The Langfuse Academy is here →
Langfuse March UpdateMarch 31, 2026

Langfuse March Update

Agent Skill, Langfuse CLI, boolean and categorical LLM-as-a-Judge scores, Kiro integration, and more

Picture Marc KlingenMarc Klingen

The past months we have been shipping a lot to make Langfuse much easier to use for your coding agents. If you are not using Langfuse through your coding agent yet, we strongly recommend giving it a spin!

Agent Skill

Agent Skill

The Langfuse Agent Skill helps coding agents to use Langfuse effectively. It follows the open Agent Skills standard and works with Claude Code, Cursor, Codex, and others.

Install it in one line:

npx skills add langfuse/skills --skill "langfuse"

Or just ask your coding agent to install it from github.com/langfuse/skills.

Once installed, your agent can query traces, create datasets, update prompts, migrate hardcoded prompts to Langfuse Prompt Management, and set up observability — all without leaving your editor. Even if you are already successfully using Langfuse, the Skill can help you improve your workflows and instrumentation.

Langfuse CLI

Langfuse CLI

The skill uses the Langfuse CLI under the hood. It wraps the entire Langfuse API, auto-generated from our OpenAPI spec so it's always in sync. Every endpoint becomes a CLI command: traces, prompts, datasets, scores, sessions, metrics, and more.

Built for agents, but useful for humans too. Script your workflows, automate batch-scoring, or sync prompts across environments in CI/CD.

npm

Further reading

Here are some pointers for what to do with the Skill and CLI:

  • Getting started with all Langfuse features. Using the Skill it is incredibly easy to get started using more of the Langfuse platform. Just ask your agent that you would like to test a certain feature and it can propose useful first use cases and start implementing them.
  • Automatic prompt improvement. Annotate a few traces in Langfuse, then let an agent fetch your feedback, analyze patterns, and propose prompt changes. A fast loop from rough to robust. → Read the guide

We have also learned a lot about building efficient Skills:

  • Evaluating skill quality. We used Langfuse datasets, tracing, and the Claude Agent SDK to systematically test and improve the Skill itself. Small details matter: a single comment saying "optional" instead of "mandatory" caused consistent agent failures. → Blog post
  • Optimizing skills with Autoresearch. We ran Karpathy's autoresearch on our prompt migration skill. Score went from 0.35 to 0.82. Not all changes were keepers, but the process surfaced failure modes we'd never have found manually. → Blog post

Fixes & improvements

Upcoming events


Was this page helpful?