Langfuse for agents

Langfuse gives coding agents like Claude Code, Codex, and Cursor direct access to your LLM application data. Hand your agent the Langfuse skill, and it can instrument an app with tracing, query production traces, manage prompts, and set up evaluations, all from the editor or terminal.

There are three ways to connect an agent to Langfuse. They share the same underlying API, so pick the one that fits how your agent works:

Agent Skill — a playbook your coding agent loads to learn how to use Langfuse correctly.
CLI — full REST API coverage from the command line, for agents that can run bash.
MCP server — the Model Context Protocol, for agents that can't run shell commands.

This page is also available as markdown. Append .md to the URL or send an Accept: text/markdown header to get a clean, token-efficient version for your agent: langfuse.com/agents.md.

Start in 30 seconds

Install the Langfuse Agent Skill, then describe what you want in plain language. The skill teaches your agent how to instrument code, query traces, manage prompts, and run evaluators.

Install the Langfuse Agent Skill to let your coding agent access all Langfuse features.

Ask your coding agent to install the skill by pointing to the GitHub repository.

Agent instruction

"Install the Langfuse Agent Skill from github.com/langfuse/skills."

Langfuse has a Cursor Plugin that includes the skill automatically.

Install Plugin in Cursor

Install via npm (skills CLI):

npx skills add langfuse/skills --skill "langfuse"

If you want to target a specific agent directly:

npx skills add langfuse/skills --skill "langfuse" --agent "<agent-id>"

Alternatively you can manually clone the skill

Clone repo somewhere stable

git clone https://github.com/langfuse/skills.git /path/to/langfuse-skills

Make sure your agent's skills dir exists

mkdir -p /path/to/<agent-skill-root>/skills

Symlink the skill folder

ln -s /path/to/langfuse-skills/skills/langfuse /path/to/<agent-skill-root>/skills/langfuse

Once it's installed, prompt your agent directly:

Show me the last 10 traces with a score below 0.5
Add Langfuse tracing to the OpenAI calls in src/agent.ts
Migrate the system prompt in src/agent.ts to Langfuse prompt management
Create a dataset called "edge-cases" and add these three failing traces to it

Agent Skill

The Agent Skill follows the open Agent Skills standard and works across Claude Code, Cursor, Codex, Windsurf, and other compatible agents. It is open source on GitHub.

A skill is a self-contained folder with a SKILL.md entrypoint plus reference docs for specific workflows. Coding agents produce better results with it installed because they follow Langfuse's documented best practices instead of guessing the API.

Those reference docs include a guide on using the Langfuse CLI, so the skill teaches your agent the exact commands to query traces, manage prompts, and run evaluators rather than leaving it to improvise HTTP requests.

The skill uses progressive disclosure: only the frontmatter sits in the agent's context, and the full instructions load on demand when a task is relevant. Context stays small until the agent actually needs the details.

Set up the Agent Skill

Skills repository on GitHub

CLI

The langfuse-cli wraps the entire Langfuse REST API. It is generated from the OpenAPI spec, so every endpoint becomes a command with proper flags, validation, and help text. The skill uses it under the hood, and your agent can call it directly when it can run bash.

Run it without installing:

npx langfuse-cli api <resource> <action>

Agents discover the surface area through built-in help, which keeps token usage low:

npx langfuse-cli api __schema                   # list all resources
npx langfuse-cli api <resource> --help          # list actions for a resource
npx langfuse-cli api <resource> <action> --help # show args for an action

Because the CLI is generated from the spec, it stays in sync with the API automatically and covers every endpoint, including full-text search over observations.

CLI reference

langfuse-cli on npm

MCP server

For agents that can't run shell commands, Langfuse exposes a Model Context Protocol server. It authenticates with your Langfuse API keys and covers most of the platform: observations, metrics, scores, score configs, datasets, dataset items, dataset runs, dataset run items, prompts, comments, annotation queues, models, media, and health.

Agents such as Claude Cowork, Linear agents, or custom internal tools can investigate a production issue, pull the relevant observation, query metrics, leave a comment for the team, create a score, or stage a dataset item for regression testing, without leaving the chat.

MCP client config

{
  "mcpServers": {
    "langfuse": {
      "url": "https://cloud.langfuse.com/api/public/mcp",
      "transport": "streamableHttp",
      "headers": {
        "Authorization": "Basic <base64(LANGFUSE_PUBLIC_KEY:LANGFUSE_SECRET_KEY)>"
      }
    }
  }
}

To keep an agent read-only, allow-list the lookup tools and leave out the write tools.

Skill, CLI, or MCP? Use the Skill to give a coding agent the full Langfuse playbook. Use the CLI when the agent can run bash and pre-filter data before it hits the context window. Use the MCP server when the agent can't run shell commands.

MCP server setup

MCP tool reference

Built to be fast for agents

Agents are impatient. They fan out many queries while working through a task, so slow responses waste both wall-clock time and tokens. Langfuse runs on ClickHouse, the columnar database built for fast analytical queries, which keeps reads quick even over hundreds of gigabytes of traces.

Full-text search via API. The matches operator on Observations API v2 gives agents token-based full-text search over trace inputs and outputs. In our benchmarks, a search that previously took 18 seconds and scanned 494 GB now returns in under half a second and reads less than a gigabyte. Your agent can pull the one trace where a refund failed out of millions of observations without scrolling.

A REST API that covers everything. Every Langfuse feature has an endpoint, so an agent can read traces, write scores, manage prompts, and run dataset experiments programmatically. The CLI and MCP server both build on this same API.

Docs built for agents

When an agent integrates Langfuse, it needs to find the right documentation fast and read it without wading through HTML.

Markdown endpoints. Any docs page is available as clean markdown. Append .md to the path or send an Accept: text/markdown header. Markdown pages are roughly 3 to 5 times smaller in tokens than the HTML equivalent.
```
curl https://langfuse.com/docs/observability/overview.md
```
llms.txt. A machine-readable index of every docs page, so an agent can orient itself before fetching anything.
```
curl -s https://langfuse.com/llms.txt
```
Public docs search. A RAG endpoint over all docs and indexed GitHub issues and discussions, no authentication required. Useful when the agent doesn't know which page to read.
```
curl -s "https://langfuse.com/api/search-docs?query=How+do+I+trace+LangGraph+agents"
```
Docs MCP server. The same docs search and page retrieval over MCP, for clients that prefer it.

Wire up evaluation gates

Once an agent can read and write Langfuse data, it can help you measure quality, not just ship code. A few things worth pointing your agent at:

LLM-as-a-judge calibration. Have your agent check whether your judge agrees with your human annotators, and report accuracy, F1, precision, and recall.
Code evaluators. Deterministic checks like JSON validity, schema conformance, or required tool arguments, written as a small function and scored automatically.
Experiments in CI/CD. A GitHub Action that runs your Langfuse experiments on every pull request and fails the build when scores drop below your threshold.

Resources

Will you be my CLI? — the full story behind the agent skill, CLI, markdown endpoints, and MCP servers
Building Langfuse's MCP server
Using agent skills to improve your prompts
Evaluating AI agent skills
Agent Skill docs · CLI docs · MCP server docs
Skills on GitHub · langfuse-cli on npm · MCP reference

Was this page helpful?