June 10, 2026

Manage evaluators via MCP

Tobias Wochinger

Set up evaluators and evaluation rules from AI agents through the Langfuse MCP server, and create code evaluators through the unstable public API.

You can now set up and manage evaluation directly from AI agents: the Langfuse MCP server exposes evaluators and evaluation rules as tools, and the unstable public API now supports code evaluators in addition to LLM-as-a-Judge.

This lets agents own more of the evaluation loop. For example, an agent can inspect failing traces, write a code evaluator that catches the failure pattern, and wire up an evaluation rule that runs it on live observations — all without leaving the chat.

New MCP tools

Evaluators

listEvaluatorsgetEvaluatorcreateEvaluator

Evaluation rules

listEvaluationRulesgetEvaluationRulecreateEvaluationRuleupdateEvaluationRuledeleteEvaluationRule

Code evaluators in the API

The unstable evaluator endpoints now accept type: "code" to create deterministic Python or TypeScript evaluators programmatically, alongside the existing llm_as_judge type. Evaluation rules can reference code evaluators, and active rules are test-run before creation so broken evaluator code is rejected upfront.

These endpoints and MCP tools are unstable and may change while the underlying evaluation data model is being redesigned. The UI workflow remains fully supported.

Get started

MCP server documentation

Code evaluators

Evaluators API reference

Was this page helpful?

PreviousUse OpenAI models on Amazon Bedrock

NextScores API v3