Manage evaluators via MCP
Set up evaluators and evaluation rules from AI agents through the Langfuse MCP server, and create code evaluators through the unstable public API.
You can now set up and manage evaluation directly from AI agents: the Langfuse MCP server exposes evaluators and evaluation rules as tools, and the unstable public API now supports code evaluators in addition to LLM-as-a-Judge.
This lets agents own more of the evaluation loop. For example, an agent can inspect failing traces, write a code evaluator that catches the failure pattern, and wire up an evaluation rule that runs it on live observations — all without leaving the chat.
New MCP tools
EvaluatorslistEvaluatorsgetEvaluatorcreateEvaluator |
Evaluation ruleslistEvaluationRulesgetEvaluationRulecreateEvaluationRuleupdateEvaluationRuledeleteEvaluationRule |
Code evaluators in the API
The unstable evaluator endpoints now accept type: "code" to create deterministic Python or TypeScript evaluators programmatically, alongside the existing llm_as_judge type. Evaluation rules can reference code evaluators, and active rules are test-run before creation so broken evaluator code is rejected upfront.
These endpoints and MCP tools are unstable and may change while the underlying evaluation data model is being redesigned. The UI workflow remains fully supported.