November 30, 2025

Langfuse November Update

Agent Tracing, Model Pricing Tiers in Cost Tracking, Score Analytics, Langfuse MCP Server & more

Picture Marc KlingenMarc Klingen

What a month!

We have just wrapped up Launch Week 4 (full recap here) with major updates to Agent Observability, Model Pricing Tiers in Cost Tracking, new Score Analytics to align Evaluations, a new hosted MCP server, and so much more…

Major Updates to Agent Observability

We have made agent tracing & evals much more powerful by introducing: [1] improved tool call visibility with inline details and arguments (available and selected tools), [2] a unified Trace Log View that simplifies scrolling and searching through agent observations, [3] added more observation types which add meaning to agent spans, and by making [4] Agent Graphs generally available to visualize complex executions across Agent frameworks and custom implementations.

Agent Tools

Read more about the updates here

Model Pricing Tiers in Cost Tracking

Langfuse now supports pricing tiers for models with context-dependent pricing, enabling accurate cost calculation for models like Claude Sonnet 4.5, Gemini 2.5 Pro, Gemini 3 Pro Preview that charge different rates based on input token count. We have added pre-configured pricing tiers for 3 commonly used models. Alternatively you can configure any number of custom pricing tiers via the Langfuse UI or API.

Tiered Model Cost Graphic

Read more about the updates here

Score Analytics to align Evaluations

Score Analytics now provides comprehensive tools for analyzing and comparing evaluation scores across your LLM application. Whether you’re validating that different LLM judges agree, checking if human annotations align with automated evaluations, or exploring score distributions and trends, Score Analytics gives you the insights you need to trust your evaluation process.

Score Analytics Dashboard

Read more about the updates here

Schema Enforcement for Dataset Items

You can now add JSON Schema validation to your datasets to ensure all dataset items conform to the expected structure. This helps maintain data quality, catch errors early, and ensure consistency across your team when building and maintaining test datasets.

Dataset Schema Enforcement

Read more about the updates here

MCP Server for Prompt Management

Langfuse now includes a hosted MCP server built directly into the platform (StreamableHTTP, authenticated with Langfuse API keys). You can use it to iterate on Prompts with Claude Code or let production agents fetch them dynamically as needed. We will extend it to the rest of the Langfuse data platform in the future.

MCP Server

Read more about the updates here

New Integrations

Fixes & Improvements

  • Advanced Filtering for Public Traces and Observations API
  • Organize Your Datasets in Folders
  • Mentions and Reactions in Comments
  • Filter Sidebar in Table
  • Annotation Support in Experiment Compare View
  • Added IdP-Initiated SSO Support
  • UI: Enabled text selection in formatted views
  • UI: Dynamic row heights for trace tree
  • UI: Fixed data table rendering and scrolling
  • UX: Allow reusing deleted project names
  • UX: Fixed stars toggle state
  • API: Improved MCP authentication error codes
  • API: Stricter delete operation limits
  • Cost: Fixed trace-level cost display in datasets
  • Cost: Filtered repetition modeling from cost calculations
  • Data: Fixed API score deduplication
  • Performance: Virtualized sessions trace list
  • Public access: Fixed 401 errors for public traces
  • Integration: Fixed OpenTelemetry duplicate generations
  • Evaluations: Prevented duplicate evals
  • Exports: Handle empty tables in blob storage exports
  • Filters: Fixed special character handling in folder paths
  • Errors: Improved evaluator table error handling
  • Navigation: Better project access verification

Customer Story: Merck

The oldest pharmaceutical company in the world, Merck, is leading innovation in their field. Langfuse is powering 80+ of their AI project teams globally to ship large scale AI Applications and Agents. Read the story.

Merck

"Generative AI will only earn enterprise trust when we can see what's happening under the hood. Langfuse enables us to track every prompt, response, cost, and latency in real time, turning black-box models into auditable, optimizable assets.
Walid Mehanna, Chief Data & AI Officer at Merck

Some Pointers

  • Our CEO Marc spoke on a panel at TypeScript AI Conf - we chatted about tracing vs. evals, online vs. offline evals, OTEL, pre-AI vs. AI native observability and where the industry is heading (watch the session)
  • Due to popular demand we recorded a workshop on Continuous Agent Evaluation using AWS Bedrock AgentCore and Langfuse with our friends at AWS (watch the workshop)
  • We relaunched our Guides Section in the Documentation, now featuring more helpful blogposts, videos, and notebooks on Agent Evals and Tracing! Go check it out and let us know what you think (here).