November 30, 2025

Langfuse November Update

Agent Tracing, Model Pricing Tiers in Cost Tracking, Score Analytics, Langfuse MCP Server & more

Marc Klingen

What a month!

We have just wrapped up Launch Week 4 (full recap here) with major updates to Agent Observability, Model Pricing Tiers in Cost Tracking, new Score Analytics to align Evaluations, a new hosted MCP server, and so much more…

Major Updates to Agent Observability

We have made agent tracing & evals much more powerful by introducing: [1] improved tool call visibility with inline details and arguments (available and selected tools), [2] a unified Trace Log View that simplifies scrolling and searching through agent observations, [3] added more observation types which add meaning to agent spans, and by making [4] Agent Graphs generally available to visualize complex executions across Agent frameworks and custom implementations.

Agent Tools

Read more about the updates here

Model Pricing Tiers in Cost Tracking

Langfuse now supports pricing tiers for models with context-dependent pricing, enabling accurate cost calculation for models like Claude Sonnet 4.5, Gemini 2.5 Pro, Gemini 3 Pro Preview that charge different rates based on input token count. We have added pre-configured pricing tiers for 3 commonly used models. Alternatively you can configure any number of custom pricing tiers via the Langfuse UI or API.

Tiered Model Cost Graphic

Read more about the updates here

Score Analytics to align Evaluations

Score Analytics now provides comprehensive tools for analyzing and comparing evaluation scores across your LLM application. Whether you’re validating that different LLM judges agree, checking if human annotations align with automated evaluations, or exploring score distributions and trends, Score Analytics gives you the insights you need to trust your evaluation process.

Score Analytics Dashboard

Read more about the updates here

Schema Enforcement for Dataset Items

You can now add JSON Schema validation to your datasets to ensure all dataset items conform to the expected structure. This helps maintain data quality, catch errors early, and ensure consistency across your team when building and maintaining test datasets.

Dataset Schema Enforcement

Read more about the updates here

MCP Server for Prompt Management

Langfuse now includes a hosted MCP server built directly into the platform (StreamableHTTP, authenticated with Langfuse API keys). You can use it to iterate on Prompts with Claude Code or let production agents fetch them dynamically as needed. We will extend it to the rest of the Langfuse data platform in the future.

MCP Server

Read more about the updates here

New Integrations

Mixpanel
Amazon Bedrock AgentCore – If you want to dive deep, I recommend watching this workshop with our friends at AWS

Fixes & Improvements

Advanced Filtering for Public Traces and Observations API
Organize Your Datasets in Folders
Mentions and Reactions in Comments
Filter Sidebar in Table
Annotation Support in Experiment Compare View
Added IdP-Initiated SSO Support
UI: Enabled text selection in formatted views
UI: Dynamic row heights for trace tree
UI: Fixed data table rendering and scrolling
UX: Allow reusing deleted project names
UX: Fixed stars toggle state
API: Improved MCP authentication error codes
API: Stricter delete operation limits
Cost: Fixed trace-level cost display in datasets
Cost: Filtered repetition modeling from cost calculations
Data: Fixed API score deduplication
Performance: Virtualized sessions trace list
Public access: Fixed 401 errors for public traces
Integration: Fixed OpenTelemetry duplicate generations
Evaluations: Prevented duplicate evals
Exports: Handle empty tables in blob storage exports
Filters: Fixed special character handling in folder paths
Errors: Improved evaluator table error handling
Navigation: Better project access verification

User Story: Merck

The oldest pharmaceutical company in the world, Merck, is leading innovation in their field. Langfuse is powering 80+ of their AI project teams globally to ship large scale AI Applications and Agents. Read the story.

Merck

"Generative AI will only earn enterprise trust when we can see what's happening under the hood. Langfuse enables us to track every prompt, response, cost, and latency in real time, turning black-box models into auditable, optimizable assets.

Walid Mehanna, Chief Data & AI Officer at Merck

Some Pointers

Our CEO Marc spoke on a panel at TypeScript AI Conf - we chatted about tracing vs. evals, online vs. offline evals, OTEL, pre-AI vs. AI native observability and where the industry is heading (watch the session)
Due to popular demand we recorded a workshop on Continuous Agent Evaluation using AWS Bedrock AgentCore and Langfuse with our friends at AWS (watch the workshop)
We relaunched our Guides Section in the Documentation, now featuring more helpful blogposts, videos, and notebooks on Agent Evals and Tracing! Go check it out and let us know what you think (here).