2025/10/29

Launch Week 4

A week of new features releases focused on collaboratively tracing, evaluating, and iterating on agents

It’s back! Starting Monday, November 3rd, we’re dropping a new feature every single day for five days.

All launches

This launch brings deeper agent insights from your agent applications, improved team collaboration, a big leap in experimentation/evaluation, and more ways to integrate with your favorite tools and frameworks.

We’ll unwrap a new feature each day:

Day 1: New Filters for Tables and API

Many users have millions of traces and observations. We’ve made it easier to filter and search for the data you need.

→ Learn more about filters in the UI and in the API

Day 2: Collaborate with your team directly in Langfuse

Comments now support @mentions and emoji reactions, making it easier to collaborate with your team directly in Langfuse. Tag teammates to bring their attention to specific sessions, traces, observations, or prompts, and use reactions to quickly acknowledge insights without adding another comment.

→ Learn more about comments

We teamed up with Mixpanel to integrate LLM-related product metrics into your existing Mixpanel dashboards. This integration makes it easy to combine your regular product analytics with the LLM-specific metrics that Langfuse generates.

→ Get started with the Mixpanel integration here

Day 3: Langfuse for Agents

We’re introducing a set of upgrades to make complex agents radically easier to understand and debug:

Agent Tools now surface all tools available to the LLM at the top of each generation, with clickable definitions and a Chat UI that shows called tools, arguments, and call IDs aligned with the tools list so you can quickly verify the right ones were used.
A new Trace Log View lets you skim every agent step in a single concatenated stream, making it easy to find specific details in loopy, verbose agents.
Expanded Observation Types make it clear what each span represents, from tool calls to embeddings to agent steps.
And with Agent Graphs now Generally Available for any framework or custom instrumentation, we infer graph structure from observation timings and nesting to visualize the real execution flow of your agents, especially in complex, looping scenarios.

We also added a new guide on how to evaluate LLM agents and their tools with Langfuse.

Day 4: Experiments in Langfuse

We’re adding a set of new features to Dataset Experiments in Langfuse:

Annotations in Compare View to scores and comments directly alongside experiment results.
Baseline Comparison to set a specific run as a baseline to identify regressions in newer runs.
Compare View Filters to filter experiment results based on criteria, such as evaluator scores.
Experiment Runner SDK, a high-level SDK abstraction for automatic tracing, concurrent execution, and flexible evaluation.

We also added guides on systematically interpreting experiment results and integrating Langfuse into CI/CD pipelines for automated testing.

Day 5: Score Analytics

It’s Day 5 of Langfuse Launch Week, and today we launch Score Analytics, a simple way to measure and align your evaluators.

Quickly answer questions like “Is my LLM-as-a-judge actually measuring what I expect?” and “How well does user feedback match our manually annotated data?”

→ Learn more

Day 6: Datasets in Langfuse

Dataset Schema Enforcement We are launching schema enforcement for dataset inputs and expected outputs. You can now define a schema that is enforced on all existing and new dataset items. This guarantees a consistent data structure, making datasets reliable and easier to consume in your experimentation workflows via the UI or the SDK.

→ Learn more

Dataset Folders As agents mature, test datasets multiply. Dataset Folders help you organize this complexity. Simply add slashes to your dataset names to automatically create folders. This allows you to structure datasets by agent capability, pipeline stage, or any workflow.

→ Learn more

Learn More About Langfuse

Docs Quickstart Interactive Demo About Us