Launch Week 4
A week of new features releases focused on collaboratively tracing, evaluating, and iterating on agents
It’s back! Starting Monday, November 3rd, we’re dropping a new feature every single day for five days.
All launches
This launch brings deeper agent insights from your agent applications, improved team collaboration, a big leap in experimentation/evaluation, and more ways to integrate with your favorite tools and frameworks.
We’ll unwrap a new feature each day:
Day 1: New Filters for Tables and API
Many users have millions of traces and observations. We’ve made it easier to filter and search for the data you need.
→ Learn more about filters in the UI and in the API
Day 2: Collaborate with your team directly in Langfuse
Comments now support @mentions and emoji reactions, making it easier to collaborate with your team directly in Langfuse. Tag teammates to bring their attention to specific sessions, traces, observations, or prompts, and use reactions to quickly acknowledge insights without adding another comment.
→ Learn more about comments
We teamed up with Mixpanel to integrate LLM-related product metrics into your existing Mixpanel dashboards. This integration makes it easy to combine your regular product analytics with the LLM-specific metrics that Langfuse generates.
→ Get started with the Mixpanel integration here
Day 3: Langfuse for Agents
We’re introducing a set of upgrades to make complex agents radically easier to understand and debug:
- Agent Tools now surface all tools available to the LLM at the top of each generation, with clickable definitions and a Chat UI that shows called tools, arguments, and call IDs aligned with the tools list so you can quickly verify the right ones were used.
- A new Trace Log View lets you skim every agent step in a single concatenated stream, making it easy to find specific details in loopy, verbose agents.
- Expanded Observation Types make it clear what each span represents, from tool calls to embeddings to agent steps.
- And with Agent Graphs now Generally Available for any framework or custom instrumentation, we infer graph structure from observation timings and nesting to visualize the real execution flow of your agents, especially in complex, looping scenarios.
We also added a new guide on how to evaluate LLM agents and their tools with Langfuse.
Day 4: Experiments in Langfuse
We’re adding a set of new features to Dataset Experiments in Langfuse:
-
Annotations in Compare View to scores and comments directly alongside experiment results.
-
Baseline Comparison to set a specific run as a baseline to identify regressions in newer runs.
-
Compare View Filters to filter experiment results based on criteria, such as evaluator scores.
-
Experiment Runner SDK, a high-level SDK abstraction for automatic tracing, concurrent execution, and flexible evaluation.
We also added guides on systematically interpreting experiment results and integrating Langfuse into CI/CD pipelines for automated testing.
Day 5: Score Analytics
It’s Day 5 of Langfuse Launch Week, and today we launch Score Analytics, a simple way to measure and align your evaluators.
Quickly answer questions like “Is my LLM-as-a-judge actually measuring what I expect?” and “How well does user feedback match our manually annotated data?”
Day 6: Datasets in Langfuse
Dataset Schema Enforcement We are launching schema enforcement for dataset inputs and expected outputs. You can now define a schema that is enforced on all existing and new dataset items. This guarantees a consistent data structure, making datasets reliable and easier to consume in your experimentation workflows via the UI or the SDK.
Dataset Folders As agents mature, test datasets multiply. Dataset Folders help you organize this complexity. Simply add slashes to your dataset names to automatically create folders. This allows you to structure datasets by agent capability, pipeline stage, or any workflow.