Compare Experiments Faster
Rebuilt experiment screens with faster loading, standalone access, and enhanced filtering for efficient analysis.
Experiments now load and filter faster, work as a standalone feature, and provide a more intuitive interface for comparing model versions. Run an A/B test between sonnet-4 and sonnet-4.5, compare evaluation scores across prompt variants, or triage regressions before shipping—all with quicker feedback loops.
This feature is in open beta and currently available on Langfuse Cloud only. Enable Fast Preview in the bottom left to get started.
What's new
Faster loading and filtering. Experiments now leverage our rebuilt observation-centric data model. Tables, and filters respond quickly even on large experiment runs.
Standalone experiments. Experiments no longer require a linked dataset. Experiments run against local data via SDK are now visible in the UI alongside dataset-backed experiments.
Polished UI with extended filtering. A cleaner interface with visual deltas on scores, cost, and latency. Set a baseline, compare candidates side-by-side, and filter by score thresholds to quickly surface regressions.
For a complete walkthrough on running experiments and interpreting results systematically, see our guide on Systematic Evaluation of AI Agents.