Experiments Overview
Experiments allow you to systematically test your LLM application using a dataset, enabling you to evaluate and compare its performance.
Each experiment is based on a Dataset containing inputs and, optionally, expected outputs. This Dataset can be either local or hosted on Langfuse. For each input, the experiment runs a task function—this could be your LLM application when using Experiments via SDK, or a prompt sent to a model when using Experiments via UI.
The results can be assessed and scored using various Evaluation Methods.
Running experiments
The matrix below shows the different experiment configurations based on where your data is hosted and where the experiment execution takes place:
Langfuse Execution
Local/CI Execution
Langfuse Dataset
Local Dataset
Not supported
While it’s optional, we recommend managing the underlying Datasets in Langfuse as it allows for [1] In-UI comparison tables of different experiments on the same data and [2] Iteratively improve dataset based on production/staging traces.