← Back to changelog
Hassieb Pakzad
September 17, 2025
Experiment Runner SDK

New high-level SDK abstraction for running experiments on datasets with automatic tracing, concurrent execution, and flexible evaluation.
Both the Python and JS/TS SDKs now provide a high-level abstraction for running experiments on datasets. The dataset can be both local or hosted on Langfuse. Using the Experiment runner is the recommended way to run an experiment on a dataset with our SDK.
Key Features
The experiment runner automatically handles:
- Concurrent execution of tasks with configurable limits
- Automatic tracing of all executions for observability
- Flexible evaluation with both item-level and run-level evaluators
- Error isolation so individual failures don’t stop the experiment
- Traces in Langfuse even though the core task function is not instrumented by automatic input / return value capture
- Dataset integration for easy comparison and tracking
Example
from langfuse import get_client
from langfuse.openai import OpenAI
# Initialize client
langfuse = get_client()
# Define your task function
def my_task(*, item, **kwargs):
question = item["input"]
response = OpenAI().chat.completions.create(
model="gpt-4.1", messages=[{"role": "user", "content": question}]
)
return response.choices[0].message.content
# Run experiment on local data
local_data = [
{"input": "What is the capital of France?"},
{"input": "What is the capital of Germany?"},
]
result = langfuse.run_experiment(
name="Geography Quiz",
description="Testing basic functionality",
data=local_data,
task=my_task,
)
# Pretty print results
print(result.format())This prints:
1. Item 1:
Input: What is the capital of France?
Actual: The capital of France is Paris.
Trace ID: e52488cb13d426f55a2a7c178d4cb0d0
2. Item 2:
Input: What is the capital of Germany?
Actual: The capital of Germany is **Berlin**.
Trace ID: 188cd8fc165446fa957a7c15423cbe0e
──────────────────────────────────────────────────
📊 Geography Quiz - Testing basic functionality
2 itemsGet Started
Learn more about the experiment runner incl. how to use it with Langfuse datasets, adding evaluators and more in our remote dataset runs documentation.
Was this page helpful?