Datasets
A dataset is a collection of inputs and expected outputs and is used to test your application. Before executing your first dataset run, you need to create a dataset.
Why use datasets?
- Datasets prerequisite for Dataset Runs, they serve as the data input of Dataset Runs
- Create test cases for your application with real production traces
- Collaboratively create and collect dataset items with your team
- Have a single source of truth for your test data
Get Started
Creating a dataset
Datasets have a name which is unique within a project.
langfuse.create_dataset(
name="<dataset_name>",
# optional description
description="My first dataset",
# optional metadata
metadata={
"author": "Alice",
"date": "2022-01-01",
"type": "benchmark"
}
)
See Python SDK docs for details on how to initialize the Python client.
Create new dataset items
Dataset items can be added to a dataset by providing the input and optionally the expected output.
langfuse.create_dataset_item(
dataset_name="<dataset_name>",
# any python object or value, optional
input={
"text": "hello world"
},
# any python object or value, optional
expected_output={
"text": "hello world"
},
# metadata, optional
metadata={
"model": "llama3",
}
)
See Python SDK v3 docs for details on how to initialize the Python client.
Create synthetic datasets
Frequently, you want to create synthetic examples to test your application to bootstrap your dataset. LLMs are great at generating these by prompting for common questions/tasks.
To get started have a look at this cookbook for examples on how to generate synthetic datasets:
Create items from production data
A common workflow is to select production traces where the application did not perform as expected. Then you let an expert add the expected output to test new versions of your application on the same data.
langfuse.create_dataset_item(
dataset_name="<dataset_name>",
input={ "text": "hello world" },
expected_output={ "text": "hello world" },
# link to a trace
source_trace_id="<trace_id>",
# optional: link to a specific span, event, or generation
source_observation_id="<observation_id>"
)
Edit/archive dataset items
You can edit or archive dataset items. Archiving items will remove them from future experiment runs.
You can upsert items by providing the id
of the item you want to update.
langfuse.create_dataset_item(
id="<item_id>",
# example: update status to "ARCHIVED"
status="ARCHIVED"
)
Dataset runs
Once you created a dataset, you can test and evaluate your application based on it.