Datasets

A dataset is a collection of inputs and expected outputs and is used to test your application. Both UI-based and SDK-based experiments support Langfuse Datasets.

Langfuse Dataset View

Why use datasets?

Create test cases for your application with real production traces
Collaboratively create and collect dataset items with your team
Have a single source of truth for your test data

Get Started

Creating a dataset

Datasets have a name which is unique within a project.

langfuse.create_dataset(
    name="<dataset_name>",
    # optional description
    description="My first dataset",
    # optional metadata
    metadata={
        "author": "Alice",
        "date": "2022-01-01",
        "type": "benchmark"
    }
)

See Python SDK docs for details on how to initialize the Python client.

import { LangfuseClient } from "@langfuse/client"
 
const langfuse = new LangfuseClient()
 
await langfuse.api.datasets.create({
  name: "<dataset_name>",
  // optional description
  description: "My first dataset",
  // optional metadata
  metadata: {
    author: "Alice",
    date: "2022-01-01",
    type: "benchmark",
  },
});

Navigate to Your Project > Datasets
Click on + New dataset to create a new dataset.

Create dataset

Upload or create new dataset items

Dataset items can be added to a dataset by providing the input and optionally the expected output. If preferred, dataset items can be imported using the CSV uploader in the Langfuse UI.

langfuse.create_dataset_item(
    dataset_name="<dataset_name>",
    # any python object or value, optional
    input={
        "text": "hello world"
    },
    # any python object or value, optional
    expected_output={
        "text": "hello world"
    },
    # metadata, optional
    metadata={
        "model": "llama3",
    }
)

See Python SDK docs for details on how to initialize the Python client.

import { LangfuseClient } from "@langfuse/client";
 
const langfuse = new LangfuseClient();
 
await langfuse.api.datasetItems.create({
  datasetName: "<dataset_name>",
  // any JS object or value
  input: {
    text: "hello world",
  },
  // any JS object or value, optional
  expectedOutput: {
    text: "hello world",
  },
  // metadata, optional
  metadata: {
    model: "llama3",
  },
});

See JS/TS SDK docs for details on how to initialize the JS/TS client.

Dataset Folders

Datasets can be organized into virtual folders to group datasets serving similar use cases. To create a folder, add slashes (/) to a dataset name. The UI shows every segment ending with a / as a folder automatically.

Create and fetch a dataset in a folder

Use the Langfuse UI or SDK to create and fetch a dataset in a folder by adding a slash (/) to a dataset name.

dataset_name = "evaluation/qa-dataset"
 
# When creating a dataset, use the full dataset name
langfuse.create_dataset(
    name=dataset_name,
)
 
# When fetching a dataset in a folder, use the full dataset name
langfuse.get_dataset(
    name=dataset_name
)

This creates and fetches a dataset named qa-dataset in a folder named evaluation. The full dataset name remains evaluation/qa-dataset.

import { LangfuseClient } from "@langfuse/client";
 
const langfuse = new LangfuseClient();
 
const datasetName = "evaluation/qa-dataset";
const encodedName = encodeURIComponent(datasetName); // "evaluation%2Fqa-dataset"
 
// When creating a dataset, use the full dataset name
await langfuse.dataset.create(datasetName);
 
// When fetching a dataset in a folder, use the encoded name
await langfuse.dataset.get(encodedName);

This creates and fetches a dataset named qa-dataset in a folder named evaluation. The full dataset name remains evaluation/qa-dataset.

In the UI, create a dataset and use a slash (/) in the name field to organize it into a folder. Fetch it by navigating to the folder, clicking on the folder name and clicking on the dataset name in the list.

URL Encoding: When using dataset names with slashes as path parameters in the API or JS/TS SDK, use URL encoding. For example, in TypeScript: encodeURIComponent(name).

Versioning

To access Dataset Versions via the Langfuse UI, navigate to: Datasets > Navigate to a specific dataset > Select Items Tab. On this page you can toggle the version view.

Every add, update, delete, or archive of dataset items produces a new dataset version. Versions track changes over time using timestamps.

GET APIs return the latest version at query time by default. Support for fetching datasets at specific version timestamps via API will be added shortly.

Versioning applies to dataset items only, not dataset schemas. Dataset schema changes do not create new versions.

Schema Enforcement

Optionally add JSON Schema validation to your datasets to ensure all dataset items conform to a defined structure. This helps maintain data quality, catch errors early, and ensure consistency across your team.

You can define JSON schemas for input and/or expectedOutput fields when creating or updating a dataset. Once set, all dataset items are automatically validated against these schemas. Valid items are accepted, invalid items are rejected with detailed error messages showing the validation issue.

langfuse.create_dataset(
    name="qa-conversations",
    input_schema={
        "type": "object",
        "properties": {
            "messages": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "role": {"type": "string", "enum": ["user", "assistant", "system"]},
                        "content": {"type": "string"}
                    },
                    "required": ["role", "content"]
                }
            }
        },
        "required": ["messages"]
    },
    expected_output_schema={
        "type": "object",
        "properties": {"response": {"type": "string"}},
        "required": ["response"]
    }
)

await langfuse.createDataset({
  name: "qa-conversations",
  inputSchema: {
    type: "object",
    properties: {
      messages: {
        type: "array",
        items: {
          type: "object",
          properties: {
            role: { type: "string", enum: ["user", "assistant", "system"] },
            content: { type: "string" }
          },
          required: ["role", "content"]
        }
      }
    },
    required: ["messages"]
  },
  expectedOutputSchema: {
    type: "object",
    properties: { response: { type: "string" } },
    required: ["response"]
  }
});

Create synthetic datasets

Frequently, you want to create synthetic examples to test your application to bootstrap your dataset. LLMs are great at generating these by prompting for common questions/tasks.

To get started have a look at this cookbook for examples on how to generate synthetic datasets:

Notebook: Synthetic Datasets

Create items from production data

A common workflow is to select production traces where the application did not perform as expected. Then you let an expert add the expected output to test new versions of your application on the same data.

langfuse.create_dataset_item(
    dataset_name="<dataset_name>",
    input={ "text": "hello world" },
    expected_output={ "text": "hello world" },
    # link to a trace
    source_trace_id="<trace_id>",
    # optional: link to a specific span, event, or generation
    source_observation_id="<observation_id>"
)

import { LangfuseClient } from "@langfuse/client";
 
const langfuse = new LangfuseClient();
 
await langfuse.api.datasetItems.create({
  datasetName: "<dataset_name>",
  input: { text: "hello world" },
  expectedOutput: { text: "hello world" },
  // link to a trace
  sourceTraceId: "<trace_id>",
  // optional: link to a specific span, event, or generation
  sourceObservationId: "<observation_id>",
});

In the UI, use + Add to dataset on any observation (span, event, generation) of a production trace.

Batch add observations to datasets

You can batch add multiple observations to a dataset directly from the observations table. This is useful for quickly building test datasets from production data.

The field mapping system gives you control over how observation data is transformed into dataset items. You can use the entire field as-is (e.g., map the full observation input to the dataset item input), extract specific values using JSON path expressions or build custom objects from multiple fields.

Navigate to the Observations table
Use filters to find relevant observations
Select observations using the checkboxes
Click Actions → Add to dataset
Choose to create a new dataset or select an existing one
Configure field mapping to control how observation data maps to dataset item fields
Preview the mapping and confirm

Batch operations run in the background with support for partial success. If some observations fail validation against a dataset schema, valid items are still added and errors are logged for review. You can monitor progress in Settings → Batch Actions.

Edit/archive dataset items

You can edit or archive dataset items. Archiving items will remove them from future experiment runs.

You can upsert items by providing the id of the item you want to update.

langfuse.create_dataset_item(
    id="<item_id>",
    # example: update status to "ARCHIVED"
    status="ARCHIVED"
)

You can upsert items by providing the id of the item you want to update.

import { LangfuseClient } from "@langfuse/client";
 
const langfuse = new LangfuseClient();
 
await langfuse.api.datasetItems.create({
  id: "<item_id>",
  // example: update status to "ARCHIVED"
  status: "ARCHIVED",
});

In the UI, you can edit the item by clicking on the item id. To archive or delete the item, click on the dots next to the item and select Archive or Delete.

Dataset runs

Once you created a dataset, you can test and evaluate your application based on it.

Experiments via SDK Experiments via UI

Learn more about the Experiments data model.

Overview Experiments via SDK

Was this page helpful?

Support