September 25, 2023

Datasets (beta)

Collect sets of inputs and expected outputs in Langfuse to evaluate your LLM app. Use evaluations to benchmark different experiments.

Datasets are collections of inputs and expected outputs that you can manage in Langfuse. Upload an existing dataset or create one based on production data (e.g. when discovering new edge cases).

When combined with automated evals, Datasets in Langfuse make it easy to systematically evaluate new iterations of your LLM app.

Run experiment on dataset

 
dataset = langfuse.get_dataset("<dataset_name>")
 
for item in dataset.items:
    # execute application function and get Langfuse parent observation (span/generation/event)
    # output also returned as it is used to evaluate the run
    generation, output = my_llm_application.run(item.input)
 
    # link the execution trace to the dataset item and give it a run_name
    item.link(generation, "<run_name>")
 
    # optionally, evaluate the output to compare different runs more easily
    generation.score(
        name="<example_eval>",
        # any float value
        value=my_eval_fn(
            item.input,
            output,
            item.expected_output
        )
    )

const dataset = await langfuse.getDataset("<dataset_name>");
 
for (const item of dataset.items) {
  // execute application function and get Langfuse parent observation (span/generation/event)
  // output also returned as it is used to evaluate the run
  const [generation, output] = await myLlmApplication.run(item.input);
 
  // link the execution trace to the dataset item and give it a run_name
  await item.link(generation, "<run_name>");
 
  // optionally, evaluate the output to compare different runs more easily
  generation.score({
    name: "<score_name>",
    value: myEvalFunction(item.input, output, item.expectedOutput),
  });
}

dataset = langfuse.get_dataset("<dataset_name>")
 
for item in dataset.items:
    # Langchain calback handler that automatically links the execution trace to the dataset item
    handler = item.get_langchain_handler(run_name="<run_name>")
 
    # Execute application and pass custom handler
    my_langchain_chain.run(item.input, callbacks=[handler])

Datasets are currently in beta on Langfuse Cloud as the API might still slightly change. If you’d like to try it, let us know via the in-app chat.

Links:

Was this page helpful?

Support