FAQ

Evaluator Migration Guide

Langfuse has introduced running evaluators on observations as the recommended approach for live data LLM-as-a-Judge evaluations. This guide helps you migrate existing live data evaluators to the new system.

Prerequisites

SDK Requirements for Observation-Level Evaluators (Live Data)

RequirementPythonJS/TS
Minimum SDK versionv3+ (OTel-based)v4+ (OTel-based)
Migration guidePython v2 → v3JS/TS v3 → v4

Filtering by trace attributes: To filter observations by trace-level attributes (userId, sessionId, version, tags, metadata, trace_name), you must use propagate_attributes() in your instrumentation code.

SDK Requirements for Experiments via SDK

RequirementPythonJS/TS
Minimum SDK version>= 3.9.0>= 4.4.0
Required functionrun_experiment()experiment.run()

You must use the experiment runner SDK functions listed above. Simply having the correct SDK version is not sufficient.

Why Migrate?

Benefits of Observation-Level Evaluators

1. Better Performance

  • Reduced database load enables faster evaluation processing
  • Scales better under high-volume workloads

2. Improved Reliability

  • More predictable behavior with evaluation targeting specific operations
  • Better error handling and retry logic

3. Greater Control

  • Evaluate specific observations (LLM calls, tool invocations, etc.) rather than entire traces
  • More precise filtering
  • Easier debugging when evaluations fail

4. Future-Proof

  • Built on Langfuse’s next-generation evaluation architecture

Understanding the Trade-offs

We recognize this migration may require work on your end. Here’s our perspective:

  • You can keep running evaluators on traces: They will continue to work for the foreseeable future
  • Some users benefit more than others: High-volume users or those with complex traces will see the biggest improvements
  • This enables long-term improvements: The architectural change allows us to build better, simpler features for everyone
  • We’re here to help: Use the built-in migration wizard and this guide

When to Migrate

✅ Migrate Now If:

  • You are using the OTel-based SDKs (Python v3+ or JS/TS v4+)
  • You are experiencing performance issues with current evaluators
  • You are setting up new evaluators and want the best experience

⏸️ Wait If:

  • You are still using the legacy SDKs (Python v2 or JS/TS v3)
  • Your current evaluators work perfectly for your use case

Migration Process

Step 1: Verify SDK Version

Confirm you are using the OTel-based SDKs (see Prerequisites above):

pip show langfuse
# Required: v3+

Step 2: Use the Upgrade Wizard

Langfuse provides a built-in wizard to migrate your evaluators.

  1. Navigate to your evaluators page

    • Go to your project → Evaluation → LLM-as-a-Judge
    • You’ll see a callout for evaluators marked “Legacy”
  2. Click “Upgrade” on any legacy evaluator

    • This opens the migration wizard
    • The wizard shows your current configuration on the left
  3. Review the migrated configuration

    • Left side: Your current (legacy) configuration (read-only)
    • Right side: Proposed configuration (editable)
  4. Adjust the new configuration

    • Filters: Add filters to narrow down the evaluation to a specific subset of data you’re interested in (observation type, trace name, trace tags, userId, sessionId, metadata etc.)
    • Variable Mapping: Map variables from observation fields (input, output, metadata) to your evaluation prompt
  5. Choose what happens to the old evaluator

    • Keep both active: Test the new evaluator alongside the old one
    • Mark old as inactive (recommended initially): Old evaluator stops running, new one takes over
    • Delete old evaluator: Permanently remove the legacy evaluator

Step 3: Verify Evaluator Execution

Verify the new evaluator works correctly:

  1. Check execution metrics

    • Go to Evaluator Table → find new evaluator row → click “Logs”
    • View execution logs
  2. Compare results (if running both)

    • Review scores from both legacy and new evaluators. You might find our score analytics helpful to compare the results.
    • Ensure consistency in evaluation logic

Migration Examples

Example 1: Simple Trace Evaluator

Likely, your trace input/output is equivalent to a observation’s input/output within that same trace. Your evaluator should now target this observation directly. In this example, let’s assume you have a generation observation named “chat-completion” that holds the same input/output as your trace.

Before (Trace-level):

Target: Traces
Filter: trace.name = "chat-completion"
Variables:
  - user_query: trace.input
  - assistant_response: trace.output

After (Observation-level):

Target: Observations
Filter:  trace.name = "chat-completion" AND observation.type = "generation" AND observation.name = "chat-completion"
Variables:
  - user_query: observation.input
  - assistant_response: observation.output

Key Changes:

  • Additional filters at observation level to identify the specific observation you want to evaluate in the trace tree
  • Variables come from observation instead of trace (e.g. observation.input and observation.output)

Troubleshooting

Variables Don’t Map Correctly

Problem: You were mapping variables from two different observations

Solution:

  • If possible, store necessary context in a single observation metadata during instrumentation
  • Consider breaking your single trace evaluator into multiple observation evaluators
  • Do not migrate your evaluator now. We do not yet have a translation for the new system, but are actively working on it.

SDK Version-Specific Guidance

For Users on Legacy SDKs (Python v2, JS/TS v3)

You have two options:

Option 1: Upgrade to OTel-based SDK (Recommended)

  1. Upgrade to Python SDK v3+ or JS/TS SDK v4+ (see Prerequisites)
  2. Update your instrumentation code using the migration guides
  3. Migrate evaluators using the wizard

Option 2: Continue with Evaluators Running on Traces

  • No changes needed—trace-level evaluators will continue to work

For Users on OTel-based SDKs (Python v3+, JS/TS v4+)

If you have existing evaluators running on traces, we recommend migrating to observation-level evaluators using the wizard above to get the full benefits of the new architecture.

Getting Help

Rollback Plan

If you need to revert after migration:

  1. If you kept both evaluators: Simply mark the new one as inactive
  2. If you deleted the old evaluator: Create a new evaluator with the old configuration
  3. Data is preserved: All historical evaluation results remain accessible

Last updated: January 30, 2026

Was this page helpful?