Evaluator Migration Guide
Langfuse has introduced running evaluators on observations as the recommended approach for live data LLM-as-a-Judge evaluations. This guide helps you migrate existing live data evaluators to the new system.
Prerequisites
SDK Requirements for Observation-Level Evaluators (Live Data)
| Requirement | Python | JS/TS |
|---|---|---|
| Minimum SDK version | v3+ (OTel-based) | v4+ (OTel-based) |
| Migration guide | Python v2 → v3 | JS/TS v3 → v4 |
Filtering by trace attributes: To filter observations by trace-level attributes (userId, sessionId, version, tags, metadata, trace_name), you must use propagate_attributes() in your instrumentation code.
SDK Requirements for Experiments via SDK
| Requirement | Python | JS/TS |
|---|---|---|
| Minimum SDK version | >= 3.9.0 | >= 4.4.0 |
| Required function | run_experiment() | experiment.run() |
You must use the experiment runner SDK functions listed above. Simply having the correct SDK version is not sufficient.
Why Migrate?
Benefits of Observation-Level Evaluators
1. Better Performance
- Reduced database load enables faster evaluation processing
- Scales better under high-volume workloads
2. Improved Reliability
- More predictable behavior with evaluation targeting specific operations
- Better error handling and retry logic
3. Greater Control
- Evaluate specific observations (LLM calls, tool invocations, etc.) rather than entire traces
- More precise filtering
- Easier debugging when evaluations fail
4. Future-Proof
- Built on Langfuse’s next-generation evaluation architecture
Understanding the Trade-offs
We recognize this migration may require work on your end. Here’s our perspective:
- You can keep running evaluators on traces: They will continue to work for the foreseeable future
- Some users benefit more than others: High-volume users or those with complex traces will see the biggest improvements
- This enables long-term improvements: The architectural change allows us to build better, simpler features for everyone
- We’re here to help: Use the built-in migration wizard and this guide
When to Migrate
✅ Migrate Now If:
- You are using the OTel-based SDKs (Python v3+ or JS/TS v4+)
- You are experiencing performance issues with current evaluators
- You are setting up new evaluators and want the best experience
⏸️ Wait If:
- You are still using the legacy SDKs (Python v2 or JS/TS v3)
- Your current evaluators work perfectly for your use case
Migration Process
Step 1: Verify SDK Version
Confirm you are using the OTel-based SDKs (see Prerequisites above):
pip show langfuse
# Required: v3+Step 2: Use the Upgrade Wizard
Langfuse provides a built-in wizard to migrate your evaluators.
-
Navigate to your evaluators page
- Go to your project → Evaluation → LLM-as-a-Judge
- You’ll see a callout for evaluators marked “Legacy”
-
Click “Upgrade” on any legacy evaluator
- This opens the migration wizard
- The wizard shows your current configuration on the left
-
Review the migrated configuration
- Left side: Your current (legacy) configuration (read-only)
- Right side: Proposed configuration (editable)
-
Adjust the new configuration
- Filters: Add filters to narrow down the evaluation to a specific subset of data you’re interested in (
observation type,trace name,trace tags,userId,sessionId,metadataetc.) - Variable Mapping: Map variables from observation fields (input, output, metadata) to your evaluation prompt
- Filters: Add filters to narrow down the evaluation to a specific subset of data you’re interested in (
-
Choose what happens to the old evaluator
- Keep both active: Test the new evaluator alongside the old one
- Mark old as inactive (recommended initially): Old evaluator stops running, new one takes over
- Delete old evaluator: Permanently remove the legacy evaluator
Step 3: Verify Evaluator Execution
Verify the new evaluator works correctly:
-
Check execution metrics
- Go to Evaluator Table → find new evaluator row → click “Logs”
- View execution logs
-
Compare results (if running both)
- Review scores from both legacy and new evaluators. You might find our score analytics helpful to compare the results.
- Ensure consistency in evaluation logic
Migration Examples
Example 1: Simple Trace Evaluator
Likely, your trace input/output is equivalent to a observation’s input/output within that same trace. Your evaluator should now target this observation directly. In this example, let’s assume you have a generation observation named “chat-completion” that holds the same input/output as your trace.
Before (Trace-level):
Target: Traces
Filter: trace.name = "chat-completion"
Variables:
- user_query: trace.input
- assistant_response: trace.outputAfter (Observation-level):
Target: Observations
Filter: trace.name = "chat-completion" AND observation.type = "generation" AND observation.name = "chat-completion"
Variables:
- user_query: observation.input
- assistant_response: observation.outputKey Changes:
- Additional filters at observation level to identify the specific observation you want to evaluate in the trace tree
- Variables come from observation instead of trace (e.g.
observation.inputandobservation.output)
Troubleshooting
Variables Don’t Map Correctly
Problem: You were mapping variables from two different observations
Solution:
- If possible, store necessary context in a single observation metadata during instrumentation
- Consider breaking your single trace evaluator into multiple observation evaluators
- Do not migrate your evaluator now. We do not yet have a translation for the new system, but are actively working on it.
SDK Version-Specific Guidance
For Users on Legacy SDKs (Python v2, JS/TS v3)
You have two options:
Option 1: Upgrade to OTel-based SDK (Recommended)
- Upgrade to Python SDK v3+ or JS/TS SDK v4+ (see Prerequisites)
- Update your instrumentation code using the migration guides
- Migrate evaluators using the wizard
Option 2: Continue with Evaluators Running on Traces
- No changes needed—trace-level evaluators will continue to work
For Users on OTel-based SDKs (Python v3+, JS/TS v4+)
If you have existing evaluators running on traces, we recommend migrating to observation-level evaluators using the wizard above to get the full benefits of the new architecture.
Getting Help
- Documentation: Refer to LLM-as-a-Judge guide
- GitHub: Report issues at github.com/langfuse/langfuse
- Support: Contact support@langfuse.com for enterprise customers
Rollback Plan
If you need to revert after migration:
- If you kept both evaluators: Simply mark the new one as inactive
- If you deleted the old evaluator: Create a new evaluator with the old configuration
- Data is preserved: All historical evaluation results remain accessible
Last updated: January 30, 2026