← Back to changelog
October 16, 2025

LLM-as-a-Judge Execution Tracing & Enhanced Observability

Picture Hassieb PakzadHassieb Pakzad
LLM-as-a-Judge Execution Tracing & Enhanced Observability

Every LLM-as-a-Judge evaluator execution now creates a trace, allowing you to inspect the exact prompts, responses, and token usage for each evaluation.

We’re excited to announce a major enhancement to Langfuse’s LLM-as-a-Judge evaluations: full tracing of evaluator executions. Every time an LLM-as-a-Judge evaluator runs, we now create a detailed trace that captures the complete LLM interaction, giving you unprecedented visibility into how your evaluations are performing.

What’s New

Every LLM-as-a-Judge evaluator execution going forward is linked to a Langfuse trace of the underlying LLM call. This means you can:

  • Debug evaluation prompts: See exactly what prompt was sent to the judge LLM
  • Inspect model responses: View the complete response including score and reasoning
  • Monitor token usage: Track costs and performance for each evaluator execution
  • Trace evaluation history: Navigate from any score back to its source LLM interaction

How to access execution traces: There are four ways to navigate to an evaluator execution trace:

  1. Score tooltip in trace view: For LLM-as-a-Judge scores, hover over any score badge and click “View execution trace”

Score tooltip with execution trace link

  1. Tracing table: Filter the environment to langfuse-llm-as-a-judge to view all evaluator execution traces

Tracing table filtered to langfuse-llm-as-a-judge environment

  1. Scores table: Enable the “Execution Trace” column in the scores table to see all evaluator executions

Scores table with execution trace column

  1. Evaluator logs table: View execution trace IDs in the evaluator logs for detailed execution history

Evaluator logs with execution traces

Why This Matters

Previously, debugging failed evaluations or understanding why a judge gave a particular score required guesswork. Now, with full tracing:

  1. Trust your evaluations: Verify that the judge received the correct input and made sound judgments
  2. Optimize costs: Identify expensive evaluation patterns and optimize your prompts
  3. Faster debugging: Instantly see what went wrong when an evaluation fails
  4. Audit trail: Complete history of every evaluation decision for compliance and analysis

Getting Started

This feature is automatically enabled for all LLM-as-a-Judge executions going forward.

Learn More

Was this page helpful?