06 Experiments
Workshop source
Workshop material is maintained in the public langfuse/langfuse-workshop repository. Use the repository for the runnable app, checkpoint branches, and local setup.
Learner guide: 06 Experiments
Instructor notes
- The key idea is reuse: the experiment runner calls the same
runSupportConversation(...)as the web app. - Contrast deterministic scoring in the script (
keyword_overlap) with LLM-as-a-judge scoring (correctness). - Confirm the default evaluator model before the Correctness setup. If learners did not configure it in session 4, send them to Project Settings → LLM Connections first.
- Emphasize the mixed setup: the script owns the cheap deterministic check, while Langfuse owns the semantic judge.
- Keep concurrency at one for workshops so traces and the final run summary are easy to follow.
Demo rhythm
- Skim the numbered sections in
scripts/run-dataset.ts. - Point out the
keyword_overlapevaluator inside the script. - Configure the Correctness evaluator as a dataset-run evaluator.
- Run
npm run dataset:run. - Open the run table, per-item traces, and chart view.
Watch for
- Correctness evaluator target. Keep it on Dataset runs if you want the score to show up on the run rows and in run comparison.
- Have learners switch from the default Observations view to Dataset runs before they configure anything else.
- Correctness evaluator mapping uses three different source dropdowns:
queryis Input with$.messages[-1].content,generationis Output with no JsonPath, andground_truthis Expected Output with$.idealAnswer. - A common misconfiguration is leaving all three variables on Input, which silently makes the evaluator read the wrong data for every field.
- Learners assuming the deterministic check must live in Langfuse now. It does not; mention the code-evaluator docs only as an alternative.
- "No default model set" means Langfuse needs an LLM connection/default evaluator model; it is not fixed by editing
.env. - Slow asynchronous evaluator results; the console only shows the final summary, so refresh Langfuse after the run finishes if
correctnessis still pending.
Was this page helpful?