07 Evaluate a Change
Workshop source
Workshop material is maintained in the public langfuse/langfuse-workshop repository. Use the repository for the runnable app, checkpoint branches, and local setup.
Learner guide: 07 Evaluate a Change
Instructor notes
- Make learners inspect run 1 before changing anything. The change should respond to evidence, not vibes.
- Keep the iteration deliberately small: one prompt rule, one rerun, one comparison.
- Emphasize regressions. The most useful comparison is often the item that got worse.
Demo rhythm
- Read low-scoring items from the first run.
- Add or promote a new prompt version.
- Run
npm run dataset:runagain. - Compare both runs side by side and decide whether the change is worth shipping.
Watch for
- Learners changing both model and prompt at the same time, making the comparison hard to interpret.
- New prompt versions that are saved but not promoted to the label the app fetches.
Was this page helpful?