Evaluation Overview

Evals give you a repeatable check of your LLM application’s behavior. You replace guesswork with data.

They also help you catch regressions before you ship a change. You tweak a prompt to handle an edge case, run your eval, and immediately see if it affected the behavior of your application in unintended ways.

🎥

Watch this walkthrough of Langfuse Evaluation and how to use it to improve your LLM application.

Getting Started

If you’re new to LLM evaluation, start by exploring the Concepts page. There’s a lot to uncover, and going through the concepts before diving in will speed up your learning curve.

Once you know what you want to do, you can:

Create a dataset to measure your LLM application’s performance consistently
Run an experiment get an overview of how your application is doing
Set up a live evaluator to monitor your live traces

Looking for something specific? Take a look under Evaluation Methods and Experiments for guides on specific topics.

GitHub Discussions

Troubleshooting & FAQ Concepts

Was this page helpful?

Support