Docs
Scores & Evaluation
Getting Started

Getting Started

The Langfuse scoring system supports all forms of evaluation methods due to its open architecture and API.

Score object in Langfuse

Scores serve as objects for storing evaluation metrics in Langfuse. They are always associated with a trace and can be attached to specific observations within a trace. Optionally, scores can be linked to a score configuration to ensure they comply with a specific schema.

AttributeTypeDescription
namestringName of the score, e.g. user_feedback, hallucination_eval
valuenumberOptional: Numeric value of the score. Always defined for numeric and boolean scores. Optional for categorical scores.
stringValuestringOptional: String equivalent of the score's numeric value for boolean and categorical data types. Automatically set for categorical scores based on the config if the configId is provided.
traceIdstringId of the trace the score relates to
observationIdstringOptional: Observation (e.g. LLM call) the score relates to
commentstringOptional: Evaluation comment, commonly used for user feedback, eval output or internal notes
idstringUnique identifier of the score. Auto-generated by SDKs. Optionally can also be used as an indempotency key to update scores.
sourcestringAutomatically set based on the souce of the score. Can be either API, EVAL, or ANNOTATION
dataTypestringAutomatically set based on the config data type when the configId is provided. Otherwise can be defined manually as NUMERIC, CATEGORICAL or BOOLEAN
configIdstringOptional: Score config id to ensure that the score follows a specific schema. Can be defined in the Langfuse UI or via API. When provided the score's dataType is automatically set based on the config

Using scores across Langfuse

Scores can be used in multiple ways across Langfuse:

  1. Displayed on trace to provide a quick overview
  2. Segment all execution traces by scores to e.g. find all traces with a low quality score
  3. Analytics: Detailed score reporting with drill downs into use cases and user segments

Configure score schema

If you'd like to ensure that your scores follow a specific schema, you can define a score config in the Langfuse UI or via our API.

A score config includes the score name, data type, and constraints on score value range such as min and max values for numerical data types and custom categories for categorical data types. Configs are immutable but can be archived (and restored anytime). Using score configs allows you to standardize your scoring schema across your team and ensure that scores are consistent and comparable for future analysis.

Frequently used scores

Scores in Langfuse are adaptable (it is just a name) and designed to cater to the unique requirements of specific LLM applications. They typically serve to measure the following aspects:

  • Quality
    • Factual accuracy
    • Completeness of the information provided
    • Verification against hallucinations
  • Style
    • Sentiment portrayed
    • Tonality of the content
    • Potential toxicity
  • Security
    • Similarity to prevalent prompt injections
    • Instances of model refusals (e.g., as a language model, ...)

This flexible scoring system allows for a comprehensive evaluation of various elements integral to the function and performance of the LLM application.

Was this page useful?

Questions? We're here to help

Subscribe to updates