Scores Data Model
In Langfuse Scores are the data object to store evaluations. They are a flexible object that is used across all evaluation methods to assign evaluation scores to different objects across the Langfuse platform. At a high level, you can think of Scores as the output of an evaluation method.
Scores
Scores serve as objects for storing evaluation metrics in Langfuse. Here are its core properties:
- Scores reference a
Trace,Observation,Session, orDatasetRun - Each Score references exactly one of the above objects.
- Scores are either numeric, categorical, or boolean.
- Scores can optionally be linked to a
ScoreConfigto ensure they comply with a specific schema.
Common Use
| Level | Description |
|---|---|
| Trace | Used for evaluation of a single interaction. (most common) |
| Observation | Used for evaluation of a single observation below the trace level. |
| Session | Used for comprehensive evaluation of outputs across multiple interactions. |
| Dataset Run | Used for performance scores of a Dataset Run. See Dataset Runs for context. |
Score object
| Attribute | Type | Description |
|---|---|---|
name | string | Name of the score, e.g. user_feedback, hallucination_eval |
value | number | Optional: Numeric value of the score. Always defined for numeric and boolean scores. Optional for categorical scores. |
stringValue | string | Optional: String equivalent of the score’s numeric value for boolean and categorical data types. Automatically set for categorical scores based on the config if the configId is provided. |
traceId | string | Optional: Id of the trace the score relates to |
observationId | string | Optional: Observation (e.g. LLM call) the score relates to |
sessionId | string | Optional: Id of the session the score relates to |
datasetRunId | string | Optional: Id of the dataset run the score relates to |
comment | string | Optional: Evaluation comment, commonly used for user feedback, eval reasoning output or internal notes |
id | string | Unique identifier of the score. Auto-generated by SDKs. Optionally can also be used as an idempotency key to update scores. |
source | string | Automatically set based on the source of the score. Can be either API, EVAL, or ANNOTATION |
dataType | string | Automatically set based on the config data type when the configId is provided. Otherwise can be defined manually as NUMERIC, CATEGORICAL or BOOLEAN |
configId | string | Optional: Score config id to ensure that the score follows a specific schema. Can be defined in the Langfuse UI or via API. When provided the score’s dataType is automatically set based on the config |
Score Config
Score configs are used to ensure that your scores follow a specific schema. Using score configs allows you to standardize your scoring schema across your team and ensure that scores are consistent and comparable for future analysis.
You can define a scoreConfig in the Langfuse UI or via our API. Configs are immutable but can be archived (and restored anytime).
A score config includes:
- Score name
- Data type:
NUMERIC,CATEGORICAL,BOOLEAN - Constraints on score value range (Min/Max for numerical, Custom categories for categorical data types
Score Config object
| Attribute | Type | Description |
|---|---|---|
id | string | Unique identifier of the score config. |
name | string | Name of the score config, e.g. user_feedback, hallucination_eval |
dataType | string | Can be either NUMERIC, CATEGORICAL or BOOLEAN |
isArchived | boolean | Whether the score config is archived. Defaults to false |
minValue | number | Optional: Sets minimum value for numerical scores. If not set, the minimum value defaults to -∞ |
maxValue | number | Optional: Sets maximum value for numerical scores. If not set, the maximum value defaults to +∞ |
categories | list | Optional: Defines categories for categorical scores. List of objects with label value pairs |
description | string | Optional: Provides further description of the score configuration |