Observability Data Model
Tracing in Langfuse is a way to log and analyze the execution of your LLM applications. The following reference provides a detailed overview of the data model used. It is inspired by OpenTelemetry.
Traces and Observations
Traces
A trace
typically represents a single request or operation.
It contains the overall input and output of the function, as well as metadata about the request ( i.e. user, session, tags, etc.).
Observations
Each trace can contain multiple observations
to log the individual steps of the execution. Usually, a trace corresponds to a single api call of an application.
Types
-
event
is the basic building block. An event is used to track discrete events in a trace. -
span
represents durations of units of work in a trace. -
generation
logs generations of AI models incl. prompts, token usage and costs. -
agent
decides on the application flow and can for example use tools with the guidance of a LLM. -
tool
represents a tool call, for example to a weather API. -
chain
is a link between different application steps, like passing context from a retriever to a LLM call. -
retriever
represents data retrieval steps, such as a call to a vector store or a database. -
evaluator
represents functions that assess relevance/correctness/helpfulness of a LLM’s outputs. -
embedding
is a call to a LLM to generate embeddings and can include model, token usage and costs -
guardrail
is a component that protects against malicious content or jailbreaks.
Nesting
Hierarchical structure of traces in Langfuse
Example trace in Langfuse UI
Sessions
Optionally, traces can be grouped into sessions. Sessions are used to group traces that are part of the same user interaction. A common example is a thread in a chat interface.
Please refer to the Sessions documentation to add sessions to your traces.
Optionally, sessions aggregate traces
Example session in Langfuse UI
Scores
Scores are flexible objects used to evaluate traces, observations, sessions and dataset runs.
They can be:
- Numeric, categorical, or boolean values
- Associated with a trace, a session, or a dataset run (one and only one is required)
- For trace level scores only: Linked to a specific observation within a trace (optional)
- Annotated with comments for additional context
- Validated against a score configuration schema (optional)
Typically, session-level scores are used for comprehensive evaluation of conversational experiences across multiple interactions, while trace-level scores are used for evaluation of a single interaction. Dataset run level scores are used for overall evaluation of a dataset run, e.g. precision, recall, F1-score.
Please refer to the scores documentation to get started. For more details on score types and attributes, refer to the score data model documentation.
Billable Units
Langfuse Cloud pricing is based on the number of ingested units per billing period.
Units
= Traces
+ Observations
+ Scores
Use our pricing calculator to estimate your monthly costs based on your expected usage.
FAQ
How can I track my Langfuse Cloud usage? Use the Usage Monitoring Report in the Dashboards tab in Langfuse to analyze your Langfuse Cloud usage.
How can I optimize my Langfuse Cloud usage to reduce cost? If your application scales and you want to optimize Langfuse Cloud cost, please check out this guide.