Glossary

This glossary provides definitions for key terms and concepts used throughout the Langfuse documentation. Use the filters below to browse by category or search for specific terms.

Filter:

A B C D E F G I L M O P R S T U

Showing 54 of 54 terms

A

Agent

(Observation Type)

An observation type that represents an AI agent workflow, including multi-step reasoning processes, tool orchestration, and autonomous decision-making. Used to track agent behavior and interactions.

Related:Observation·Tool·Agent Graph·Span

A visual representation of complex AI agent workflows in Langfuse. Agent graphs help you understand and debug multi-step reasoning processes and agent interactions by displaying the flow of observations within a trace.

Related:Trace·Observation·Agent·Span

Annotation Queue

A manual evaluation method that allows domain experts to review and add scores and comments to traces, observations, or sessions. Useful for building ground truth, systematic labeling, and team collaboration.

API Key

Credentials used to authenticate with the Langfuse API and SDKs. API keys consist of a public key and secret key and are associated with a specific project. They are managed in project settings.

Related:Project·Public API·SDK

B

Billable Unit

The unit of measurement for Langfuse Cloud pricing. Units are the sum of traces, observations, and scores ingested per billing period.

Related:Trace·Observation·Score

C

Chain

(Observation Type)

An observation type that represents a link between different application steps, such as passing context from a retriever to an LLM call.

Chat Prompt

(Message Prompt)

A prompt type that consists of an array of messages with specific roles (system, user, assistant). Useful for managing complete conversation structures and chat history.

Custom Dashboards

Flexible, self-service analytics dashboards that allow you to visualize and monitor metrics from your LLM application. Dashboards support multiple chart types, filtering, and multi-level aggregations.

Related:Score·Token Tracking

D

Dataset

A collection of test cases (dataset items) used to test and benchmark LLM applications. Datasets contain inputs and optionally expected outputs for systematic testing.

Dataset Experiment

(Dataset Run, Experiment Run)

Also known as a Dataset Run. The execution of a dataset through your LLM application, producing outputs that can be evaluated. Links dataset items to their corresponding traces.

Related:Dataset·Dataset Item·Task·Score

Dataset Item

An individual test case within a dataset. Each item contains an input (the scenario to test) and optionally an expected output.

Related:Dataset·Dataset Experiment·Task

E

Embedding

(Observation Type)

An observation type that represents a call to an LLM to generate embeddings. Can include model information, token usage, and costs.

Related:Observation·Generation·Retriever·Token Tracking

Environment

A way to organize traces, observations, and scores from different deployment contexts (e.g., production, staging, development). Helps keep data separate while using the same project.

Related:Project·Trace·Tags

Evaluation Method

A function that scores traces, observations, sessions, or dataset runs. Methods include LLM-as-a-Judge for subjective assessments, Annotation Queues for human review, Scores via UI for spot checks, and Scores via API/SDK for programmatic evaluation.

Evaluator

(Observation Type)

An observation type that represents functions assessing the relevance, correctness, or helpfulness of LLM outputs. Also refers to the function that scores experiment results.

Event

(Observation Type)

A basic observation type used to track discrete events in a trace. Events are the building blocks of tracing.

Related:Observation·Span·Trace

F

Flush

The process of sending buffered trace data to the Langfuse server. Important for short-lived applications to ensure no data is lost when the process terminates.

Related:SDK·Trace·Instrumentation

G

Generation

(Observation Type)

An observation type that logs outputs from AI models including prompts, completions, token usage, and costs. The most common observation type for LLM calls.

Related:Observation·Token Tracking·Span

Guardrail

(Observation Type)

An observation type that represents a component protecting against malicious content, jailbreaks, or other security risks.

Related:Observation·Trace·Agent

I

Instrumentation

The process of adding code to record application behavior. Langfuse provides context managers, observe wrappers, and manual observation methods for instrumenting your application.

Related:SDK·Trace·Observation·Flush

L

LLM Connection

An API key configuration that allows Langfuse to call LLM models in the Playground or for LLM-as-a-Judge evaluations. Supports providers like OpenAI, Anthropic, and Google.

Related:Playground·LLM-as-a-Judge

LLM-as-a-Judge

An evaluation method that uses an LLM to score the output of your application based on custom criteria. Provides scalable, repeatable evaluations with chain-of-thought reasoning.

Log View

Shows all observations concatenated. Great for quickly scanning through them.

Related:Agent Graph

M

MCP Server

A Model Context Protocol server that enables AI-powered tools to interact with Langfuse data. Used for advanced integrations and AI-assisted workflows.

Related:Public API·SDK

Metrics API

An API endpoint for retrieving customized analytics from Langfuse data. Allows specifying dimensions, metrics, filters, and time granularity to build custom reports and dashboards for LLM applications.

Model Definition

A configuration that stores pricing information for an LLM model. Model definitions specify the cost per input and output token, enabling Langfuse to automatically calculate the price of generations based on token usage.

O

Observation

An individual step within a trace. Observations can be of different types (span, generation, event, tool, etc.) and can be nested to represent hierarchical workflows.

Related:Trace·Span·Generation·Event

Offline Evaluation

Testing your application against a fixed dataset before deployment. Used to validate changes and catch regressions during development.

Online Evaluation

Scoring live production traces to catch issues in real traffic. Helps identify edge cases and monitor application quality in production.

OpenTelemetry

(OTel)

An open standard for collecting telemetry data from applications. Langfuse is built on OpenTelemetry, enabling interoperability and reducing vendor lock-in.

Related:Trace·Span·Instrumentation

Organization

A top-level entity in Langfuse that contains projects. Organizations manage billing, team members, and SSO configuration.

Related:Project·RBAC

P

Playground

The LLM Playground where you can test, iterate, and compare different prompts and models directly in Langfuse without writing code.

Project

A container that groups all Langfuse data within an organization. Projects enable fine-grained role-based access control and separate data for different applications.

Prompt Label

A label that can be assigned to a prompt version. Used to mark prompt versions as production or staging to fetch them via the SDK or API.

Prompt Management

A systematic approach to storing, versioning, and retrieving prompts for LLM applications. Decouples prompt updates from code deployment.

Prompt Variables

Placeholders in prompts that are dynamically filled at runtime. Allow creating reusable prompt templates with customizable content.

Protected Prompt Label

Restricts the ability to modify certain prompt labels (e.g. production) from being added to new prompt versions to admins and owners. This prevents accidental or unauthorized changes to production prompts.

Related:Prompt Management·Environment

Public API

The REST API that provides access to all Langfuse data and features. Used for custom integrations, workflows, and programmatic access.

Related:SDK·API Key·MCP Server

R

RBAC

(Role-Based Access Control)

Role-Based Access Control that manages permissions within Langfuse. Roles include Owner, Admin, Member, Viewer, and None, each with specific scopes.

Related:Organization·Project

Remote Experiment

A webhook-based trigger that allows running SDK experiments from the Langfuse UI. Configure a webhook URL and default config, then trigger experiments that fetch the dataset, run your application, and ingest scores back into Langfuse.

Retriever

(Observation Type)

An observation type that represents data retrieval steps, such as calls to vector stores or databases in RAG applications.

Related:Observation·Chain·Embedding

S

Score

The output of an evaluation. Scores can be numeric, categorical, or boolean and are assigned to traces, observations, sessions, or dataset runs.

Score Config

A configuration defining how a score is calculated and interpreted. Includes data type, value constraints, and categories for standardized scoring.

Related:Score·LLM-as-a-Judge

SDK

(Software Development Kit)

Software Development Kit. Langfuse provides native SDKs for Python and JavaScript/TypeScript that handle tracing, prompt management, and API access.

Session

A way to group related traces that are part of the same user interaction. Commonly used for multi-turn conversations or chat threads.

Related:Trace·User Tracking

Span

(Observation Type)

An observation type that represents the duration of a unit of work in a trace. The default observation type for most operations.

T

Task

A function definition that processes dataset items during an experiment. The task represents the application code you want to test.

Text Prompt

(String Prompt)

A prompt type that consists of a single string. Ideal for simple use cases or when you only need a system message.

Token

The basic unit of text that LLMs process. Tokens can be words, parts of words, or characters depending on the model's tokenizer. Token counts determine API costs and context window limits. Langfuse tracks input and output tokens for cost monitoring and optimization.

Related:Generation·Custom Dashboards

Tool

(Observation Type)

An observation type that represents a tool call in your application, such as calling a weather API or executing a database query.

Related:Observation·Agent·Span·Agent Graph

Trace

A single request or operation in your LLM application. Traces contain the overall input, output, and metadata, along with nested observations that capture each step.

Related:Observation·Session·Span·Generation

Tracing

The process of capturing structured logs of every request in your LLM application. Includes prompts, responses, token usage, latency, and any intermediate steps.

Related:Trace·Instrumentation·SDK

U

User Tracking

The ability to associate traces with users via a userId. Enables per-user analytics, cost tracking, and filtering.

Related:Trace·Session

Security & Guardrails Roadmap

Was this page helpful?

Support