DocsGlossary

Glossary

This glossary provides definitions for key terms and concepts used throughout the Langfuse documentation. Use the filters below to browse by category or search for specific terms.

Filter:

Showing 54 of 54 terms

A

Agent

(Observation Type)

An observation type that represents an AI agent workflow, including multi-step reasoning processes, tool orchestration, and autonomous decision-making. Used to track agent behavior and interactions.

A visual representation of complex AI agent workflows in Langfuse. Agent graphs help you understand and debug multi-step reasoning processes and agent interactions by displaying the flow of observations within a trace.

A manual evaluation method that allows domain experts to review and add scores and comments to traces, observations, or sessions. Useful for building ground truth, systematic labeling, and team collaboration.

Credentials used to authenticate with the Langfuse API and SDKs. API keys consist of a public key and secret key and are associated with a specific project. They are managed in project settings.

B

The unit of measurement for Langfuse Cloud pricing. Units are the sum of traces, observations, and scores ingested per billing period.

C

Chain

(Observation Type)

An observation type that represents a link between different application steps, such as passing context from a retriever to an LLM call.

Chat Prompt

(Message Prompt)

A prompt type that consists of an array of messages with specific roles (system, user, assistant). Useful for managing complete conversation structures and chat history.

Flexible, self-service analytics dashboards that allow you to visualize and monitor metrics from your LLM application. Dashboards support multiple chart types, filtering, and multi-level aggregations.

Related:Score·Token Tracking

D

A collection of test cases (dataset items) used to test and benchmark LLM applications. Datasets contain inputs and optionally expected outputs for systematic testing.

Dataset Experiment

(Dataset Run, Experiment Run)

Also known as a Dataset Run. The execution of a dataset through your LLM application, producing outputs that can be evaluated. Links dataset items to their corresponding traces.

An individual test case within a dataset. Each item contains an input (the scenario to test) and optionally an expected output.

E

Embedding

(Observation Type)

An observation type that represents a call to an LLM to generate embeddings. Can include model information, token usage, and costs.

Related:Observation·Generation·Retriever·Token Tracking

A way to organize traces, observations, and scores from different deployment contexts (e.g., production, staging, development). Helps keep data separate while using the same project.

Related:Project·Trace·Tags

A function that scores traces, observations, sessions, or dataset runs. Methods include LLM-as-a-Judge for subjective assessments, Annotation Queues for human review, Scores via UI for spot checks, and Scores via API/SDK for programmatic evaluation.

Evaluator

(Observation Type)

An observation type that represents functions assessing the relevance, correctness, or helpfulness of LLM outputs. Also refers to the function that scores experiment results.

Event

(Observation Type)

A basic observation type used to track discrete events in a trace. Events are the building blocks of tracing.

F

The process of sending buffered trace data to the Langfuse server. Important for short-lived applications to ensure no data is lost when the process terminates.

G

Generation

(Observation Type)

An observation type that logs outputs from AI models including prompts, completions, token usage, and costs. The most common observation type for LLM calls.

Related:Observation·Token Tracking·Span

Guardrail

(Observation Type)

An observation type that represents a component protecting against malicious content, jailbreaks, or other security risks.

I

The process of adding code to record application behavior. Langfuse provides context managers, observe wrappers, and manual observation methods for instrumenting your application.

L

An API key configuration that allows Langfuse to call LLM models in the Playground or for LLM-as-a-Judge evaluations. Supports providers like OpenAI, Anthropic, and Google.

An evaluation method that uses an LLM to score the output of your application based on custom criteria. Provides scalable, repeatable evaluations with chain-of-thought reasoning.

Shows all observations concatenated. Great for quickly scanning through them.

Related:Agent Graph

M

A Model Context Protocol server that enables AI-powered tools to interact with Langfuse data. Used for advanced integrations and AI-assisted workflows.

Related:Public API·SDK

An API endpoint for retrieving customized analytics from Langfuse data. Allows specifying dimensions, metrics, filters, and time granularity to build custom reports and dashboards for LLM applications.

A configuration that stores pricing information for an LLM model. Model definitions specify the cost per input and output token, enabling Langfuse to automatically calculate the price of generations based on token usage.

O

An individual step within a trace. Observations can be of different types (span, generation, event, tool, etc.) and can be nested to represent hierarchical workflows.

Testing your application against a fixed dataset before deployment. Used to validate changes and catch regressions during development.

Scoring live production traces to catch issues in real traffic. Helps identify edge cases and monitor application quality in production.

An open standard for collecting telemetry data from applications. Langfuse is built on OpenTelemetry, enabling interoperability and reducing vendor lock-in.

A top-level entity in Langfuse that contains projects. Organizations manage billing, team members, and SSO configuration.

Related:Project·RBAC

P

The LLM Playground where you can test, iterate, and compare different prompts and models directly in Langfuse without writing code.

A container that groups all Langfuse data within an organization. Projects enable fine-grained role-based access control and separate data for different applications.

A label that can be assigned to a prompt version. Used to mark prompt versions as production or staging to fetch them via the SDK or API.

A systematic approach to storing, versioning, and retrieving prompts for LLM applications. Decouples prompt updates from code deployment.

Placeholders in prompts that are dynamically filled at runtime. Allow creating reusable prompt templates with customizable content.

Restricts the ability to modify certain prompt labels (e.g. production) from being added to new prompt versions to admins and owners. This prevents accidental or unauthorized changes to production prompts.

The REST API that provides access to all Langfuse data and features. Used for custom integrations, workflows, and programmatic access.

R

RBAC

(Role-Based Access Control)

Role-Based Access Control that manages permissions within Langfuse. Roles include Owner, Admin, Member, Viewer, and None, each with specific scopes.

A webhook-based trigger that allows running SDK experiments from the Langfuse UI. Configure a webhook URL and default config, then trigger experiments that fetch the dataset, run your application, and ingest scores back into Langfuse.

Retriever

(Observation Type)

An observation type that represents data retrieval steps, such as calls to vector stores or databases in RAG applications.

S

The output of an evaluation. Scores can be numeric, categorical, or boolean and are assigned to traces, observations, sessions, or dataset runs.

A configuration defining how a score is calculated and interpreted. Includes data type, value constraints, and categories for standardized scoring.

SDK

(Software Development Kit)

Software Development Kit. Langfuse provides native SDKs for Python and JavaScript/TypeScript that handle tracing, prompt management, and API access.

A way to group related traces that are part of the same user interaction. Commonly used for multi-turn conversations or chat threads.

Span

(Observation Type)

An observation type that represents the duration of a unit of work in a trace. The default observation type for most operations.

T

Flexible labels that categorize and filter traces and observations. Useful for organizing by feature, API endpoint, workflow, or other criteria.

A function definition that processes dataset items during an experiment. The task represents the application code you want to test.

Text Prompt

(String Prompt)

A prompt type that consists of a single string. Ideal for simple use cases or when you only need a system message.

The basic unit of text that LLMs process. Tokens can be words, parts of words, or characters depending on the model's tokenizer. Token counts determine API costs and context window limits. Langfuse tracks input and output tokens for cost monitoring and optimization.

Tool

(Observation Type)

An observation type that represents a tool call in your application, such as calling a weather API or executing a database query.

A single request or operation in your LLM application. Traces contain the overall input, output, and metadata, along with nested observations that capture each step.

The process of capturing structured logs of every request in your LLM application. Includes prompts, responses, token usage, latency, and any intermediate steps.

U

The ability to associate traces with users via a userId. Enables per-user analytics, cost tracking, and filtering.

Related:Trace·Session
Was this page helpful?