Observability for OpenAI SDK (JS/TS)
Looking for the Python version? Check it out here.
The Langfuse JS/TS SDK offers a wrapper function around the OpenAI SDK, enabling you to easily add observability to your OpenAI calls. This includes tracking latencies, time-to-first-token on stream responses, errors, and model usage.
import OpenAI from "openai";
import { observeOpenAI } from "@langfuse/openai";
const openai = observeOpenAI(new OpenAI());
const res = await openai.chat.completions.create({
messages: [{ role: "system", content: "Tell me a story about a dog." }],
});
Langfuse automatically tracks:
- All prompts/completions with support for streaming and function calling
- Total latencies and time-to-first-token
- OpenAI API Errors
- Model usage (tokens) and cost (USD) (learn more)
How it works
Install Langfuse SDK
The integration is compatible with OpenAI SDK versions >=4.0.0
.
npm install @langfuse/openai openai
Register your credentials
Add your Langfuse credentials to your environment variables. Make sure that you have a .env
file in your project root and a package like dotenv
to load the variables.
LANGFUSE_SECRET_KEY = "sk-lf-..."
LANGFUSE_PUBLIC_KEY = "pk-lf-..."
LANGFUSE_BASE_URL = "https://cloud.langfuse.com" # 🇪🇺 EU region
# LANGFUSE_BASE_URL = "https://us.cloud.langfuse.com" # 🇺🇸 US region
Initialize OpenTelemetry
The Langfuse TypeScript SDK’s tracing is built on top of OpenTelemetry, so you need to set up the OpenTelemetry SDK. The LangfuseSpanProcessor
is the key component that sends traces to Langfuse.
import { NodeSDK } from "@opentelemetry/sdk-node";
import { LangfuseSpanProcessor } from "@langfuse/otel";
const sdk = new NodeSDK({
spanProcessors: [new LangfuseSpanProcessor()],
});
sdk.start();
Call OpenAI methods with the wrapped client
With your environment configured, call OpenAI SDK methods as usual from the wrapped client.
import OpenAI from "openai";
import { observeOpenAI } from "@langfuse/openai";
const openai = observeOpenAI(new OpenAI());
const res = await openai.chat.completions.create({
messages: [{ role: "system", content: "Tell me a story about a dog." }],
model: "gpt-4o",
max_tokens: 300,
});
Done!✨ You now have full observability of your OpenAI calls in Langfuse.
Check out the notebook for end-to-end examples of the integration:
Troubleshooting
Queuing and batching of events
The Langfuse SDKs queue and batches events in the background to reduce the number of network requests and improve overall performance. In a long-running application, this works without any additional configuration.
If you are running a short-lived application, you need to flush Langfuse to ensure that all events are flushed before the application exits.
await langfuseSpanProcessor.forceFlush();
// If you have previously initialized a Langfuse client, you can use that for the flush call
await langfuse.flush();
Learn more about queuing and batching of events here.
Assistants API
Tracing of the assistants api is not supported by this integration as OpenAI Assistants have server-side state that cannot easily be captured without additional api requests. We added some more information on how to best track usage of the assistants api in this FAQ.
Advanced usage
Custom trace properties
You can add the following properties to the langfuseConfig
of the observeOpenAI
function to use additional Langfuse features:
Property | Description |
---|---|
generationName | Set generationName to identify a specific type of generation. |
langfusePrompt | Pass a created or fetched Langfuse prompt to link it with the generations |
metadata | Set metadata with additional information that you want to see in Langfuse. |
sessionId | The current session. |
userId | The current user_id. |
tags | Set tags to categorize and filter traces. |
Example:
const res = await observeOpenAI(new OpenAI(), {
generationName: "Traced generation",
metadata: { someMetadataKey: "someValue" },
sessionId: "session-id",
userId: "user-id",
tags: ["tag1", "tag2"],
}).chat.completions.create({
messages: [{ role: "system", content: "Tell me a story about a dog." }],
model: "gpt-3.5-turbo",
max_tokens: 300,
});
Adding custom properties requires you to wrap the OpenAI SDK with the
observeOpenAI
function and pass the properties as the second
langfuseConfig
argument. Since the Langfuse client here is a singleton and
the same client is used for all calls, you do not need to worry about
mistakingly having multiple clients running.
Link to Langfuse prompts
With Langfuse Prompt management you can effectively manage and version your prompts. You can link your OpenAI generations to a prompt by passing the langfusePrompt
property to the observeOpenAI
function.
import { observeOpenAI } from "@langfuse/openai";
import OpenAI from "openai";
const langfusePrompt = await langfuse.prompt.get("my-prompt"); // Fetch a previously created prompt
const res = await observeOpenAI(new OpenAI(), {
langfusePrompt,
}).completions.create({
prompt: langfusePrompt.prompt,
model: "gpt-3.5-turbo-instruct",
max_tokens: 300,
});
Resulting generations are now linked to the prompt in Langfuse, allowing you to track prompt usage and performance.
When working with chat prompts, you must typecast the compiled prompt messages as OpenAI.ChatCompletionMessageParam[]
or use a type-guard utility function as Langfuse message roles can be arbitrary strings whereas the OpenAI type definition is more restrictive.
OpenAI token usage on streamed responses
OpenAI returns the token usage on streamed responses only when in stream_options
the include_usage
parameter is set to true
. If you would like to benefit from OpenAI’s directly provided token usage, you can set { include_usage: true }
in the stream_options
argument.
import OpenAI from "openai";
import { observeOpenAI } from "@langfuse/openai";
const openai = observeOpenAI(new OpenAI());
const stream = await openai.chat.completions.create({
model: "gpt-4",
messages: [{ role: "user", content: "How are you?" }],
stream: true,
stream_options: { include_usage: true },
});
let result = "";
for await (const chunk of stream) {
// Check if chunk choices are not empty. OpenAI returns token usage in a final chunk with an empty choices list.
result += chunk.choices[0]?.delta?.content || "";
}