Observability for OpenAI SDK (JS/TS)
Looking for the Python version? Check it out here.
The Langfuse JS/TS SDK offers a wrapper function around the OpenAI SDK, enabling you to easily add observability to your OpenAI calls. This includes tracking latencies, time-to-first-token on stream responses, errors, and model usage.
import OpenAI from "openai";
import { observeOpenAI } from "langfuse";
const openai = observeOpenAI(new OpenAI());
const res = await openai.chat.completions.create({
messages: [{ role: "system", content: "Tell me a story about a dog." }],
});
Langfuse automatically tracks:
- All prompts/completions with support for streaming and function calling
- Total latencies and time-to-first-token
- OpenAI API Errors
- Model usage (tokens) and cost (USD) (learn more)
In the Langfuse Console
How it works
Install Langfuse SDK
The integration is compatible with OpenAI SDK versions >=4.0.0
.
npm install langfuse openai
Call OpenAI methods with the wrapped client
Langfuse wraps the OpenAI SDK and provides the functionality and method signatures. You can call them as usual.
The observeOpenAI
function automatically instantiates a Langfuse client in the background. You can either configure the client with environment variables or pass the configuration directly to the observeOpenAI
function.
Add your Langfuse credentials to your environment variables. You can find your credentials in your project settings in the Langfuse UI. Make sure that you have a .env
file in your project root and a package like dotenv
to load the variables.
LANGFUSE_SECRET_KEY="sk-lf-..."
LANGFUSE_PUBLIC_KEY="pk-lf-..."
# 🇪🇺 EU region
LANGFUSE_BASEURL="https://cloud.langfuse.com"
# 🇺🇸 US region
# LANGFUSE_BASEURL="https://us.cloud.langfuse.com"
With your environment configured, call OpenAI SDK methods as usual from the wrapped client.
import OpenAI from "openai";
import { observeOpenAI } from "langfuse";
const openai = observeOpenAI(new OpenAI());
const res = await openai.chat.completions.create({
messages: [{ role: "system", content: "Tell me a story about a dog." }],
model: "gpt-3.5-turbo",
max_tokens: 300,
});
Done!✨ You now have full observability of your OpenAI calls in Langfuse.
Check out the notebook for end-to-end examples of the integration:
Troubleshooting
Queuing and batching of events
The Langfuse SDKs queue and batches events in the background to reduce the number of network requests and improve overall performance. In a long-running application, this works without any additional configuration.
If you are running a short-lived application, you need to flush Langfuse to ensure that all events are flushed before the application exits.
await openai.flushAsync(); // method added by Langfuse wrapper
// If you have previously passed a parent span or trace for nesting, use that client for the flush call
await langfuse.flushAsync();
Learn more about queuing and batching of events here.
Assistants API
Tracing of the assistants api is not supported by this integration as OpenAI Assistants have server-side state that cannot easily be captured without additional api requests. We added some more information on how to best track usage of the assistants api in this FAQ.
Advanced usage
Custom trace properties
You can add the following properties to the langfuseConfig
of the observeOpenAI
function to use additional Langfuse features:
Property | Description |
---|---|
generationName | Set generationName to identify a specific type of generation. |
langfusePrompt | Pass a created or fetched Langfuse prompt to link it with the generations |
metadata | Set metadata with additional information that you want to see in Langfuse. |
sessionId | The current session. |
userId | The current user_id. |
version | Track different versions in Langfuse analytics |
release | Track different releases in Langfuse analytics |
tags | Set tags to categorize and filter traces. |
Example:
const res = await observeOpenAI(new OpenAI(), {
generationName: "Traced generation",
metadata: { someMetadataKey: "someValue" },
sessionId: "session-id",
userId: "user-id",
tags: ["tag1", "tag2"],
version: "0.0.1",
release: "beta",
}).chat.completions.create({
messages: [{ role: "system", content: "Tell me a story about a dog." }],
model: "gpt-3.5-turbo",
max_tokens: 300,
});
Adding custom properties requires you to wrap the OpenAI SDK with the
observeOpenAI
function and pass the properties as the second
langfuseConfig
argument. Since the Langfuse client here is a singleton and
the same client is used for all calls, you do not need to worry about
mistakingly having multiple clients running.
Link to Langfuse prompts
With Langfuse Prompt management you can effectively manage and version your prompts. You can link your OpenAI generations to a prompt by passing the langfusePrompt
property to the observeOpenAI
function.
import { observeOpenAI } from "langfuse";
import OpenAI from "openai";
const langfusePrompt = await langfuse.getPrompt("prompt-name"); // Fetch a previously created prompt
const res = await observeOpenAI(new OpenAI(), {
langfusePrompt,
}).completions.create({
prompt: langfusePrompt.prompt,
model: "gpt-3.5-turbo-instruct",
max_tokens: 300,
});
Resulting generations are now linked to the prompt in Langfuse, allowing you to track prompt usage and performance.
When working with chat prompts, you must typecast the compiled prompt messages as OpenAI.ChatCompletionMessageParam[]
or use a type-guard utility function as Langfuse message roles can be arbitrary strings whereas the OpenAI type definition is more restrictive.
OpenAI token usage on streamed responses
OpenAI returns the token usage on streamed responses only when in stream_options
the include_usage
parameter is set to true
. If you would like to benefit from OpenAI’s directly provided token usage, you can set { include_usage: true }
in the stream_options
argument.
import OpenAI from "openai";
import { observeOpenAI } from "langfuse";
const openai = observeOpenAI(new OpenAI());
const stream = await openai.chat.completions.create({
model: "gpt-4",
messages: [{ role: "user", content: "How are you?" }],
stream: true,
stream_options: { include_usage: true }
});
let result = "";
for await (const chunk of stream) {
// Check if chunk choices are not empty. OpenAI returns token usage in a final chunk with an empty choices list.
result += chunk.choices[0]?.delta?.content || "";
}
await openai.flushAsync();
Nested traces
Langfuse Tracing groups multiple observations (can be any LLM or non-LLM calls) into a single trace. This integration by default creates a single trace for each OpenAI call.
By passing an existing trace or span to the observeOpenAI
function as the parent
, you can:
- add non-OpenAI related observations to the trace.
- group multiple OpenAI calls into a single trace while customizing the trace.
- exert more control over the trace structure.
- leverage all Langfuse Tracing features.
New to Langfuse Tracing? Checkout this introduction to the basic concepts.
Use the Langfuse JS/TS SDK to create traces or spans manually and add OpenAI calls to it.
Example
Desired trace structure
TRACE: capital-poem-generator
|
|-- SPAN: France
| |
| |-- GENERATION: get-capital
| |
| |-- GENERATION: generate-poem
Implementation
import Langfuse, { observeOpenAI } from "langfuse";
// Initialize SDKs
const langfuse = new Langfuse();
const openai = new OpenAI();
// Create trace and add params
const trace = langfuse.trace({ name: "capital-poem-generator" });
// Create span
const country = "France";
const span = trace.span({ name: country });
// Call OpenAI
const capital = (
await observeOpenAI(openai, {
parent: span,
generationName: "get-capital",
}).chat.completions.create({
model: "gpt-3.5-turbo",
messages: [
{ role: "system", content: "What is the capital of the country?" },
{ role: "user", content: country },
],
})
).choices[0].message.content;
const poem = (
await observeOpenAI(openai, {
parent: span,
generationName: "generate-poem",
}).chat.completions.create({
model: "gpt-3.5-turbo",
messages: [
{
role: "system",
content: "You are a poet. Create a poem about this city.",
},
{ role: "user", content: capital },
],
})
).choices[0].message.content;
// End span to get span-level latencies
span.end();
// Flush the Langfuse client belonging to the parent span
await langfuse.flushAsync();