DocsIntegrationsOpenAI SDKJS/TSGet Started

Observability for OpenAI SDK (JS/TS)

Looking for the Python version? Check it out here.

The Langfuse JS/TS SDK offers a wrapper function around the OpenAI SDK, enabling you to easily add observability to your OpenAI calls. This includes tracking latencies, time-to-first-token on stream responses, errors, and model usage.

import OpenAI from "openai";
import { observeOpenAI } from "langfuse";
 
const openai = observeOpenAI(new OpenAI());
 
const res = await openai.chat.completions.create({
  messages: [{ role: "system", content: "Tell me a story about a dog." }],
});

Langfuse automatically tracks:

  • All prompts/completions with support for streaming and function calling
  • Total latencies and time-to-first-token
  • OpenAI API Errors
  • Model usage (tokens) and cost (USD) (learn more)

In the Langfuse Console

How it works

Install Langfuse SDK

The integration is compatible with OpenAI SDK versions >=4.0.0.

npm install langfuse openai

Call OpenAI methods with the wrapped client

Langfuse wraps the OpenAI SDK and provides the functionality and method signatures. You can call them as usual.

The observeOpenAI function automatically instantiates a Langfuse client in the background. You can either configure the client with environment variables or pass the configuration directly to the observeOpenAI function.

Add your Langfuse credentials to your environment variables. You can find your credentials in your project settings in the Langfuse UI. Make sure that you have a .env file in your project root and a package like dotenv to load the variables.

.env
LANGFUSE_SECRET_KEY="sk-lf-..."
LANGFUSE_PUBLIC_KEY="pk-lf-..."
LANGFUSE_BASEURL="https://cloud.langfuse.com" # 🇪🇺 EU region
# LANGFUSE_BASEURL="https://us.cloud.langfuse.com" # 🇺🇸 US region

With your environment configured, call OpenAI SDK methods as usual from the wrapped client.

import OpenAI from "openai";
import { observeOpenAI } from "langfuse";
 
const openai = observeOpenAI(new OpenAI());
 
const res = await openai.chat.completions.create({
  messages: [{ role: "system", content: "Tell me a story about a dog." }],
  model: "gpt-3.5-turbo",
  max_tokens: 300,
});

Done!✨ You now have full observability of your OpenAI calls in Langfuse.

Check out the notebook for end-to-end examples of the integration:

Troubleshooting

Queuing and batching of events

The Langfuse SDKs queue and batches events in the background to reduce the number of network requests and improve overall performance. In a long-running application, this works without any additional configuration.

If you are running a short-lived application, you need to flush Langfuse to ensure that all events are flushed before the application exits.

await openai.flushAsync(); // method added by Langfuse wrapper
 
// If you have previously passed a parent span or trace for nesting, use that client for the flush call
await langfuse.flushAsync();

Learn more about queuing and batching of events here.

Assistants API

Tracing of the assistants api is not supported by this integration as OpenAI Assistants have server-side state that cannot easily be captured without additional api requests. We added some more information on how to best track usage of the assistants api in this FAQ.

Advanced usage

Custom trace properties

You can add the following properties to the langfuseConfig of the observeOpenAI function to use additional Langfuse features:

PropertyDescription
generationNameSet generationName to identify a specific type of generation.
langfusePromptPass a created or fetched Langfuse prompt to link it with the generations
metadataSet metadata with additional information that you want to see in Langfuse.
sessionIdThe current session.
userIdThe current user_id.
versionTrack different versions in Langfuse analytics
releaseTrack different releases in Langfuse analytics
tagsSet tags to categorize and filter traces.

Example:

const res = await observeOpenAI(new OpenAI(), {
  generationName: "Traced generation",
  metadata: { someMetadataKey: "someValue" },
  sessionId: "session-id",
  userId: "user-id",
  tags: ["tag1", "tag2"],
  version: "0.0.1",
  release: "beta",
}).chat.completions.create({
  messages: [{ role: "system", content: "Tell me a story about a dog." }],
  model: "gpt-3.5-turbo",
  max_tokens: 300,
});

Adding custom properties requires you to wrap the OpenAI SDK with the observeOpenAI function and pass the properties as the second langfuseConfig argument. Since the Langfuse client here is a singleton and the same client is used for all calls, you do not need to worry about mistakingly having multiple clients running.

With Langfuse Prompt management you can effectively manage and version your prompts. You can link your OpenAI generations to a prompt by passing the langfusePrompt property to the observeOpenAI function.

import { observeOpenAI } from "langfuse";
import OpenAI from "openai";
 
const langfusePrompt = await langfuse.getPrompt("prompt-name"); // Fetch a previously created prompt
 
const res = await observeOpenAI(new OpenAI(), {
  langfusePrompt,
}).completions.create({
  prompt: langfusePrompt.prompt,
  model: "gpt-3.5-turbo-instruct",
  max_tokens: 300,
});

Resulting generations are now linked to the prompt in Langfuse, allowing you to track prompt usage and performance.

When working with chat prompts, you must typecast the compiled prompt messages as OpenAI.ChatCompletionMessageParam[] or use a type-guard utility function as Langfuse message roles can be arbitrary strings whereas the OpenAI type definition is more restrictive.

OpenAI token usage on streamed responses

OpenAI returns the token usage on streamed responses only when in stream_options the include_usage parameter is set to true. If you would like to benefit from OpenAI’s directly provided token usage, you can set { include_usage: true } in the stream_options argument.

import OpenAI from "openai";
import { observeOpenAI } from "langfuse";
 
const openai = observeOpenAI(new OpenAI());
 
const stream = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: "How are you?" }],
  stream: true,
  stream_options: { include_usage: true }
});
 
let result = "";
 
for await (const chunk of stream) {
  // Check if chunk choices are not empty. OpenAI returns token usage in a final chunk with an empty choices list.
  result += chunk.choices[0]?.delta?.content || "";
}
 
await openai.flushAsync();

Nested traces

Langfuse Tracing groups multiple observations (can be any LLM or non-LLM calls) into a single trace. This integration by default creates a single trace for each OpenAI call.

By passing an existing trace or span to the observeOpenAI function as the parent, you can:

  • add non-OpenAI related observations to the trace.
  • group multiple OpenAI calls into a single trace while customizing the trace.
  • exert more control over the trace structure.
  • leverage all Langfuse Tracing features.

New to Langfuse Tracing? Checkout this introduction to the basic concepts.

Use the Langfuse JS/TS SDK to create traces or spans manually and add OpenAI calls to it.

Example

Desired trace structure

TRACE: capital-poem-generator
|
|-- SPAN: France
|   |
|   |-- GENERATION: get-capital
|   |
|   |-- GENERATION: generate-poem

Implementation

import Langfuse, { observeOpenAI } from "langfuse";
 
// Initialize SDKs
const langfuse = new Langfuse();
const openai = new OpenAI();
 
// Create trace and add params
const trace = langfuse.trace({ name: "capital-poem-generator" });
 
// Create span
const country = "France";
const span = trace.span({ name: country });
 
// Call OpenAI
const capital = (
  await observeOpenAI(openai, {
    parent: span,
    generationName: "get-capital",
  }).chat.completions.create({
    model: "gpt-3.5-turbo",
    messages: [
      { role: "system", content: "What is the capital of the country?" },
      { role: "user", content: country },
    ],
  })
).choices[0].message.content;
 
const poem = (
  await observeOpenAI(openai, {
    parent: span,
    generationName: "generate-poem",
  }).chat.completions.create({
    model: "gpt-3.5-turbo",
    messages: [
      {
        role: "system",
        content: "You are a poet. Create a poem about this city.",
      },
      { role: "user", content: capital },
    ],
  })
).choices[0].message.content;
 
// End span to get span-level latencies
span.end();
 
// Flush the Langfuse client belonging to the parent span
await langfuse.flushAsync();

FAQ

GitHub Discussions

Was this page useful?

Questions? We're here to help

Subscribe to updates