Integration Litellm Proxy
This is a Jupyter notebook

Cookbook: LiteLLM (Proxy) + Langfuse OpenAI Integration + @observe Decorator

We want to share a stack that's commonly used by the Langfuse community to quickly experiment with 100+ models from different providers without changing code. This stack includes:

This cookbook is an end-to-end guide to set up and use this stack. As we'll use Python in this example, we will also use the @observe decorator to create nested traces. More on this below.

Let's dive right in!

Install dependencies

!pip install "litellm[proxy]" langfuse openai

Setup environment

import os
from langfuse.openai import auth_check
# Get keys for your project from the project settings page
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["LANGFUSE_HOST"] = "" # 🇪🇺 EU region
# os.environ["LANGFUSE_HOST"] = "" # 🇺🇸 US region
# Your openai key
os.environ["OPENAI_API_KEY"] = ""
# Test connection to Langfuse, not recommended for production as it is blocking

Setup Lite LLM Proxy

In this example, we'll use GPT-3.5-turbo directly from OpenAI, and llama3 and mistral via the Ollama on our local machine.


  1. Create a litellm_config.yaml to configure which models are available (docs (opens in a new tab)). We'll use gpt-3.5-turbo, and llama3 and mistral via Ollama in this example. Make sure to replace <openai_key> with your OpenAI API key.
      - model_name: gpt-3.5-turbo
          model: gpt-3.5-turbo
          api_key: <openai_key>
      - model_name: ollama/llama3
          model: ollama/llama3
      - model_name: ollama/mistral
          model: ollama/mistral
  2. Ensure that you installed Ollama and have pulled the llama3 (8b) and mistral (7b) models: ollama pull llama3 && ollama pull mistral
  3. Run the following cli command to start the proxy: litellm --config litellm_config.yaml

The Lite LLM Proxy should be now running on (opens in a new tab)

To verify the connection you can run litellm --test

Log single LLM Call via Langfuse OpenAI Wrapper

The Langfuse SDK offers a wrapper function around the OpenAI SDK, automatically logging all OpenAI calls as generations to Langfuse.

For more details, please refer to our documentation (opens in a new tab).

from langfuse.openai import openai
# Set PROXY_URL to the url of your lite_llm_proxy (by default:
system_prompt = "You are a very accurate calculator. You output only the result of the calculation."
# Configure the OpenAI client to use the LiteLLM proxy
client = openai.OpenAI(base_url=PROXY_URL)
gpt_completion =
  name="gpt-3.5", # optional name of the generation in langfuse
      {"role": "system", "content": system_prompt},
      {"role": "user", "content": "1 + 1 = "}],
llama_completion =
  name="llama3", # optional name of the generation in langfuse
      {"role": "system", "content": system_prompt},
      {"role": "user", "content": "3 + 3 = "}],

Public trace links for the following examples:

Trace nested LLM Calls via Langfuse OpenAI Wrapper and @observe decorator

Via the Langfuse @observe() decorator we can automatically capture execution details fo any python function such as inputs, outputs, timings, and more. The decorator simplifies achieving in-depth observability in your applications with minimal code, especially when non-LLM calls are involved for knowledge retrieval (RAG) or api calls (agents).

For more details on how to utilize this decorator and customize your tracing, refer to our documentation (opens in a new tab).

Let's have a look at a simple example which uses all three models we have set up in the LiteLLM Proxy:

from langfuse.decorators import observe
from langfuse.openai import openai
def rap_battle(topic: str):
    client = openai.OpenAI(
    messages = [
        {"role": "system", "content": "You are a rap artist. Drop a fresh line."},
        {"role": "user", "content": "Kick it off, today's topic is {topic}, here's the mic..."}
    # First model (gpt-3.5-turbo) starts the rap
    gpt_completion =
        name="rap-gpt-3.5-turbo", # add custom name to Langfuse observation
    first_rap = gpt_completion.choices[0].message.content
    messages.append({"role": "assistant", "content": first_rap})
    print("Rap 1:", first_rap)
    # Second model (ollama/llama3) responds
    llama_completion =
    second_rap = llama_completion.choices[0].message.content
    messages.append({"role": "assistant", "content": second_rap})
    print("Rap 2:", second_rap)
    # Third model (ollama/mistral) adds the final touch
    mistral_completion =
    third_rap = mistral_completion.choices[0].message.content
    messages.append({"role": "assistant", "content": third_rap})
    print("Rap 3:", third_rap)
    return messages
# Call the function

Public trace (opens in a new tab)

Public Trace

Learn more

Check out the docs to learn more about all components of this stack:

If you do not want to capture traces via the OpenAI SDK Wrapper, you can also directly log requests from the LiteLLM Proxy to Langfuse. For more details, refer to the LiteLLM Docs (opens in a new tab).

Was this page useful?

Questions? We're here to help

Subscribe to updates