Integration: π LiteLLM SDK
LiteLLM (GitHub): Use any LLM as a drop in replacement for GPT. Use Azure, OpenAI, Cohere, Anthropic, Ollama, VLLM, Sagemaker, HuggingFace, Replicate (100+ LLMs).
This integration is for the LiteLLM SDK. If you are looking for the LiteLLM Proxy integration, see the LiteLLM Proxy Integration page.
The LiteLLM SDK is a Python library that allows you to use any LLM as a drop in replacement for the OpenAI SDK.
This integration is covered by the LiteLLM integration docs.
Get Started
LiteLLM relies on the Langfuse Python SDK v2. It is currently not compatible with the newer Python SDK v3. Please refer to the v2 documentation and pin the SDK version during installation:
pip install "langfuse>=2,<3" litellm
from litellm import completion
## set env variables
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
# Langfuse host
os.environ["LANGFUSE_HOST"]="https://cloud.langfuse.com" # πͺπΊ EU region
# os.environ["LANGFUSE_HOST"]="https://us.cloud.langfuse.com" # πΊπΈ US region
# Model API keys (example)
os.environ["OPENAI_API_KEY"] = ""
os.environ["COHERE_API_KEY"] = ""
# set callbacks
litellm.success_callback = ["langfuse"]
litellm.failure_callback = ["langfuse"]
Quick Example
import litellm
# openai call
openai_response = litellm.completion(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "Hi π - i'm openai"}
]
)
print(openai_response)
# cohere call
cohere_response = litellm.completion(
model="command-nightly",
messages=[
{"role": "user", "content": "Hi π - i'm cohere"}
]
)
print(cohere_response)
Use within decorated function
If you want to use the LiteLLM SDK within a decorated function (observe() decorator), you can use the langfuse.get_current_trace_id()
and langfuse.get_current_observation_id()
methods to pass the correct nesting information to the LiteLLM SDK.
from litellm import completion
from langfuse import observe, get_client
langfuse = get_client()
@observe()
def fn():
# set custom langfuse trace params and generation params
response = completion(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "Hi π - i'm openai"}
],
metadata={
"existing_trace_id": langfuse.get_current_trace_id(), # set langfuse trace ID
"parent_observation_id": langfuse.get_current_observation_id(),
},
)
print(response)
GitHub issue tracking a native integration that will automatically capture nested traces when the LiteLLM SDK is used within a decorated function.
Set Custom Trace ID, Trace User ID and Tags
You can add additional Langfuse attributes to the requests in order to group requests into traces, add userIds, tags, sessionIds, and more. These attributes are shared across LiteLLM Proxy and SDK, please refer to both documentation pages to learn about all potential options:
from litellm import completion
# set custom langfuse trace params and generation params
response = completion(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "Hi π - i'm openai"}
],
metadata={
"generation_name": "test-generation", # set langfuse Generation Name
"generation_id": "gen-id", # set langfuse Generation ID
"trace_id": "trace-id", # set langfuse Trace ID
"trace_user_id": "user-id", # set langfuse Trace User ID
"session_id": "session-id", # set langfuse Session ID
"tags": ["tag1", "tag2"] # set langfuse Tags
},
)
print(response)
Use LangChain ChatLiteLLM + Langfuse
pip install langchain
from langchain.chat_models import ChatLiteLLM
from langchain.schema import HumanMessage
import litellm
chat = ChatLiteLLM(
model="gpt-3.5-turbo"
model_kwargs={
"metadata": {
"trace_user_id": "user-id", # set Langfuse Trace User ID
"session_id": "session-id", # set Langfuse Session ID
"tags": ["tag1", "tag2"] # set Langfuse Tags
}
}
)
messages = [
HumanMessage(
content="what model are you"
)
]
chat(messages)
Customize Langfuse Python SDK via Environment Variables
To customize Langfuse settings, use the Langfuse environment variables. These will be picked up by the LiteLLM SDK on initialization as it uses the Langfuse Python SDK under the hood.
Learn more about LiteLLM
What is LiteLLM?
LiteLLM is an open source proxy server to manage auth, loadbalancing, and spend tracking across more than 100 LLMs. LiteLLM has grown to be a popular utility for developers working with LLMs and is universally thought to be a useful abstraction.
Is LiteLLM an Open Source project?
Yes, LiteLLM is open source. The majority of its code is permissively MIT-licensed. You can find the open source LiteLLM repository on GitHub.
Can I use LiteLLM with Ollama and local models?
Yes, you can use LiteLLM with Ollama and other local models. LiteLLM supports all models from Ollama, and it provides a Docker image for an OpenAI API-compatible server for local LLMs like llama2, mistral, and codellama.
How does LiteLLM simplify API calls across multiple LLM providers?
LiteLLM provides a unified interface for calling models such as OpenAI, Anthropic, Cohere, Ollama and others. This means you can call any supported model using a consistent method, such as completion(model, messages)
, and expect a uniform response format. The library does away with the need for if/else statements or provider-specific code, making it easier to manage and debug LLM interactions in your application.