Guides
Cookbooks
Langfuse Sdk Performance Test
This is a Jupyter notebook

Langfuse SDK Performance Test

Langfuse shall have a minimal impact on latency. This is achieved by running almost entirely in the background and by batching all requests to the Langfuse API.

Coverage of this performance test:

  • Langfuse SDK: trace(), generation(), span()
  • Langchain Integration
  • OpenAI Integration
  • LlamaIndex Integration

Limitation: We test integrations using OpenAI's hosted models, making the experiment less controlled but actual latency of the integrations impact more realistic.

Setup

%pip install langfuse --upgrade
import os
 
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["LANGFUSE_HOST"] = ""
os.environ["OPENAI_API_KEY"] = ""
from langfuse import Langfuse
 
langfuse = Langfuse()
import pandas as pd
import timeit
 
def time_func(func, runs=100):
    durations = []
    for _ in range(runs):
        start = timeit.default_timer()
        func()
        stop = timeit.default_timer()
        durations.append(stop - start)
 
    desc = pd.Series(durations).describe()
    desc.index = [f'{name} (sec)' if name != 'count' else name for name in desc.index]
    return desc

Python SDK

trace()

time_func(lambda: langfuse.trace(name="perf-trace"))
count         100.000000
mean (sec)      0.000266
std (sec)       0.000381
min (sec)       0.000154
25% (sec)       0.000191
50% (sec)       0.000197
75% (sec)       0.000211
max (sec)       0.003784
dtype: float64

span()

trace = langfuse.trace(name="perf-trace")
 
time_func(lambda: trace.span(name="perf-span"))
count         100.000000
mean (sec)      0.000162
std (sec)       0.000199
min (sec)       0.000096
25% (sec)       0.000099
50% (sec)       0.000106
75% (sec)       0.000130
max (sec)       0.001635
dtype: float64

generation()

trace = langfuse.trace(name="perf-trace")
 
time_func(lambda: trace.generation(name="perf-generation"))
count         100.000000
mean (sec)      0.000196
std (sec)       0.000165
min (sec)       0.000132
25% (sec)       0.000137
50% (sec)       0.000148
75% (sec)       0.000173
max (sec)       0.001238
dtype: float64

event()

trace = langfuse.trace(name="perf-trace")
 
time_func(lambda: trace.event(name="perf-generation"))
count         100.000000
mean (sec)      0.000236
std (sec)       0.000300
min (sec)       0.000152
25% (sec)       0.000177
50% (sec)       0.000189
75% (sec)       0.000219
max (sec)       0.003144
dtype: float64

Langchain Integration

Docs: https://langfuse.com/docs/integrations/langchain (opens in a new tab)

%pip install langchain langchain-openai --upgrade
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
 
prompt = ChatPromptTemplate.from_template("what is the city {person} is from?")
model = ChatOpenAI(max_tokens=10)
chain = prompt | model | StrOutputParser()
from langfuse.callback import CallbackHandler
langfuse_handler = CallbackHandler()

Bechmark without Langfuse

langchain_stats_no_langfuse = time_func(lambda: chain.invoke({"person":"Paul Graham"}))
langchain_stats_no_langfuse
count         100.000000
mean (sec)      0.529463
std (sec)       0.685193
min (sec)       0.306092
25% (sec)       0.373373
50% (sec)       0.407278
75% (sec)       0.530427
max (sec)       7.107237
dtype: float64

With Langfuse Tracing

langchain_stats_with_langfuse = time_func(lambda: chain.invoke({"person":"Paul Graham"}, {"callbacks":[langfuse_handler]}))
langchain_stats_with_langfuse
count         100.000000
mean (sec)      0.618286
std (sec)       0.165149
min (sec)       0.464992
25% (sec)       0.518323
50% (sec)       0.598474
75% (sec)       0.675420
max (sec)       1.838614
dtype: float64

OpenAI Integration

Docs: https://langfuse.com/docs/integrations/openai (opens in a new tab)

%pip install langfuse openai --upgrade --quiet
import openai

Benchmark without Langfuse

time_func(lambda: openai.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
      {"role": "user", "content": "what is the city Paul Graham is from?"}],
  temperature=0,
  max_tokens=10,
))
count         100.000000
mean (sec)      0.524097
std (sec)       0.220446
min (sec)       0.288002
25% (sec)       0.395479
50% (sec)       0.507395
75% (sec)       0.571789
max (sec)       1.789671
dtype: float64

With Langfuse Tracing

from langfuse.openai import openai
time_func(lambda: openai.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
      {"role": "user", "content": "what is the city Paul Graham is from?"}],
  temperature=0,
  max_tokens=10,
))
count         100.000000
mean (sec)      0.515243
std (sec)       0.286902
min (sec)       0.283431
25% (sec)       0.378736
50% (sec)       0.435775
75% (sec)       0.558746
max (sec)       2.613779
dtype: float64

LlamaIndex Integration

Docs: https://langfuse.com/docs/integrations/llama-index (opens in a new tab)

%pip install llama-index --upgrade --quiet

Sample documents

from llama_index.core import Document
 
doc1 = Document(text="""
Maxwell "Max" Silverstein, a lauded movie director, screenwriter, and producer, was born on October 25, 1978, in Boston, Massachusetts. A film enthusiast from a young age, his journey began with home movies shot on a Super 8 camera. His passion led him to the University of Southern California (USC), majoring in Film Production. Eventually, he started his career as an assistant director at Paramount Pictures. Silverstein's directorial debut, “Doors Unseen,” a psychological thriller, earned him recognition at the Sundance Film Festival and marked the beginning of a successful directing career.
""")
doc2 = Document(text="""
Throughout his career, Silverstein has been celebrated for his diverse range of filmography and unique narrative technique. He masterfully blends suspense, human emotion, and subtle humor in his storylines. Among his notable works are "Fleeting Echoes," "Halcyon Dusk," and the Academy Award-winning sci-fi epic, "Event Horizon's Brink." His contribution to cinema revolves around examining human nature, the complexity of relationships, and probing reality and perception. Off-camera, he is a dedicated philanthropist living in Los Angeles with his wife and two children.
""")

Bechmark without Langfuse

Index

# Example index construction + LLM query
from llama_index.core import VectorStoreIndex
 
time_func(lambda: VectorStoreIndex.from_documents([doc1,doc2]))
count         100.000000
mean (sec)      0.171673
std (sec)       0.058332
min (sec)       0.112696
25% (sec)       0.136361
50% (sec)       0.157330
75% (sec)       0.178455
max (sec)       0.459417
dtype: float64

Query

index = VectorStoreIndex.from_documents([doc1,doc2])
time_func(lambda: index.as_query_engine().query("What did he do growing up?"))
count         100.000000
mean (sec)      0.795817
std (sec)       0.338263
min (sec)       0.445060
25% (sec)       0.614282
50% (sec)       0.756573
75% (sec)       0.908411
max (sec)       3.495263
dtype: float64

With Langfuse Tracing

from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from langfuse.llama_index import LlamaIndexCallbackHandler
 
langfuse_callback_handler = LlamaIndexCallbackHandler()
Settings.callback_manager = CallbackManager([langfuse_callback_handler])

Index

time_func(lambda: VectorStoreIndex.from_documents([doc1,doc2]))
count         100.000000
mean (sec)      0.178796
std (sec)       0.101976
min (sec)       0.112530
25% (sec)       0.138217
50% (sec)       0.163698
75% (sec)       0.179563
max (sec)       0.992403
dtype: float64

Query

index = VectorStoreIndex.from_documents([doc1,doc2])
time_func(lambda: index.as_query_engine().query("What did he do growing up?"))
count         100.000000
mean (sec)      0.802315
std (sec)       0.230386
min (sec)       0.423413
25% (sec)       0.639373
50% (sec)       0.784945
75% (sec)       0.945300
max (sec)       2.164593
dtype: float64

Was this page useful?

Questions? We're here to help

Subscribe to updates