Caching of Prompts in Client SDKs

Langfuse prompts are cached client-side in the SDKs, so there’s no latency impact after the first use and no availability risk. You can also pre-fetch prompts on startup to populate the cache or provide a fallback prompt.

When the SDK cache contains a fresh prompt, it’s returned immediately without any network requests.

Optional: Customize caching duration (TTL)

The caching duration is configurable if you wish to reduce network overhead of the Langfuse Client. The default cache TTL is 60 seconds. After the TTL expires, the SDKs will refetch the prompt in the background and update the cache. Refetching is done asynchronously and does not block the application.

# Get current `production` prompt version and cache for 5 minutes
prompt = langfuse.get_prompt("movie-critic", cache_ttl_seconds=300)

Optional: Disable caching

You can disable caching by setting the cacheTtlSeconds to 0. This will ensure that the prompt is fetched from the Langfuse API on every call. This is recommended for non-production use cases where you want to ensure that the prompt is always up to date with the latest version in Langfuse.

prompt = langfuse.get_prompt("movie-critic", cache_ttl_seconds=0)
 
# Common in non-production environments, no cache + latest version
prompt = langfuse.get_prompt("movie-critic", cache_ttl_seconds=0, label="latest")

Optional: Guaranteed availability of prompts

While usually not necessary, you can ensure 100% availability of prompts by pre-fetching them on application startup and providing a fallback prompt. Please follow this guide for more information.

Performance measurement of inital fetch

We measured the execution time of the following snippet with fully disabled caching. You can run this notebook yourself to verify the results.

prompt = langfuse.get_prompt("perf-test", cache_ttl_seconds=0)
prompt.compile(input="test")

Results from 1000 sequential executions using Langfuse Cloud (includes network latency):

Performance Chart

count    1000.000000
mean        0.039335 sec
std         0.014172 sec
min         0.032702 sec
25%         0.035387 sec
50%         0.037030 sec
75%         0.041111 sec
99%         0.068914 sec
max         0.409609 sec

A/B Testing Config

Was this page helpful?

Support