Docs
Prompt Management

Prompt Management

Use Langfuse to effectively manage and version your prompts. This allows you to iterate quickly, publish new prompt versions without redeploying your app, and track metrics by prompt version.

from langfuse import Langfuse
 
langfuse = Langfuse()
 
# Get current production prompt in application
langfuse.get_prompt("prompt name")

The workflow for managing prompts in Langfuse includes the following steps:

Create / Update prompts

Create

Update

Use the edit button in the Langfuse UI or create a new prompt version via the SDKs with the same name.

Promote prompt to production

Set a prompt version to active when creating it via the SDKs. In the Langfuse UI, you can promote a prompt version to production:

Retrieve prompt in your application

# Get current production version
prompt = langfuse.get_prompt("prompt name")
 
# Get specific version
prompt = langfuse.get_prompt("prompt name", version=3)
 
# Get specific version and extend cache TTL from default 60 to 300 seconds
prompt = langfuse.get_prompt("prompt name", version=3, cache_ttl_seconds=300)
 
# Insert variables into prompt template
compiled_prompt = prompt.compile(input="test")
 
# Access the config of the prompt
config = prompt.config
 

Caching in client SDKs

While Langfuse Tracing is fully asynchronous and non-blocking, managing prompts in Langfuse adds latency to your application when retrieving the prompt. To minimize the impact on your application, prompts are cached for 60 seconds in the client SDKs once fetched. The cache TTL can be configured directly in the SDKs.

When refetching a prompt fails but an expired version is in the cache, the SDKs will return the expired version, preventing application blockage due to network issues.

# Get current production prompt version and cache for 5 minutes
prompt = langfuse.get_prompt("prompt name", cache_ttl_seconds=300)
 
# Get a specific prompt version and cache for 5 minutes
prompt = langfuse.get_prompt("prompt name", version=3, cache_ttl_seconds=300)
 
# Disable caching for a prompt
prompt = langfuse.get_prompt("prompt name", cache_ttl_seconds=0)

Performance on first fetch (excluding cache hits)

We measured the execution time of the following snippet (prompt retrieval and compilation). Note that this excludes cache hits where the prompt is available immediately.

prompt = langfuse.get_prompt("perf-test")
prompt.compile(input="test")

Results from 1000 sequential executions in a local jupyter notebook using Langfuse Cloud (includes network latency):

Performance Chart

count  1000.000000
mean      0.178465 sec
std       0.058125 sec
min       0.137314 sec
25%       0.161333 sec
50%       0.165919 sec
75%       0.171736 sec
max       0.687994 sec

Link generations in Langfuse Tracing to prompt versions

Add the prompt object to the generation call in the SDKs to link the generation in Langfuse Tracing to the prompt version. This allows you to track metrics by prompt name and version in the Langfuse UI.

langfuse.generation(
    ...
+   prompt=prompt
    ...
)

Demo

We used this feature for our Docs Q&A Chatbot and traced it with Langfuse. You can get view-only access to the project by signing up to the public demo.

Was this page useful?

Questions? We're here to help

Subscribe to updates