Cookbook: Observability for Groq Models (Python)
This cookbook shows two ways to interact with Groq models and trace them with Langfuse:
- Using the OpenAI SDK to interact with the Groq model
- Using the Groq SDK to interact with Groq models
By following these examples, you’ll learn how to log and trace interactions with Groq language models, enabling you to debug and evaluate the performance of your AI-driven applications.
Note: Langfuse is also natively integrated with LangChain, LlamaIndex, LiteLLM, and other frameworks. If you use one of them, any use of Groq models is instrumented right away.
Overview
In this notebook, we will explore various use cases where Langfuse can be integrated with the Groq SDK, including:
- Basic LLM Calls: Learn how to wrap standard Groq model interactions with Langfuse’s
@observe
decorator for comprehensive logging. - Chained Function Calls: See how to manage and observe complex workflows where multiple model interactions are linked together to produce a final result.
- Streaming Support: Discover how to use Langfuse with streaming responses from Groq models, ensuring that real-time interactions are fully traceable.
To get started, set up your environment variables for Langfuse and Groq:
import os
# Get keys for your project from the project settings page: https://cloud.langfuse.com
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..."
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com" # 🇪🇺 EU region
# os.environ["LANGFUSE_HOST"] = "https://us.cloud.langfuse.com" # 🇺🇸 US region
# Your Groq API key
os.environ["GROQ_API_KEY"] = "gsk_..."
Option 1: Using the OpenAI SDK to interact with the Groq model
Note: This example shows how to use the OpenAI Python SDK. If you use JS/TS, have a look at our OpenAI JS/TS SDK.
Install Required Packages
%pip install langfuse openai --upgrade
Import Necessary Modules
Instead of importing openai
directly, import it from langfuse.openai
. Also, import any other necessary modules.
# Instead of: import openai
from langfuse.openai import OpenAI
Initialize the OpenAI Client for the Groq Model
Initialize the OpenAI client but point it to the Groq model endpoint. Replace the access token with your own.
client = OpenAI(
base_url="https://api.groq.com/openai/v1",
api_key=os.environ.get("GROQ_API_KEY")
)
Chat Completion Request
Use the client
to make a chat completion request to the Groq model.
completion = client.chat.completions.create(
model="llama3-8b-8192",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{
"role": "user",
"content": "Write a poem about language models"
}
]
)
print(completion.choices[0].message.content)
Option 2: Using the Groq SDK to interact with Groq models
For more detailed guidance on the Groq SDK or the @observe
decorator from Langfuse, please refer to the Groq Documentation and the Langfuse Documentation.
Install Required Packages
%pip install groq langfuse
Initialize the Groq client:
from groq import Groq
# Initialize Groq client
groq_client = Groq(api_key=os.environ["GROQ_API_KEY"])
Examples
Basic LLM Calls
We are integrating the Groq SDK with Langfuse using the @observe
decorator, which is crucial for logging and tracing interactions with large language models (LLMs). The @observe(as_type="generation")
decorator specifically logs LLM interactions, capturing inputs, outputs, and model parameters. The resulting groq_chat_completion
method can then be used across your project.
from langfuse import observe, get_client
langfuse = get_client()
# Function to handle Groq chat completion calls, wrapped with @observe to log the LLM interaction
@observe(as_type="generation")
def groq_chat_completion(**kwargs):
# Clone kwargs to avoid modifying the original input
kwargs_clone = kwargs.copy()
# Extract relevant parameters from kwargs
messages = kwargs_clone.pop('messages', None)
model = kwargs_clone.pop('model', None)
temperature = kwargs_clone.pop('temperature', None)
max_tokens = kwargs_clone.pop('max_tokens', None)
top_p = kwargs_clone.pop('top_p', None)
# Filter and prepare model parameters for logging
model_parameters = {
"max_tokens": max_tokens,
"temperature": temperature,
"top_p": top_p
}
model_parameters = {k: v for k, v in model_parameters.items() if v is not None}
# Log the input and model parameters before calling the LLM
langfuse.update_current_generation(
input=messages,
model=model,
model_parameters=model_parameters,
metadata=kwargs_clone,
)
# Call the Groq model to generate a response
response = groq_client.chat.completions.create(**kwargs)
# Log the usage details and output content after the LLM call
choice = response.choices[0]
langfuse.update_current_generation(
usage_details={
"input": len(str(messages)),
"output": len(choice.message.content)
},
output=choice.message.content
)
# Return the model's response object
return response
Simple Example
In the following example, we also added the decorator to the top-level function find_best_painter_from
. This function calls the groq_chat_completion
function, which is decorated with @observe(as_type="generation")
. This hierarchical setup helps to trace more complex applications that involve multiple LLM calls and other non-LLM methods decorated with @observe
.
@observe()
def find_best_painter_from(country="France"):
response = groq_chat_completion(
model="llama3-70b-8192",
max_tokens=1024,
temperature=0.4,
messages=[
{
"role": "user",
"content": f"this is a test"
}
]
)
return response.choices[0].message.content
print(find_best_painter_from())
Chained Completions
This example demonstrates chaining multiple LLM calls using the @observe()
decorator. The first call identifies the best painter from a specified country, and the second call uses that painter’s name to find their most famous painting. Both interactions are logged by Langfuse as we use the wrapped groq_chat_completion
method created above, ensuring full traceability across the chained requests.
from langfuse import observe, get_client
langfuse = get_client()
@observe()
def find_best_painting_from(country="France"):
response = groq_chat_completion(
model="llama3-70b-8192",
max_tokens=1024,
temperature=0.1,
messages=[
{
"role": "user",
"content": f"Who is the best painter from {country}? Only provide the name."
}
]
)
painter_name = response.choices[0].message.content.strip()
response = groq_chat_completion(
model="llama3-70b-8192",
max_tokens=1024,
messages=[
{
"role": "user",
"content": f"What is the most famous painting of {painter_name}? Answer in one short sentence."
}
]
)
return response.choices[0].message.content
print(find_best_painting_from("Germany"))
Streaming Completions
The following example demonstrates how to handle streaming responses from the Groq model using the @observe(as_type="generation")
decorator. The process is similar to the completion example but includes handling streamed data in real-time.
from langfuse import observe, get_client
langfuse = get_client()
@observe(as_type="generation")
def stream_groq_chat_completion(**kwargs):
kwargs_clone = kwargs.copy()
messages = kwargs_clone.pop('messages', None)
model = kwargs_clone.pop('model', None)
temperature = kwargs_clone.pop('temperature', None)
max_tokens = kwargs_clone.pop('max_tokens', None)
top_p = kwargs_clone.pop('top_p', None)
model_parameters = {
"max_tokens": max_tokens,
"temperature": temperature,
"top_p": top_p
}
model_parameters = {k: v for k, v in model_parameters.items() if v is not None}
langfuse.update_current_generation(
input=messages,
model=model,
model_parameters=model_parameters,
metadata=kwargs_clone,
)
stream = groq_client.chat.completions.create(stream=True, **kwargs)
final_response = ""
for chunk in stream:
content = str(chunk.choices[0].delta.content)
final_response += content
yield content
langfuse.update_current_generation(
usage_details={
"total_tokens": len(final_response.split())
},
output=final_response
)
Usage:
@observe()
def stream_find_best_five_painter_from(country="France"):
response_chunks = stream_groq_chat_completion(
model="llama3-70b-8192",
max_tokens=1024,
messages=[
{
"role": "user",
"content": f"Who are the best five painters from {country}? Give me a list of names and their most famous painting."
}
]
)
final_response = ""
for chunk in response_chunks:
final_response += str(chunk)
print(chunk, end="")
return final_response
stream_find_best_five_painter_from("Spain")
Interoperability with the Python SDK
You can use this integration together with the Langfuse Python SDK to add additional attributes to the trace.
The @observe()
decorator provides a convenient way to automatically wrap your instrumented code and add additional attributes to the trace.
from langfuse import observe, get_client
langfuse = get_client()
@observe()
def my_instrumented_function(input):
output = my_llm_call(input)
langfuse.update_current_trace(
input=input,
output=output,
user_id="user_123",
session_id="session_abc",
tags=["agent", "my-trace"],
metadata={"email": "[email protected]"},
version="1.0.0"
)
return output
Learn more about using the Decorator in the Python SDK docs.
Next Steps
Once you have instrumented your code, you can manage, evaluate and debug your application: