IntegrationsGatewaysHelicone

Helicone Integration

In this guide, we’ll show you how to integrate Langfuse with Helicone.

What is Helicone? Helicone is an open-source AI gateway enabling you access to 100+ AI models through an OpenAI-compatible interface. It offers features like intelligent routing, automatic failover, caching, cost tracking, and more.

What is Langfuse? Langfuse is an open source LLM engineering platform that helps teams trace LLM calls, monitor performance, and debug issues in their AI applications.

Since Helicone is OpenAI-compatible, we can utilize Langfuse’s native integration with the OpenAI SDK, available in both Python and TypeScript.

Get started

  1. In your terminal, install the following packages if you haven’t already:
pip install langfuse openai python-dotenv
  1. Then, create a .env file in your project and add your environment variables:
HELICONE_API_KEY=sk-helicone-... # Get it from your Helicone dashboard

LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_BASE_URL=https://cloud.langfuse.com # 🇪🇺 EU region
# LANGFUSE_BASE_URL=https://us.cloud.langfuse.com 🇺🇸 US region

Example 1: Simple LLM Call

We use Langfuse’s OpenAI SDK wrapper to automatically log Helicone calls as generations in Langfuse.

  • The base_url is set to Helicone’s AI Gateway endpoint.
  • You can replace "gpt-4o" with any model available in Helicone’s model registry.
  • The api_key uses your Helicone API key to handle authentication with model providers.
from langfuse.openai import openai
import os
from dotenv import load_dotenv
 
load_dotenv()
 
# Create an OpenAI client with Helicone's gateway endpoint
client = openai.OpenAI(
    api_key=os.getenv("HELICONE_API_KEY"),
    base_url="https://ai-gateway.helicone.ai/"
)
 
# Make a chat completion request
response = client.chat.completions.create(
    model="gpt-4o", # Or any other 100+ models
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a fun fact about space."}
    ],
    name="fun-fact-request"  # Optional: Name of the generation in Langfuse
)
 
print(response.choices[0].message.content)

Example 2: Nested LLM Calls

By using the @observe() decorator, we capture execution details of any Python function, including nested LLM calls, inputs, outputs, and execution times. This provides in-depth observability with minimal code changes.

  • The @observe() decorator captures inputs, outputs, and execution details of the functions.
  • Nested functions summarize_text and analyze_sentiment are also decorated, creating a hierarchy of traces.
  • Each LLM call within the functions is logged, providing a detailed trace of the execution flow.

By using the @observe() decorator, we can capture execution details of any Python function. The @observe() decorator captures inputs, outputs, and execution details of the functions.

from langfuse import observe
from langfuse.openai import openai
import os
from dotenv import load_dotenv
 
load_dotenv()
 
# Create an OpenAI client with Helicone's base URL
client = openai.OpenAI(
    base_url="https://ai-gateway.helicone.ai/",
    api_key=os.getenv("HELICONE_API_KEY")
)
 
@observe()  # This decorator enables tracing of the function
def analyze_text(text: str):
    # First LLM call: Summarize the text
    summary_response = summarize_text(text)
    summary = summary_response.choices[0].message.content
 
    # Second LLM call: Analyze the sentiment of the summary
    sentiment_response = analyze_sentiment(summary)
    sentiment = sentiment_response.choices[0].message.content
 
    return {
        "summary": summary,
        "sentiment": sentiment
    }
 
@observe()  # Nested function to be traced
def summarize_text(text: str):
    return client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You summarize texts in a concise manner."},
            {"role": "user", "content": f"Summarize the following text:\n{text}"}
        ],
        name="summarize-text"
    )
 
@observe()  # Nested function to be traced
def analyze_sentiment(summary: str):
    return client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You analyze the sentiment of texts."},
            {"role": "user", "content": f"Analyze the sentiment of the following summary:\n{summary}"}
        ],
        name="analyze-sentiment"
    )
 
# Example usage
text_to_analyze = "OpenAI's GPT-4 model has significantly advanced the field of AI, setting new standards for language generation."
analyze_text(text_to_analyze)

Example 3: Streaming Responses

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Write a haiku about a robot."}
    ],
    stream=True,
    name="streaming-story"
)
 
print("🤖 Assistant (streaming):")
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

Example 4: Multi-Provider Access

Helicone provides access to 100+ LLM providers through a single interface. Simply change the model name to use different providers:

# Use Anthropic Claude
response = client.chat.completions.create(
    model="claude-3.5-sonnet-v2/anthropic",
    messages=[{"role": "user", "content": "Hello!"}]
)
 
# Use Gemini Pro if Anthropic's Claude Sonnet 3.5 is not available
response = client.chat.completions.create(
    model="claude-3.5-sonnet-v2/anthropic,gemini-2.5-flash-lite/google-ai-studio",
    messages=[{"role": "user", "content": "Hello!"}]
)

Learn More

Was this page helpful?