Docs
Features
Multi Modality

Multi-Modality

Langfuse supports rendering Multi-Modal traces, including both text and image formats. We follow OpenAI's format convention.

How to trace Multi-Modal content in Langfuse?

To utilize our Multi-Modal Trace support, your trace or observation input/output should include a list of messages comprising the conversation so far. Each message should contain a role (system, user, assistant) and content. To display multi-modal content, you can pass a combination of text and image URLs. The content property of the messages follows the OpenAI convention.

We plan to extend support to base64 images, file attachments (e.g., PDFs), and audio soon. Please add your upvote and any thoughts to this thread (opens in a new tab).

Visual Representation in Langfuse

When the "Markdown" option is enabled in the Langfuse UI, you can click on the image icon to preview the image inline.

Trace in Langfuse UI

Content Format

ContentTypeDescription
Default: Text contentstringThe text contents of the message.
Multi-Modal: Array of content partsarrayAn array of content parts with a defined type, each can be of type text or image_url. You can pass multiple images by adding multiple image_url content parts.

Content Examples

{
  "content": [
    {
      "role": "system",
      "content": "You are an AI trained to describe and interpret images. Describe the main objects and actions in the image."
    },
    {
      "role": "user",
      "content": [
        { 
          "type": "text",
          "text": "What's happening in this image?"
        },
        { 
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/image.jpg"
          }
        }
      ]
    }
  ]
}
Content Part Types
PropertyTypeDescription
typetextType of content part
textstringText content of the message

For more details and examples, refer to our OpenAI cookbook.

Was this page useful?

Questions? We're here to help

Subscribe to updates