June 23, 2026

Multi-modal datasets

Tobias Wochinger

Create Langfuse dataset items with images, audio, video, documents, and other attachments for SDK-based multi-modal experiments.

You can now add media attachments to Langfuse dataset items and use them in SDK-based multi-modal experiments. Dataset item input, expectedOutput, and metadata can include media uploaded from the UI or via the Python and JS/TS SDKs.

Use this to build visual QA datasets, compare generated images against reference files, or run evaluations over audio, documents, and other multi-modal inputs. In SDK-based experiments, dataset media is resolved into media references by default, with helpers to fetch them as bytes, base64, or data URIs depending on the format your model provider expects.

Multi-modal datasets are supported for SDK-based experiments with Python SDK >= 4.10.0 and JS/TS SDK @langfuse/client >= 5.6.0. UI-based experiments do not yet support dataset items with media attachments.

Get started

Datasets

Experiments via SDK

Was this page helpful?

PreviousMonitors and Alerts

NextKeyboard shortcuts for annotation queues