Multi-modal datasets
Create Langfuse dataset items with images, audio, video, documents, and other attachments for SDK-based multi-modal experiments.
You can now add media attachments to Langfuse dataset items and use them in SDK-based multi-modal experiments. Dataset item input, expectedOutput, and metadata can include media uploaded from the UI or via the Python and JS/TS SDKs.
Use this to build visual QA datasets, compare generated images against reference files, or run evaluations over audio, documents, and other multi-modal inputs. In SDK-based experiments, dataset media is resolved into media references by default, with helpers to fetch them as bytes, base64, or data URIs depending on the format your model provider expects.
Multi-modal datasets are supported for SDK-based experiments with Python SDK
>= 4.10.0 and JS/TS SDK @langfuse/client >= 5.6.0. UI-based
experiments do not yet support dataset items with media attachments.