Databricks and Langfuse

What is Databricks? > Databricks is a unified analytics platform founded by the creators of Apache Spark. It provides an interactive workspace for collaborative data engineering, machine learning, and data analytics. With Databricks, teams can build, train, and deploy models at scale, efficiently harnessing big data and advanced analytics tools.

What is Langfuse? > Langfuse is a comprehensive platform designed to help developers monitor, trace, and evaluate their language models in production. It offers powerful insights through detailed logging and event tracing, ensuring robust performance monitoring and easier debugging of AI applications.

Tracing and Observability

Integrating Databricks with Langfuse enables you to trace your applications built with Databricks, experiment with prompts in the Langfuse Playground and benchmark your models through rigorous evaluations.

Example Notebook

Playground & Evaluations

This guide walks you through integrating Databricks language model endpoints with Langfuse. By doing so, you can quickly experiment with prompts and debug interactions using the Langfuse Playground, as well as benchmark your models systematically with Evaluations.

With Langfuse, you can:

Experiment in the Playground: The interactive Playground lets you test your language models in real-time. You can send custom prompts, review detailed responses, and add prompts to your Prompt Library.
Benchmark with Evaluations: LLM-as-a-Judge evaluations provide a way to benchmark your application’s performance. You can run pre-defined test templates, analyze metrics like latency and accuracy, and refine your models based on measurable outcomes.

Set Up a Serving Endpoint in Databricks

Begin by setting up a serving endpoint in Databricks. This lets you query custom fine-tuned models or models served via a gateway such as OpenAI or Anthropic. For advanced configuration options, refer to the Databricks docs.

Set up a Serving Endpoint in Databricks

Add the Model in your Project Settings

Next, add your Databricks model endpoint to your Langfuse project settings.

Make sure you’ve entered the correct endpoint URL and authentication details. The model name is the name of the serving endpoint you created in Databricks.

Add the Model in Your Project Settings

Use the Model in the Playground

The Langfuse Playground offers an interactive interface where you can:

Send prompts and view quick results.
Add prompts to your Prompt Library.

Use the Model in the Playground

Select Databricks as your LLM provider and choose the endpoint you configured earlier.

Use the Model for Evaluations

LLM-as-a-judge is a technique to evaluate the quality of LLM applications by using an LLM as a judge. The LLM is given a trace or a dataset entry and asked to score and reason about the output. The scores and reasoning are stored as scores in Langfuse.

Use the Model for Evaluations

If you want to learn more about LLM Evals, check out our blog post:

LLM Evaluation 101: Best Practices and Challenges

Cohere DeepSeek

Was this page helpful?

Support