Langfuse Integration with MariaDB Vector Store Documentation
Overview
This document describes a Python script (main.py
) that integrates Langfuse for tracing with a MariaDB vector store using LangChain and Sentence Transformers. The script demonstrates how to:
- Initialize a Sentence Transformer model for embeddings.
- Set up Langfuse for tracing application logic.
- Configure a MariaDB vector store.
- Add documents to the vector store and perform a similarity search.
- Log results to Langfuse for observability.
Architecture Flow
Prerequisites
To run the script, ensure the following are installed and configured:
- Python 3.8+
- Dependencies:
langfuse
langchain-mariadb
langchain-community
sentence-transformers
python-dotenv
- MariaDB: A running MariaDB instance with a database named
langchain
. - Environment Variables:
- Create a
.env
file in the project root with the following:LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key> LANGFUSE_SECRET_KEY=<your-langfuse-secret-key> MARIADB_USER=<your-mariadb-username> MARIADB_PASSWORD=<your-mariadb-password>
- Create a
- Langfuse Account: Sign up at Langfuse to obtain
LANGFUSE_PUBLIC_KEY
andLANGFUSE_SECRET_KEY
.
Installation
-
Clone the repository or copy the script to your local environment.
-
Install the required Python packages:
pip install langfuse langchain-mariadb langchain-community sentence-transformers python-dotenv
-
Set up the
.env
file with your credentials. -
Ensure MariaDB is running and accessible at
localhost
with the databaselangchain
created.
Code Structure
The script (main.py
) is structured as follows:
1. Importing Dependencies
import os
from langfuse import get_client
from langchain_mariadb import MariaDBStore
from sentence_transformers import SentenceTransformer
from langchain_core.documents import Document
from langchain_community.embeddings import HuggingFaceEmbeddings
from dotenv import load_dotenv
os
anddotenv
: Load environment variables.langfuse
: Provides the Langfuse SDK for tracing.langchain_mariadb
: MariaDB vector store integration with LangChain.sentence_transformers
andlangchain_community
: Generate embeddings using theall-MiniLM-L6-v2
model.langchain_core.documents
: Defines theDocument
class for text storage.
2. Loading Environment Variables
load_dotenv()
Loads environment variables from the .env
file for secure access to credentials.
3. Initializing the Sentence Transformer Model
model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
- Uses the
all-MiniLM-L6-v2
model to generate 384-dimensional embeddings for text documents. - The model is lightweight and optimized for semantic text similarity.
4. Setting Up Langfuse
os.environ["LANGFUSE_PUBLIC_KEY"] = os.getenv("LANGFUSE_PUBLIC_KEY")
os.environ["LANGFUSE_SECRET_KEY"] = os.getenv("LANGFUSE_SECRET_KEY")
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"
langfuse = get_client()
if langfuse.auth_check():
print("Langfuse client is authenticated and ready!")
else:
print("Authentication failed. Check your credentials.")
- Configures Langfuse with public and secret keys from environment variables.
- Sets the Langfuse host to the cloud instance.
- Verifies authentication with Langfuse.
5. Configuring the MariaDB Vector Store
url = f"mariadb+mariadbconnector://{os.getenv('MARIADB_USER')}:{os.getenv('MARIADB_PASSWORD')}@localhost/langchain"
vectorstore = MariaDBStore(
embeddings=model,
embedding_length=384,
datasource=url,
collection_name="my_docs",
)
- Constructs a MariaDB connection string using environment variables.
- Initializes a
MariaDBStore
instance with:- The Sentence Transformer model for embeddings.
- Embedding length of 384 (specific to
all-MiniLM-L6-v2
). - Connection to the
langchain
database. - A collection named
my_docs
for storing documents.
6. Application Logic with Langfuse Tracing
with langfuse.start_as_current_span(name="mariadb-trace") as span:
vectorstore.add_documents(
[
Document(page_content="The sun is a star."),
Document(page_content="The moon is a natural satellite.")
]
)
results = vectorstore.similarity_search("Tell me about celestial bodies.")
span.update_trace(
metadata={"query": "Tell me about celestial bodies."}
)
print(f"Search results: {results}")
- Starts a Langfuse trace named
mariadb-trace
. - Adds two sample documents to the vector store.
- Performs a similarity search with the query “Tell me about celestial bodies.”
- Logs the query metadata to the Langfuse trace.
- Prints the search results.
7. Flushing Langfuse Data
langfuse.flush()
Ensures all trace data is sent to the Langfuse server.
Usage
-
Ensure MariaDB is running and the
langchain
database exists. -
Populate the
.env
file with your Langfuse and MariaDB credentials. -
Run the script:
python main.py
-
Expected output:
- Confirmation of Langfuse authentication.
- Search results from the MariaDB vector store, e.g.:
Langfuse client is authenticated and ready! Search results: [Document(page_content='The sun is a star.'), Document(page_content='The moon is a natural satellite.')]
-
Check the Langfuse dashboard (
https://cloud.langfuse.com
) for trace details under themariadb-trace
span.
Notes
- Embedding Model: The
all-MiniLM-L6-v2
model is used for its balance of performance and efficiency. Other models can be used by changing themodel_name
parameter, but ensure theembedding_length
matches the model’s output dimension. - MariaDB: Ensure the database user has appropriate permissions to create and modify tables in the
langchain
database. - Langfuse: Traces are logged to the Langfuse cloud instance. Ensure your credentials are valid to avoid authentication errors.
- Error Handling: The script includes basic authentication checks for Langfuse. Additional error handling (e.g., for MariaDB connection failures) can be added as needed.
Troubleshooting
- Langfuse Authentication Failure:
- Verify
LANGFUSE_PUBLIC_KEY
andLANGFUSE_SECRET_KEY
in the.env
file. - Ensure the Langfuse host URL is correct.
- Verify
- MariaDB Connection Issues:
- Check that MariaDB is running and accessible at
localhost
. - Confirm the
MARIADB_USER
andMARIADB_PASSWORD
are correct. - Ensure the
langchain
database exists.
- Check that MariaDB is running and accessible at
- No Search Results:
- Verify that documents were added successfully to the vector store.
- Ensure the query is relevant to the stored documents.