Masking of sensitive LLM data
Masking is a feature that allows precise control over the tracing data sent to the Langfuse server. With custom masking functions, you can control and sanitize the data that gets traced and sent to the server. Whether it’s for compliance reasons or to protect user privacy, masking sensitive data is a crucial step in responsible application development. It enables you to:
- Redact sensitive information from trace or observation inputs and outputs.
- Customize the content of events before transmission.
- Implement fine-grained data filtering based on your specific requirements.
Learn more about Langfuse’s data security and privacy measures concerning the stored data in our data security and privacy documentation.
How it works
- You define a custom masking function and pass it to the Langfuse client constructor.
- All event inputs and outputs are processed through this function.
- The masked data is then sent to the Langfuse server.
This approach ensures that you have complete control over the event input and output data traced by your application.
Define a masking function:
def masking_function(data):
if isinstance(data, str) and data.startswith("SECRET_"):
return "REDACTED"
return data
Use with the @observe()
decorator:
from langfuse.decorators import langfuse_context, observe
langfuse_context.configure(mask=masking_function)
@observe()
def fn():
return "SECRET_DATA"
fn()
langfuse_context.flush()
# The trace output in Langfuse will have the output masked as "REDACTED".
Use with the low-level SDK:
from langfuse import Langfuse
langfuse = Langfuse(mask=masking_function)
trace = langfuse.trace(output="SECRET_DATA")
langfuse.flush()
# The trace output in Langfuse will have the output masked as "REDACTED".
Examples
Now, we’ll show you examples how to use the masking feature. We’ll use the Langfuse decorator for this, but you can also use the low-level SDK or the JS/TS SDK analogously.
Example 1: Redacting Credit Card Numbers
In this example, we’ll demonstrate how to redact credit card numbers from strings using a regular expression. This helps in complying with PCI DSS by ensuring that credit card numbers are not transmitted or stored improperly.
Langfuse’s masking feature allows you to define a custom masking function with parameters, which you then pass to the Langfuse client constructor. This function is applied to all event inputs and outputs, processing each piece of data to mask or redact sensitive information according to your specifications. By ensuring that all events are processed through your masking function before being sent, Langfuse guarantees that only the masked data is transmitted to the Langfuse server.
Steps:
- Import necessary modules.
- Define a masking function that uses a regular expression to detect and replace credit card numbers.
- Configure the masking function in Langfuse.
- Create a sample function to simulate processing sensitive data.
- Observe the trace to see the masked output.
import re
from langfuse.decorators import langfuse_context, observe
# Step 2: Define the masking function
def masking_function(data):
if isinstance(data, str):
# Regular expression to match credit card numbers (Visa, MasterCard, AmEx, etc.)
pattern = r'\b(?:\d[ -]*?){13,19}\b'
data = re.sub(pattern, '[REDACTED CREDIT CARD]', data)
return data
# Step 3: Configure the masking function
langfuse_context.configure(mask=masking_function)
# Step 4: Create a sample function with sensitive data
@observe()
def process_payment():
# Simulated sensitive data containing a credit card number
transaction_info = "Customer paid with card number 4111 1111 1111 1111."
return transaction_info
# Step 5: Observe the trace
result = process_payment()
print(result)
# Output: Customer paid with card number [REDACTED CREDIT CARD].
Example 2: Using the llm-guard
library
In this example, we’ll use the Anonymize
scanner from llm-guard
to remove personal names and other PII from the data. This is useful for anonymizing user data and protecting privacy.
Find our more about the llm-guard
library in their documentation.
Steps:
- Install the
llm-guard
library. - Import necessary modules.
- Initialize the Vault and configure the Anonymize scanner.
- Define a masking function that uses the Anonymize scanner.
- Configure the masking function in Langfuse.
- Create a sample function to simulate processing data with PII.
- Observe the trace to see the masked output.
pip install llm-guard
from langfuse.decorators import langfuse_context, observe
from llm_guard.vault import Vault
from llm_guard.input_scanners import Anonymize
from llm_guard.input_scanners.anonymize_helpers import BERT_LARGE_NER_CONF
# Step 3: Initialize the Vault and configure the Anonymize scanner
vault = Vault()
def create_anonymize_scanner():
scanner = Anonymize(
vault,
recognizer_conf=BERT_LARGE_NER_CONF,
language="en"
)
return scanner
# Step 4: Define the masking function
def masking_function(data):
if isinstance(data, str):
scanner = create_anonymize_scanner()
# Scan and redact the data
sanitized_data, is_valid, risk_score = scanner.scan(data)
return sanitized_data
return data
# Step 5: Configure the masking function
langfuse_context.configure(mask=masking_function)
# Step 6: Create a sample function with PII
@observe()
def generate_report():
# Simulated data containing personal names
report = "John Doe met with Jane Smith to discuss the project."
return report
# Step 7: Observe the trace
result = generate_report()
print(result)
# Output: [REDACTED_PERSON] met with [REDACTED_PERSON] to discuss the project.
Link to the trace in Langfuse 2
Example 3: Masking Email and Phone Numbers
You can extend the masking function to redact other types of PII such as email addresses and phone numbers using regular expressions.
import re
from langfuse.decorators import langfuse_context, observe
def masking_function(data):
if isinstance(data, str):
# Mask email addresses
data = re.sub(r'\b[\w.-]+?@\w+?\.\w+?\b', '[REDACTED EMAIL]', data)
# Mask phone numbers
data = re.sub(r'\b\d{3}[-. ]?\d{3}[-. ]?\d{4}\b', '[REDACTED PHONE]', data)
return data
langfuse_context.configure(mask=masking_function)
@observe()
def contact_customer():
info = "Please contact John at [email protected] or call 555-123-4567."
return info
result = contact_customer()
print(result)
# Output: Please contact John at [REDACTED EMAIL] or call [REDACTED PHONE].