Self HostingBackups
Version: v3

Backup Strategies

This guide covers backup strategies for self-hosted Langfuse deployments. Follow one of the deployment guides to get started.

Proper backup strategies are essential for protecting your Langfuse data and ensuring business continuity. This guide covers backup approaches for all components of your self-hosted Langfuse deployment.

ClickHouse

ClickHouse stores your observability data including traces, observations, and scores. Backup strategies vary depending on whether you use a managed service or self-hosted deployment.

ClickHouse Cloud (Managed Service)

Automatic Backups: ClickHouse Cloud automatically manages backups for you with:

  • Continuous incremental backups
  • Point-in-time recovery capabilities
  • Cross-region replication options
  • Enterprise-grade durability guarantees

No Action Required: If you’re using ClickHouse Cloud, backups are handled automatically. Refer to the ClickHouse Cloud documentation for backup retention policies and recovery procedures.

Self-Hosted ClickHouse

For self-hosted ClickHouse instances, you need to implement your own backup strategy.

Kubernetes Deployments

1. Volume Snapshots (Recommended)

Most cloud providers support volume snapshots for persistent volumes. Ensure that you’re also adding snapshots for the clickhouse zookeeper volumes.

# Create a VolumeSnapshot for each ClickHouse replica
kubectl apply -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: clickhouse-backup-$(date +%Y%m%d-%H%M%S)
  namespace: langfuse
spec:
  source:
    persistentVolumeClaimName: data-langfuse-clickhouse-0
  volumeSnapshotClassName: csi-hostpath-snapclass
EOF

2. Velero for Complete Cluster Backups

Velero provides comprehensive Kubernetes backup solutions:

# Install Velero
velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.8.0 \
  --bucket langfuse-backups --secret-file ./credentials-velero
 
# Create a backup schedule
velero schedule create langfuse-daily \
  --schedule="0 2 * * *" \
  --include-namespaces langfuse \
  --ttl 720h0m0s

3. ClickHouse Native Backups

Use ClickHouse’s built-in backup functionality. Follow the ClickHouse backup guide for all details.

-- Create a backup
BACKUP DATABASE default TO S3('s3://backup-bucket/clickhouse-backup-{timestamp}', 'access_key', 'secret_key');
 
-- Restore from backup
RESTORE DATABASE default FROM S3('s3://backup-bucket/clickhouse-backup-{timestamp}', 'access_key', 'secret_key');

Docker Deployments

For Docker-based deployments, implement regular volume backups:

#!/bin/bash
# Backup script for ClickHouse Docker volumes
 
BACKUP_DIR="/backups/clickhouse"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
CONTAINER_NAME="clickhouse-server"
 
# Create backup directory
mkdir -p "$BACKUP_DIR"
 
# Stop ClickHouse temporarily for consistent backup
docker stop "$CONTAINER_NAME"
 
# Create tar archive of data volume
docker run --rm \
  -v clickhouse_data:/source:ro \
  -v "$BACKUP_DIR":/backup \
  alpine tar czf "/backup/clickhouse-backup-$TIMESTAMP.tar.gz" -C /source .
 
# Restart ClickHouse
docker start "$CONTAINER_NAME"
 
# Clean up old backups (keep last 7 days)
find "$BACKUP_DIR" -name "clickhouse-backup-*.tar.gz" -mtime +7 -delete

Postgres

Postgres stores critical transactional data including users, organizations, projects, and API keys. We strongly recommend using managed database services for production deployments.

Cloud Provider Services: Use managed PostgreSQL services which provide automatic backups.

Self-Hosted Postgres Backups

If you must self-host Postgres, implement comprehensive backup strategies.

Kubernetes Generic Backup Solution

1. pg_dump with CronJob

apiVersion: batch/v1
kind: CronJob
metadata:
  name: postgres-backup
  namespace: langfuse
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: postgres-backup
            image: postgres:15
            env:
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-credentials
                  key: password
            command:
            - /bin/bash
            - -c
            - |
              TIMESTAMP=$(date +%Y%m%d_%H%M%S)
              pg_dump -h postgres-service -U langfuse -d langfuse > /backup/langfuse-backup-$TIMESTAMP.sql
 
              # Upload to S3 (optional)
              aws s3 cp /backup/langfuse-backup-$TIMESTAMP.sql s3://langfuse-backups/postgres/
 
              # Clean up local files older than 3 days
              find /backup -name "langfuse-backup-*.sql" -mtime +3 -delete
            volumeMounts:
            - name: backup-storage
              mountPath: /backup
          volumes:
          - name: backup-storage
            persistentVolumeClaim:
              claimName: postgres-backup-pvc
          restartPolicy: OnFailure

2. Volume Snapshots

# Create snapshot of Postgres PVC
kubectl apply -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-backup-$(date +%Y%m%d-%H%M%S)
  namespace: langfuse
spec:
  source:
    persistentVolumeClaimName: postgres-data-pvc
  volumeSnapshotClassName: csi-hostpath-snapclass
EOF

Docker Deployments

#!/bin/bash
# Postgres backup script for Docker
 
BACKUP_DIR="/backups/postgres"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
CONTAINER_NAME="postgres"
DB_NAME="langfuse"
DB_USER="langfuse"
 
mkdir -p "$BACKUP_DIR"
 
# Create SQL dump
docker exec "$CONTAINER_NAME" pg_dump -U "$DB_USER" "$DB_NAME" > "$BACKUP_DIR/langfuse-backup-$TIMESTAMP.sql"
 
# Compress backup
gzip "$BACKUP_DIR/langfuse-backup-$TIMESTAMP.sql"
 
# Upload to cloud storage (optional)
aws s3 cp "$BACKUP_DIR/langfuse-backup-$TIMESTAMP.sql.gz" s3://langfuse-backups/postgres/
 
# Clean up old backups
find "$BACKUP_DIR" -name "langfuse-backup-*.sql.gz" -mtime +7 -delete

MinIO

MinIO is Obsolete with Cloud Storage: If you’re using cloud storage services like AWS S3, Azure Blob Storage, or Google Cloud Storage, MinIO is not needed and backup strategies should focus on your cloud storage provider’s native backup features.

MinIO is only relevant for self-hosted deployments that don’t use cloud storage services. For most production deployments, we recommend using managed cloud storage instead.

When MinIO is Used

MinIO is typically used in:

  • Air-gapped environments
  • On-premises deployments without cloud access
  • Development environments
  • Specific compliance requirements

MinIO Backup Strategies

Configure MinIO to replicate to cloud storage:

# Configure MinIO client
mc alias set myminio http://localhost:9000 minio miniosecret
mc alias set s3backup https://s3.amazonaws.com ACCESS_KEY SECRET_KEY
 
# Set up bucket replication
mc replicate add myminio/langfuse --remote-bucket s3backup/langfuse-backup

Kubernetes MinIO Backups

1. Volume Snapshots

# Snapshot MinIO data volumes
kubectl apply -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: minio-backup-$(date +%Y%m%d-%H%M%S)
  namespace: langfuse
spec:
  source:
    persistentVolumeClaimName: data-minio-0
  volumeSnapshotClassName: csi-hostpath-snapclass
EOF

2. Scheduled Sync to External Storage

apiVersion: batch/v1
kind: CronJob
metadata:
  name: minio-backup-sync
  namespace: langfuse
spec:
  schedule: "0 3 * * *"  # Daily at 3 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: minio-backup
            image: minio/mc:latest
            command:
            - /bin/sh
            - -c
            - |
              mc alias set source http://minio:9000 $MINIO_ACCESS_KEY $MINIO_SECRET_KEY
              mc alias set backup s3://backup-bucket $AWS_ACCESS_KEY $AWS_SECRET_KEY
              mc mirror source/langfuse backup/langfuse-backup/$(date +%Y%m%d)
            env:
            - name: MINIO_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: minio-credentials
                  key: access-key
            - name: MINIO_SECRET_KEY
              valueFrom:
                secretKeyRef:
                  name: minio-credentials
                  key: secret-key
            - name: AWS_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: aws-credentials
                  key: access-key
            - name: AWS_SECRET_KEY
              valueFrom:
                secretKeyRef:
                  name: aws-credentials
                  key: secret-key
          restartPolicy: OnFailure

Was this page useful?

Questions? We're here to help

Subscribe to updates