Version: v3

Scaling

This guide covers how you can operate your Langfuse deployment at scale and includes best practices and tweaks to get the best performance.

Minimum Infrastructure Requirements

Service	Minimum Requirements
Langfuse Web Container	2 CPU, 4 GiB Memory
Langfuse Worker Container	2 CPU, 4 GiB Memory
PostgreSQL Database	2 CPU, 4 GiB Memory
Redis/Valkey Instance	1 CPU, 1.5 GiB Memory
ClickHouse	2 CPU, 8 GiB Memory
Blob Storage	Serverless (S3 or compatible) or MinIO (2 CPU, 4 GiB Memory)

Ingestion Throughput

Langfuse is designed to handle a large amount and volume of ingested data. On very high loads, it may become necessary to apply additional settings that influence the throughput.

Scaling the worker containers

For most environments, we recommend to scale the worker containers by their CPU load as this is a straightforward metric to measure. A load above 50% for a 2 CPU container is an indicator that the instance is saturated and that the throughput should increase by adding more containers.

In addition, the Langfuse worker also publishes queue length metrics via statsd that can be used to scale the worker containers. langfuse.queue.ingestion.length is the main metric that we use to make scaling decisions. The queue metrics can also be published to AWS CloudWatch by setting ENABLE_AWS_CLOUDWATCH_METRIC_PUBLISHING=true to configure auto-scalers based on AWS metrics.

Reducing ClickHouse reads within the ingestion processing

Per default, the Langfuse worker reads the existing event from ClickHouse and merges it with any incoming data. This increases the load on ClickHouse and may limit the total throughput. For projects that were not migrated from a previous version of Langfuse, this is optional as the full event history is available in S3. You can set LANGFUSE_SKIP_INGESTION_CLICKHOUSE_READ_MIN_PROJECT_CREATE_DATE to a date in the past before your first project was created, e.g. 2025-01-01. Please note that any S3/blob storage deletion lifecycle rules in combination with late updates to events may cause duplicates in the event history. If you use the default integration methods with the Langfuse SDKs or OpenTelemetry this should not affect you.

Separating ingestion and user interface

When the ingestion load is high, the Langfuse web interface and API calls may become slow or unresponsive. In this case, splitting the langfuse-web deployment into one ingestion handling and one user interface handling deployment can help to keep the user interface responsive. You can create a new identical replica of the langfuse-web deployment and route all traffic to /api/public/ingestion*, /api/public/media*, and /api/public/otel* to the new deployment.

Increasing S3 (Blobstorage) Write Concurrency

The blob storage backend is used to store raw events, multi-modal inputs, batch exports, and other files. In very high throughput scenarios the number of allowed sockets from the S3 client library may be exhausted and requests are being throttled. If this happens, we usually observe an increase in memory usage on the web container that processes ingestion and media workloads. The corresponding log message looks like this: @smithy/node-http-handler:WARN - socket usage at capacity=150 and 387 additional requests are enqueued..

In this situation, we recommend to increase the number of concurrent writes by setting LANGFUSE_S3_CONCURRENT_WRITES to a value larger than 50 (the default). Each additional write socket comes with a small memory overhead, so we recommend to increase the value gradually and observe the behaviour of your service.

Slow UI queries or API calls

If you notice long-loading screens within the UI or slow API calls, it is usually related to insufficient resources on the ClickHouse database or missing time filters. The tracing data is indexed by projectId and time, i.e. adding filter conditions on those should significantly improve performance.

If all filters are in place, a larger ClickHouse instance may increase the observed performance. ClickHouse is designed to scale vertically, i.e. adding more memory to the instance should yield faster response times. You can check the ClickHouse Docs on which memory size to choose for your workloads. In general, we recommend at least 16 GiB of memory for larger deployments.

Increasing Disk Usage

LLM tracing data may contain large payloads due to inputs and outputs being tracked. In addition, ClickHouse stores observability data within its system tables. If you notice that your disk space is increasing significantly on S3/Blob Storage or ClickHouse, we can recommend the following.

In general, the most effective way to free disk space is to configure a data retention policy. If this is not available in your plan, consider the options below.

S3 / Blob Storage Disk Usage

You can implement lifecycle rules to automatically remove old files from your blob storage. We recommend to keep events for as long as you want to access them within the UI or you want to update them. For most customers, a default of 30 days is a good choice.

However, this does not apply to the media bucket used for storing uploaded media files. Setting a retention policy on this bucket is not recommended because:

Referenced media files in traces would break
Future uploads of the same file would fail since file upload status is tracked by hash in Postgres

Instead, we recommend using the Langfuse data-retention feature to manage media files properly and avoid broken references across the product.

ClickHouse Disk Usage

To automatically remove data within ClickHouse, you can use the TTL feature. See the ClickHouse documentation for more details on how to configure it. This is applicable for the traces, observations, scores, and event_log table within ClickHouse.

In addition, the system.trace_log and system.text_log tables within ClickHouse may grow large, even on smaller deployments. You can modify your ClickHouse settings to set a TTL on both tables. It is also safe to manually prune them from time to time. Check the ClickHouse documentation for details on how to configure a TTL on those tables.

The following query helps to identify the largest tables in Clickhouse:

SELECT table, formatReadableSize(size) as size, rows FROM (
    SELECT
        table,
        database,
        sum(bytes) AS size,
        sum(rows) AS rows
    FROM system.parts
    WHERE active
    GROUP BY table, database
    ORDER BY size DESC
)

High Redis CPU Load

If you observe high Redis Engine CPU utilization (above 90%), we recommend to check the following:

Use an instance with at least 4 CPUs. This will allow Redis to schedule networking and background tasks on separate CPUs.
Ensure that you have Redis Cluster mode enabled.

If the high CPU utilization persists, it is possible to shard the queues that Langfuse uses across multiple nodes. Set LANGFUSE_INGESTION_QUEUE_SHARD_COUNT and LANGFUSE_TRACE_UPSERT_QUEUE_SHARD_COUNT to a value greater than 1 to enable sharding. We recommend a value that is approximately 2-3 times the number of shards you have within your Redis cluster to ensure an equal distribution among the nodes, as each queue-shard will be allocated to a random slot in Redis (see Redis Cluster docs for more details).

⚠️

Sharding the queues is an advanced feature and should only be used if you have a high Redis CPU load and have followed the above recommendations. Once you have sharded your queue, do not reduce the number of Shards. Make sure to scale LANGFUSE_INGESTION_QUEUE_PROCESSING_CONCURRENCY and LANGFUSE_TRACE_UPSERT_WORKER_CONCURRENCY accordingly as it counts per shard. Per default, we target a concurrency of 20 per worker, i.e. set it to 2 if you have 10 queue-shards.

FAQ

GitHub Discussions

Troubleshooting Backups

Was this page helpful?

Support