Self HostingSizing & Scaling
Self HostingConfigurationSizing & Scaling
Version: v3

Scaling

This guide covers how you can operate your Langfuse deployment at scale and includes best practices and tweaks to get the best performance.

Minimum Infrastructure Requirements

ServiceMinimum Requirements
Langfuse Web Container2 CPU, 4 GiB Memory
Langfuse Worker Container2 CPU, 4 GiB Memory
PostgreSQL Database2 CPU, 4 GiB Memory
Redis/Valkey Instance1 CPU, 1.5 GiB Memory
ClickHouse2 CPU, 8 GiB Memory
Blob StorageServerless (S3 or compatible) or MinIO (2 CPU, 4 GiB Memory)

Ingestion Throughput

Langfuse is designed to handle a large amount and volume of ingested data. On very high loads, it may become necessary to apply additional settings that influence the throughput.

Scaling the worker containers

For most environments, we recommend to scale the worker containers by their CPU load as this is a straightforward metric to measure. A load above 50% for a 2 CPU container is an indicator that the instance is saturated and that the throughput should increase by adding more containers.

In addition, the Langfuse worker also publishes queue length metrics via statsd that can be used to scale the worker containers. langfuse.queue.ingestion.length is the main metric that we use to make scaling decisions. The queue metrics can also be published to AWS CloudWatch by setting ENABLE_AWS_CLOUDWATCH_METRIC_PUBLISHING=true to configure auto-scalers based on AWS metrics.

Reducing ClickHouse reads within the ingestion processing

Per default, the Langfuse worker reads the existing event from ClickHouse and merges it with any incoming data. This increases the load on ClickHouse and may limit the total throughput. For projects that were not migrated from a previous version of Langfuse, this is optional as the full event history is available in S3. You can set LANGFUSE_SKIP_INGESTION_CLICKHOUSE_READ_MIN_PROJECT_CREATE_DATE to a date in the past before your first project was created, e.g. 2025-01-01. Please note that any S3/blob storage deletion lifecycle rules in combination with late updates to events may cause duplicates in the event history. If you use the default integration methods with the Langfuse SDKs or OpenTelemetry this should not affect you.

Separating ingestion and user interface

When the ingestion load is high, the Langfuse web interface and API calls may become slow or unresponsive. In this case, splitting the langfuse-web deployment into one ingestion handling and one user interface handling deployment can help to keep the user interface responsive. You can create a new identical replica of the langfuse-web deployment and route all traffic to /api/public/ingestion*, /api/public/media*, and /api/public/otel* to the new deployment.

Increasing S3 (Blobstorage) Write Concurrency

The blob storage backend is used to store raw events, multi-modal inputs, batch exports, and other files. In very high throughput scenarios the number of allowed sockets from the S3 client library may be exhausted and requests are being throttled. If this happens, we usually observe an increase in memory usage on the web container that processes ingestion and media workloads. The corresponding log message looks like this: @smithy/node-http-handler:WARN - socket usage at capacity=150 and 387 additional requests are enqueued..

In this situation, we recommend to increase the number of concurrent writes by setting LANGFUSE_S3_CONCURRENT_WRITES to a value larger than 50 (the default). Each additional write socket comes with a small memory overhead, so we recommend to increase the value gradually and observe the behaviour of your service.

Slow UI queries or API calls

If you notice long-loading screens within the UI or slow API calls, it is usually related to insufficient resources on the ClickHouse database or missing time filters. The tracing data is indexed by projectId and time, i.e. adding filter conditions on those should significantly improve performance.

If all filters are in place, a larger ClickHouse instance may increase the observed performance. ClickHouse is designed to scale vertically, i.e. adding more memory to the instance should yield faster response times. You can check the ClickHouse Docs on which memory size to choose for your workloads. In general, we recommend at least 16 GiB of memory for larger deployments.

Routing reads to a separate ClickHouse compute group

On deployments with compute-compute separation — primarily ClickHouse Cloud and BYOC — you can isolate heavy analytical reads from the write path by pointing Langfuse at a dedicated read-only compute group. This keeps dashboard and public-API read traffic from contending with ingestion inserts and background merges on the primary compute.

Set CLICKHOUSE_READ_ONLY_URL and Langfuse will route UI and public-API read queries to the given endpoint while writes, migrations, and ingestion continue to use CLICKHOUSE_URL:

Credentials (CLICKHOUSE_USER, CLICKHOUSE_PASSWORD) and the database name are reused from the primary configuration, so the read-only compute group must accept the same user. Both variables are optional and unset by default; on a single-node ClickHouse or a non-compute-separated cluster they provide no benefit because the endpoint would be the same as the primary.

Increasing Disk Usage

LLM tracing data may contain large payloads due to inputs and outputs being tracked. In addition, ClickHouse stores observability data within its system tables. If you notice that your disk space is increasing significantly on S3/Blob Storage or ClickHouse, we can recommend the following.

In general, the most effective way to free disk space is to configure a data retention policy. If this is not available in your plan, consider the options below.

S3 / Blob Storage Disk Usage

You can implement lifecycle rules to automatically remove old files from your blob storage. We recommend to keep events for as long as you want to access them within the UI or you want to update them. For most customers, a default of 30 days is a good choice.

However, this does not apply to the media bucket used for storing uploaded media files. Setting a retention policy on this bucket is not recommended because:

  1. Referenced media files in traces would break
  2. Future uploads of the same file would fail since file upload status is tracked by hash in Postgres

Instead, we recommend using the Langfuse data-retention feature to manage media files properly and avoid broken references across the product.

ClickHouse Disk Usage

To automatically remove data within ClickHouse, you can use the TTL feature. See the ClickHouse documentation for more details on how to configure it. This is applicable for the traces, observations, scores, and event_log table within ClickHouse.

The following query helps to identify the largest tables in Clickhouse:

SELECT table, formatReadableSize(size) as size, rows FROM (
    SELECT
        table,
        database,
        sum(bytes) AS size,
        sum(rows) AS rows
    FROM system.parts
    WHERE active
    GROUP BY table, database
    ORDER BY size DESC
)

Slow or timing-out deletions

If data retention jobs or project/trace deletions fail with timeout errors, the client-side ClickHouse HTTP timeout for delete operations is likely exhausted. Increase LANGFUSE_CLICKHOUSE_DELETION_TIMEOUT_MS (default 600000, i.e. 10 minutes) to give long-running deletes more headroom. This applies to both scheduled retention jobs and user-triggered deletes across traces, observations, scores, dataset run items, and events.

On ClickHouse 25.7 and above, you can further reduce mutation pressure by opting into lightweight deletes and updates. Set CLICKHOUSE_LIGHTWEIGHT_DELETE_MODE from its default alter_update to lightweight_update (or lightweight_update_force) to resolve DELETE statements via lightweight deletes instead of ALTER TABLE ... DELETE mutations. Additionally, set CLICKHOUSE_USE_LIGHTWEIGHT_UPDATE=true to route tracing-table updates through native UPDATE statements instead of ALTER TABLE ... UPDATE mutations. Both reduce the amount of background merge work ClickHouse performs on deletion-heavy workloads.

ClickHouse system log tables

On default ClickHouse configurations, the system log tables (trace_log, text_log, opentelemetry_span_log, asynchronous_metric_log, metric_log, latency_log) can dominate disk usage. They have no TTL by default, and the query profiler writes to system.trace_log continuously. Langfuse does not read from these tables, so you can safely reduce them. Two options:

Option 1 — Disable unused system log tables. Mount a file into /etc/clickhouse-server/config.d/ that opts out of the tables Langfuse never reads. This is the approach taken by default in the Langfuse Terraform AWS module:

<clickhouse>
    <trace_log remove="1"/>
    <text_log remove="1"/>
    <opentelemetry_span_log remove="1"/>
    <asynchronous_metric_log remove="1"/>
    <metric_log remove="1"/>
    <latency_log remove="1"/>
</clickhouse>

On the Bitnami ClickHouse Helm chart, you can ship the same block via clickhouse.extraOverrides. Keep query_log, part_log, and error_log enabled — they are useful for debugging and remain small.

Option 2 — Aggressive TTLs. If you want to keep the tables around for debugging, attach short TTLs and disable the query profiler instead:

<clickhouse>
    <profiles>
        <default>
            <query_profiler_real_time_period_ns>0</query_profiler_real_time_period_ns>
            <query_profiler_cpu_time_period_ns>0</query_profiler_cpu_time_period_ns>
        </default>
    </profiles>
    <trace_log>
        <engine>ENGINE = MergeTree PARTITION BY toYYYYMM(event_date) ORDER BY (event_date, event_time) TTL event_date + INTERVAL 7 DAY</engine>
    </trace_log>
    <opentelemetry_span_log>
        <engine>ENGINE = MergeTree PARTITION BY toYYYYMM(finish_date) ORDER BY (finish_date, finish_time_us) TTL finish_date + INTERVAL 7 DAY</engine>
    </opentelemetry_span_log>
    <query_log>
        <engine>ENGINE = MergeTree PARTITION BY toYYYYMM(event_date) ORDER BY (event_date, event_time) TTL event_date + INTERVAL 30 DAY</engine>
    </query_log>
</clickhouse>

TTL directives only apply when the system table is first created. On an existing install, after deploying the config you also need to retrofit the tables — for example, for system.trace_log:

SET max_table_size_to_drop = 0;
TRUNCATE TABLE system.trace_log;
ALTER TABLE system.trace_log MODIFY TTL event_date + INTERVAL 7 DAY;

Repeat for each table you want to cap. See the upstream discussion in langfuse/langfuse#13123.

High Redis CPU Load

If you observe high Redis Engine CPU utilization (above 90%), we recommend to check the following:

  • Use an instance with at least 4 CPUs. This will allow Redis to schedule networking and background tasks on separate CPUs.
  • Ensure that you have Redis Cluster mode enabled.

If the high CPU utilization persists, it is possible to shard the queues that Langfuse uses across multiple nodes. Set LANGFUSE_INGESTION_QUEUE_SHARD_COUNT and LANGFUSE_TRACE_UPSERT_QUEUE_SHARD_COUNT to a value greater than 1 to enable sharding. We recommend a value that is approximately 2-3 times the number of shards you have within your Redis cluster to ensure an equal distribution among the nodes, as each queue-shard will be allocated to a random slot in Redis (see Redis Cluster docs for more details).

Sharding the queues is an advanced feature and should only be used if you have a high Redis CPU load and have followed the above recommendations. Once you have sharded your queue, do not reduce the number of Shards. Make sure to scale LANGFUSE_INGESTION_QUEUE_PROCESSING_CONCURRENCY and LANGFUSE_TRACE_UPSERT_WORKER_CONCURRENCY accordingly as it counts per shard. Per default, we target a concurrency of 20 per worker, i.e. set it to 2 if you have 10 queue-shards.

FAQ

GitHub Discussions


Was this page helpful?