How to Answer Support Questions

This page is the playbook used by both human support engineers and AI support agents at Langfuse. Each branch in the decision tree below covers a recurring question pattern: how to triage it, what to check, a reply template you can adapt, and when to escalate.

The tree is derived from analyzing ~1,500 closed Pylon tickets across email, Slack, MS Teams, the in-app chat widget, and GitHub Discussions. Patterns and reply phrasings come from real resolutions by the team.

Tools you'll use to answer most tickets:

Pylon: primary inbox, ticket metadata, customer tier, internal notes
Metabase: usage stats, ingestion volume, ClickHouse queries
PostHog: product analytics, user activity, session replays
Stripe: subscription, invoice, charge, refund history
Impersonation View: see the customer's Langfuse UI exactly as they do
Google Forms: startup discount applications (the form is the source of truth)
DataDog: ingestion queue depth, worker health, ClickHouse latency
status.langfuse.com: public incident timeline

Before you reply (preflight)

Before you open a reply box, do these four things in order. Most of the rest of this page assumes you have already done them.

Identify the customer and tier. Pylon sidebar shows org name, plan tier (Hobby / Core / Pro / Pro + Teams Add-on / Enterprise / Self-hosted EE), data region, and contract notes. Tier dictates SLA and how aggressively to escalate.
Locate their environment. Are they on Langfuse Cloud (which region: EU cloud.langfuse.com, US us.cloud.langfuse.com, HIPAA hipaa.cloud.langfuse.com, Japan jp.cloud.langfuse.com) or self-hosted? If self-hosted, what version? The same symptom often has different causes on Cloud vs. self-hosted.
Check status.langfuse.com and DataDog. If the customer is reporting errors/latency, rule out a known ongoing incident before debugging their side.
Search Pylon for the same symptom in the last 7 days. If three other customers are reporting the same thing right now, you're seeing an incident, escalate to engineering rather than answering one-by-one.

Once you've done these four, walk the tree below.

Decision tree

The headings below are top-level question categories. Click any to drill into specific sub-questions, each with triage steps, a reply template, and escalation rules. Use Cmd/Ctrl-F to jump to a keyword from the customer's message.

"I can't log in" / "invalid credentials" / "account not found"

The single most common cause is the wrong data region. Users sign up on one region and then try to log in to another. The reset-password flow says "no account associated", not because the account doesn't exist, but because it doesn't exist in the region they're looking at.

Triage steps:

Ask the customer (or check from the email signature/domain) which region they signed up in. If they don't know, ask them to try each one: EU cloud.langfuse.com, US us.cloud.langfuse.com, HIPAA hipaa.cloud.langfuse.com, Japan jp.cloud.langfuse.com.
If they used SSO originally (Google, GitHub, Azure AD), email+password login will fail with "Please sign in with the identity provider that is linked to your account." Have them try the SSO providers.
If they still can't see their account, look them up by email in the Impersonation View, confirm which region holds their account.
If region is correct and SSO is confirmed, check whether their email is on the email suppression list (see "password reset emails not arriving" below).

Reply template:

Hi {name},

Sorry you're hitting this. The most common cause is signing in on the wrong data region. We run four separate regions and accounts in one are not visible in the others:

- EU:    https://cloud.langfuse.com
- US:    https://us.cloud.langfuse.com
- HIPAA: https://hipaa.cloud.langfuse.com
- Japan: https://jp.cloud.langfuse.com

Reference: https://langfuse.com/security/data-regions

A second possibility: if you originally signed up using Google / GitHub / Azure AD SSO, email+password login will fail. Try clicking the SSO provider button instead.

Could you confirm which region and login method, and I'll dig in from there?

Best,
{you}

Escalate when: customer confirms region and provider but still can't log in → ping engineering with their email and the org ID, since we may need to look up the account state directly.

"Password reset emails are not arriving"

The usual cause is an email suppression list. When a previous email to that address bounced or was marked spam by the recipient, our email provider stops delivering to them. This affects one user, not the whole domain.

Triage steps:

Confirm the customer is on the right data region first (see "I can't log in" above), if they're on the wrong region, no email will ever arrive because no account exists there.
Ask them to check spam, then escalate to engineering to remove the email from the suppression list.
Once unblocked, ask them to retry password reset.

Reply template:

Hi {name},

Thanks. A quick check first: which data region did you originally sign up in (cloud.langfuse.com EU vs. us.cloud.langfuse.com US vs. hipaa.cloud.langfuse.com HIPAA vs. jp.cloud.langfuse.com Japan)?

If you're on the right region and emails still aren't arriving, our email provider may have placed your address on a suppression list (this happens after a previous bounce or spam mark). I'll unblock it on our end, please retry the password reset in ~10 minutes and let me know if it works.

Best,
{you}

Escalate when: the suppression list isn't the cause and the customer truly cannot receive any Langfuse email → engineering. Note that we cannot manually reset passwords for security reasons; engineering can confirm account state but the user has to complete the reset themselves.

SSO setup (Okta / Azure AD / Entra / Google Workspace)

On Cloud, SSO is included with Pro + Teams Add-on and above; setup is white-glove (support collects credentials, engineering applies them). Self-hosted (OSS and EE) customers configure SSO themselves via env vars.

Triage steps:

Confirm the customer is on a plan tier that includes SSO (Pro + Teams Add-on and above on Cloud; included in self-hosted OSS, only SCIM / Org Management API are EE-gated). If not, route to sales, do not promise a discount.
Collect the four pieces of information: instance URL, issuer URL, client ID, client secret.
Recommend the customer share secrets via a password-manager link (1Password share link, Bitwarden Send, etc.). Do not accept secrets in plaintext email.
Pass the bundle to engineering for application.

Reply template:

Hi {name},

Happy to help set up SSO. I'll need the following from you:

- Instance URL (which Langfuse region: cloud.langfuse.com, us.cloud.langfuse.com, hipaa.cloud.langfuse.com, jp.cloud.langfuse.com, or your self-hosted URL)
- Issuer URL (e.g. https://example.okta.com)
- Client ID
- Client Secret

Please share the client secret via a password-manager link (1Password / Bitwarden / similar) rather than in plain email. Once I have all four, the team will get it applied within one business day.

Let me know if you have any questions on the IdP side.

Best,
{you}

Escalate when: customer asks for SCIM, custom claim mapping, or a non-standard IdP, those need engineering review.

2FA recovery / lost authenticator / backup codes

We treat 2FA recovery as a high-trust operation. Customers must prove ownership.

Triage steps:

Confirm the customer's identity through a secondary signal: email matches a billing record, work email domain matches the org's domain, or they're on a Slack Connect channel we already trust.
If trust is established, engineering can disable 2FA on the account so the user can re-enroll. Do not do this yourself.
If the customer also lost access to the recovery email, the org owner must act. If the org owner is also locked out, escalate to engineering with full context, this is rare and case-by-case.

Reply template:

Hi {name},

For 2FA recovery we need to verify ownership before disabling MFA. The fastest path:

1. Confirm the org/project this affects.
2. Confirm the email tied to the account is one you still control.

Once verified, we'll disable 2FA so you can re-enroll on next login. If you've also lost access to the recovery email, please reply from a different verified address on the same org or have the org owner reach out.

Best,
{you}

Related FAQ: /faq/all/enforcing-2fa.

"I cannot see my org / project" (RBAC, viewer access, invites)

Usually one of: wrong region (see top), the user was invited to a different org under the same email, the inviting admin set them as VIEWER and that role hides administrative views, or SCIM/SSO group mapping didn't apply.

Triage steps:

Region check (see top of this section).
Look up the user in Impersonation View, what orgs do they belong to?
Verify role: VIEWER, MEMBER, ADMIN, OWNER. If they need higher, the org's OWNER has to change it; we don't change roles on the customer's behalf without approval from the org owner.
For self-hosted EE: there's no built-in "instance admin / superuser" role. To grant cross-project oversight, use the Instance Management API, script a one-time invite of the admin user to every org.

Reply template (cloud):

Hi {name},

Quick check first: are you signed into the same data region where you were invited (EU cloud.langfuse.com vs. US us.cloud.langfuse.com vs. HIPAA hipaa.cloud.langfuse.com vs. Japan jp.cloud.langfuse.com)?

If so, can you ask your org admin to confirm (a) your email is invited to the right org and (b) you have at least the MEMBER role? Owners are visible under Organization Settings → Members.

Best,
{you}

2. Billing, pricing, and contracts

"I have higher costs than usual / I was charged unexpectedly"

This is the most sensitive billing question. Lead with empathy and facts, never guess at the cause.

Triage steps:

Open Stripe, search by email domain or org → recent invoice, subscription, billing history.
Determine the source of the increase: plan change, usage increase (more traces/observations), seat increase, or one-off charge.
Cross-check the org in Impersonation View → Usage tab → confirm trace/observation volume in the billing period.
Check whether their OTEL configuration is exporting non-LLM spans. A common case: customers wire a pre-existing OTEL setup into Langfuse and their export filter lets HTTP/DB/framework spans through. Those spans are ingested and billed as observations like any other. The customer controls what is exported via should_export_span on the SDK (the older blocked_instrumentation_scopes parameter is deprecated). See /faq/all/existing-otel-setup#unwanted-spans-in-langfuse.
Reply with the specific reason, link the invoice or usage view.

Reply template:

Hi {name},

Thanks for flagging this. I dug into your billing for {period}:

- {plan tier} → {tier with seat/feature breakdown}
- Usage in the period: {N} observations / {M} events
- Compared to prior month: {delta}

The increase comes from {specific cause}. {Invoice link / Usage tab screenshot}.

If something here still doesn't add up, let me know and I'll investigate further.

Best,
{you}

Cancel subscription / downgrade / non-renewal

Two distinct things customers conflate:

Stripe subscriptions (Hobby/Core/Pro Cloud, with or without the Teams Add-on), cancellable from the billing UI directly, but customers often email us. Acknowledge politely and confirm cancellation in Stripe. Note that downgrades take effect at end of billing period.
Self-hosted EE licenses: these are contractual and removing the LANGFUSE_EE_LICENSE_KEY env var does not cancel the contract. A separate written cancellation is required. This catches customers off-guard regularly.

Reply template (Cloud cancellation):

Hi {name},

Done. Your Pro subscription is canceled. You'll keep access until the end of the current billing period ({date}), after which the org will be downgraded to Hobby. Your data is retained according to the new plan's retention policy.

Sorry to see you go. If there's anything you wish Langfuse did differently, a few bullets would mean a lot. We read every one.

Best,
{you}

Reply template (Self-hosted EE, customer thought they had canceled):

Hi {name},

To clarify: removing the EE license key in itself does not cancel the contract, it only disables EE features at runtime. The subscription continues to renew until a written cancellation is filed with Langfuse Support.

I've now {canceled the subscription / refunded invoice {ID} / both}. You should see the refund in 5–10 business days.

For future reference: please send cancellation notice to support@langfuse.com (or your account contact) before the next renewal date.

Best,
{you}

Escalate when: EE contracts above standard tier, route to the enterprise team. For refunds, follow the approval limits in the private handbook.

Refund request

Triage steps:

Confirm what was charged and when via Stripe.
Do not confirm, deny, or estimate a refund on the thread. Whether a charge is eligible for a refund or goodwill credit, the approval limits, and when to loop in the team all follow the internal refund handling guidelines in the private handbook.
Acknowledge the request, gather the details finance needs, and route it per those guidelines. Do not approve refunds beyond your limit unilaterally.

Reply template:

Hi {name},

Sorry for the friction. I {refunded invoice {ID} for ${amount} / canceled the upcoming renewal / both}. Refunds usually take 5–10 business days to appear on the card.

{If goodwill credit: "I've also added {amount} in credit on your next invoice as a goodwill gesture."}

Let me know if there's anything else.

Best,
{you}

"Can we get a startup discount / 50% off?"

We run a standard startup program. Approved applicants get a 50% discount code by email after going through the form, no exceptions. The form gives us a paper trail.

Triage steps:

Direct the customer to langfuse.com/startups.
Ask them to fill out https://forms.gle/eJAYjRWeCZU1Mn6j8.
Do not promise a timeline or approval beyond what the page says.
Approved applicants receive the discount code via email automatically.
For VC firms / venture studios asking for portfolio-wide discounts, the same program applies, portfolio companies should each submit the form.

Reply template:

Hi {name},

Happy to help. Details on the program are here: https://langfuse.com/startups

To apply, please fill out: https://forms.gle/eJAYjRWeCZU1Mn6j8

Once approved you'll get the discount code by email, you can apply it at checkout when upgrading or in your billing settings if you already have a subscription.

Best,
{you}

Enterprise quote / contract / commercial license

Anything that mentions: "enterprise", "POC", "Account Manager", "MSA", "DPA signature", "NDA", "PO", "quote for X seats", "self-hosted commercial license for OSS compliance", route to enterprise.

Triage steps:

Acknowledge quickly and route. Do not negotiate pricing on the support thread.
Direct the customer to the contact sales form.
For commercial licensing on self-hosted to satisfy OSS compliance tools (e.g. Black Duck flagging the ee/ directories), confirm with the customer whether they're actually using EE features. Many of these tickets are governance-only and resolve with a confirmation email plus a copy of the license terms.

Reply template:

Hi {name},

Thanks for reaching out. Please use our [contact sales form](https://langfuse.com/talk-to-us) so our enterprise team can follow up with pricing and contract details.

Best,
{you}

Escalate when: anything > $50k ACV, anything regulated (HIPAA BAA, financial services), or anything where legal is on the customer thread.

Invoice / receipt / PO / "where is my invoice"

Triage steps:

Stripe → search customer → invoices/receipts. Send the direct PDF link.
Custom POs from large enterprises (the "Purchase Order PO… please send your most competitive price" template) are usually spam or phishing. If the sender domain doesn't match a known customer, treat as spam and do not respond.
For legitimate POs from active customers, route to finance.

Reply template (Stripe invoice download):

Hi {name},

Your invoice for {period} is here: {Stripe-hosted invoice URL}. Receipts are also accessible directly from your Langfuse billing settings.

Let me know if you need a different format or VAT details.

Best,
{you}

3. Self-hosting

Install / Docker Compose / Kubernetes / Helm questions

Most self-hosted setup questions are answered by our docs, do not re-derive them. Send the link, ask which doc page they hit a wall on, and dig in.

Triage steps:

Ask: which deployment target (Docker Compose dev, Kubernetes via Helm, ECS, Cloud Run, etc.)? Which Langfuse version?
Point to langfuse.com/self-hosting. For K8s specifically, the Helm chart README and langfuse.com/self-hosting/deployment/kubernetes-helm.
If they're stuck on a specific error, ask for: full stack/log output, the values.yaml or docker-compose.yml, and the output of kubectl get pods or docker ps.

Escalate when: customer's setup involves an unsupported backend (e.g. Tencent TCHouse-C as a ClickHouse drop-in, we test against ClickHouse Cloud and OSS ClickHouse only), unusual ingress (service mesh, mTLS-only), or air-gapped envs without internet. These need engineering eyes.

ClickHouse: alternative backends, sizing, migrations

Hard rule: ClickHouse is the only supported OLAP backend. We do not support Elasticsearch, BigQuery, etc. as replacements. Customers asking about this should be redirected to the feature request channel, do not promise it.

Triage steps for common ClickHouse questions:

"Can I use <alternative>?" No. Direct them to the feature request idea or the existing GitHub discussion if one exists.
"Failed migration / migration deadlock" → see /faq/all/self-hosting-clickhouse-handling-failed-migrations. For large version jumps, advise temporarily extending readiness/liveness probe windows so migration containers aren't killed mid-migration, and reducing to a single web replica during the migration.
"Direct DB ingestion (bypass the web/API)?" Not supported. The web/worker layer is the only contract. Even if it works today the schema can change in any minor release.
Disk usage too high → /faq/all/reduce-clickhouse-disk-size.

Reply template (alternative backend ask):

Hi {name},

ClickHouse is currently our only supported OLAP backend. We've intentionally bet on it for the trace/eval/score query patterns Langfuse needs, alternative backends aren't on the near-term roadmap.

For OSS compliance / single-database environments, the practical paths are:
- Use ClickHouse Cloud (managed) so you don't operate it yourself
- Stand up a small dedicated ClickHouse cluster just for Langfuse

If this is blocking adoption, please upvote / comment on the existing GitHub discussion: {link if exists}. The product team reads those.

Best,
{you}

Postgres: migration failures, table ownership, RDS gotchas

Triage steps:

"Table ownership errors on migration" → /faq/all/self-hosting-postgresql-table-ownership-migration-failures. Common when running on RDS with a non-superuser DB role.
Migration deadlock with multiple replicas → migrations should run with a single web replica. Scale web to 1 before applying, scale back up after.
Connection issues → check DATABASE_URL, connection_limit, and that the Langfuse user has CREATE/ALTER on the schema.

Redis / BullMQ / Queue / Valkey / Elasticache

Triage steps:

Confirm Redis is reachable: redis-cli -h $REDIS_HOST ping. We require Redis 7+ or compatible (Valkey, ElastiCache).
For Azure Redis with managed identity / Workload Identity, see GitHub discussion #13268, TLS/SNI setup matters.
For Redis Sentinel, see GitHub discussion #13359 (optional TLS env flag).
Queue management endpoints (BullMQ admin API) are documented at /faq/all/self-hosting-queue-management-bullmq-admin-api, useful when ingestion is stuck.
Symptoms of an unhealthy queue: events accepted by API but never appear in UI. Worker logs will show retries.

S3 / Blob storage / Media uploads / Event export

Langfuse uses S3-compatible storage for raw event uploads and media. Issues here usually surface as either ingestion failures (events accepted, never processed) or "blob storage export failed" emails.

Triage steps:

Verify LANGFUSE_S3_EVENT_UPLOAD_* env vars are set and the bucket exists.
Verify the IAM principal has s3:PutObject, s3:GetObject, s3:ListBucket. For MinIO, set LANGFUSE_S3_EVENT_UPLOAD_FORCE_PATH_STYLE=true.
For "blob storage export failed" notifications, check the bucket policy and lifecycle rule didn't recently change.
For media uploads, also set LANGFUSE_S3_MEDIA_UPLOAD_*.

Upgrade between Langfuse versions (self-hosted)

Triage steps:

Find current version (docker images | grep langfuse, or Helm appVersion) and target version.
Walk the upgrade notes for each intermediate major. Most v3.x → v3.x are seamless within the same major. v2 → v3 and v3 → v4 require following the migration guides.
For very large jumps (e.g. v3.132 → v3.175): migrations may take minutes. Temporarily extend K8s readiness/liveness probe windows, and scale to a single web replica during the migration to avoid Prisma/Postgres migration deadlocks with concurrent replicas.
Test in staging first if the customer has one.

Reply template:

Hi {name},

For a jump that large, the main risk is migration time. Two things to do before upgrading:

1. Temporarily increase the readiness/liveness probe initial-delay and failure-threshold on the web container so it isn't killed mid-migration.
2. Scale `web` to 1 replica during the migration. Concurrent replicas can deadlock on Prisma/Postgres migrations. Scale back up once migration completes.

We aim for full compatibility within a major version, there are no known breaking changes between v3.132 and the latest v3.x.

Docs: https://langfuse.com/self-hosting/upgrade

Best,
{you}

Related FAQ: /faq/all/upgrade-langfuse.

EE license usage / "do I need an EE license for production?"

This is a governance/compliance question, not a technical one. The customer is usually preparing for an internal OSS review.

Canonical facts:

Langfuse core (tracing, observability, prompt management, evaluations, dashboards) is MIT-licensed. No EE license required for production use of these.
EE features require LANGFUSE_EE_LICENSE_KEY. See /self-hosting/license-key for the canonical list.

Reply template:

Hi {name},

Happy to confirm:

1. The core Langfuse features (tracing, observability, prompt management, evaluations, dashboards) are MIT-licensed and free to use in production, with no EE license required.
2. EE features require LANGFUSE_EE_LICENSE_KEY. Without that env var set, no EE code paths execute. Full list: https://langfuse.com/self-hosting/license-key.

If your compliance review needs this in writing on letterhead, please use our [contact sales form](https://langfuse.com/talk-to-us).

Best,
{you}

CVE / vulnerability report in the Docker image

Container scanners (Wiz, Snyk, Trivy, Black Duck) regularly produce long lists of CVEs in transitive Node.js dependencies. Most are not exploitable in our usage. The right response is:

Triage steps:

Check the version the customer scanned. If it's not the latest, ask them to scan the current image first, many CVEs are already patched in the next release.
For genuine concerns, direct the customer to the ClickHouse Bugcrowd program for scope review and submission. Do not ask them to send sensitive proof-of-concept details through support channels.
If the report suggests active exploitation or customer data exposure, page engineering on Slack #security in parallel.
Do not promise fix timelines. We patch on rolling cadence with each release.

Reply template:

Hi {name},

Thanks for the scan output. Could you re-run the scan against the latest image ({current_version}, released {date})? Several of the high-severity CVEs in your list are already addressed in recent releases.

For any that still appear after that, our security team will triage and prioritize. Most CVEs in transitive Node.js dependencies are in code paths Langfuse doesn't exercise, we don't ship a fix for every transient CVE, but we do for anything reachable.

If you believe a remaining finding is exploitable, please review the scope and submit it through the ClickHouse Bugcrowd program: https://bugcrowd.com/engagements/clickhouse.

Best,
{you}

4. Ingestion (Cloud and self-hosted)

"Traces are missing / slow / not appearing"

Triage steps in order:

status.langfuse.com: rule out a current incident first.
DataDog: check ingestion queue depth, ClickHouse latency. If queues are deep, this is a platform issue and you should escalate, not debug per-customer.
Customer SDK version: ask. Old SDKs (Python pre-v3, JS pre-v4) used legacy endpoints with known performance issues. Recommend upgrade to the latest scoped packages (@langfuse/client, @langfuse/tracing, @langfuse/otel or langfuse Python v3+).
Customer's flush behavior: short-lived processes (Lambdas, CLIs, edge runtimes) must call langfuse.flush() before exit. Without this, in-flight events are dropped.
Customer's filter / time range: are they looking at the right project, the right environment tag, and a time range that includes "now-5 minutes" (ingestion can be delayed up to ~1–2 minutes in normal operation)?

Reply template (cloud, after status check):

Hi {name},

Status page is clear and our queues look healthy on this side. A few things to confirm:

1. Are you on the latest SDK? For Python that's `langfuse` v3+, for JS that's the v4+ scoped packages (`@langfuse/client` / `@langfuse/tracing` / `@langfuse/otel`). The legacy `langfuse` JS v3 package and Python v2 SDK both used older endpoints with known delays.
2. If the process sending traces is short-lived (Lambda, CLI, edge runtime, batch job), make sure you call langfuse.flush() / shutdown() before exit, otherwise in-flight events drop.
3. What time range are you looking at in the UI, and which environment tag?

If you can share an example traceId or sessionId that's missing, I'll look it up directly.

Best,
{you}

Escalate when: customer's SDK is current, flush is configured, time range is correct, and traces still don't appear → engineering with the traceId, project ID, and timestamp.

OTEL / OpenTelemetry: unwanted spans, double-counting, semantic conventions

A common OTEL configuration issue: the customer's existing OTEL setup exports every HTTP request, DB query, and framework span to Langfuse alongside LLM spans. Because every exported span is ingested and billed as an observation, this increases both cost and UI clutter until the export filter is tightened.

Triage steps:

Ask the customer how they wired Langfuse into their OTEL provider (sharing a TracerProvider? exporter-only? auto-instrumentation?).
If they're sharing a global TracerProvider with HTTP / DB / framework auto-instrumentation, recommend a should_export_span filter (Python SDK, replaces the deprecated blocked_instrumentation_scopes) or shouldExportSpan (JS SDK) to drop non-LLM spans.
For cost-double-counting on agent frameworks (notably pydantic-ai, see issue #1819): there's a known bug we're tracking. Acknowledge and offer to file/link the issue, do not promise a fix date.
For langfuse.experiment.* attributes: customers using non-Python SDKs sometimes try to propagate experiment attributes manually and find evaluators don't run. LLM-as-a-Judge currently only runs against OTEL-ingested traces, confirm the legacy SDK path is not in use.

Reply template (unwanted spans):

Hi {name},

That's a common one with existing OTEL setups. Your global TracerProvider is exporting HTTP/DB/framework spans alongside LLM spans, which is why volume is high.

Fix (Python):
  from langfuse import Langfuse
  from langfuse.span_filter import is_default_export_span

  blocked = {
      "opentelemetry.instrumentation.fastapi",
      "opentelemetry.instrumentation.asgi",
      "opentelemetry.instrumentation.httpx",
      # ... add yours
  }
  langfuse = Langfuse(
      should_export_span=lambda span: (
          is_default_export_span(span)
          and (
              span.instrumentation_scope is None
              or span.instrumentation_scope.name not in blocked
          )
      ),
  )

This typically cuts ingested volume by 50–90% and only LLM/agent spans land in Langfuse.

Full docs: https://langfuse.com/faq/all/existing-otel-setup#unwanted-spans-in-langfuse

Best,
{you}

Cost / token tracking mismatch ("the cost looks wrong")

Triage steps:

Is the model on our supported pricing list? Check the model in the UI's "Model" definition. Custom models need a Model entry with input/output token pricing or Langfuse can't compute cost.
Does the SDK / framework send token counts? If yes, Langfuse uses them; if no, we tokenize the input/output ourselves with the model's tokenizer (best-effort).
For agent frameworks (pydantic-ai notably), token double-counting can happen when both the parent agent span and the child LLM span report usage. Known issue, escalate with the trace link.
For frameworks where Langfuse calculates cost despite the framework also reporting it, the framework's otel operation.cost attribute is overridden: our pricing table is the source of truth.

Reply template:

Hi {name},

Cost discrepancies usually come from one of three places:

1. Custom or unsupported model, we need a Model entry (Project Settings → Models) with the right input/output token pricing for Langfuse to compute cost. If your model isn't there, cost shows as 0 or uses a generic estimate.
2. The framework you're using double-reports usage on both parent and child spans (this happens with some agent frameworks). If you can share a trace link, I'll check whether double-counting is the cause.
3. Tokenization difference between your provider's billing and our internal tokenizer when usage isn't sent, small numerical drift, not a bug.

Can you share a specific trace that looks off, and the model name?

Best,
{you}

5. SDKs and integrations

Python SDK

Common issues:

Using the legacy langfuse Python v2 package. The @observe decorator and OTEL-based ingestion live in v3+. Recommend upgrade.
Short-lived processes: must langfuse.flush() before exit.
get_prompt() errors: usually wrong region, missing API key, or referencing a prompt with the wrong label.

Upgrade docs: /docs/observability/sdk/upgrade-path.

JS / TypeScript SDK

Common issues:

The legacy langfuse npm package is on v3.x. v4+ lives under the @langfuse/* scoped packages: @langfuse/client, @langfuse/tracing, @langfuse/otel. The in-app evaluator warning "JS SDK v4+ required" means switch to these scoped packages.
Edge runtime / serverless: make sure to await flushAsync().
Browser usage: only the public key, never the secret. Recommend a backend proxy.

Reply template (legacy package confusion):

Hi {name},

The "JS SDK v4+" message refers to the new scoped packages (@langfuse/client, @langfuse/tracing, @langfuse/otel), not the legacy `langfuse` npm package. We're freezing the legacy package at v3.x and shipping all new features (incl. evaluators-on-observations) in the scoped ones.

Upgrade guide: https://langfuse.com/docs/observability/sdk/upgrade-path

Best,
{you}

LangChain / LangGraph

Use CallbackHandler from langfuse.langchain. For LangGraph, the same callback works but you may want to set the trace name explicitly per node, see GitHub discussion #13261.
"How do I track non-LLM service costs in LangChain tools?": use update_current_generation(...usage_details=...) inside the tool. See GitHub discussion #13514.
Global callback registration is a recurring feature request (GitHub #13583), don't promise it.

LlamaIndex / LiteLLM / Vercel AI SDK / Pydantic-AI / CrewAI / Dify / others

LiteLLM: uses the standard Langfuse callback. Pricing config lives in LiteLLM's model_list.
Vercel AI SDK: uses our OTEL exporter. Make sure experimental_telemetry: { isEnabled: true }.
Pydantic-AI: known cost double-counting bug (issue #1819). Acknowledge, do not promise fix date.
Dify: there was a Dify-side bug in May 2026 (langgenius/dify #36107) that routed spans to the wrong Langfuse projects. We deleted affected data 2026-05-12T09:42:00Z → 2026-05-13T09:54:00Z. Customers re-discovering this issue should be told it's resolved upstream.
LlamaIndex: duplicated token counts on generation spans is a known issue (#12897).
OpenAI Agent SDK: reasoning summary drops in some cases (#12876).
Google ADK / Strands / Mastra / Agno / Haystack / Instructor: point to the relevant docs page under /integrations.

If the integration isn't documented, ask which framework/version and offer to file a docs issue. We do not custom-build integrations on demand.

6. Prompt management

Prompt management: versioning, labels, caching, get_prompt issues

Common issues:

Old prompt version served from cache: SDK caches by default. To bypass: get_prompt(name, cache_ttl_seconds=0).
Linked prompt label resolves to wrong version: labels are mutable; check Audit log / Prompt history.
MCP server supports prompt management today (read and write tools by default; clients can opt into a read-only allowlist). Datasets / Traces are on the roadmap.
Conditional / templated prompts: see /faq/all/conditional-prompt-embedding.

7. Evaluations

"Evaluator is not running on my traces"

The single most common cause: the trace was ingested via a legacy SDK path that pre-dates OTEL. LLM-as-a-Judge currently only runs against OTEL-based observations.

Diagnostic check: Open the trace in the UI. If its metadata.scope.* and metadata.resourceAttributes.* fields exist, it was ingested via OTEL and evaluators should pick it up. If those fields are missing, the trace came via the legacy /api/public/ingestion endpoint and won't be scored.

Triage steps:

Look at one of the customer's recent traces, check for OTEL metadata.
If legacy: ask them to upgrade SDK (Python langfuse v3+, JS @langfuse/* v4+).
If OTEL: check that the evaluator config matches the trace (variable mapping, filter conditions, target observation type). Some evaluators target observations rather than traces.
Check evaluator logs in UI → Evaluators → click the config → recent runs.

Reply template:

Hi {name},

LLM-as-a-Judge currently only evaluates OTEL-ingested observations. If you open one of the traces that didn't get scored, check whether it has metadata.scope.* / metadata.resourceAttributes.* fields:

- Present → OTEL-based, should be scored
- Absent → ingested via the legacy SDK path, won't be scored

If you're seeing the absent case, the fix is to upgrade your SDK (Python langfuse v3+, JS @langfuse/* v4+). I'm happy to walk through which traces are which if you share a couple of traceIds.

Best,
{you}

Datasets and experiments

Common issues:

"Duplicate dataset items on ingestion": usually customer-side: the same source row gets re-uploaded. Add a unique constraint on id when calling create_dataset_item.
"How do I version a dataset?": datasets are versioned automatically; experiments pin to a snapshot. See the experiments docs.
Java / non-Python SDK running experiments, they must propagate the right OTEL attributes (langfuse.experiment.id, langfuse.experiment.dataset.id, langfuse.experiment.item.id, langfuse.experiment.item.root_observation_id) on the trace. The official langfuse-java client covers prompts and scores via the public API but does not provide native tracing, so experiments require manual OTEL attribute propagation; route to engineering for the canonical attribute schema. See GitHub #13438.
Experiments in CI: point to the GitHub Action for Langfuse Experiments.

Scores: score configs, custom scores, scores API filtering

Custom score type setup → /faq/all/manage-score-configs.
scores.get_many filter not applying: this was a known bug; verify customer is on the latest SDK. If still broken, escalate with the request body and expected output.
"What are scores?" → /faq/all/what-are-scores.

Related FAQ: /faq/all/manage-score-configs.

8. Security and compliance

SOC 2 / ISO 27001 reports

We hold SOC 2 Type II and ISO 27001. Reports go out under NDA to evaluating customers.

Triage steps:

Confirm the requester is from a real organization actively evaluating Langfuse (look up domain, role).
The enterprise team sends reports as PDFs attached to the email reply.
Note: we may be mid-audit with a new vendor; include the engagement letter as a forward-looking signal.

Reply template (route to enterprise team):

Hi {name},

Happy to share both. Please use our [contact sales form](https://langfuse.com/talk-to-us) and our enterprise team will send over the SOC 2 Type II and ISO 27001 reports.

For reference, our public security overview is at https://langfuse.com/security.

Best,
{you}

DPA (Data Processing Agreement)

Key fact: the DPA is auto-applied via our T&Cs at signup. We do not counter-sign on a per-customer basis (unless enterprise specifically requires it).

Triage steps:

Direct the customer to langfuse.com/dpa, where the current DPA is published.
If they explicitly need a counter-signed copy on their template, route to the enterprise team.

Reply template:

Hi {name},

Our DPA is auto-applied for all signups under the standard Terms. You can find the current version here for your records:

https://langfuse.com/dpa

If your procurement requires a counter-signed copy on your template, let me know and I'll loop in our enterprise team.

Best,
{you}

BAA / HIPAA

HIPAA is available on a dedicated cloud region: hipaa.cloud.langfuse.com. Customers must complete and sign the BAA via DocuSign before they process PHI.

Triage steps:

Confirm the customer is using or about to use hipaa.cloud.langfuse.com (not the standard US/EU regions).
Account migration: customers moving from EU/US must create a fresh account on hipaa.cloud.langfuse.com. Past trace data can be moved via the data migration cookbook.
For HIPAA-related legal requirements, direct the customer to complete and sign the BAA via DocuSign. Do not tell customers that the BAA applies automatically.
For HIPAA-region IP allowlisting (egress from Langfuse to customer infra, e.g. for LLM-as-a-judge): static IPs are 35.82.248.193, 34.211.191.155, 52.43.164.18 (us-west-2). Full list at langfuse.com/security/networking. Ingress to hipaa.cloud.langfuse.com sits behind AWS ALBs without static IPs, we cannot publish a stable ingress IP range.

Reply template:

Hi {name},

For HIPAA usage you'll need to be on hipaa.cloud.langfuse.com (separate region from us./cloud.) and on a HIPAA-eligible plan (Pro or higher). Please complete and sign the BAA before processing PHI: https://powerforms.docusign.net/39437499-4b13-4604-b373-2b1f2ae6f6fd?env=eu&acct=33986eaf-fde8-48d4-8d61-cf8c0f573e98&accountId=33986eaf-fde8-48d4-8d61-cf8c0f573e98

A note for completeness: HIPAA accounts are provisioned fresh on hipaa.cloud.langfuse.com. If your team is currently on us.cloud or cloud., past trace data can be moved via our data migration cookbook: https://langfuse.com/guides/cookbook/example_data_migration.

Best,
{you}

Networking: IP allowlist, egress IPs, telemetry firewall rules

Egress (Langfuse → customer infra, e.g. for LLM-as-a-judge eval calls or webhooks): static IPs are published at langfuse.com/security/networking.
Ingress (customer SDK → Langfuse): behind AWS ALBs, no static IPs. Customer firewalls must allowlist by hostname.
Telemetry to PostHog is enabled by default in self-hosted Langfuse. See langfuse.com/self-hosting/security/telemetry.
- OSS (self-hosted): can be disabled via TELEMETRY_ENABLED=false. Compliant under our standard self-hosted terms, provision in older EE self-hosted terms previously required permission, but the current terms don't.
- EE (self-hosted): telemetry is used for license compliance and cannot be disabled. If a customer needs an exception, route to enterprise.

Bug bounty / vulnerability disclosure

Langfuse is included in the ClickHouse Bugcrowd program. Almost all inbound is one of:

Legitimate disclosure of a real security issue, direct the reporter to Bugcrowd. If the report suggests active exploitation or customer data exposure, also escalate immediately to engineering.
Outreach from agencies/freelancers offering paid security services, polite decline.
Auto-generated reports of "vulnerabilities" that turn out to be expected behavior (subdomain redirects, password length DoS, etc.), polite explanation that the behavior is intended.

Triage steps:

Direct vulnerability reports to the ClickHouse Bugcrowd program, where the reporter can review eligibility, scope, and reward details before submitting. Do not ask the reporter to send sensitive proof-of-concept details through support channels.
If the report suggests active exploitation or customer data exposure, page engineering on Slack #security in parallel.
If it's an agency pitch or a generic templated report → use the standard reply.

Reply template (vulnerability report or bug bounty inquiry):

Hi {name},

Thank you for reaching out. Langfuse participates in the ClickHouse Bugcrowd program. Please review the program's eligibility, scope, and reward details and submit your report through Bugcrowd:

https://bugcrowd.com/engagements/clickhouse

Best,
{you}

Reply template (report is a false positive, e.g. subdomain redirect flagged as takeover):

Hi {name},

Thanks. We've reviewed the report. The behavior you've identified is expected: each of the subdomains in your report redirects to a controlled landing or sub-page on langfuse.com. There is no dangling DNS or unclaimed third-party resource.

Please confirm findings against the live behavior before submitting future reports.

Best,
{you}

Escalate immediately when: any credible report of SSRF, IDOR, cross-tenant data access, authentication bypass, SCIM injection, or credential exposure. Page engineering on Slack #security.

9. Data deletion and retention

"Delete my account / org / project" / GDPR deletion

We require users to perform their own deletions for compliance reasons (clear paper trail that the user authorized it). We do not delete accounts on the customer's behalf.

Triage steps:

Confirm what they want to delete (project / organization / entire account).
Walk them through the in-product flow: Project Settings → Danger Zone for project; Organization Settings → Danger Zone for org. Account deletion is a final delete-all flow from the user settings.
For HIPAA → standard region migrations where the customer wants their old account gone, confirm they've moved everything they need first, then ask them to delete it themselves.

Reply template:

Hi {name},

For compliance / paper-trail reasons we ask customers to perform deletions themselves. The in-product flows are:

- Project: Project Settings → Danger Zone → Delete project
- Organization: Organization Settings → Danger Zone → Delete organization
- Account: User Settings → Delete account

Before you delete: confirm you've moved any data, projects, or configurations you want to keep.

If you hit any error during the flow, send a screenshot and I'll dig in.

Best,
{you}

Related FAQ: /faq/all/delete-account-langfuse.

Data retention policies (EE feature)

Data retention is configurable on Pro Cloud and Enterprise (and self-hosted EE). Hobby and Core have fixed retention by plan.

Triage steps:

Confirm plan tier: Hobby and Core have fixed retention; Pro Cloud, Enterprise, and self-hosted EE can configure it.
For EE: retention is configured per-project via the Project Settings or via the Instance Management API.
Note: retention runs as a background job. Customers seeing data still present after the retention window are usually inside the job's cycle, escalate if it persists beyond 24h.

10. API errors

5xx errors (502 / 503 / 504 / 524 Bad Gateway / Gateway Timeout)

Triage steps:

status.langfuse.com first. If there's an active incident, point the customer to the status page and acknowledge.
If status is clear, check DataDog for elevated error rates in the last 30 minutes. A short outage may have happened but not been status-posted yet, for short blips this is normal, document it internally if it repeats.
If it's only one customer and our side looks healthy: ask for the timestamp, region, and whether they're hitting cloud.langfuse.com / us.cloud.langfuse.com / etc. or going through a proxy.

Reply template (during/after a known short outage):

Hi {name},

We had a very short outage around {time}. Things should be back to normal now. Can you confirm if you're still seeing the errors? If yes, share the most recent timestamp and I'll dig in.

Best,
{you}

429 / rate limit errors

Triage steps:

Identify which endpoint they're hitting. Trace ingestion is much more permissive than prompt/API reads.
Recommend exponential backoff in the SDK (the official SDKs do this by default).
For genuine high-throughput needs, route to enterprise, we lift limits per agreement.

Related FAQ: /faq/all/api-limits.

11. UI bug / view broken

"I can't see view X" / "the page is blank" / "Langfuse v4 preview" (formerly "Fast Mode")

Triage steps:

First check: is the Langfuse v4 preview toggled on? Many views are gated on the Langfuse v4 preview being enabled.
Hard refresh (Cmd-Shift-R / Ctrl-Shift-R) to bust any stale assets.
Try Impersonation View, can you reproduce as them?
Ask for browser, version, and console errors.

Reply template:

Hi {name},

Quick check: is the "Langfuse v4 preview" toggled on? A few of the newer views (incl. Experiments, the redesigned Trace view) are gated on it.

If the Langfuse v4 preview is on and you still don't see it, a hard refresh (Cmd-Shift-R) usually fixes stale-asset cases. If neither helps, please share the browser, version, and any console errors.

Best,
{you}

Escalate when: the customer confirms the Langfuse v4 preview is on, has hard-refreshed, and you can reproduce in Impersonation View → engineering.

12. Customer leaving Langfuse

"We've decided to stop using Langfuse"

Triage steps:

Reply with empathy. Do not push for retention on this thread, that's a separate sales conversation, and only if the customer signals interest.
Ask for short feedback. Bullet-point format is fine. Promise nothing in return.
If they're on Cloud, confirm cancellation is processed (see "Cancel subscription" above).
If they're on self-hosted EE, the contract path applies, they need to cancel in writing.

Reply template:

Hi {name},

Thanks for letting us know, and sorry Langfuse fell short for you. We'd be really grateful if you could share a few bullets on what we could've done better, we read every one.

I've {canceled your subscription / forwarded to the EE team for contract cancellation}.

Wishing you the best with whatever you choose next.

Best,
{you}

13. Feature requests

"Can you add X?" / "It would be great if Langfuse could…"

Feature requests are not something to resolve on the thread, they're something to route so the product team sees the demand. How you handle it depends on where the request comes from.

Paying customers (Pylon):

Acknowledge the request and log the +1. For example: "Agree that this would make sense! I'll add your +1 to this feature request."
Create a new Linear ticket for the request, or connect an existing Linear issue to the Pylon issue so the demand is tracked against a concrete piece of work.

Community / GitHub ideas board:

Acknowledge the request. For example: "Thanks for adding this here!"
@-mention the Langfuse product engineer who owns that product area (see ownership) so it lands with the right person. For example: "@nimar FYI - feature request for faster evals."

Escalate when: the request is tied to a deal or renewal (route to the enterprise team), or when several customers ask for the same thing in a short window, flag it to the owning product engineer directly rather than only logging +1s.

14. Not-actually-support inbox (filter these fast)

Spam / partnership / sponsorship / guest post / link insertion

About 1–2% of the inbox is outreach: "I'd love to write a guest post," "We sell partnerships," "Sponsor our event," "Buy backlinks." Close with a polite no, or no response.

Reply template:

Hi {name},

Thanks for reaching out. We're not currently exploring partnerships of this kind. Wishing you the best with your work.

Best,
{you}

Or no reply, this is also acceptable for transparent spam.

Job applications / recruiting outreach

We route all applications to one place. Do not engage on the support thread.

Reply template:

Hi {name},

Thanks for your interest in Langfuse. Please apply through our official careers page so the hiring team picks it up:

https://langfuse.com/careers

Best,
{you}

Auto-reply / out-of-office / language we don't speak

If the inbound is purely an auto-reply (Zendesk "thank you for reaching out", OOO notices), close the ticket, no human action.

For tickets in languages no one on the team reads natively, reply in English and offer to continue in English. Most customers are bilingual; if not, escalate to the team channel.

When in doubt

If the customer's question doesn't fit a branch above:

Search this page with Cmd/Ctrl-F for keywords from their message.
Search Pylon for the same symptom in the last 30 days, someone has likely answered it before.
Ask in #support Slack with the ticket link and your hypothesis. Internal notes on the Pylon ticket also work.
Hand off to the relevant owner: see ownership. If you can't tell who owns it, escalate to engineering (technical) or the enterprise team (commercial/legal).

Whenever you find yourself answering a new question for the third time, add it to this page, or add a FAQ entry under content/faq/all/ and link to it from here. Every recurring question we document is one that Inkeep, Dosu, and future support engineers can answer without humans.

Was this page helpful?

On this page