How to capture User Feedback for Evaluation of LLM apps?

User feedback is a great source to evaluate the quality of an LLM app’s output.

What are common feedback types?

Depending on the type of the application, there are different types of feedback that can be collected that vary in quality, detail, and quantity.

Explicit Feedback: Directly prompt the user to give feedback, this can be a rating, a like, a dislike, a scale or a comment. While it is simple to implement, quality and quantity of the feedback is often low.
Implicit Feedback: Measure the user’s behavior, e.g., time spent on a page, click-through rate, accepting/rejecting a model-generated output. This type of feedback is more difficult to implement but is often more frequent and reliable.

What is a common workflow of using user feedback?

A common workflow is to:

Collection of feedback alongside LLM traces in Langfuse. Example: Negative, Langfuse not included in response
Browsing of all user feedback, especially when collecting comments from users
Identification of the root cause of the low-quality response (can be managed via annotation queues). Example: Docs on Langfuse integration are not included in embedding similarity search

Demo

Langfuse UI on the left, demo application on the right

→ Try the Q&A chatbot yourself and browse the collected feedback in the public Langfuse project

Get started

Capturing user feedback in Langfuse is done via the External Evaluation Method which uses the SDK / API to attach scores to traces or observations.

Types of user feedback that can be collected as a score:

Be numeric (e.g. 1-5 stars)
Be categorical (e.g. thumbs up/down)
Be boolean (e.g. accept/reject)

Integration Example

In this example, we use the LangfuseWeb SDK to collect user feedback right from the browser.

Note: You can also use the SDKs server-side or add scores via the API.

User feedback on individual responses

Chat application

Integration

UserFeedbackComponent.tsx

import { LangfuseWeb } from "langfuse";
 
export function UserFeedbackComponent(props: { traceId: string }) {
  const langfuseWeb = new LangfuseWeb({
    publicKey: env.NEXT_PUBLIC_LANGFUSE_PUBLIC_KEY,
  });
 
  const handleUserFeedback = async (value: number) =>
    await langfuseWeb.score({
      traceId: props.traceId,
      name: "user_feedback",
      value,
    });
 
  return (
    <div>
      <button onClick={() => handleUserFeedback(1)}>👍</button>
      <button onClick={() => handleUserFeedback(0)}>👎</button>
    </div>
  );
}

Preview

API

Was this page helpful?

Support