User feedback is a great source to evaluate the quality of an LLM app's output. In Langfuse, feedback is collected as a
score and attached to an execution trace or an individual LLM generation.
Depending on the type of the application, there are different types of feedback that can be collected that vary in quality, detail, and quantity.
- Explicit Feedback: Directly prompt the user to give feedback, this can be a rating, a like, a dislike, a scale or a comment. While it is simple to implement, quality and quantity of the feedback is often low.
- Implicit Feedback: Measure the user's behavior, e.g., time spent on a page, click-through rate, accepting/rejecting a model-generated output. This type of feedback is more difficult to implement but is often more frequent and reliable.
The easiest way to collect user feedback is via the Langfuse Web SDK. Thereby you can ingest scores directly from the browser. See the Web SDK documentation for more details.
We implemented collection of user feedback into the Q&A chatbot for the Langfuse docs.
In this example you see the following steps:
- Collection of feedback using the Langfuse Web SDK
Negative, Langchain not included in response
- Browsing of feedback
- Identification of the root cause of the low-quality response
Docs on Langchain integration are not included in embedding similarity search
→ Try the demo yourself and browse the collected feedback in Langfuse