Skip to content

LLM Monitoring


LLM (Large Language Model) is an artificial intelligence technology based on deep learning that can understand and generate natural language text. It associates LLM requests with the entire application link, tracks the complete flow of each conversation, and precisely measures the number of Tokens consumed by each generation task.

In the actual use of the LLM monitoring service, you can:

  • View the complete link of a single request: Clearly view the entire process from receiving a user query, processing (such as database queries), to calling the LLM model and returning the answer.

  • Analyze performance bottlenecks: Precisely measure the time consumption of each step (such as model calls, data retrieval) to promptly identify delays.

  • Correlate upstream and downstream services: Associate LLM requests with related application and infrastructure metrics for comprehensive root cause analysis.

Core Capabilities

The most essential part of LLM observability functionality is establishing a quantifiable correlation between input (Prompt), output (Completion), and system behavior. Its core capabilities are reflected in three dimensions:

1. Full-Link Tracing

Within the LLM invocation framework, precisely trace the entire request link through Trace and Span to locate latency bottlenecks.

2. Quality Output Assessment

Internally optimizes output content automatically based on rule engines and AI evaluation.

3. Cost Measurement

Automatically collects and associates the Token consumption (input/output breakdown), model type, and invocation parameters for each request, providing cost allocation capabilities based on multiple business dimensions.

Getting Started

Ingest Data

Create an LLM application in the application list. Currently, the Langfuse integration framework is supported by default. After defining the application name and ID, the system generates configuration parameters and a Client Token. Follow the instructions to complete the integration configuration for Python, JS/TS, or other frameworks to start collecting data.

View and Analyze

After data ingestion, each LLM request will be centralized as a log event in the Explorer. Here, you can search and filter Trace or Span data, view input/output content, Token consumption, execution duration, and other metrics. You can also drill down to the details page to view the complete request link.

Monitoring Overview

Use the analysis dashboard to get a graphical overview of the application's operational status, including core metrics such as request volume, error rate, Token consumption, response time, and the usage proportion of each model, providing a comprehensive understanding of application performance and cost distribution.

Feedback

Is this page helpful? ×