Error Tracking¶
The Error Tracking Explorer centrally analyzes error data in APM. Through it, you can:
- View error historical trends: Observe frequency change curves of specific error types or sources over time through charts like Top Lists and Time Series.
- Analyze error distribution: Quickly locate high-frequency error sources, such as service error rates, resource endpoint error rates.
- Aggregate similar errors: Automatically group error requests with the same exception stack or similar error characteristics to avoid repeatedly viewing individual traces.
- ......
Data Display¶
The Error Tracking Explorer provides various professional analysis views based on lists and charts.
List¶
Displays detailed records and aggregated results of APM errors in the current workspace, including occurrence time, error type, error message, associated services and resources, etc.
In list mode, two analysis modes are provided:
All Errors¶
Records all Spans marked as errors (status=error) and containing an error type (error_type), ultimately viewing all error records that meet the criteria.
Data Details¶
In the Error Tracking Explorer, click any error to view its trace details, including service, error type, content, distribution chart, details, trace details, extended attributes, and associated information like logs, hosts, and networks.
In the error distribution chart on the error details page, error traces with high similarity are aggregated and statistically analyzed based on the error_message and error_type fields. The time interval is automatically selected according to the time range, presenting the error distribution trend.
Obsy AI Error Analysis¶
Guance provides the capability to parse error data with one click. It utilizes large models to automatically extract key information from the data, combined with online search engines and operation knowledge bases, to quickly analyze possible fault causes and provide preliminary solutions.
- Click on a single data entry to expand the details page.
- Click "Obsy AI Error Analysis" in the upper right corner.
- Automatic anomaly analysis begins.
Pattern Analysis¶
Automatically groups similar errors and identifies high-frequency patterns, displaying the top 10,000 error Span information within the selected time range. Similarity calculation is performed on error trace data based on clustering fields to extract common patterns, helping to quickly discover abnormal traces and locate problems.
Aggregation is performed by default based on the error_message field. Custom clustering fields can be entered, with a maximum of 3.
Pattern Analysis Details¶
In the Pattern Analysis list, click any error to view all associated traces.
On the associated traces page, you can sort the number of documents in ascending/descending order (default is descending).
Click on a specific data entry in the associated traces again to enter the details page. You can perform the following operations:
- View the host and service where the error occurred, error distribution, and other information.
- Click the icon in the upper right corner of the details page to export the current data.
- Perform AI Intelligent Analysis on the current error details.
- Click to jump to the associated traces of the current error details.
Charts¶
Based on count, last, first, count_distinct operation modes, data is filtered under by conditions and presented in chart form. Includes the following charts, selectable as needed:
- Top List
- Time Series
- Pie Chart
- Treemap
- Grouped Table Chart
Issue Auto-Discovery¶
After enabling the "Issue Auto-Discovery" configuration, the system statistically analyzes abnormal data based on different grouping dimensions, performs stack tracing and automatic condensation for subsequent similar problems, and finally generates an Issue. Issues generated through this entry help you quickly obtain the context and root cause of the problem, effectively shortening the problem resolution time.
Configure¶
Note
Before enabling this configuration, you must configure the rules first. Otherwise, enabling is not supported.
- Data Source: The enablement entry for the current configuration page.
- Combination Dimension: Statistical categorization and grouping based on the configured field content, including
service,version,resource,error_type.- For the data source, filter conditions can be added to narrow the data range. The system will further query data that meets the conditions.
- Detection Frequency: The system determines the time range for querying data based on the selected frequency. Options include 5 minutes, 10 minutes, 15 minutes, 30 minutes, and 1 hour.
-
Issue Definition: After enabling this configuration, the Issue will be presented as defined here. To prevent information loss, fill in sequentially.
- Both the Title and Description of the Issue support the use of the following template variables:
Variable Meaning countStatistical count serviceService name versionVersion resourceResource name error_typeError type error_messageError content error_stackError stack
After saving the configuration and enabling it, Issues automatically discovered and generated by the system will be displayed in Incident.








