Application Performance Metrics Monitoring¶
Used to monitor key metric data for application performance monitoring (APM) within the workspace. By setting specific threshold ranges, once the monitored metrics exceed these preset thresholds, the system will automatically trigger a warning mechanism. This threshold-based alert setup helps users promptly identify and respond to potential performance issues, ensuring stable application operation.
Use Cases¶
- Monitor metric data for all/individual APM services;
- Count eligible traces within a specified time period, triggering an anomaly event when exceeding custom thresholds.
Detection Configuration¶
Detection Frequency¶
The execution frequency of the detection rule; default is 5 minutes.
Detection Interval¶
The time range for querying metrics each time the task is executed. Affected by the detection frequency, the selectable detection intervals may vary.
Detection Frequency | Detection Interval (Dropdown Options) |
---|---|
1m | 1m/5m/15m/30m/1h/3h |
5m | 5m/15m/30m/1h/3h |
15m | 15m/30m/1h/3h/6h |
30m | 30m/1h/3h/6h |
1h | 1h/3h/6h/12h/24h |
6h | 6h/12h/24h |
12h | 12h/24h |
24h | 24h |
Detection Metrics¶
Set the metrics for detecting data. You can configure the metric data for all/individual services within a certain time range in the workspace.
Field | Description |
---|---|
Services | Monitor all/individual service metric data within the current workspace's application performance monitoring, supporting full selection or individual selection. |
Metrics | Specific detection metrics, supporting configuration of single metrics, including request count, error request count, request error rate, average requests per second, average response time, P50 response time, P75 response time, P90 response time, P99 response time, etc. |
Filtering Conditions | Filter detection metric data based on tags, limiting the scope of detected data. Support adding one or more tag filters, allowing fuzzy matching and non-matching filtering conditions. |
Detection Dimensions | Any string type (keyword ) fields in the configured data can be selected as detection dimensions. Currently, up to three fields are supported for detection dimensions. By combining multiple detection dimension fields, a specific detection object can be determined. The system will judge whether the statistical metrics corresponding to a certain detection object meet the threshold conditions for triggering events. If the conditions are met, an event is generated.For example, selecting detection dimensions host and host_ip would result in a detection object {host: host1, host_ip: 127.0.0.1}. |
Count the number of eligible traces within a specified time period, triggering an anomaly event when exceeding custom thresholds. Can be used for abnormal trace error notifications.
Field | Description |
---|---|
Source | Data source for the current detection metric, supporting selection of all (* ) or a specified single data source. |
Filtering Conditions | Filter trace span using tags, limiting the scope of detected data, supporting addition of one or more tag filters. |
Aggregation Algorithm | Default is “*”, corresponding function is count . If another field is selected, the function automatically changes to count distinct (the number of data points where the keyword appears). |
Detection Dimensions | Any string type (keyword ) fields in the configured data can be selected as detection dimensions. Currently, up to three fields are supported for detection dimensions. By combining multiple detection dimension fields, a specific detection object can be determined. The system will judge whether the statistical metrics corresponding to a certain detection object meet the threshold conditions for triggering events. If the conditions are met, an event is generated.For example, selecting detection dimensions host and host_ip would result in a detection object {host: host1, host_ip: 127.0.0.1}. |
Trigger Conditions¶
Set the trigger conditions for alert levels: You can arbitrarily configure any one of the urgent, important, warning, or normal trigger conditions.
Configure trigger conditions and severity levels, and if the query results contain multiple values, any value that meets the trigger condition will generate an event.
For more details, refer to Event Level Description.
If Continuous Trigger Judgment is enabled, it means that after multiple consecutive judgments meet the trigger condition, an event will be triggered again. The maximum limit is 10 times.
Alert Levels
-
Alert Levels Urgent (Red), Important (Orange), Warning (Yellow): Based on judgment operators from configuration conditions.
-
Alert Level Normal (Green): Based on the number of detections configured, as follows:
- Each execution of a detection task counts as 1 detection, e.g.,
Detection Frequency = 5 Minutes
, then 1 detection = 5 minutes; - You can customize the number of detections, e.g.,
Detection Frequency = 5 Minutes
, then 3 detections = 15 minutes.
Level Description Normal After the detection rule takes effect and generates urgent, important, or warning anomaly events, if the data detection results return to normal within the configured custom detection count, a recovery alert event will be generated.
Recovery alert events are not subject to Alert Mute restrictions. If no recovery alert event detection count is set, the alert event will not recover and will always appear in the Events > Unrecovered Events List.
- Each execution of a detection task counts as 1 detection, e.g.,
Data Gaps¶
You can configure seven strategies for data gap states.
-
Linking the detection interval time range, judging the query results of the most recent minutes for the detection metrics, no event will be triggered;
-
Linking the detection interval time range, judging the query results of the most recent minutes for the detection metrics, query results are considered 0; at this point, the query results will be compared again with the thresholds configured in the Trigger Condition, thereby determining whether to trigger an anomaly event.
-
Custom fill for detection interval values, trigger data gap events, trigger urgent events, trigger important events, trigger warning events, and trigger recovery events; if this type of configuration strategy is selected, it is recommended that the custom data gap time configuration be >= detection interval time. If the configured time <= detection interval time, there might be simultaneous satisfaction of data gaps and anomalies, in which case only the data gap processing result will be applied.
Information Generation¶
After enabling this option, unmatched detection results will be written as "Information" events.
Note
If trigger conditions, data gaps, and information generation are configured simultaneously, the following priority order applies: Data Gaps > Trigger Conditions > Information Event Generation.
Other Configurations¶
For more details, refer to Rule Configuration.