Application Performance Metrics Detection¶
Used to monitor key metric data within the APM workspace. After setting a threshold range, the system automatically issues a warning when metrics exceed the threshold.
Use Cases¶
- Monitoring metrics data of APM services;
- Counting the number of links that meet conditions within a specified time period, triggering an anomaly event when exceeding custom thresholds.
Detection Configuration¶
Detection Frequency¶
The execution frequency of the detection rule.
Detection Interval¶
The time range for querying metrics during each task execution. Affected by the detection frequency, different options may be available.
Detection Frequency | Detection Interval (Dropdown Options) |
---|---|
30s | 1m/5m/15m/30m/1h/3h |
1m | 1m/5m/15m/30m/1h/3h |
5m | 5m/15m/30m/1h/3h |
15m | 15m/30m/1h/3h/6h |
30m | 30m/1h/3h/6h |
1h | 1h/3h/6h/12h/24h |
6h | 6h/12h/24h |
12h | 12h/24h |
24h | 24h |
Detection Metrics¶
Set the detection data metrics, which can be used to configure service metrics data within the workspace for a specified time range.
Field | Description |
---|---|
Services | Monitor application performance monitoring services within the current workspace. |
Metrics | Specific detection metrics, including request counts, error request counts, request error rates, average requests per second, average response times, P50 response times, P75 response times, P90 response times, and P99 response times. |
Filtering Conditions | Screen detection data based on tags associated with the metrics, limiting the scope of detection. Supports adding one or more tag filters, as well as fuzzy matching and non-matching filtering conditions. |
Detection Dimensions | Any string type (keyword ) fields in the configured data can be selected as detection dimensions. Currently, up to three fields are supported. By combining multiple detection dimension fields, a specific detection object can be determined. The system will determine whether the statistical metrics of this detection object meet the threshold of the trigger condition, and if so, an event is generated.For example, selecting detection dimensions host and host_ip results in a detection object like {host: host1, host_ip: 127.0.0.1} . |
Counts the number of traces that meet conditions within a specified time period, triggering an anomaly event when exceeding custom thresholds, which can be used for notifications about abnormal errors in service traces.
Field | Description |
---|---|
Source | The data source for the current detection metrics. |
Filtering Conditions | Filter trace span using tags to limit the scope of detection data. Supports adding one or more tag filtering conditions. |
Aggregation Algorithm | Default is “*”, corresponding aggregation function is count . If another field is selected, the aggregation function automatically changes to count distinct (the number of data points where the keyword appears). |
Detection Dimensions | Any string type (keyword ) fields in the configured data can be selected as detection dimensions. Currently, up to three fields are supported. By combining multiple detection dimension fields, a specific detection object can be determined. The system will determine whether the statistical metrics of this detection object meet the threshold of the trigger condition, and if so, an event is generated.For example, selecting detection dimensions host and host_ip results in a detection object like {host: host1, host_ip: 127.0.0.1} . |
Trigger Conditions¶
Set alert level trigger conditions: You can configure any one of emergency, critical, warning, or normal trigger conditions arbitrarily.
Configure trigger conditions and severity levels. When the query result contains multiple values, an event is generated if any value meets the trigger condition.
For more details, refer to Event Level Description.
If continuous trigger judgment is enabled, it means that after the trigger condition is met multiple times consecutively, events are triggered again. The maximum limit is 10 times.
Alert Levels
-
Alert Levels Emergency (Red), Critical (Orange), Warning (Yellow): Based on evaluating operators in the configuration conditions.
-
Alert Level Normal (Green): Based on the number of detections configured, as follows:
- Each execution of a detection task counts as 1 detection, such as
detection frequency = 5 minutes
, then 1 detection = 5 minutes; - You can customize the number of detections, such as
detection frequency = 5 minutes
, then 3 detections = 15 minutes.
Level Description Normal After the detection rule takes effect, if emergency, critical, or warning anomaly events occur, and the data detection results return to normal within the configured custom number of detections, a recovery alert event is generated.
Recovery alert events are not subject to alert muting. If the recovery alert event detection count is not set, the alert event will not recover and will remain in the Events > Unrecovered Events List.
- Each execution of a detection task counts as 1 detection, such as
Data Gaps¶
You can configure seven strategies for handling data gap states.
-
Linking the detection interval time range, judge the query results of the most recent minutes of the detection metrics, do not trigger events;
-
Linking the detection interval time range, judge the query results of the most recent minutes of the detection metrics, query results are considered as 0; at this point, the query results will be compared again with the thresholds configured in the trigger conditions above to determine whether an anomaly event should be triggered.
-
Custom fill-in for the detection interval value, trigger data gap events, trigger emergency events, trigger critical events, trigger warning events, and trigger recovery events; choosing this type of configuration strategy suggests that the custom data gap time configuration should be >= detection interval time. If the configured time <= the detection interval time, there might be simultaneous satisfaction of data gaps and anomalies, in which case only the data gap processing result will apply.
Information Generation¶
Enabling this option generates "information" events for detection results that do not match any of the above trigger conditions.
Note
If trigger conditions, data gaps, and information generation are configured simultaneously, the following priority order applies: data gaps > trigger conditions > information event generation.
Other Configurations¶
For more details, refer to Rule Configuration.