APM Metrics Detection¶
Used to monitor key metrics data of APM within the workspace. After setting threshold ranges, the system automatically issues alerts when metrics exceed thresholds.
- Monitor metrics data of APM services
- Count the number of traces that meet conditions within a specified time period and trigger incidents when exceeding custom thresholds
Detection Configuration¶
Detection Frequency¶
The execution frequency of detection rules.
Detection Interval¶
The time range for querying metrics during each task execution. Affected by detection frequency, available options vary accordingly.
Detection Frequency | Detection Interval (Dropdown Options) |
---|---|
30s | 1m/5m/15m/30m/1h/3h |
1m | 1m/5m/15m/30m/1h/3h |
5m | 5m/15m/30m/1h/3h |
15m | 15m/30m/1h/3h/6h |
30m | 30m/1h/3h/6h |
1h | 1h/3h/6h/12h/24h |
6h | 6h/12h/24h |
12h | 12h/24h |
24h | 24h |
Detection Metrics¶
Set detection data metrics, which can be used to configure metric data for services within the specified time range in the workspace.
Field | Description |
---|---|
Service | Monitor application performance monitoring services within the current workspace. |
Metric | Specific detection metrics including request count, error request count, request error rate, average requests per second, average response time, P50 response time, P75 response time, P90 response time, P99 response time, etc. |
Filter Conditions | Filter detection data based on metric tags to define detection scope. Support adding one or multiple tag filters, fuzzy matching, and fuzzy non-matching conditions. |
Detection Dimensions | Any string-type (keyword ) fields in the configuration data can be selected as detection dimensions. Currently, up to three fields are supported. By combining multiple dimension fields, a specific detection object can be defined. The system will determine whether the statistical metrics of this detection object meet the threshold set in the trigger condition, if so, an event is generated.For example, selecting host and host_ip as detection dimensions, then the detection object could be {host: host1, host_ip: 127.0.0.1} . |
Count the number of traces meeting conditions within a specified time period, triggering incidents when exceeding custom thresholds, which can be used for notifications regarding service trace errors.
Field | Description |
---|---|
Source | Data source of the current metric. |
Filter Conditions | Filter trace span using tags to define detection scope. Support adding one or multiple tag filter conditions. |
Aggregation Algorithm | Default selection is “*”, corresponding aggregation function is count . If another field is selected, the aggregation function automatically changes to count distinct (counts keyword occurrences). |
Detection Dimensions | Any string-type (keyword ) fields in the configuration data can be selected as detection dimensions. Currently, up to three fields are supported. By combining multiple dimension fields, a specific detection object can be defined. The system will determine whether the statistical metrics of this detection object meet the threshold set in the trigger condition, if so, an event is generated.For example, selecting host and host_ip as detection dimensions, then the detection object could be {host: host1, host_ip: 127.0.0.1} . |
Trigger Conditions¶
Set trigger conditions for alert levels: You may configure any of Emergency, Critical, Warning, or Normal as a trigger condition.
Configure trigger conditions and severity levels. When query results contain multiple values, any single value meeting the trigger condition will generate an event.
For more details, refer to Event Level Description.
If Continuous Trigger Judgment is enabled, it means after multiple consecutive detections meet the trigger condition, the event will be triggered again. Maximum limit is 10 times.
Alert Levels
-
Emergency (Red), Critical (Orange), Warning (Yellow): Based on configured conditions and comparison operators.
-
Normal (Green): Based on configured detection counts, explained as follows:
- Each execution of a detection task equals one detection. For example, if
Detection Frequency = 5 minutes
, then 1 detection = 5 minutes; - Custom detection counts can be set. For instance, if
Detection Frequency = 5 minutes
, then 3 detections = 15 minutes.
Level Description Normal After the detection rule becomes effective, if data returns to normal within the configured custom detection count after generating emergency, critical, or warning events, a recovery alert event will be generated.
Recovery alert events are not subject to alert mute settings. If no recovery detection count is configured, alert events will not recover and will remain in the Events > Unrecovered Events List.
- Each execution of a detection task equals one detection. For example, if
Data Gap Handling¶
Seven strategies are available for handling data gaps.
-
Coordinate with the detection interval time range, judge query results of the most recent minutes for detected metrics, do not trigger events;
-
Coordinate with the detection interval time range, judge query results of the most recent minutes for detected metrics, treat query results as 0; In this case, query results will be compared again with the threshold set in the Trigger Condition above to determine if an incident should be triggered.
-
Customize fill values for detection intervals, trigger data gap events, trigger emergency events, trigger critical events, trigger warning events, and trigger recovery events; For such configurations, it is recommended that custom data gap time be >= detection interval duration. If the configured time <= detection interval duration, both data gap and anomaly conditions might be met simultaneously, in which case only the data gap handling result will apply.
Information Event Generation¶
When enabled, detection results not matching any of the above trigger conditions will be written as "information" events.
Note
When trigger conditions, data gaps, and information event generation are all configured, the following priority applies: Data Gaps > Trigger Conditions > Information Event Generation.
Additional Configuration¶
For more details, refer to Rule Configuration.