Application Performance Metrics Detection¶
Used to monitor key metrics data of APM within the workspace. The system counts the number of qualified traces within a specified time period, and triggers an anomaly event when it exceeds the custom threshold.
Detection Configuration¶
Detection Frequency¶
The execution frequency of the detection rule.
Detection Interval¶
The time range for querying metrics each time the task is executed. The available detection intervals vary depending on the detection frequency.
| Detection Frequency | Detection Interval (Dropdown Options) |
|---|---|
| 30s | 1m/5m/15m/30m/1h/3h |
| 1m | 1m/5m/15m/30m/1h/3h |
| 5m | 5m/15m/30m/1h/3h |
| 15m | 15m/30m/1h/3h/6h |
| 30m | 30m/1h/3h/6h |
| 1h | 1h/3h/6h/12h/24h |
| 6h | 6h/12h/24h |
| 12h | 12h/24h |
| 24h | 24h |
Detection Metrics¶
Set the metrics for detection data, which can be used to configure the metrics data of services within the workspace for a specified time range.
| Field | Description |
|---|---|
| Service | Monitor the APM services within the current workspace. |
| Metrics | Specific detection metrics, including request count, error request count, request error rate, average requests per second, average response time, P50 response time, P75 response time, P90 response time, P99 response time, etc. |
| Filter Conditions | Filter detection data based on metric tags to limit the detection scope. Supports adding one or more tag filters, and supports fuzzy matching and fuzzy non-matching filter conditions. |
| Detection Dimensions | Any string type (keyword) field in the configuration data can be selected as a detection dimension, currently supporting up to three fields. By combining multiple detection dimension fields, a specific detection object can be determined. The system will determine whether the statistical metrics of this detection object meet the threshold of the trigger condition, and if so, an event will be generated.For example, if the detection dimensions host and host_ip are selected, the detection object can be {host: host1, host_ip: 127.0.0.1}. |
Count the number of qualified traces within a specified time period, and trigger an anomaly event when it exceeds the custom threshold, which can be used for notifications of service trace anomalies.
| Field | Description |
|---|---|
| Source | The data source of the current detection metrics. |
| Filter Conditions | Filter traces span through tags to limit the data scope of detection. Supports adding one or more tag filter conditions. |
| Aggregation Algorithm | By default, "*" is selected, corresponding to the aggregation function count. If other fields are selected, the aggregation function automatically changes to count distinct (count the number of data points where the keyword appears). |
| Detection Dimensions | Any string type (keyword) field in the configuration data can be selected as a detection dimension, currently supporting up to three fields. By combining multiple detection dimension fields, a specific detection object can be determined. The system will determine whether the statistical metrics of this detection object meet the threshold of the trigger condition, and if so, an event will be generated.For example, if the detection dimensions host and host_ip are selected, the detection object can be {host: host1, host_ip: 127.0.0.1}. |
Trigger Conditions¶
Set the trigger conditions for alert levels: You can configure any one of the emergency, important, warning, and normal trigger conditions.
Configure the trigger conditions and severity. When the query result has multiple values, any value that meets the trigger condition will generate an event.
For more details, refer to Event Level Description.
Continuous Trigger Judgment¶
If continuous trigger judgment is enabled, it means that after the trigger condition is judged to be effective multiple times in a row, an event will be triggered again. The maximum limit is 10 times.
Bulk Alert Protection¶
Enabled by default.
When the number of alerts generated by a single detection exceeds the preset threshold, the system will automatically switch to the status summary strategy: instead of processing each alert object individually, it will generate a small number of summary alerts based on the event status and push them.
This ensures the timeliness of notifications and significantly reduces alert noise, avoiding the risk of timeout due to processing too many alerts.
Note
When this switch is enabled, the event details generated by subsequent monitor detections will not display historical records and associated events.
Alert Levels¶
-
Emergency (Red), Important (Orange), Warning (Yellow) Alert Levels: Based on the configured condition judgment operators.
-
Normal (Green) Alert Level: Based on the configured number of detections, described as follows:
-
Each execution of a detection task counts as 1 detection, e.g.,
detection frequency = 5 minutes, then 1 detection = 5 minutes; -
The number of detections can be customized, e.g.,
detection frequency = 5 minutes, then 3 detections = 15 minutes.
Level Description Normal After the detection rule takes effect, if emergency, important, or warning anomaly events are generated, and the data detection results return to normal within the configured custom number of detections, a recovery alert event will be generated.
Recovery alert events are not restricted by alert silence. If the number of detections for recovery alert events is not set, the alert event will not recover and will always appear in the Events > Unrecovered Events List.
-
Data Gap¶
For data gap status, seven strategies can be configured.
-
Link the detection interval time range, judge the query result of the detection metrics for the most recent minutes, do not trigger an event;
-
Link the detection interval time range, judge the query result of the detection metrics for the most recent minutes, consider the query result as 0; at this time, the query result will be re-compared with the threshold configured in the trigger conditions above to determine whether to trigger an anomaly event.
-
Custom fill the detection interval value, trigger data gap event, trigger emergency event, trigger important event, trigger warning event, and trigger recovery event; for this type of configuration strategy, it is recommended to configure the custom data gap time >= detection interval time interval. If the configured time <= detection interval time interval, there may be situations where both data gap and anomaly conditions are met, in which case only the data gap processing result will be applied.
Information Generation¶
Enable this option to generate "information" events for detection results that do not match the above trigger conditions.
Note
When trigger conditions, data gap, and information generation are configured simultaneously, the triggering priority is judged as follows: data gap > trigger conditions > information event generation.
Other Configurations¶
For more details, refer to Rule Configuration.