Skip to content

APM Metrics Monitoring


Used to monitor key Metrics data of APM within the workspace. The system counts the number of traces that meet the conditions within the specified time period. When the custom threshold is exceeded, an Incident is triggered.

Monitoring Configuration

Monitoring Frequency

The execution frequency of the monitoring rule.

Monitoring Interval

The time range for querying Metrics each time the task is executed. The available monitoring intervals vary depending on the monitoring frequency.

Monitoring Frequency Monitoring Interval (Dropdown Options)
30s 1m/5m/15m/30m/1h/3h
1m 1m/5m/15m/30m/1h/3h
5m 5m/15m/30m/1h/3h
15m 15m/30m/1h/3h/6h
30m 30m/1h/3h/6h
1h 1h/3h/6h/12h/24h
6h 6h/12h/24h
12h 12h/24h
24h 24h

Monitoring Metrics

Set the Metrics for monitoring data, which can be used to configure the Metrics data of services within the workspace for the specified time range.

Field Description
Service Monitor the APM services within the current workspace.
Metric Specific monitoring Metrics, including request count, error request count, request error rate, average requests per second, average response time, P50 response time, P75 response time, P90 response time, P99 response time, etc.
Filter Conditions Filter the monitoring data based on the tags of the Metrics to limit the monitoring scope. Supports adding one or more tag filters, and also supports fuzzy match and fuzzy non-match filter conditions.
Monitoring Dimensions Any string type (keyword) fields in the configuration data can be selected as monitoring dimensions, currently supporting up to three fields. By combining multiple monitoring dimension fields, a specific monitoring object can be determined. The system will determine whether the statistical Metrics of this monitoring object meet the threshold of the trigger conditions, and if so, an Incident will be generated.

For example, if the monitoring dimensions host and host_ip are selected, the monitoring object can be {host: host1, host_ip: 127.0.0.1}.

Count the number of traces that meet the conditions within the specified time period. When the custom threshold is exceeded, an Incident is triggered, which can be used for notifications of trace anomalies.

Field Description
Source The data source of the current monitoring Metrics.
Filter Conditions Filter the trace span through tags to limit the data scope of monitoring. Supports adding one or more tag filter conditions.
Aggregation Algorithm By default, "*" is selected, corresponding to the aggregation function count. If other fields are selected, the aggregation function automatically changes to count distinct (counting the number of data points where the keyword appears).
Monitoring Dimensions Any string type (keyword) fields in the configuration data can be selected as monitoring dimensions, currently supporting up to three fields. By combining multiple monitoring dimension fields, a specific monitoring object can be determined. The system will determine whether the statistical Metrics of this monitoring object meet the threshold of the trigger conditions, and if so, an Incident will be generated.

For example, if the monitoring dimensions host and host_ip are selected, the monitoring object can be {host: host1, host_ip: 127.0.0.1}.

Trigger Conditions

Set the trigger conditions for alert levels: You can configure any one of the trigger conditions for Critical, Major, Warning, and Normal.

Configure the trigger conditions and severity. When the query result is multiple values, any value that meets the trigger conditions will generate an Incident.

For more details, refer to Event Level Description.

If Continuous Trigger Judgment is enabled, it means that after the trigger conditions are met for multiple consecutive judgments, an Incident will be generated again. The maximum limit is 10 times.

Alert Levels
  1. Alert Levels Critical (Red), Major (Orange), Warning (Yellow): Based on the configured condition judgment operators.

  2. Alert Level Normal (Green): Based on the configured number of detections, explained as follows:

    • Each execution of a monitoring task counts as 1 detection, e.g., Monitoring Frequency = 5 minutes, then 1 detection = 5 minutes;
    • The number of detections can be customized, e.g., Monitoring Frequency = 5 minutes, then 3 detections = 15 minutes.
    Level Description
    Normal After the monitoring rule takes effect, if Critical, Major, or Warning Incidents are generated, and the monitoring data returns to normal within the configured number of detections, a recovery alert Incident will be generated.
    ⚠ Recovery alert Incidents are not restricted by Alert Silence. If the number of detections for recovery alert Incidents is not set, the alert Incident will not recover and will remain in the Incidents > Unrecovered Incidents List.

Data Gap

For data gap status, seven strategies can be configured.

  1. Link the monitoring interval time range to judge the query result of the most recent minutes of the monitoring Metrics, do not trigger an Incident;

  2. Link the monitoring interval time range to judge the query result of the most recent minutes of the monitoring Metrics, treat the query result as 0; at this time, the query result will be re-compared with the threshold configured in the Trigger Conditions above to determine whether to trigger an Incident.

  3. Customize the fill value for the monitoring interval, trigger a data gap Incident, trigger a Critical Incident, trigger a Major Incident, trigger a Warning Incident, and trigger a recovery Incident; when selecting this type of configuration strategy, it is recommended that the custom data gap time configuration >= monitoring interval time interval. If the configured time <= monitoring interval time interval, there may be situations where both data gap and anomaly conditions are met, in which case only the data gap processing result will be applied.

Information Generation

When this option is enabled, the monitoring results that do not match the above trigger conditions will generate "Information" Incidents.

Note

When trigger conditions, data gap, and information generation are configured simultaneously, the triggering is judged in the following priority: data gap > trigger conditions > information Incident generation.

Other Configurations

For more details, refer to Rule Configuration.

Feedback

Is this page helpful? ×