Interval Detection V2¶
Current Document Location
This document is the second step in the detection rule configuration process. After completing the configuration, please return to the main document to continue with the third step: Event Notification.
Data Scope: Metrics (M), Tracing (T), RUM (R)
The V2 version of interval detection uses historical data to build a confidence interval and predict the normal fluctuation range. The system compares current data characteristics with historical data to determine if they exceed the confidence interval, thereby identifying anomalies and triggering alerts to ensure data stability and security.
Key Features:
- In-depth Analysis: Predicts normal fluctuations based on confidence intervals built from historical data.
- Continuous Updates: Continuously updated by the Guance algorithm team to enhance data processing capabilities.
Concepts¶
Confidence Interval Range (confidence_interval): A metric that measures the tolerance for fluctuation of time series data within a specific detection range, with values between 1% and 100%.
- When data volatility is high and randomness is strong, this value can be appropriately increased.
- When data fluctuations are regular, this value can be decreased.
If:
- The confidence interval is too large, the upper and lower boundaries widen, which reduces the number of anomaly detections.
- The confidence interval is too small, it may detect too many anomalies.
- The confidence interval is too large, it may fail to detect any anomalies.
Therefore, reasonably adjusting this parameter based on the fluctuation characteristics of the data is crucial for balancing the sensitivity and accuracy of anomaly detection, effectively avoiding excessive false positives or missed anomalies.
Detection Configuration¶
Detection Frequency¶
Set the time period for executing the detection.
- Fixed Frequency: 10 minutes (cannot be changed)
Detection Metrics¶
Define the detection data source and aggregation method based on DQL.
| Configuration Item | Description |
|---|---|
| Workspace | Defaults to the current workspace, can be switched to other authorized workspaces. After authorization, detection metrics from other workspaces under the current account can be used to create monitors. |
| Data Type | The data type currently being detected, including Metrics, APM (Tracing), RUM (RUM). |
| Measurement | The measurement where the current detection metric resides. |
| Metric | The metric targeted by the current detection. |
| Aggregation Algorithm | Supports Avg by (average), Min by (minimum), Max by (maximum), Sum by (sum), Last (last value), First by (first value), Count by (data point count), Count_distinct by (distinct data point count), p50 (median), p75 (75th percentile), p90 (90th percentile), p99 (99th percentile). |
| Detection Dimension | String-type (keyword) fields in the configured data can be selected as detection dimensions. Currently, up to three fields are supported. The combination of multiple detection dimension fields can determine a specific detection object (e.g., {host: host1, host_ip: 127.0.0.1}). |
| Filter Conditions | Filter detection data based on metric tags to limit the detection scope. Supports adding one or more tag filters, including fuzzy match and fuzzy non-match conditions. |
| Alias | Custom name for the detection metric. |
| Query Method | Supports Simple Query and Expression Query. |
Trigger Conditions¶
Configure trigger conditions for each alert level (Critical, Severe, Important, Warning), as well as normal recovery conditions.
| Configuration Item | Description |
|---|---|
| Mutation Direction | Select the direction of data anomaly: |
| Confidence Interval Upper/Lower Bound Range | Set the width of the confidence interval (1-100%). Predicts the width of the confidence interval range. For metrics with high volatility, the confidence interval width can be appropriately increased to avoid false positives. |
| Critical/Severe/Important/Warning | Triggered when Result >= [value] %. Compares the proportion of mutated abnormal data points; triggers an event if it falls outside the configured range. |
| Normal | No events generated for [N] consecutive detections. After the detection rule takes effect, if the data detection result changes from abnormal to normal within the configured number of custom detections, a recovery alert event is triggered. |
Bulk Alert Protection¶
Enabled by default. When the number of alerts generated in a single detection exceeds the preset threshold (100), the system automatically activates the summary-by-status strategy. It pauses the aggregation and muting process for individual objects, generates and pushes summary events by status, ensuring notification timeliness while significantly reducing noise and avoiding processing timeout risks. When this switch is enabled, subsequent event details generated by this monitor after detecting anomalies will not display historical records and associated events.
Note
Recovery alert events are not restricted by Alert Muting. If the number of detections for recovery alert events is not set, the alert event will not recover and will remain in the Events > Unrecovered Events List.
Data Gap¶
Processing strategy when the query result for the detection metric is empty within the detection interval:
| Option | Description |
|---|---|
| Do Not Trigger Event | Links to the time range of the detection interval. Determines whether to generate an event based on the query results of the detection metric in the last few minutes. |
| Treat Query Result as 0 | Links to the time range of the detection interval. Treats the query results of the detection metric in the last few minutes as 0, and re-compares them with the thresholds configured in the trigger conditions above to determine whether to trigger an anomaly event. |
| Custom Fill and Trigger Event | Supports custom filling of the detection interval value and triggers the following event types respectively: Data Gap Event, Critical Event, Important Event, Warning Event, and Recovery Event. When selecting this strategy, it is recommended that the configured custom data gap time be ≥ the time interval of the detection interval. If the configured time is ≤ the detection interval time, situations where both data gap and anomaly conditions are met may occur. In such cases, the data gap processing result will be prioritized. |
Information Generation¶
When this option is enabled, the system writes all detection results that do not match the above trigger conditions as "Information" events.
When trigger conditions, data gap, and information generation are configured simultaneously, the triggering priority is determined as follows: Data Gap > Trigger Conditions > Information Event Generation.
Subsequent Configuration¶
After completing the above detection configuration, please continue to configure:
- Event Notification: Define event title, content, notification members, data gap handling, and associated incidents.
- Alert Configuration: Select alert strategies, set notification targets, and mute periods.
- Association: Associate dashboards for quick navigation to view data.
- Permissions: Set operation permissions to control who can edit/delete this monitor.
