Synthetic Testing Anomaly Detection¶
Current Document Location
This document is the second step in the detection rule configuration process. After completing the configuration, please return to the main document to continue with the third step: Event Notification.
Used to monitor Synthetic Tests data within the workspace. You can set threshold ranges for specified data volumes generated by testing tasks within a time period. Once the data volume reaches these thresholds, the system triggers alerts. Supports monitoring based on performance metrics and count statistics for protocol types such as HTTP, TCP, ICMP, WEBSOCKET, and Multistep Tests.
Applicable for monitoring scenarios like URL availability, service port connectivity, and network latency in production environments. For example, monitoring whether the average response time or availability rate of critical business interfaces meets standards.
Detection Configuration¶
Detection Frequency¶
Set the time cycle for executing the detection.
-
Preset options: 1 minute, 5 minutes (default), 10 minutes, 15 minutes, 30 minutes, 1 hour, 6 hours, 12 hours, 24 hours
-
Crontab mode: Click "Switch to Crontab Mode" to configure a custom cycle, supporting scheduled task execution based on seconds, minutes, hours, days, months, weeks, etc.
Detection Interval¶
Set the data time range queried for each detection (❗️The detection interval should be greater than or equal to the detection frequency, and should match the actual data reporting cycle to avoid missed detections or false positives).
| Detection Frequency | Detection Interval (Dropdown Options) |
|---|---|
| 30s | 1m/5m/15m/30m/1h/3h |
| 1m | 1m/5m/15m/30m/1h/3h |
| 5m | 5m/15m/30m/1h/3h |
| 15m | 15m/30m/1h/3h/6h |
| 30m | 30m/1h/3h/6h |
| 1h | 1h/3h/6h/12h/24h |
| 6h | 6h/12h/24h |
| 12h | 12h/24h |
| 24h | 24h |
- Custom format: Custom input for detection interval, e.g., 20m (last 20 minutes), 2h (last 2 hours), 1d (last 1 day).
Detection Metrics¶
Set the metrics for detection data. Supports setting specified data generated by all or individual testing tasks within the current workspace as detection metrics (❗️Avoid selecting high-cardinality fields as detection dimensions. If configured improperly, overly lenient trigger conditions may cause frequent alerts. The current query maximum return count is 100,000 records).
Supports two query modes:
-
Testing Metrics (based on specific performance metrics)
-
Count Statistics (based on data source queries)
Testing Metrics¶
Detection based on specific performance metrics of Synthetic Tests.
| Configuration Item | Description |
|---|---|
| Testing Type | Includes protocol types such as HTTP Tests, TCP Tests, ICMP Tests, WEBSOCKET Tests, Multistep Tests, etc. |
| Testing Address | Supports monitoring all or individual testing tasks within the current workspace's Synthetic Tests; obtains the corresponding testing task list based on the selected testing type. |
| Metric | Supports detection based on metric dimensions, including: Average Response Time, P50 Response Time, P75 Response Time, P90 Response Time, P99 Response Time, Availability Rate, Error Request Count, Request Count, etc. |
| Detection Dimension | Any string-type (keyword) field in the configuration data can be selected as a detection dimension. Currently, a maximum of three fields can be selected as detection dimensions. By combining multiple detection dimension fields, a specific detection object can be determined. The system will judge whether the statistical metrics corresponding to a detection object meet the threshold of the trigger conditions. If the conditions are met, an event is generated.(For example, selecting detection dimensions host and host_ip, the detection object could be {host: host1, host_ip: 127.0.0.1}.) |
| Filter Conditions | Filter the data of detection metrics based on the tags of the metrics to limit the data scope of detection; supports adding one or more tag filters; supports fuzzy match and fuzzy non-match filter conditions. |
Count Statistics¶
Statistical detection based on testing data sources.
You can perform query statistics on testing tasks by selecting corresponding data sources (http_dial_testing, tcp_dial_testing, icmp_dial_testing, websocket_dial_testing, multi_dial_testing, etc.) based on different testing types.
Supports limiting the detection scope through keyword search or tag filtering.
In addition to simple queries, expression query methods are also supported.
Additional Information¶
Additional fields are only used for extra queries and will not be used for trigger condition judgment. You can configure them into event notifications. If multiple matching values are detected, one record will be returned randomly.
Trigger Conditions¶
Configure trigger conditions and severity levels. When the query result contains multiple values, an event is generated if any value satisfies the trigger conditions.
Supports configuring four levels of thresholds: Critical, Severe, Important, Warning, and a Normal recovery condition.
| Level | Configuration | Description |
|---|---|---|
| Critical | When Result >= [value] |
Highest level alert, requires immediate handling. |
| Severe | When Result >= [value] |
High-level alert, requires priority handling. |
| Important | When Result >= [value] |
Medium-level alert, requires attention. |
| Warning | When Result >= [value] |
Low-level alert, requires notice. |
| Normal | No events generated for [N] consecutive detections |
After the detection rule takes effect, if the data detection result changes from abnormal (Critical, Severe, Important, Warning) to normal within the configured custom number of detections, a recovery alert event is triggered. ❗️ Recovery alert events are not restricted by Alert Silence. If the recovery alert event detection count is not set, the alert event will not recover and will remain in the Events > Unrecovered Events List. |
For more details, refer to Event Level Description.
Advanced Options¶
Consecutive Trigger Judgment¶
When enabled, events are generated only when trigger conditions are continuously met, avoiding false positives from transient fluctuations (❗️Maximum configuration limit is 10 times).
Bulk Alert Protection¶
Enabled by default in the system.
When the number of alerts generated by a single detection exceeds a preset threshold, the system automatically switches to a status summary strategy: instead of processing each alert object individually, it generates a small number of summary alerts based on event status and pushes them.
This ensures notification timeliness while significantly reducing alert noise and avoiding timeout risks from processing too many alerts.
When this switch is on, subsequent event details generated by the monitor after detecting anomalies will not display historical records and associated events.
Data Gap¶
Processing strategy when the detection metric query result is empty within the detection interval:
| Option | Description |
|---|---|
| Do Not Trigger Event (Default) | Links to the time range of the detection interval. Determines whether to generate an event based on the query results of the detection metric within the last several minutes. Suitable for scenarios where data gaps are allowed. |
| Treat Query Result as 0 | Links to the time range of the detection interval. Treats the query result of the detection metric within the last several minutes as 0, and re-compares it with the thresholds configured in the Trigger Conditions above to determine whether to trigger an abnormal event. |
| Custom Fill and Trigger Event | Supports custom filling of the detection interval value and triggers the following event types respectively: Data Gap Event, Critical Event, Severe Event, Important Event, Warning Event, and Recovery Event. ❗️When choosing this strategy, it is recommended to configure the custom data gap time ≥ the detection interval time. If the configured time ≤ the detection interval time, situations where both data gap and anomaly conditions are met may occur. In such cases, the data gap processing result will be applied first. |
When Trigger Conditions, Data Gap, and Information Generation are configured simultaneously, the triggering priority is judged as follows: Data Gap > Trigger Conditions > Information Event Generation.
That is: first judge whether there is a data gap, then judge whether thresholds are triggered, and finally judge whether to generate an information event.
Information Generation¶
After enabling this option, the system writes all detection results that do not match the above trigger conditions as "Information" events.
Suitable for scenarios requiring recording of normal status changes or low-priority information.
Subsequent Configuration¶
After completing the above detection configuration, please continue to configure:
-
Event Notification: Define event title, content, notification members, data gap handling, and associated incidents;
-
Alert Configuration: Select alert strategies, set notification targets, and mute periods;
-
Association: Associate dashboards for quick jump to view data;
-
Permissions: Set operation permissions to control who can edit/delete this monitor.