Process Anomaly Detection¶
Current Document Location
This document is the second step in the detection rule configuration process. After completing the configuration, please return to the main document to continue with the third step: Event Notification.
Used to monitor process data within the workspace, supporting the configuration of alert trigger conditions for one or more field types in process data. By monitoring process count statistics, it helps promptly detect process anomalies (such as process disappearance, abnormal increase in process count, etc.), ensuring system service stability.
Data Scope: Supports the Process data type.
Applicable to scenarios requiring monitoring of specific process running status. For example:
- Monitoring whether critical business processes (such as
nginx,mysql,java, etc.) are running normally - Monitoring whether the process count in a specific mode is abnormal (e.g., too many zombie processes)
- Monitoring whether scheduled task processes start normally
Detection Configuration¶
Detection Frequency¶
Set the time cycle for executing detection.
-
Preset Options: 1 minute, 5 minutes (default), 10 minutes, 15 minutes, 30 minutes, 1 hour
-
Crontab Mode: Click "Switch to Crontab Mode" to configure a custom cycle, supporting scheduled task execution based on seconds, minutes, hours, days, months, weeks, etc.
Detection Interval¶
Set the data time range queried for each detection (❗️The detection interval should be greater than or equal to the detection frequency, and should match the actual data reporting cycle to avoid missed detection or false alarms).
| Detection Frequency | Detection Interval (Dropdown Options) |
|---|---|
| 30s | 1m/5m/15m/30m/1h/3h |
| 1m | 1m/5m/15m/30m/1h/3h |
| 5m | 5m/15m/30m/1h/3h |
| 15m | 15m/30m/1h/3h/6h |
| 30m | 30m/1h/3h/6h |
| 1h | 1h/3h/6h/12h/24h |
| 6h | 6h/12h/24h |
| 12h | 12h/24h |
| 24h | 24h |
- Custom Format: Custom input for detection interval, e.g.: 20m (last 20 minutes), 2h (last 2 hours), 1d (last 1 day).
Detection Target¶
Set the process data to be detected (❗️Avoid selecting high-cardinality fields as detection dimensions. If configured improperly, overly lenient trigger conditions may cause frequent alerts. The current query returns a maximum of 100,000 records).
Supports setting the count of occurrences of one or more field type keywords within process data in the current workspace over a specified time range.
Configuration Elements¶
| Configuration Item | Description |
|---|---|
| Detection Type | Fixed as "Process Count Statistics", used to count the number of processes matching the conditions |
| Process | Requires manual input of process name, supports wildcards for fuzzy matching (e.g., k8s*, C:\\Windows\\*), special characters do not need escaping, multiple values are separated by "," |
| Filter Conditions | Supports filtering fields of process data to limit the data scope for detection; supports adding one or more tag filters; supports fuzzy match and fuzzy not-match filter conditions |
| Detection Dimensions | Any string-type (keyword) field in the configuration data can be selected as a detection dimension. Currently, a maximum of three fields can be selected as detection dimensions. By combining multiple detection dimension fields, a specific detection target can be determined. The system will judge whether the statistical metrics corresponding to a detection target meet the threshold of the trigger conditions. If the conditions are met, an event is generated.(For example, selecting detection dimensions host and host_ip, the detection target could be {host: host1, host_ip: 127.0.0.1}.) |
Trigger Conditions¶
Configure trigger conditions and severity levels. When the query result contains multiple values, an event is generated if any value satisfies the trigger conditions.
Supports configuring four-level thresholds: Critical, Severe, Important, Warning, as well as Normal recovery conditions.
| Level | Configuration | Description |
|---|---|---|
| Critical | When Result >= [value] |
Highest level alert, requires immediate action |
| Severe | When Result >= [value] |
High-level alert, requires priority handling |
| Important | When Result >= [value] |
Medium-level alert, requires attention |
| Warning | When Result >= [value] |
Low-level alert, requires awareness |
| Normal | No events generated for [N] consecutive detections |
After the detection rule takes effect, if the data detection result changes from abnormal (Critical, Severe, Important, Warning) to normal within the configured custom number of detections, a recovery alert event is triggered. ❗️ Recovery alert events are not restricted by Alert Silence. If the recovery alert event detection count is not set, the alert event will not recover and will remain in the Events > Unrecovered Events List |
For more details, refer to Event Level Description.
Advanced Options¶
Consecutive Trigger Judgment¶
When enabled, events are generated only when trigger conditions are continuously met, avoiding false alarms due to transient fluctuations (❗️Maximum configuration limit is 10 times).
Bulk Alert Protection¶
Enabled by default in the system.
When the number of alerts generated in a single detection exceeds the preset threshold, the system automatically switches to a status summary strategy: instead of processing each alert target individually, it generates a small number of summary alerts based on event status and pushes them.
This ensures notification timeliness while significantly reducing alert noise and avoiding timeout risks due to processing too many alerts.
When this switch is on, subsequent Event Details generated by the monitor for such anomalies will not display historical records and associated events.
Data Gap¶
Processing strategy when the detection metric query result is empty within the detection interval:
| Option | Description |
|---|---|
| Do Not Trigger Event (Default) | Links to the time range of the detection interval, judges whether to generate an event based on the query results of the detection metric in the last several minutes. Suitable for scenarios where data gaps are allowed |
| Treat Query Result as 0 | Links to the time range of the detection interval, treats the query result of the detection metric in the last several minutes as 0, and re-compares it with the thresholds configured in the Trigger Conditions above to determine whether to trigger an abnormal event |
| Custom Fill and Trigger Event | Supports custom filling of the detection interval value, and triggers the following event types respectively: Data Gap Event, Critical Event, Severe Event, Important Event, Warning Event, and Recovery Event. ❗️When choosing this strategy, it is recommended to configure the custom data gap time ≥ the detection interval time span; if the configured time ≤ the detection interval time span, situations where both data gap and anomaly conditions are met may occur, in which case the data gap processing result will be applied first |
When trigger conditions, data gap, and information generation are configured simultaneously, the triggering priority is judged as follows: Data Gap > Trigger Conditions > Information Event Generation.
That is: first judge whether there is a data gap, then judge whether trigger thresholds are met, and finally judge whether to generate an information event.
Information Generation¶
When this option is enabled, the system writes all detection results that do not match the above trigger conditions as "Information" events.
Applicable to scenarios requiring recording of normal status changes or low-priority information.
Subsequent Configuration¶
After completing the above detection configuration, please continue to configure:
-
Event Notification: Define event title, content, notification members, data gap handling, and associated incidents;
-
Alert Configuration: Select alert strategies, set notification targets, and mute periods;
-
Association: Associate dashboards for quick jump to view data;
-
Permissions: Set operation permissions to control who can edit/delete this monitor.