Network Data Monitoring¶
A tool for monitoring network performance metrics within a workspace, allowing users to set threshold ranges and trigger alerts when metrics exceed these thresholds. Guance supports configuring alerting rules for single metrics and allows customization of alert severity levels.
Use Cases¶
Supports monitoring metric data from data sources netflow/httpflow. For example, monitoring the request count, error count, and error rate for the httpflow data source on a host.
Configuration¶
Check Frequency¶
Refers to the execution frequency of the detection rule; defaults to 5 minutes.
Check Interval¶
Refers to the time range queried for the detection metrics each time the task is executed. The available check intervals vary depending on the check frequency.
| Check Frequency | Check Interval (Dropdown Options) |
|---|---|
| 30s | 1m/5m/15m/30m/1h/3h |
| 1m | 1m/5m/15m/30m/1h/3h |
| 5m | 5m/15m/30m/1h/3h |
| 15m | 15m/30m/1h/3h/6h |
| 30m | 30m/1h/3h/6h |
| 1h | 1h/3h/6h/12h/24h |
| 6h | 6h/12h/24h |
| 12h | 12h/24h |
| 24h | 24h |
Detection Metrics¶
Set the metrics for the detection data. Supports setting metric data for all/single services in the workspace list over a certain time range.
| Field | Description |
|---|---|
| Data Source | Supported: netflow, httpflow. |
| Metric | netflow: Bytes sent, Bytes received, TCP delay, TCP jitter, TCP connections, TCP retransmissions, TCP closures;httpflow: Request count, Error count, Error rate, Average response time, P99 response time, P95 response time, P75 response time, P50 response time. |
| Filters | Filter the data of the detection metrics based on the tags of the metrics to limit the scope of the detected data. Supports adding one or more tag filters, and supports fuzzy match and fuzzy not match filter conditions. |
| Detection Dimensions | Any string-type (keyword) fields in the configuration data can be selected as detection dimensions. Currently, a maximum of three fields can be selected as detection dimensions. By combining multiple detection dimension fields, a specific detection object can be identified. Guance will determine if the statistical metrics corresponding to a detection object meet the threshold of the trigger condition. If the condition is met, an event is generated.(For example, selecting detection dimensions host and host_ip means the detection object could be {host: host1, host_ip: 127.0.0.1}.) |
Trigger Conditions¶
Set trigger conditions for alert levels: You can freely configure one of the trigger conditions: Critical, High, Warning, Normal.
Configure the trigger conditions and severity. If the query result contains multiple values, an event is generated if any value meets the trigger condition.
For more details, refer to Event Level Description.
Consecutive Trigger Judgment¶
If Consecutive Trigger Judgment is enabled, you can configure that an event is triggered only after the trigger condition is met consecutively for a specified number of times. The maximum is 10 times.
Bulk Alert Protection¶
Enabled by default.
When the number of alerts generated in a single detection exceeds a preset threshold, the system automatically switches to a status aggregation strategy: instead of processing each alert object individually, it generates a small number of summary alerts based on event status and pushes them.
This ensures timely notification while significantly reducing alert noise and avoiding timeout risks caused by processing too many alerts.
Note
When this switch is enabled, the subsequent event details generated by the monitor after detecting anomalies will not display historical records and associated events.
Alert Level¶
-
Alert Level Critical (red), High (orange), Warning (yellow);
-
Alert Level Normal (green): Based on the configured number of detection times, explained as follows:
-
Each execution of a detection task counts as 1 detection. For example, if
Check Frequency = 5 minutes, then 1 detection = 5 minutes; -
The number of detections can be customized. For example, if
Check Frequency = 5 minutes, then 3 detections = 15 minutes.
Level Description Normal After the detection rule takes effect, if the data detection result returns to normal within the configured number of custom detections after a Critical, High, or Warning abnormal event is generated, a recovery alert event is generated.
❗️ Recovery alert events are not subject to Alert Silence restrictions. If the number of detections for recovery alert events is not set, the alert event will not recover and will remain in the Events > Unrecovered Events List. -
Data Gap¶
Seven strategies can be configured for data gap status.
-
Linked with the check interval time range, judge the query result of the detection metrics for the most recent minutes, do not trigger an event;
-
Linked with the check interval time range, judge the query result of the detection metrics for the most recent minutes, treat the query result as 0; at this time, the query result will be re-compared with the threshold configured in the Trigger Conditions above to determine whether to trigger an abnormal event.
-
Custom fill the check interval value, trigger data gap event, trigger critical event, trigger high event, trigger warning event, and trigger recovery event; for this type of configuration strategy, the custom data gap time configuration is recommended to be >= the check interval time span. If the configured time is <= the check interval time span, situations where both data gap and abnormal conditions are met may occur. In such cases, only the data gap processing result will be applied.
Information Generation¶
Enable this option to generate "Information" events for detection results that do not match any of the above trigger conditions.
Note
If Trigger Conditions, Data Gap, and Information Generation are configured simultaneously, the triggering is judged according to the following priority: Data Gap > Trigger Conditions > Information Event Generation.
Other Configuration¶
For more details, refer to Rule Configuration.