Log Monitoring¶
The Log Monitoring feature is used to monitor all log data generated by log collectors within a workspace. It supports setting alert rules based on log keywords to quickly identify abnormal patterns that deviate from expected behavior, such as abnormal tags appearing in log text. This enables timely detection and response to potential security threats or system issues.
Use Cases¶
Mostly applicable for code exceptions or task scheduling detection in IT monitoring scenarios. For example, monitoring excessively high log error rates.
Configuration¶
Check Frequency¶
Refers to the execution frequency of the detection rule; defaults to 5 minutes.
Check Interval¶
Refers to the time range for metric queries each time the task is executed. The available check intervals vary depending on the check frequency.
| Check Frequency | Check Interval (Dropdown Options) |
|---|---|
| 1m | 1m/5m/15m/30m/1h/3h |
| 5m | 5m/15m/30m/1h/3h |
| 15m | 15m/30m/1h/3h/6h |
| 30m | 30m/1h/3h/6h |
| 1h | 1h/3h/6h/12h/24h |
| 6h | 6h/12h/24h |
| 12h | 12h/24h |
| 24h | 24h |
Detection Metrics¶
Monitors the number of logs containing the specified keywords within a certain time range for the designated detection objects in the log list.
| Field | Description |
|---|---|
| Index | The index to which the current detection metric belongs; multiple selections are allowed. ❗️ After setting up indices in Logs > Indices, when selecting "Logs" as the data source for chart queries, different indices corresponding to log content can be chosen, defaulting to the default index. |
| Source | The data source of the current detection metric; supports selecting all (*) or a specific single data source. |
| Keyword Search | Supports keyword search. |
| Filter Conditions | Filters the data of the detection metrics based on the metric's tags to limit the detection data scope; supports adding one or multiple tag filters; supports fuzzy match and fuzzy not match filter conditions. |
| Aggregation Algorithm | Defaults to "*", corresponding to the count function. If another field is selected, the function automatically changes to count distinct (counts distinct data points where the keyword appears). |
| Detection Dimension | Any string-type (keyword) field in the configuration data can be selected as a detection dimension. Currently, up to three fields can be selected as detection dimensions. By combining multiple detection dimension fields, a specific detection object can be determined. The system will check if the statistical metrics for a detection object meet the threshold of the trigger conditions. If conditions are met, an event is generated.(For example, selecting detection dimensions host and host_ip means a detection object could be {host: host1, host_ip: 127.0.0.1}.) When the detection object is "Logs", the default detection dimensions are status, host, service, source, filename. |
| Query Method | Supports simple query and expression query. If the query method is expression query and contains multiple queries, the log detection object is the same. For example, if expression query A's detection object is "Logs", then expression query B's detection object is also "Logs". |
Trigger Conditions¶
Set trigger conditions for alert severity levels: You can configure any one of Critical, Important, Warning, or Normal trigger conditions.
Configure trigger conditions and severity levels. When the query result contains multiple values, an event is generated if any value meets the trigger condition.
For more details, refer to Event Level Description.
Consecutive Trigger Judgment¶
If Consecutive Trigger Judgment is enabled, you can configure that an event is generated only after the trigger condition is met consecutively for a specified number of times. The maximum is 10 times.
Bulk Alert Protection¶
Enabled by default.
When the number of alerts generated in a single detection exceeds a preset threshold, the system automatically switches to a status summary strategy: instead of processing each alert object individually, it generates a small number of summary alerts based on event status and pushes them.
This ensures the timeliness of notifications while significantly reducing alert noise and avoiding timeout risks caused by processing too many alerts.
Note
When this switch is enabled, subsequent Event Details generated by the monitor for such anomalies will not display historical records and associated events.
Alert Level¶
-
Alert Level Critical (red), Important (orange), Warning (yellow);
-
Alert Level Normal (green): Based on the configured number of detection times, explained as follows:
- Each execution of a detection task counts as 1 detection. For example, if
Check Frequency = 5 minutes, then 1 detection = 5 minutes; - The number of detections can be customized. For example, if
Check Frequency = 5 minutes, then 3 detections = 15 minutes.
Level Description Normal After the detection rule takes effect, if an abnormal event (Critical, Important, Warning) occurs and the data detection result returns to normal within the configured custom number of detection times, a recovery alert event is generated.
❗️ Recovery alert events are not subject to Alert Silence restrictions. If the number of detections for recovery alert events is not set, the alert event will not recover and will remain in the Events > Unrecovered Events List. - Each execution of a detection task counts as 1 detection. For example, if
Data Gap¶
Seven strategies can be configured for data gap status.
-
Linked with the check interval time range, judge the query result for the most recent minutes of the detection metric, do not trigger an event;
-
Linked with the check interval time range, judge the query result for the most recent minutes of the detection metric, treat the query result as 0; at this time, the query result will be re-compared with the threshold configured in the Trigger Conditions above to determine whether to trigger an abnormal event.
-
Custom fill the check interval value, trigger data gap event, trigger critical event, trigger important event, trigger warning event, and trigger recovery event; for this type of configuration strategy, the recommended custom data gap time configuration is >= check interval time span. If the configured time <= check interval time span, situations satisfying both data gap and abnormal conditions may occur. In such cases, only the data gap processing result will be applied.
Information Generation¶
Enable this option to generate "Information" events for detection results that do not match any of the above trigger conditions.
Note
When trigger conditions, data gap, and information generation are configured simultaneously, triggering is judged according to the following priority: Data Gap > Trigger Conditions > Information Event Generation.
Other Configuration¶
For more details, refer to Rule Configuration.