Skip to content

Log Detection


Log detection is used to monitor all log data generated by log collectors within the workspace. It supports setting alerts based on keywords in the logs to promptly identify abnormal patterns that do not conform to expected behavior (e.g., abnormal tags in log text data). It is widely applicable to scenarios such as detecting code anomalies or task scheduling in IT monitoring.

Use Case

Most of them are suitable for code anomaly or task scheduling detection in IT monitoring scenarios. For example, the monitoring log error rate is too high.

Setup

Step 1: Detection Configuration

Detection Frequency: The execution frequency of detection rules, including 1m/5m/15/30m/1h/6h (5m is selected by default).

Detection Interval: The time range of detection metric query when each task is executed. The optional detection interval will be different due to the influence of detection frequency.

Detection Frequency Detection Interval (Drop-down Option)
1m 1m/5m/15m/30m/1h/3h
5m 5m/15m/30m/1h/3h
15m 15m/30m/1h/3h/6h
30m 30m/1h/3h/6h
1h 1h/3h/6h/12h/24h
6h 6h/12h/24h

Detection Metrics: Monitor the number of logs with the set keywords on the log list of specified detection objects within a certain time range.

Field Description
Index The index to which the current detection belongs.
Note:If the index is set in Log > Index, when "Log" is selected in the data source of chart query, the log contents corresponding to different indexes can be selected, and the default index is default.
Source The data source of the current detection metric is supported by selecting all (*) or specifying a single data source.
Keyword Search Keyword search is supported.
Filtering Metric-based labels filter the data of detecting metrics, limit the range of detected data, support adding one or more labels to filter, fuzzy matching and fuzzy mismatching screening conditions.
Aggregation Algorithm * is selected by default, and the corresponding function is count. If other fields are selected, the function automatically changes to Count distinct.
Detection Dimension The corresponding string type (keyword) fields in the configuration data can be selected as detection dimensions. At present, the detection dimensions support selecting up to three fields. Through the combination of fields of multiple detection dimensions, a certain detection object can be determined, and the guance will judge whether the statistical index corresponding to a detection object meets the threshold of trigger conditions, and if it meets the conditions, an event will be generated. (For example, if the detection dimensions host and host_ip are selected, the detection object can be {host: host1, host_ip: 127.0.0.1}); When the detection object is "log", the default detection dimensions are status, host, service, source and filename.
Query Mode Simple query and expression query are supported. If the query mode is expression query and contains multiple queries, the log detection object is the same. If the detection object of expression query A is "log", the detection object of expression query B is also "log".
For details, refer to query.

Trigger Condition: Set the trigger condition of alert level; You can configure any of the following trigger conditions: Critical, Error, Warning, No Data, or Information.

Configure the trigger condition and severity. When the query result is multiple values, an event will be generated if any value meets the trigger conditions.

See Event Levels.

I. Alert levels: Critical (red), Important (orange), Warning (yellow): Based on the configured conditions using operators.

II. Alert levels: OK (green), Information (blue): Based on the configured number of detections, as explained below:

  • One test is performed for each test task, if "test frequency = 5 minutes", then one test = 5 minutes
  • You can customize the number of tests, such as "Test frequency = 5 minutes", then 3 tests = 15 minutes
Level Description
OK After the detection rule takes effect, if the result of an urgent, important, or warning abnormal event returns to normal within the configured number of custom detections, a recovery alert event is generated.
⚠ Recovery alert events are not affected by Mute Alerting. If no detection count is set for recovery alert events, the alert event will not recover and will always appear in the Events > Unrecovered Events List.
Information Events are generated even for normal detection results.

III. Alert level: No Data (gray): The no data state supports three configuration strategies: Trigger No-Data Event, Trigger Recovery Event, and Untrigger Event.

Step 2: Event Notification

Event Title: Set the event name of the alert trigger condition; support the use of preset template variables.

Note: In the latest version, the Monitor Name will be automatically generated based on the Event Title input. In older monitors, there may be inconsistencies between the Monitor Name and the Event Title. To enjoy a better user experience, please synchronize to the latest version as soon as possible. One-click replacement with event title is supported.

Event Content: The content of the event notification sent when the trigger conditions are met. Support inputting text in Markdown format, previewing effects, the use of preset associated links and the use of preset template variables.

Note: Different alert notification objects support different Markdown syntax. For example, WeCom does not support unordered lists.

Alert Strategy: After the monitoring meets the trigger conditions, immediately send an alert message to the specified notification targets. The Alert Strategy includes the event level that needs to be notified, the notification targets and the mute alerting period.

Synchronously create Issue: If abnormal events occur under this monitor, an issue for anomaly tracking will be created synchronously and delivered to the channel for anomaly tracking. You can go to Incident > Your selected Channel to view it.

Step 3: Association

Associate Dashboard: Every monitor supports associating with a dashboard for quick navigation and viewing.

Example

Take the source and service log error rate as an example, divide the number of errors by the total number to obtain the log error rate.

Feedback

Is this page helpful? ×