Detection Rules¶
The system supports over a dozen monitoring detection rules, covering different data scopes.
Rule Types¶
Rule Name |
Data Scope |
Basic Description |
---|---|---|
Threshold Detection | All | Detect anomalies in metric data based on set thresholds. |
Mutation Detection | Metrics(M) | Detect sudden abnormal behavior of metrics based on historical data; suitable for business data and scenarios with short time windows. |
Interval Detection | Metrics(M) | Detect anomalous data points in metrics based on dynamic threshold ranges; suitable for stable trend timelines. |
Interval Detection V2 | Metrics(M) | Detect anomalous data points in metrics based on dynamic threshold ranges; suitable for stable trend timelines. |
Outlier Detection | Metrics(M) | Detect whether the metrics/statistics of the detection object in a specific group deviate from the norm. |
Log Detection | Logs(L) | Detect anomalies in business applications based on log data. |
Process Anomaly Detection | Process Objects(O::host_processes ) |
Periodically detect process data to understand process anomalies. |
Infrastructure Survival Detection V2 | Objects(O) | Set survival conditions based on infrastructure object data to monitor the stability of infrastructure. |
Application Performance Metric Detection | Tracing(T) | Set threshold rules based on application performance monitoring data to detect anomalies. |
User Access Metric Detection | User Access Data(R) | Set threshold rules based on user access monitoring data to detect anomalies. |
Composite Detection | All | Combine multiple monitors' results into one monitor via expressions, then trigger alerts based on the combined result. |
Security Inspection Anomaly Detection | Security Inspection(S) | Perform anomaly detection based on security inspection data, which can effectively perceive the health status of hosts. |
Synthetic Testing Anomaly Detection | Synthetic Testing Data(L::Type ) |
Set threshold rules based on synthetic testing data to detect anomalies. |
Network Data Detection | Network(N) | Set threshold rules based on network data to detect the stability of network performance. |
Third-party Event Detection | Others | Send anomaly events or records generated by third-party systems to an HTTP server via POST requests after specifying a URL address, generating event data. |
Infrastructure Change Detection | Objects(O) | Monitor various change behaviors based on tracking the lifecycle of infrastructure, accurately identifying configuration drifts, unauthorized operations, and other abnormal conditions. |
Rule Configuration¶
Detection Configuration¶
Set corresponding detection frequencies, detection intervals, detection metrics, etc., for different detection rules.
Event Notifications¶
Event Title¶
Define the event name for alert trigger conditions; you can use predefined template variables.
Note
In the latest version, the monitor name will be automatically generated after entering the event title. In older monitors, there may be inconsistencies between the monitor name and the event title; it is recommended to synchronize to the latest version.
Event Content¶
Enter the event notification content, and when the trigger conditions are met, the system will send this content externally. It usually includes the following information:
- Main body in Markdown format;
- You can insert associated links and template variables;
- Add associated logs or error messages based on advanced settings;
- Target notification members for sending event content.
Note
The @ member
configuration will only take effect and send the event content to the specified members when correlation with anomaly tracking is enabled.
Associated Links¶
Monitors will automatically generate jump links based on the detection metrics in the detection configuration. You can adjust filtering conditions and time ranges after inserting links. These are generally fixed link prefixes that include the current domain name and workspace ID; you can also choose custom jump links.
Additionally, if you need to insert a link to jump to a dashboard, you must supplement the dashboard's ID and name based on the above logic, adjusting view variables and time ranges as needed.
Custom Advanced Settings¶
Through advanced settings, you can add associated logs or error stacks in the event content to view contextual data when anomalies occur.
- Add associated logs:
Query:
For example, retrieve a log message
with index default
:
Associated logs:
- Add associated error stack
Query:
{% set dql_data = DQL("T::re(`.*`):(`error_message`,`error_stack`){ (`source` NOT IN ['service_map', 'tracing_stat', 'service_list_1m', 'service_list_1d', 'service_list_1h', 'profile']) AND (`error_stack` = exist()) } LIMIT 1") %}
Associated error stack:
Custom Notification Content¶
By default, the system will use the event content as the alert notification content. If you want to customize the actual external notification, you can enable the switch here and fill in the notification information.
Note
Different alert notification targets support different Markdown syntax. For example, WeCom does not support unordered lists.
Data Gaps Events¶
Customize the notification content for data gaps. You can configure the title, content, and other details sent externally for such events.
If no configuration is made here, the official default notification template will be used automatically when sending notifications externally.
Correlation with Incident Tracking¶
After enabling correlation, if an anomaly event occurs under this monitor, an Issue will be created simultaneously. You can choose to create Issues corresponding to different event levels.
- Select the event level;
- Define the final level of the generated Issue;
- Choose the responsible person for this type of Issue;
- Select the delivery channel;
- Optionally choose whether to close the Issue synchronously after the event is resolved.
Issues generated here can be viewed at Incident > Your selected Channel.
Alert Configuration¶
Once the monitor meets the trigger conditions, immediately send an alert message to the designated notification target. The alert strategy includes the event levels that need to be notified, notification targets, and the alert silence cycle.
Alert strategies support single or multiple selections. Clicking the strategy name will expand the detail page. To modify the strategy, click Edit Alert Strategy.
Correlation¶
Supports associating monitors with dashboards for quick jumps and visualization of related data.
Permissions¶
After setting the operation permissions for monitors, ensure that different users perform configured actions according to their roles and permission levels.
- Not enabling this configuration: Follow the default permissions for "Monitor Configuration Management";
- Enabling this configuration and selecting custom permission objects: Only the creator and authorized objects can enable/disable, edit, or delete the rules set for this monitor;
- Enabling this configuration but not selecting custom permission objects: Only the creator has the enable/disable, edit, and delete permissions for this monitor.
Note
The Owner role of the current workspace is not affected by the operational permission configuration here.
Restore Monitors¶
Support viewing the status, last update time, creation time, and creator of existing monitors. You can restore historical configurations of monitors to quickly communicate and collaborate with other team members to update monitors.
Operation Example:
In Monitoring > Monitors, select an existing monitor to edit. On the monitor configuration page, click the button in the top-right corner to view the monitor's status, last update time, creation time, and creator.
Click the view button to the right of the Update Time in the above image to open a new browser window showing the configuration of the previous version of the monitor;
Click the Restore This Version button in the top-right corner of the previous version of the monitor. In the pop-up dialog box, confirm the restoration to restore the configuration of the previous version of the monitor for editing and saving.