Monitoring¶
Guance provides a comprehensive anomaly monitoring system, offering a complete solution from detection and alerting to incident management based on a unified platform data. By creating monitors, you can perform continuous status assessments on data such as metrics, logs, application performance, user access, and objects. When a monitor detects an anomaly, it automatically triggers an alert and generates an incident. Incidents are aggregated into the Incident Center for management and analysis. It also supports alert muting and SLO management, enabling refined alert governance and stability measurement.
Monitoring System Architecture¶
The Guance monitoring system forms a closed loop consisting of three stages: Detection, Alerting, and Governance:
-
Detection Layer: Monitors perform continuous status assessments on multi-source data through Rule Detection or Intelligent Detection algorithms, generating incidents when anomalies are detected.
-
Alerting Layer: Alert Strategies are bound to monitors, defining incident severity levels and notification rules. Alert information is sent to designated recipients via configured notification targets.
-
Governance Layer: Mute Rules suppress alert noise during specific periods, and SLO management defines service stability objectives, enabling refined alert operations.
Core Process¶
Configure Monitors¶
Monitors are the core components that execute detection tasks, supporting the configuration of detection rules for data sources such as time series metrics, logs, APM, and RUM. You can choose between rule detection or intelligent detection based on the monitoring scenario:
-
Rule Detection: Supports various trigger rules such as threshold detection, sudden change detection, and range detection. It allows flexible configuration of detection frequency and trigger conditions, suitable for scenarios with clear anomaly judgment criteria.
-
Intelligent Detection: Employs machine learning technology to automatically analyze historical data characteristics and periodic patterns of metrics, intelligently identifying abnormal fluctuations. It is suitable for complex metrics with periodicity and trends, effectively compensating for the limitations of fixed threshold detection.
When a monitor detects an anomaly, it automatically generates an incident. Incidents are aggregated into the Incident Center, where you can view, analyze, and handle all monitoring-related incidents.
Set Up Alert Strategies¶
Alert strategies establish a complete mechanism from anomaly detection to notification handling.
When creating an alert strategy, you need to define the strategy name, select associated objects (All, Monitors, Intelligent Monitoring, SLO, Security Monitoring), configure the notification time zone, repeated alert time range, and alert aggregation mode, and finally set up notification rules.
Anomaly Governance¶
During the operation of the monitoring system, refined operations can be achieved through the following two methods:
-
Mute Management: To avoid alert noise during planned maintenance or known issues, mute rules can be set to suppress alert notifications within specified time periods based on monitor rules, alert strategies, monitor tags, or custom conditions.
-
SLO Management: Based on data generated by monitors (e.g., request success rate, latency), service stability objectives are defined. After creating an SLO and configuring the target value, the system continuously tracks achievement status and remaining error budget, providing a quantitative basis for service stability.