Metrics¶
Metrics are the core data units in the Guance system used for continuously tracking the state of a system. They consist of three parts: a numerical value, a timestamp, and dimensional labels. They record quantifiable system characteristics in time series form (such as resource utilization, business throughput), and enable multi-dimensional analysis through labels (such as HOSTs, services, regions), providing precise data support for real-time monitoring, performance optimization, and trend prediction.
Data Architecture¶
The metrics data processing in Guance is divided into three layers:
-
Collection Layer: Completed by DataKit, responsible for grabbing raw metrics from sources like HOSTs, applications, MIDDLEWAREs, etc.
-
Transmission Layer: DataKit encrypts the data and sends it via HTTP/HTTPS to Guance's data center.
-
Storage and Analysis Layer: Guance cleans, stores the data, and provides visualization analysis capabilities.
Key Roles¶
DataKit is a lightweight agent deployed in the user's environment (analogous to Prometheus's Exporters). It directly interfaces with data sources and handles the core responsibilities of collection, pre-processing, and secure transmission.
Data Composition¶
A complete metrics data unit contains three core elements:
-
Measurement: The classification identifier for the data (e.g.,
cpu
represents CPU metrics). -
Tags: Dimensions used for filtering and grouping data (e.g.,
host=server01
,region=cn
). -
Fields: Specific numerical metrics (e.g.,
usage=58.3
).
For example:
-
Measurement:
cpu
-
Tags:
host=server01
,core=0
(marking the source server and CPU core) -
Fields:
usage_user=12.3
(user space CPU usage),usage_system=5.7
(system space usage) -
Timestamp:
1690524000000000000 (2023-07-28 12:00:00 UTC)
Use Cases¶
Metrics data collected via DataKit can serve various critical business scenarios, helping achieve full-chain insights from technical operations to business decision-making. For example:
- Business Operations Analysis: Quantify key business metrics (such as user activity, transaction success rate, feature usage rate) and correlate technical data with business outcomes;
- Application Performance Optimization: Track service interface response time, error rates, throughput, and other performance metrics to identify performance bottlenecks in code logic or dependent services;
- Resource Cost Control: Monitor cloud resource utilization and cost distribution to locate idle or inefficient instances;
- Security and Compliance Management: Monitor abnormal login behavior, frequency of sensitive operations, and other security metrics to identify potential risks and trigger automated responses, meeting audit and compliance requirements.