Skip to content

Metrics


Metrics are the core data units in the Guance system used for continuously tracking the state of a system. They consist of three parts: a numerical value, a timestamp, and dimensional labels. They record quantifiable system characteristics in time series form (such as resource utilization, business throughput), and enable multi-dimensional analysis through labels (such as HOSTs, services, regions), providing precise data support for real-time monitoring, performance optimization, and trend prediction.

Data Architecture

The metrics data processing in Guance is divided into three layers:

  • Collection Layer: Completed by DataKit, responsible for grabbing raw metrics from sources like HOSTs, applications, MIDDLEWAREs, etc.

  • Transmission Layer: DataKit encrypts the data and sends it via HTTP/HTTPS to Guance's data center.

  • Storage and Analysis Layer: Guance cleans, stores the data, and provides visualization analysis capabilities.

Key Roles

DataKit is a lightweight agent deployed in the user's environment (analogous to Prometheus's Exporters). It directly interfaces with data sources and handles the core responsibilities of collection, pre-processing, and secure transmission.

Data Composition

A complete metrics data unit contains three core elements:

  • Measurement: The classification identifier for the data (e.g., cpu represents CPU metrics).

  • Tags: Dimensions used for filtering and grouping data (e.g., host=server01, region=cn).

  • Fields: Specific numerical metrics (e.g., usage=58.3).

For example:

cpu,host=server01,core=0 usage_user=12.3,usage_system=5.7 1690524000000000000
  • Measurement: cpu

  • Tags: host=server01, core=0 (marking the source server and CPU core)

  • Fields: usage_user=12.3 (user space CPU usage), usage_system=5.7 (system space usage)

  • Timestamp: 1690524000000000000 (2023-07-28 12:00:00 UTC)

Use Cases

Metrics data collected via DataKit can serve various critical business scenarios, helping achieve full-chain insights from technical operations to business decision-making. For example:

  • Business Operations Analysis: Quantify key business metrics (such as user activity, transaction success rate, feature usage rate) and correlate technical data with business outcomes;
  • Application Performance Optimization: Track service interface response time, error rates, throughput, and other performance metrics to identify performance bottlenecks in code logic or dependent services;
  • Resource Cost Control: Monitor cloud resource utilization and cost distribution to locate idle or inefficient instances;
  • Security and Compliance Management: Monitor abnormal login behavior, frequency of sensitive operations, and other security metrics to identify potential risks and trigger automated responses, meeting audit and compliance requirements.

Feedback

Is this page helpful? ×