Infrastructure¶
Guance provides unified monitoring for all underlying computing resources that support application operations. This includes, but is not limited to:
- Physical machines and virtual machines
- Containers and Kubernetes clusters
- Network devices and services
- Various cloud services
Guance uses DataKit to uniformly collect Metrics, LOGs, and object data from infrastructure such as HOSTs, CONTAINERS, and cloud services. It automatically builds dynamic dependencies between components, forming a visual infrastructure topology. This topology clearly shows the actual running locations and relational status of resources like services, containers, and HOSTs, providing users with operational insights from a global to a granular level.
Based on a unified tagging system and flexible search capabilities, the platform supports users in quickly locating target resources and correlating data from different sources such as Metrics, traces, and LOGs. Through smooth cross-data-type navigation and contextual linkage, users can rapidly trace the root cause of issues, enabling efficient troubleshooting and performance optimization.
Features¶
| Module | Monitoring Targets | Core Capabilities |
|---|---|---|
| HOST | Physical machines, virtual machines, cloud servers | Real-time collection of 200+ system-level Metrics, covering deep performance data like CPU steal time, memory swap activity, disk IOPS |
| CONTAINERS | Docker, Kubernetes core resources | Automatically builds cluster resource topology maps, monitors Pod lifecycle status and resource quota utilization in real-time |
| Process | HOST running processes | Deep monitoring of process-level resource consumption, establishing relationships between processes and business |
| DATABASE | MySQL, Redis, PostgreSQL, etc. | Non-intrusive collection of DATABASE performance Metrics, real-time monitoring of QPS, connection count, slow queries, etc. |
| NETWORK | HOST and container network traffic | Non-intrusive network traffic collection based on eBPF technology, monitoring TCP retransmission, connection anomalies, and other Metrics |
| Resource Catalog | Cloud resources, custom resources | Automatically integrates with cloud provider APIs, unified monitoring of managed services like RDS, load balancers |
Prerequisites¶
Before using Infrastructure monitoring, ensure the following steps are completed:
- Register and log in to the Guance workspace.
- Install DataKit on the target HOST.
- Enable the corresponding collectors based on monitoring requirements.
Core Concepts¶
Object (O): Objects are entity resources in the infrastructure, such as HOSTs, CONTAINERS, Pods, processes, etc. Object data includes resource attributes, status, and relationships. Through object data, you can understand the configuration information and operational status of resources.
For detailed explanations of concepts like Metrics, tags, and Time Series, refer to Metrics.
Data Display¶
Infrastructure data is visually displayed through the Explorer, supporting the following analysis dimensions:
- List view: Displays real-time object status, supports sorting, filtering, and custom display columns.
- Honeycomb view: Visually presents the dynamic topology structure of resource clusters, supports cross-level drilling.
- Top List/Pie chart/Treemap: Data distribution analysis based on grouped statistics.
- Detail page: Displays complete object properties, Metric trends, associated data, and bound views.
Getting Started¶
HOST Monitoring¶
Real-time collection of 200+ system-level Metrics, covering deep performance data like CPU steal time, memory swap activity, disk IOPS.
After installing DataKit, the system automatically enables a set of HOST-related collectors and automatically reports data to the workspace.
The list of collectors enabled by default is as follows:
| Collector Name | Function Description |
|---|---|
cpu |
HOST CPU usage |
disk |
Disk usage |
diskio |
HOST disk I/O status |
mem |
HOST memory usage |
swap |
Swap memory usage |
system |
HOST operating system load |
net |
HOST network traffic status |
host_process |
HOST process list and resource usage (collects processes alive for more than 10 minutes by default) |
hostobject |
HOST basic information (operating system, hardware information, etc.) |
container |
HOST container or Kubernetes data (if no containers are present on the HOST, the collector automatically exits) |
For more details, refer to DataKit Collector Usage, HOST Object.
CONTAINERS and Kubernetes¶
Automatically builds cluster resource topology maps, monitors Pod lifecycle status and resource quota utilization in real-time, accurately tracks HPA elastic scaling efficiency, and effectively warns of container restart events caused by insufficient resources.
Guance provides two ways to enable container data collection:
- Install DataKit on the HOST: Enable the CONTAINERS collector, supporting the collection of Containers and Pods data.
- Install DataKit via DaemonSet: Supports collection of full Kubernetes resource data (Containers, Pods, Services, Deployments, Nodes, etc.), automatically builds cluster topology.
Process Monitoring¶
Deep monitoring of process-level resource consumption, establishing relationships between processes and business, supporting rapid drilling from abnormal processes to corresponding APM traces and LOG data.
The system enables the process collector by default, collecting process data from the last 10 minutes. To collect process Metric data (CPU, memory, etc.), navigate to the conf.d/host folder in the DataKit installation directory, copy host_processes.conf.sample and rename it to host_processes.conf, set open_metric to true, and then restart DataKit.
For more information about the collector, refer to Process.
DATABASE Monitoring¶
Automatically collects performance Metrics of mainstream DATABASEs in a non-intrusive manner, covering common DATABASE types such as MySQL, Redis, PostgreSQL, etc., and monitors key performance data like QPS, connection count, and slow queries in real-time.
NETWORK Monitoring¶
Implements non-intrusive collection of network traffic based on eBPF technology, comprehensively monitors network performance Metrics such as TCP retransmission and connection anomalies, and visualizes service dependencies through real-time topology.
Resource Catalog¶
Automatically integrates with cloud provider APIs, provides unified monitoring for managed services like RDS and load balancers, associates cloud billing data, enabling dual control of cost and performance.
By creating custom resources and combining DataKit API and DataFlux Func, report any data to Guance, including cloud provider resource data, enterprise business data, etc.
For specific operation procedures, refer to Resource Catalog Data Reporting.
Related Features¶
-
Monitoring: Configure monitors like Infrastructure Liveness Detection and Threshold Detection based on infrastructure Metrics.
-
Metrics: View details of infrastructure Measurements, configure Generate Metrics.
-
LOGs: Correlate and analyze LOG data from HOSTs and CONTAINERS.
-
APM: Drill down from infrastructure to application trace.