APM¶
APM is Guance's full-stack performance analysis solution built around distributed tracing. It adheres to standardized protocols like OpenTracing. By deploying a unified collection agent at the host layer, it enables platform-level correlation analysis of trace data, infrastructure metrics, and application logs, achieving end-to-end observability from code to resources.
Core Architecture¶
Adopts a single-host, single-agent architecture. A DataKit is deployed on each application server to act as a unified data collector.
Data Collection¶
Data is processed by the local DataKit and then uniformly reported to the Guance platform for unified storage and correlation calculation.
-
Tracing data: Receives and processes distributed trace data reported via standard protocols like OpenTelemetry and Jaeger.
-
Infrastructure metrics: Actively collects resource metrics from the host layer, such as CPU usage, memory usage, disk I/O, and network traffic.
-
Application logs: Collects application standard output, specified log files, and operating system logs in real-time.
Core Features¶
-
Service List and Topology View
Provides a complete application service list, centrally displaying core performance metrics and real-time statuses of all services. Supports visual representation of real-time call relationships and dependency topologies between services, and monitors key global metrics including the number of online services, P90 service response latency, and maximum service impact latency, helping to grasp the full picture of the service architecture.
-
Based on distributed tracing technology, provides end-to-end full-path request tracing. Through built-in tools like flame graphs, Span lists, and waterfall charts, it enables querying, visualization, and in-depth analysis of all collected and reported trace data, achieving performance profiling and precise fault localization from application interfaces down to the code method level.
-
Provides aggregation analysis and tracking capabilities for various errors generated in distributed traces. Supports viewing historical trends of specific error types and their distribution across different services, interfaces, or instances, enabling quick root cause identification and improved troubleshooting efficiency.
-
The analysis dashboard aggregates and displays core analysis data for application performance, mainly including trace statistics (Span and request volume and errors), correlated anomalies (error logs), deep performance analysis (response latency, call count, service request distribution, etc.), and resource and anomaly correlation.
-
Uses deep performance profiling tools like flame graphs to visually analyze application runtime CPU usage, method latency, etc. Correlates application-level performance bottlenecks (e.g., slow calls, high-latency methods) with underlying infrastructure resource consumption (e.g., CPU usage) to accurately pinpoint the root cause of performance issues.
-
Application Performance Detection
Supports configuring application performance monitors to match and filter performance data on traces based on rules. You can define specific detection conditions (e.g., response time exceeding a threshold, occurrence of specific errors). The system will accordingly identify and filter requests from the full trace data that meet the anomaly conditions, enabling precise discovery and alerting for specific performance issues.
Storage Billing¶
The system counts the number of unique trace_id values in the current workspace and uses a tiered pricing model.
Specific billing rules and data storage policies (e.g., retention period) can be configured separately. Please refer to Data Storage Policies.
For more billing rules, please refer to Billing Methods.