Apollo¶
Collect Apollo related Metrics information.
Installation and Configuration¶
Prerequisites¶
- Install DataKit
Apollo Monitoring¶
In a distributed deployment, Apollo includes numerous roles played by three types of processes: Portal, ConfigService, and AdminService. For example, dedicated ConfigService and AdminService are deployed separately for testing and production environments. Refer to the Apollo Deployment Architecture. All three types of processes expose Prometheus format metrics at the /prometheus
endpoint:
- Portal: 8070/prometheus
- ConfigService: 8080/prometheus
- AdminService: 8090/prometheus
DataKit Collector Configuration¶
Since Apollo
can directly expose metrics
urls, it is possible to collect them using the prom
collector.
Navigate to the conf.d/prom
directory under the DataKit installation directory, and copy prom.conf.sample
as apollo-portal-prod-1.conf
cp prom.conf.sample apollo-portal-prod-1.conf
Adjust the content as follows:
url = "http://127.0.0.1:8070/prometheus"
## Collector alias.
source = "apollo_portal_prod_1"
## (Optional) Collect interval: (defaults to "30s").
interval = "30s"
## If measurement_name is not empty, use this as the Measurement set name.
measurement_name = "apollo"
Following the same method, create configuration files for the ConfigService and AdminService collectors.
Other configurations should be adjusted as needed, parameter adjustment explanation:
- urls:
prometheus
Metrics address, fill in the Metrics url exposed by the corresponding component - source: Collector alias, differentiation is recommended
- interval: Collection interval
Restart DataKit¶
Metrics¶
Apollo Metrics are located under the Apollo Metrics set. This section mainly introduces the description of Apollo-related Metrics.
Metric Name | Description | Unit |
---|---|---|
http_server_requests_seconds |
HTTP server response time when processing requests; clients connect to the Apollo server using HTTP | Second |
process_uptime_seconds |
JVM uptime duration | Second |
hikaricp_connections_active |
Number of active connections | Count |
hikaricp_connections_idle |
Number of idle connections | Count |
hikaricp_connections_pending |
Number of threads waiting for connections; normally 0, persistent non-zero values should trigger alerts, optimization methods include increasing maximum connections | Count |
hikaricp_connections_usage_seconds |
Time that connections are occupied by business logic; long durations should trigger alerts, possibly due to slow database responses; pay attention to average and P99 extreme values | Second |
jvm_memory_max_bytes |
Maximum number of bytes managed by the JVM, identified by different memory types using the id tag | Byte |
jvm_memory_usage_after_gc_percent |
Percentage of long-lived objects in heap memory after last GC | % |
jvm_memory_used_bytes |
Number of used bytes managed by the JVM, identified by different memory types using the id tag | Byte |
jvm_memory_committed_bytes |
Number of committed bytes by the JVM | Byte |
jvm_gc_pause_seconds |
Duration of JVM GC pauses | Second |
system_load_average_1m |
Operating system average load over the past one minute | - |
system_cpu_count |
Number of CPUs available to the JVM | Count |
system_cpu_usage |
Operating system CPU usage | % |
process_cpu_usage |
Process CPU usage | % |
process_files_max_files |
Maximum number of file descriptors allowed to be opened by the process | Count |
process_files_open_files |
Number of file descriptors opened by the process | Count |