Hadoop Yarn NodeManager¶
Collect Yarn NodeManager metrics information.
Installation and Deployment¶
Since NodeManager is developed in the LANGUAGE of Java, it can use the jmx-exporter method to collect metrics information.
1. NodeManager Configuration¶
1.1 Download jmx-exporter¶
Download address: https://github.com/prometheus/jmx_exporter
1.2 Download jmx Script¶
Download address: https://github.com/lrwh/jmx-exporter/blob/main/hadoop-yarn-nodemanager.yml
1.3 Adjust NodeManager Startup Parameters¶
Add to nodemanager startup parameters:
{JAVA_GC_ARGS} -javaagent:/opt/guance/jmx/jmx_exporter-1.0.1.jar=localhost:17108:/opt/guance/jmx/jmx_node_manager.yml
1.4 Restart NodeManager¶
2. DataKit Collector Configuration¶
2.1 Install DataKit¶
2.2 Configure Collector¶
The jmx-exporter can directly expose metrics
url, so it can be collected via the prom
collector.
Enter the DataKit installation directory under conf.d/prom
, copy prom.conf.sample
to nodemanager.conf
.
cp prom.conf.sample nodemanager.conf
Adjust the content of nodemanager.conf
as follows:
urls = ["http://localhost:17108/metrics"]
source ="yarn-nodemanager"
[inputs.prom.tags]
component = "yarn-nodemanager"
interval = "10s"
Other configurations should be adjusted as needed
, parameter adjustment instructions:
- urls: The
jmx-exporter
metrics address; fill in the metrics url exposed by the corresponding component here. - source: Collector alias, it is recommended to differentiate.
- keep_exist_metric_name: Keep the metric name.
- interval: Collection interval.
- inputs.prom.tags: Add extra tags.
3. Restart DataKit¶
Metrics¶
Hadoop Measurement Set¶
NodeManager metrics are located under the Hadoop measurement set; this section mainly introduces the relevant descriptions of NodeManager metrics.
Metrics | Description | Unit |
---|---|---|
nodemanager_allocatedcontainers |
Number of containers allocated by the node manager |
count |
nodemanager_allocatedgb |
Amount allocated by the node manager |
count |
nodemanager_allocatedopportunisticgb |
Bytes available for allocation by the node manager |
count |
nodemanager_allocatedopportunisticvcores |
Opportunities available for allocation by the node manager |
count |
nodemanager_allocatedvcores |
Number of vcores allocated by the node manager |
count |
nodemanager_availablegb |
Number of bytes available by the node manager |
count |
nodemanager_availablevcores |
Number of vcores available by the node manager |
count |
nodemanager_badlocaldirs |
Number of damaged local directories managed by the node manager |
count |
nodemanager_badlogdirs |
Number of damaged log directories managed by the node manager |
count |
nodemanager_blocktransferratebytes_count |
Number of block transfer bytes by the node manager |
byte |
nodemanager_blocktransferratebytes_rate1 |
Rate of block transfer bytes 1 by the node manager |
B/s |
nodemanager_blocktransferratebytes_rate15 |
Rate of block transfer bytes 15 by the node manager |
B/s |
nodemanager_blocktransferratebytes_rate5 |
Rate of block transfer bytes 5 by the node manager |
B/s |
nodemanager_blocktransferratebytes_ratemean |
Average rate of block transfer bytes by the node manager |
byte |
nodemanager_cachesizebeforeclean |
Cache size before cleaning by the node manager |
byte |
nodemanager_callqueuelength |
Length of call queue by the node manager |
count |
nodemanager_containerlaunchdurationavgtime |
Average time for container launch by the node manager |
s |
nodemanager_containerlaunchdurationnumops |
Number of operations for container launch by the node manager |
count |
nodemanager_containerscompleted |
Number of completed containers by the node manager |
count |
nodemanager_containersfailed |
Number of failed containers by the node manager |
count |
nodemanager_containersiniting |
Number of exiting containers by the node manager |
count |
nodemanager_containerskilled |
Number of running containers by the node manager |
count |
nodemanager_containerslaunched |
Number of launched containers by the node manager |
count |
nodemanager_containersreiniting |
Number of restarted containers by the node manager |
count |
nodemanager_containersrolledbackonfailure |
Number of rollback failures by the node manager |
count |
nodemanager_containersrunning |
Number of running containers by the node manager |
ms |
nodemanager_deferredrpcprocessingtimeavgtime |
Average deferred RPC processing time by the node manager |
s |
nodemanager_deferredrpcprocessingtimenumops |
Number of deferred RPC operations by the node manager |
count |
nodemanager_droppedpuball |
Number of dropped puballs by the node manager |
count |
nodemanager_gccount |
Garbage collection count by the node manager |
count |
nodemanager_gccountconcurrentmarksweep |
Garbage collection count with concurrent marking by the node manager |
count |
nodemanager_gccountparnew |
Garbage collection count with copying by the node manager |
count |
nodemanager_gcnuminfothresholdexceeded |
Number of garbage collection info exceeding threshold by the node manager |
count |
nodemanager_gcnumwarnthresholdexceeded |
Number of garbage collection warnings exceeding threshold by the node manager |
count |
nodemanager_gctimemillis |
Garbage collection time in milliseconds by the node manager |
ms |
nodemanager_gctimemillisconcurrentmarksweep |
Garbage collection marking time in milliseconds by the node manager |
ms |
nodemanager_gctimemillisparnew |
Copying time in milliseconds by the node manager |
ms |
nodemanager_gctotalextrasleeptime |
Total garbage collection sleep time by the node manager |
s |
nodemanager_getgroupsavgtime |
Average time to get groups by the node manager |
s |
nodemanager_getgroupsnumops |
Number of operations to get groups by the node manager |
count |
nodemanager_goodlocaldirsdiskutilizationperc |
Percentage of healthy disk utilization by the node manager |
count |
nodemanager_logerror |
Number of log errors by the node manager |
count |
nodemanager_logfatal |
Number of deleted logs by the node manager |
count |
nodemanager_loginfailureavgtime |
Average time for log write failure by the node manager |
ms |
nodemanager_loginfailurenumops |
Number of operations for log write failure by the node manager |
count |
nodemanager_loginfo |
Number of log informations by the node manager |
count |
nodemanager_loginsuccessavgtime |
Average time for successful log write by the node manager |
count |
nodemanager_loginsuccessnumops |
Number of successful log write operations by the node manager |
count |
nodemanager_logwarn |
Number of log warnings by the node manager |
count |
nodemanager_memheapcommittedm |
Number of committed heap memory by the node manager |
count |
nodemanager_memheapmaxm |
Maximum number of heap memory by the node manager |
count |
nodemanager_memheapusedm |
Number of used heap memory by the node manager |
count |
nodemanager_memmaxm |
Maximum memory by the node manager |
byte |
nodemanager_memnonheapcommittedm |
Number of non-committed heap memory by the node manager |
count |
nodemanager_memnonheapmaxm |
Maximum number of non-committed heap memory by the node manager |
count |
nodemanager_memnonheapusedm |
Maximum number of unused heap memory by the node manager |
count |
nodemanager_numactiveconnections |
Number of connections by the node manager |
count |
nodemanager_numactivesinks |
Number of active pools by the node manager |
count |
nodemanager_numactivesources |
Number of active resources by the node manager |
count |
nodemanager_numallsinks |
Total number of pools by the node manager |
count |
nodemanager_numallsources |
Total number of resources |
count |
nodemanager_numdroppedconnections |
Number of dropped connections by the node manager |
count |
nodemanager_numopenconnections |
Number of open connections by the node manager |
count |
nodemanager_numregisteredconnections |
Number of registered connections by the node manager |
count |
nodemanager_openblockrequestlatencymillis_count |
Number of open block latencies by the node manager |
count |
nodemanager_openblockrequestlatencymillis_rate1 |
Rate of open block latency requests 1 by the node manager |
B/s |
nodemanager_openblockrequestlatencymillis_rate15 |
Rate of open block latency requests 15 by the node manager |
B/s |
nodemanager_openblockrequestlatencymillis_rate5 |
Rate of open block latency requests 5 by the node manager |
B/s |
nodemanager_openblockrequestlatencymillis_ratemean |
Average request latency rate for open blocks by the node manager |
B/s |
nodemanager_privatebytesdeleted |
Number of private bytes deleted by the node manager |
byte |
nodemanager_publicbytesdeleted |
Number of bytes deleted by the node manager |
byte |
nodemanager_publishavgtime |
Average publish time by the node manager |
s |
nodemanager_publishnumops |
Number of data publish operations by the node manager |
ms |
nodemanager_receivedbytes |
Number of received bytes by the node manager |
byte |
nodemanager_registeredexecutorssize |
Number of registered executor categories by the node manager |
count |
nodemanager_registerexecutorrequestlatencymillis_count |
Number of registration delays for executors by the node manager |
count |
nodemanager_registerexecutorrequestlatencymillis_rate1 |
Rate of registration delays for executors 1 by the node manager |
B/s |
nodemanager_registerexecutorrequestlatencymillis_rate15 |
Rate of registration delays for executors 15 by the node manager |
B/s |
nodemanager_registerexecutorrequestlatencymillis_rate5 |
Rate of registration delays for executors 5 by the node manager |
B/s |
nodemanager_registerexecutorrequestlatencymillis_ratemean |
Average delay for executor registrations by the node manager |
count |
nodemanager_renewalfailures |
Number of update failures by the node manager |
count |
nodemanager_renewalfailurestotal |
Total number of update failures by the node manager |
count |
nodemanager_rpcauthenticationfailures |
Number of authentication failures by the node manager |
count |
nodemanager_rpcauthorizationsuccesses |
Number of successful authentications by the node manager |
count |
nodemanager_rpcclientbackoff |
Number of client backoffs by the node manager |
count |
nodemanager_rpcprocessingtimeavgtime |
Average processing time by the node manager |
s |
nodemanager_rpcprocessingtimenumops |
Number of RPC processing operations by the node manager |
count |
nodemanager_rpcqueuetimeavgtime |
Average queue time for RPC by the node manager |
count |
nodemanager_rpcqueuetimenumops |
Number of RPC queue time operations by the node manager |
count |
nodemanager_rpcslowcalls |
Number of slow calls by the node manager |
count |
nodemanager_runningopportunisticcontainers |
Number of running opportunistic containers by the node manager |
count |
nodemanager_securityenabled |
Number of enabled securities by the node manager |
count |
nodemanager_sentbytes |
Number of sent bytes by the byte manager |
byte |
nodemanager_shuffleconnections |
Number of reconnections by the byte manager |
count |
nodemanager_shuffleoutputbytes |
Number of reshuffled output bytes by the byte manager |
byte |
nodemanager_shuffleoutputsfailed |
Number of shuffle output failures by the node manager |
count |
nodemanager_shuffleoutputsok |
Number of successful shuffle outputs by the node manager |
count |
nodemanager_snapshotavgtime |
Average snapshot time by the node manager |
s |
nodemanager_snapshotnumops |
Number of snapshot operations by the node manager |
count |
nodemanager_threadsblocked |
Number of blocked threads by the node manager |
count |
nodemanager_threadsnew |
Number of newly created threads by the node manager |
count |
nodemanager_threadsrunnable |
Number of non-runnable threads by the node manager |
count |
nodemanager_threadsterminated |
Number of initialized threads by the node manager |
count |
nodemanager_threadstimedwaiting |
Thread waiting time by the node manager |
s |
nodemanager_threadswaiting |
Number of thread switches by the node manager |
count |
nodemanager_totalbytesdeleted |
Total number of deleted bytes by the node manager |
byte |