Skip to content

Hadoop Yarn NodeManager

Collect Yarn NodeManager Metrics information.

Installation and Deployment

Since NodeManager is developed in Java, it can use the jmx-exporter method to collect metrics information.

1. NodeManager Configuration

1.1 Download jmx-exporter

Download address: https://github.com/prometheus/jmx_exporter

1.2 Download jmx Script

Download address: https://github.com/lrwh/jmx-exporter/blob/main/hadoop-yarn-nodemanager.yml

1.3 Adjust NodeManager Startup Parameters

Add to nodemanager startup parameters:

{JAVA_GC_ARGS} -javaagent:/opt/guance/jmx/jmx_exporter-1.0.1.jar=localhost:17108:/opt/guance/jmx/jmx_node_manager.yml

1.4 Restart NodeManager

2. DataKit Collector Configuration

2.1 Install DataKit

2.2 Configure Collector

The jmx-exporter can directly expose metrics URL, so you can collect data using the prom collector.

Go to the DataKit installation directory under conf.d/prom, copy prom.conf.sample to nodemanager.conf.

cp prom.conf.sample nodemanager.conf

Adjust the content of nodemanager.conf as follows:

  urls = ["http://localhost:17108/metrics"]
  source = "yarn-nodemanager"
  [inputs.prom.tags]
    component = "yarn-nodemanager" 
  interval = "10s"

Other configurations can be adjusted as needed

Parameter adjustment explanation:

  • urls: The jmx-exporter metrics address, fill in the metrics URL exposed by the corresponding component.
  • source: Alias for the collector, it's recommended to differentiate them.
  • keep_exist_metric_name: Keep metric names unchanged.
  • interval: Collection interval.
  • inputs.prom.tags: Add extra tags.

3. Restart DataKit

Restart DataKit

Metrics

Hadoop Metrics Set

NodeManager metrics are located under the Hadoop Metrics set. This section mainly introduces the related metrics for NodeManager.

Metrics Description Unit
nodemanager_allocatedcontainers Number of containers allocated by the node manager count
nodemanager_allocatedgb GB allocated by the node manager count
nodemanager_allocatedopportunisticgb Opportunistic GB allocated by the node manager count
nodemanager_allocatedopportunisticvcores Opportunistic vCores allocated by the node manager count
nodemanager_allocatedvcores vCores allocated by the node manager count
nodemanager_availablegb Available GB on the node manager count
nodemanager_availablevcores Available vCores on the node manager count
nodemanager_badlocaldirs Number of bad local directories on the node manager count
nodemanager_badlogdirs Number of bad log directories on the node manager count
nodemanager_blocktransferratebytes_count Block transfer bytes on the node manager byte
nodemanager_blocktransferratebytes_rate1 Block transfer byte rate 1 on the node manager B/s
nodemanager_blocktransferratebytes_rate15 Block transfer byte rate 15 on the node manager B/s
nodemanager_blocktransferratebytes_rate5 Block transfer byte rate 5 on the node manager B/s
nodemanager_blocktransferratebytes_ratemean Mean block transfer byte rate on the node manager byte
nodemanager_cachesizebeforeclean Cache size before cleaning on the node manager byte
nodemanager_callqueuelength Call queue length on the node manager count
nodemanager_containerlaunchdurationavgtime Average container launch duration on the node manager s
nodemanager_containerlaunchdurationnumops Number of container launch operations on the node manager count
nodemanager_containerscompleted Number of completed containers on the node manager count
nodemanager_containersfailed Number of failed containers on the node manager count
nodemanager_containersiniting Number of initializing containers on the node manager count
nodemanager_containerskilled Number of killed containers on the node manager count
nodemanager_containerslaunched Number of launched containers on the node manager count
nodemanager_containersreiniting Number of reinitializing containers on the node manager count
nodemanager_containersrolledbackonfailure Number of containers rolled back on failure on the node manager count
nodemanager_containersrunning Number of running containers on the node manager ms
nodemanager_deferredrpcprocessingtimeavgtime Average deferred RPC processing time on the node manager s
nodemanager_deferredrpcprocessingtimenumops Number of deferred RPC operations on the node manager count
nodemanager_droppedpuball Number of dropped puball on the node manager count
nodemanager_gccount Garbage collection count on the node manager count
nodemanager_gccountconcurrentmarksweep Concurrent mark sweep garbage collection count on the node manager count
nodemanager_gccountparnew ParNew garbage collection count on the node manager count
nodemanager_gcnuminfothresholdexceeded Number of garbage collection info threshold exceeded on the node manager count
nodemanager_gcnumwarnthresholdexceeded Number of garbage collection warning threshold exceeded on the node manager count
nodemanager_gctimemillis Garbage collection time in milliseconds on the node manager ms
nodemanager_gctimemillisconcurrentmarksweep Concurrent mark sweep garbage collection time in milliseconds on the node manager ms
nodemanager_gctimemillisparnew ParNew garbage collection time in milliseconds on the node manager ms
nodemanager_gctotalextrasleeptime Total extra sleep time for garbage collection on the node manager s
nodemanager_getgroupsavgtime Average time to get groups on the node manager s
nodemanager_getgroupsnumops Number of operations to get groups on the node manager count
nodemanager_goodlocaldirsdiskutilizationperc Percentage of good local disk utilization on the node manager count
nodemanager_logerror Number of log errors on the node manager count
nodemanager_logfatal Number of fatal logs on the node manager count
nodemanager_loginfailureavgtime Average login failure time on the node manager ms
nodemanager_loginfailurenumops Number of login failures on the node manager count
nodemanager_loginfo Number of log information entries on the node manager count
nodemanager_loginsuccessavgtime Average login success time on the node manager count
nodemanager_loginsuccessnumops Number of successful logins on the node manager count
nodemanager_logwarn Number of log warnings on the node manager count
nodemanager_memheapcommittedm Committed heap memory on the node manager count
nodemanager_memheapmaxm Maximum heap memory on the node manager count
nodemanager_memheapusedm Used heap memory on the node manager count
nodemanager_memmaxm Maximum memory on the node manager byte
nodemanager_memnonheapcommittedm Non-committed heap memory on the node manager count
nodemanager_memnonheapmaxm Maximum non-committed heap memory on the node manager count
nodemanager_memnonheapusedm Non-used heap memory on the node manager count
nodemanager_numactiveconnections Number of active connections on the node manager count
nodemanager_numactivesinks Number of active sinks on the node manager count
nodemanager_numactivesources Number of active sources on the node manager count
nodemanager_numallsinks Total number of sinks on the node manager count
nodemanager_numallsources Total number of sources on the node manager count
nodemanager_numdroppedconnections Number of dropped connections on the node manager count
nodemanager_numopenconnections Number of open connections on the node manager count
nodemanager_numregisteredconnections Number of registered connections on the node manager count
nodemanager_openblockrequestlatencymillis_count Number of open block request latencies on the node manager count
nodemanager_openblockrequestlatencymillis_rate1 Open block request latency rate 1 on the node manager B/s
nodemanager_openblockrequestlatencymillis_rate15 Open block request latency rate 15 on the node manager B/s
nodemanager_openblockrequestlatencymillis_rate5 Open block request latency rate 5 on the node manager B/s
nodemanager_openblockrequestlatencymillis_ratemean Mean open block request latency rate on the node manager B/s
nodemanager_privatebytesdeleted Private bytes deleted on the node manager byte
nodemanager_publicbytesdeleted Public bytes deleted on the node manager byte
nodemanager_publishavgtime Average publish time on the node manager s
nodemanager_publishnumops Number of publish operations on the node manager ms
nodemanager_receivedbytes Received bytes on the node manager byte
nodemanager_registeredexecutorssize Number of registered executors on the node manager count
nodemanager_registerexecutorrequestlatencymillis_count Number of register executor request latencies on the node manager count
nodemanager_registerexecutorrequestlatencymillis_rate1 Register executor request latency rate 1 on the node manager B/s
nodemanager_registerexecutorrequestlatencymillis_rate15 Register executor request latency rate 15 on the node manager B/s
nodemanager_registerexecutorrequestlatencymillis_rate5 Register executor request latency rate 5 on the node manager B/s
nodemanager_registerexecutorrequestlatencymillis_ratemean Mean register executor request latency on the node manager count
nodemanager_renewalfailures Number of renewal failures on the node manager count
nodemanager_renewalfailurestotal Total number of renewal failures on the node manager count
nodemanager_rpcauthenticationfailures Number of RPC authentication failures on the node manager count
nodemanager_rpcauthorizationsuccesses Number of successful RPC authentications on the node manager count
nodemanager_rpcclientbackoff Number of RPC client backoffs on the node manager count
nodemanager_rpcprocessingtimeavgtime Average RPC processing time on the node manager s
nodemanager_rpcprocessingtimenumops Number of RPC processing operations on the node manager count
nodemanager_rpcqueuetimeavgtime Average RPC queue time on the node manager count
nodemanager_rpcqueuetimenumops Number of RPC queue time operations on the node manager count
nodemanager_rpcslowcalls Number of slow RPC calls on the node manager count
nodemanager_runningopportunisticcontainers Number of running opportunistic containers on the node manager count
nodemanager_securityenabled Security enabled on the node manager count
nodemanager_sentbytes Bytes sent by the node manager byte
nodemanager_shuffleconnections Number of shuffle connections on the node manager count
nodemanager_shuffleoutputbytes Shuffle output bytes on the node manager byte
nodemanager_shuffleoutputsfailed Number of failed shuffle outputs on the node manager count
nodemanager_shuffleoutputsok Number of successful shuffle outputs on the node manager count
nodemanager_snapshotavgtime Average snapshot time on the node manager s
nodemanager_snapshotnumops Number of snapshot operations on the node manager count
nodemanager_threadsblocked Number of blocked threads on the node manager count
nodemanager_threadsnew Number of new threads on the node manager count
nodemanager_threadsrunnable Number of runnable threads on the node manager count
nodemanager_threadsterminated Number of terminated threads on the node manager count
nodemanager_threadstimedwaiting Time spent waiting by threads on the node manager s
nodemanager_threadswaiting Number of waiting threads on the node manager count
nodemanager_totalbytesdeleted Total bytes deleted on the node manager byte

Feedback

Is this page helpful? ×