Skip to content

Hadoop Yarn ResourceManager

Collect metrics information from Yarn ResourceManager.

Installation and Deployment

Since ResourceManager is developed in the JAVA languages, it can use the jmx-exporter method to collect metrics information.

1. ResourceManager Configuration

1.1 Download jmx-exporter

Download address: https://github.com/prometheus/jmx_exporter

1.2 Download jmx Script

Download address: https://github.com/lrwh/jmx-exporter/blob/main/hadoop-yarn-resourcemanager.yml

1.3 Adjust ResourceManager Startup Parameters

Add to the startup parameters of resourcemanager:

{JAVA_GC_ARGS} -javaagent:/opt/guance/jmx/jmx_exporter-1.0.1.jar=localhost:17109:/opt/guance/jmx/jmx_resource_manager.yml

1.4 Restart ResourceManager

2. DataKit Collector Configuration

2.1 Install DataKit

2.2 Configure Collector

The jmx-exporter can directly expose metrics url, so it can be collected directly through the prom collector.

Enter the DataKit installation directory under conf.d/prom, copy prom.conf.sample as resourcemanager.conf.

cp prom.conf.sample resourcemanager.conf

Adjust the content of resourcemanager.conf as follows:

  urls = ["http://localhost:17109/metrics"]
  source ="yarn-resourcemanager"
  [inputs.prom.tags]
    component = "yarn-resourcemanager" 
  interval = "10s"

Other configurations should be adjusted as needed

, parameter adjustment description:

  • urls: The jmx-exporter metrics address, fill in the corresponding component's exposed metrics url here.
  • source: Collector alias, it is recommended to make distinctions.
  • keep_exist_metric_name: Keep the metric name unchanged.
  • interval: Collection interval.
  • inputs.prom.tags: Add additional tags.

3. Restart DataKit

Restart DataKit

Metrics

Hadoop Measurement

ResourceManager metrics are located under the Hadoop Measurement set, this mainly introduces the descriptions for ResourceManager-related metrics.

Metrics Description Unit
resourcemanager_activeapplications Number of active applications in the resource manager count
resourcemanager_activeusers Number of active users in the resource manager count
resourcemanager_aggregatecontainersallocated Number of containers allocated by the resource manager count
resourcemanager_aggregatecontainerspreempted Number of containers preempted by the resource manager count
resourcemanager_aggregatecontainersreleased Number of containers released by the resource manager count
resourcemanager_aggregatememorymbsecondspreempted Amount of memory consumed per second by preempted containers B/s
resourcemanager_aggregatenodelocalcontainersallocated Number of containers running locally on all nodes count
resourcemanager_aggregateoffswitchcontainersallocated Number of containers aggregated switch allocations by the resource manager count
resourcemanager_aggregateracklocalcontainersallocated Number of aggregated local container racks count
resourcemanager_aggregatevcoresecondspreempted Number of CPU cores used by the resource manager byte
resourcemanager_allocatedcontainers Number of containers allocated by the resource manager to applications count
resourcemanager_allocatedmb Amount of memory allocated by the resource manager B/s
resourcemanager_allocatedvcores Number of CPU cores allocated by the resource manager count
resourcemanager_amlaunchdelayavgtime Average application launch delay time ms
resourcemanager_amlaunchdelaynumops Number of application launch delays count
resourcemanager_amregisterdelayavgtime Average registration delay time for the resource manager ms
resourcemanager_amregisterdelaynumops Number of registration delays for the resource manager s
resourcemanager_amresourceusagemb Number of container start operations by the node manager count
resourcemanager_amresourceusagevcores Number of completed containers by the node manager count
resourcemanager_appattemptfirstcontainerallocationdelayavgtime Number of failed containers by the node manager count
resourcemanager_appattemptfirstcontainerallocationdelaynumops Number of exited containers by the node manager count
resourcemanager_appscompleted Number of running containers by the node manager count
resourcemanager_appsfailed Number of failed applications in the resource manager count
resourcemanager_appskilled Number of terminated applications in the resource manager count
resourcemanager_appspending Number of pending applications awaiting execution count
resourcemanager_appsrunning Number of currently running applications count
resourcemanager_appssubmitted Number of submitted applications in the resource manager count
resourcemanager_availablemb Total available memory in the resource manager count
resourcemanager_availablevcores Number of available CPU cores in the resource manager count
resourcemanager_callqueuelength Length of the call queue in the resource management area count
resourcemanager_continuousschedulingrunavgtime Average continuous scheduling run time in the resource manager ms
resourcemanager_continuousschedulingrunimaxtime Maximum continuous scheduling run time in the resource manager ms
resourcemanager_continuousschedulingrunimintime Minimum continuous scheduling run time in the resource manager ms
resourcemanager_continuousschedulingruninumops Number of continuous scheduling operations in the resource manager count
resourcemanager_continuousschedulingrunmaxtime Maximum continuous scheduling run time in the resource manager ms
resourcemanager_continuousschedulingrunmintime Minimum continuous scheduling run time in the resource manager ms
resourcemanager_continuousschedulingrunnumops Number of continuous scheduling operations in the resource manager count
resourcemanager_deferredrpcprocessingtimenumops Number of deferred RPC processing time operations in the resource manager count
resourcemanager_droppedpuball Number of times puball was dropped in the resource manager count
resourcemanager_fairsharemb Memory allocation in the resource manager count
resourcemanager_fairsharevcores Number of CPU core allocations in the resource manager count
resourcemanager_gccount Number of garbage collections in the resource manager count
resourcemanager_gccountconcurrentmarksweep Number of concurrent mark-sweep garbage collections count
resourcemanager_gccountparnew Number of parnew garbage collectors ms
resourcemanager_gcnuminfothresholdexceeded Number of times GC collection information exceeds threshold in the resource manager count
resourcemanager_gcnumwarnthresholdexceeded Number of times GC pause exceeds threshold in the resource manager count
resourcemanager_gctimemillis Time from last GC start to finish ms
resourcemanager_gctimemillisconcurrentmarksweep Number of successful log write operations by the node manager count
resourcemanager_gctimemillisparnew Time from parnew start to finish ms
resourcemanager_gctotalextrasleeptime Total extra sleep time in the resource manager ms
resourcemanager_getgroupsavgtime Average time to get groups in the resource manager count
resourcemanager_logerror Number of used memory heaps by the node manager count
resourcemanager_logfatal Maximum memory value of the node manager byte
resourcemanager_loginfailureavgtime Average login failure time in the resource manager ms
resourcemanager_loginfailurenumops Number of login failures in the resource manager count
resourcemanager_loginfo Number of login information entries in the resource manager count
resourcemanager_loginsuccessavgtime Average successful login time in the resource manager ms
resourcemanager_loginsuccessnumops Number of successful logins in the resource manager count
resourcemanager_logwarn Number of warning logs in the resource manager count
resourcemanager_maxamsharemb Maximum AM resource usage in the resource manager byte
resourcemanager_maxamsharevcores Maximum shared CPU core number in the resource manager count
resourcemanager_maxapps Maximum number of applications in the resource manager count
resourcemanager_memheapcommittedm Amount of memory allocated by the resource manager byte
resourcemanager_memheapmaxm Maximum amount of memory in the resource manager byte
resourcemanager_memheapusedm Amount of memory used by the resource manager byte
resourcemanager_memmaxm Maximum memory value in the resource manager byte
resourcemanager_memnonheapcommittedm Amount of memory declared to be allocated by the resource manager byte
resourcemanager_memnonheapmaxm Maximum amount of memory declared by the resource manager byte
resourcemanager_memnonheapusedm Amount of memory declared to be used by the resource manager byte
resourcemanager_minsharemb Minimum resource amount in the resource manager count
resourcemanager_minsharevcores Minimum CPU core number in the resource manager byte
resourcemanager_nodeheartbeatavgtime Average node heartbeat time in the resource manager s
resourcemanager_nodeheartbeatnumops Number of node heartbeats in the resource manager count
resourcemanager_nodeupdatecallavgtime Average response time for node updates in the resource manager s
resourcemanager_nodeupdatecallimaxtime Maximum response time for nodes in the resource manager s
resourcemanager_nodeupdatecallimintime Minimum response time for node updates in the resource manager s
resourcemanager_nodeupdatecallinumops Number of node update responses in the resource manager count
resourcemanager_numactivenms Number of currently active NodeManagers in the resource manager count
resourcemanager_numactivesinks Number of currently active sinks in the resource manager count
resourcemanager_numactivesources Number of active resources in the resource manager count
resourcemanager_numallsinks Total number of sinks in the resource manager count
resourcemanager_numallsources Total amount of resource data in the resource manager count
resourcemanager_numdecommissionednms Number of decommissioned nodes in the resource manager count
resourcemanager_numdecommissioningnms Number of nodes being decommissioned in the resource manager count
resourcemanager_numdroppedconnections Number of dropped connections in the resource manager count
resourcemanager_numlostnms Number of lost nodes in the resource manager count
resourcemanager_numopenconnections Number of open connections in the resource manager count
resourcemanager_numrebootednms Number of rebooted nodes in the resource manager count
resourcemanager_numshutdownnms Number of shutdown nodes in the resource manager count
resourcemanager_numunhealthynms Number of healthy nodes in the resource manager count
resourcemanager_pendingcontainers Number of containers waiting to be allocated in the resource manager count
resourcemanager_pendingmb Number of resources waiting to be allocated in the resource manager count
resourcemanager_pendingvcores Number of CPU cores waiting to be allocated in the resource manager count
resourcemanager_publishavgtime Average data publishing time in the resource manager s
resourcemanager_rpcprocessingtimeavgtime Average RPC execution time in the resource manager s
resourcemanager_rpcprocessingtimenumops Number of executions in the resource manager count
resourcemanager_rpcqueuetimeavgtime Average RPC response time in the resource manager count
resourcemanager_rpcqueuetimenumops Number of RPC response operations in the resource manager count
resourcemanager_rpcslowcalls Slow RPC call time in the resource manager s
resourcemanager_running_0 Number of applications running for 0 seconds count
resourcemanager_running_1440 Number of applications running for 1440 seconds count
resourcemanager_running_300 Number of applications running for 300 seconds count
resourcemanager_running_60 Number of applications running for 60 seconds count
resourcemanager_securityenabled Number of security mechanisms enabled in the resource manager count
resourcemanager_sentbytes Number of bytes sent by the resource manager byte
resourcemanager_snapshotavgtime Average snapshot time of data in the resource manager s
resourcemanager_snapshotnumops Number of snapshot operations in the resource manager count
resourcemanager_steadyfairsharemb Weighted shared memory in the resource manager byte
resourcemanager_steadyfairsharevcores Weighted shared CPU cores in the resource manager count
resourcemanager_threadsblocked Number of city locks in the resource manager count
resourcemanager_threadsnew Number of newly created threads in the resource manager count
resourcemanager_threadsrunnable Number of running threads in the resource manager count
resourcemanager_threadsterminated Number of terminated threads in the resource manager count
resourcemanager_threadstimedwaiting Number of timed-waiting threads in the resource manager count
resourcemanager_threadswaiting Number of waiting threads in the resource manager count
resourcemanager_updatethreadrunavgtime Average update thread time in the resource manager s
resourcemanager_updatethreadrunimaxtime Maximum thread update time in the resource manager s
resourcemanager_updatethreadrunimintime Minimum thread update time in the resource manager s
resourcemanager_updatethreadruninumops Number of thread update operations in the resource manager count
resourcemanager_updatethreadrunmaxtime Maximum thread update time in the resource manager s
resourcemanager_updatethreadrunmintime Minimum thread update time in the resource manager s
resourcemanager_updatethreadrunnumops Number of thread update operations in the resource manager count

Feedback

Is this page helpful? ×