Hadoop HDFS DataNode¶
Collect HDFS datanode metric information.
Installation and deployment¶
Since DataNode is developed in Java language, it can collect metric information using jmx-exporter.
1. DataNode configuration¶
1.1 Download jmx-exporter¶
Download link:https://github.com/prometheus/jmx_exporter
1.2 Download jmx script¶
Download link:https://github.com/lrwh/jmx-exporter/blob/main/hadoop-hdfs-datanode.yml
1.3 DataNode startup parameter adjustment¶
Add startup parameters to the datanode
{{JAVA_GC_ARGS}} -javaagent:/opt/guance/jmx/jmx_exporter-1.0.1.jar=localhost:17106:/opt/guance/jmx/hadoop-hdfs-datanode.yml
1.4 Restart DataNode¶
2. DataKit Collector Configuration¶
2.1 Install DataKit¶
2.2 Configure collector¶
By using jmx exporter, the metrics
URL can be directly exposed, so it can be collected directly through the prom
collector.
Go to the installation directory of DataKit and copy prom.d/prom.sample
to datanode.conf
.
cp prom.conf.sample datanode.conf
Adjust the content of datanode.conf
as follows:
urls = ["http://localhost:17106/metrics"]
source ="hdfs-datanode"
[inputs.prom.tags]
component = "hdfs-datanode"
interval = "10s"
Adjust other configurations as needed
,parameter adjustment instructions :
- urls:
jmx-exporter
metric address, fill in the URL of the metric exposed by the corresponding component here - source:Collector alias, it is recommended to make a distinction
- keep_exist_metric_name: Maintain metric name
- interval:Collection interval
- inputs.prom.tags: Add additional tags
3. Restart DataKit¶
Metric¶
Hadoop Metric Set¶
The DataNode metric is located under the Hadoop metric set, and here we mainly introduce the description of DataNode related metrics
Metrics | Description | Unit |
---|---|---|
datanode_block_verification_failures |
Number of failed data node block verification attempts |
count |
datanode_blocks_cached |
The number of blocks cached by data nodes |
count |
datanode_blocks_read |
Number of blocks read by data nodes |
count |
datanode_blocks_removed |
Number of blocks removed by data nodes |
count |
datanode_blocks_replicated |
Number of blocks replicated by data nodes |
count |
datanode_blocks_uncached |
Number of uncached blocks in data nodes |
count |
datanode_blocks_verified |
Number of validated blocks for data nodes |
count |
datanode_blocks_written |
Number of blocks written by data nodes |
count |
datanode_bytes_read |
The number of bytes read by the data node |
byte |
datanode_bytes_written |
The number of bytes written by the data node |
byte |
datanode_cache_capacity |
Data node cache capacity |
byte |
datanode_cache_reports_avg_time |
Data node cache report average time |
ms |
datanode_cache_reports_num_ops |
Number of data node cache report operations |
count |
datanode_cache_used |
The amount of cache already used by the data node |
byte |
datanode_capacity |
Data node capacity |
count |
datanode_data_node_active_xceivers_count |
Number of active receivers for data nodes |
count |
datanode_datanode_network_errors |
Number of network errors in data nodes |
count |
datanode_dfs_used |
DFS space already used by data nodes |
btye |
datanode_dropped_pub_all |
The total number of published messages with data node loss |
count |
datanode_estimated_capacity_lost |
Estimation of lost capacity of data nodes |
byte |
datanode_flush_io_rate_avg_time |
Average refresh I/O rate time of data nodes |
ms |
datanode_flush_io_rate_num_ops |
Number of data node refresh I/O operations |
count |
datanode_flush_nanos_avg_time |
Average refresh time of data nodes (nanoseconds) |
ns |
datanode_flush_nanos_num_ops |
Number of data node refresh operations |
count |
datanode_fsync_count |
Number of fsync operations on data nodes |
count |
datanode_heartbeats_avg_time |
Average heartbeat time of data node |
ms |
datanode_heartbeats_num_ops |
Number of heartbeat operations of data node |
count |
datanode_heartbeats_total_avg_time |
Average total heartbeat time of data nodes |
ms |
datanode_heartbeats_total_num_ops |
Total number of heartbeat operations for data nodes |
count |
datanode_incremental_block_reports_avg_time |
Average time for incremental block reporting of data nodes |
ms |
datanode_incremental_block_reports_num_ops |
Number of incremental block report operations for data nodes |
count |
datanode_lifelines_avg_time |
Average time of data node lifecycle signal |
ms |
datanode_lifelines_num_ops |
Number of signal operations during the lifecycle of data nodes |
count |
datanode_metadata_operation_rate_avg_time |
Average rate and time of metadata operations on data nodes |
ms |
datanode_metadata_operation_rate_num_ops |
Number of metadata operations on data nodes |
count |
datanode_num_active_sinks |
Number of active receivers for data nodes |
count |
datanode_num_active_sources |
Number of active sources for data nodes |
count |
datanode_num_all_sinks |
Number of all receivers in the data node |
count |
datanode_num_all_sources |
Number of all sources for data nodes |
count |
datanode_num_blocks_cached |
The number of blocks cached by data nodes |
count |
datanode_num_blocks_failed_to_cache |
Number of blocks with failed data node caching |
count |
datanode_num_blocks_failed_to_un_cache |
The number of blocks that failed to cache in the data node |
count |
datanode_num_blocks_failed_to_uncache |
Number of blocks that failed to cache data nodes |
count |
datanode_num_failed_volumes |
Number of volumes with failed data nodes |
count |
datanode_publish_avg_time |
Average publishing time of data nodes |
ms |
datanode_publish_num_ops |
Number of data node publishing operations |
count |
datanode_ram_disk_blocks_deleted_before_lazy_persisted |
The number of RAM disk blocks deleted before data node delay persistence |
count |
datanode_ram_disk_blocks_evicted |
The number of RAM disk blocks evicted by data nodes |
count |
datanode_ram_disk_blocks_read_hits |
Number of read hits on data node RAM disk blocks |
count |
datanode_ram_disk_blocks_write |
Number of writes to data node RAM disk blocks |
count |
datanode_ram_disk_bytes_write |
Number of bytes written to data node RAM disk |
byte |
datanode_read_block_op_avg_time |
The average time for data node read block operations |
ms |
datanode_read_block_op_num_ops |
Number of block read operations for data nodes |
count |
datanode_read_io_rate_avg_time |
Data node read I/O average rate time |
ms |
datanode_read_io_rate_num_ops |
Number of data node read I/O operations |
count |
datanode_reads_from_local_client |
The number of times the data node is read from the local client |
count |
datanode_reads_from_remote_client |
The number of times data nodes are read from remote clients |
count |
datanode_remaining |
Remaining space of data nodes |
byte |
datanode_remote_bytes_read |
Number of bytes remotely read by data nodes |
byte |
datanode_remote_bytes_written |
Number of bytes remotely written to data nodes |
byte |
datanode_replace_block_op_avg_time |
Average time for data node replacement block operation |
ms |
datanode_replace_block_op_num_ops |
Number of data node replacement block operations |
count |
datanode_send_data_packet_blocked_on_network_nanos_avg_time |
Average network blocking time (nanoseconds) for data nodes to send data packets |
ns |
datanode_send_data_packet_blocked_on_network_nanos_num_ops |
Number of network blocking operations for data nodes sending data packets |
count |
datanode_send_data_packet_transfer_nanos_avg_time |
Average packet transmission time (nanoseconds) for data nodes to send data packets |
ns |
datanode_send_data_packet_transfer_nanos_num_ops |
Number of packet transmission operations sent by data nodes |
count |
datanode_snapshot_avg_time |
Average snapshot time of data nodes |
ms |
datanode_snapshot_num_ops |
Number of snapshot operations on data nodes |
count |
datanode_sync_io_rate_avg_time |
Data node synchronous I/O average rate time |
ms |
datanode_sync_io_rate_num_ops |
Number of synchronous I/O operations for data nodes |
count |
datanode_total_data_file_ios |
Total number of data file I/O times for data nodes |
count |
datanode_total_file_io_errors |
Total number of file I/O errors in data nodes |
count |
datanode_total_metadata_operations |
Total number of metadata operations on data nodes |
count |
datanode_total_read_time |
Total read time of data nodes |
ms |
datanode_total_write_time |
Total write time of data nodes |
ms |
datanode_volume_failures |
Number of data node volume failures |
count |
datanode_write_block_op_avg_time |
The average time for data node block writing operations |
ms |
datanode_write_block_op_num_ops |
Number of block writing operations for data nodes |
count |
datanode_write_io_rate_avg_time |
Data node write I/O average rate time |
ms |
datanode_write_io_rate_num_ops |
Number of I/O operations written by data nodes |
count |
datanode_writes_from_local_client |
The number of writes from the local client to the data node |
count |
datanode_writes_from_remote_client |
The number of writes from remote clients to data nodes |
count |
datanode_xceiver_count |
Number of data node receivers |
count |
datanode_xmits_in_progress |
The number of transmissions being carried out by the data node |
count |