etcd
etcd 采集器可以从 etcd 实例中采取很多指标,比如 etcd 服务器状态和网络的状态等多种指标,并将指标采集到 DataFlux,帮助你监控分析 etcd 各种异常情况。
配置¶
前置条件¶
etcd 版本 >= 3, 已测试的版本:
- 3.5.7
- 3.4.24
- 3.3.27
采集器配置¶
开启 etcd,默认的 metrics 接口是 http://localhost:2379/metrics
,也可以自行在配置文件中修改。
进入 DataKit 安装目录下的 conf.d/etcd
目录,复制 etcd.conf.sample
并命名为 etcd.conf
。示例如下:
[[inputs.etcd]]
## Exporter URLs.
urls = ["http://127.0.0.1:2379/metrics"]
## TLS configuration.
tls_open = false
# tls_ca = "/tmp/ca.crt"
# tls_cert = "/tmp/peer.crt"
# tls_key = "/tmp/peer.key"
## Set to 'true' to enable election.
election = true
## Ignore tags. Multi supported.
## The matched tags would be dropped, but the item would still be sent.
# tags_ignore = ["xxxx"]
## Customize tags.
[inputs.etcd.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
## (Optional) Collect interval: (defaults to "30s").
# interval = "30s"
配置好后,重启 DataKit 即可。
目前可以通过 ConfigMap 方式注入采集器配置来开启采集器。
指标¶
etcd
¶
- 标签
Tag | Description |
---|---|
action | Action. |
cluster_version | Cluster version. |
code | Code. |
grpc_code | GRPC code. |
grpc_method | GRPC method. |
grpc_service | GRPC service name. |
grpc_type | GRPC type. |
host | Hostname. |
instance | Instance. |
server_go_version | Server go version. |
server_id | Server ID. |
server_version | Server version. |
version | Version. |
- 指标列表
Metric | Description |
---|---|
etcd_cluster_version | Which version is running. 1 for 'cluster_version' label with current cluster version Type: float Unit: count |
etcd_debugging_auth_revision | The current revision of auth store. Type: float Unit: count |
etcd_debugging_disk_backend_commit_rebalance_duration_seconds | The latency distributions of commit.rebalance called by bboltdb backend. Type: float Unit: count |
etcd_debugging_disk_backend_commit_spill_duration_seconds | The latency distributions of commit.spill called by bboltdb backend. Type: float Unit: count |
etcd_debugging_disk_backend_commit_write_duration_seconds | The latency distributions of commit.write called by bboltdb backend. Type: float Unit: count |
etcd_debugging_lease_granted_total | The total number of granted leases. Type: float Unit: count |
etcd_debugging_lease_renewed_total | The number of renewed leases seen by the leader. Type: float Unit: count |
etcd_debugging_lease_revoked_total | The total number of revoked leases. Type: float Unit: count |
etcd_debugging_lease_ttl_total | Bucketed histogram of lease TTLs. Type: float Unit: count |
etcd_debugging_mvcc_compact_revision | The revision of the last compaction in store. Type: float Unit: count |
etcd_debugging_mvcc_current_revision | The current revision of store. Type: float Unit: count |
etcd_debugging_mvcc_db_compaction_keys_total | Total number of db keys compacted. Type: float Unit: count |
etcd_debugging_mvcc_db_compaction_last | The unix time of the last db compaction. Resets to 0 on start. Type: float Unit: count |
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds | Bucketed histogram of db compaction pause duration. Type: float Unit: count |
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds | Bucketed histogram of db compaction total duration. Type: float Unit: count |
etcd_debugging_mvcc_db_total_size_in_bytes | Total size of the underlying database physically allocated in bytes. Type: float Unit: count |
etcd_debugging_mvcc_delete_total | Total number of deletes seen by this member. Type: float Unit: count |
etcd_debugging_mvcc_events_total | Total number of events sent by this member. Type: float Unit: count |
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds | Bucketed histogram of index compaction pause duration. Type: float Unit: count |
etcd_debugging_mvcc_keys_total | Total number of keys. Type: float Unit: count |
etcd_debugging_mvcc_pending_events_total | Total number of pending events to be sent. Type: float Unit: count |
etcd_debugging_mvcc_put_total | Total number of puts seen by this member. Type: float Unit: count |
etcd_debugging_mvcc_range_total | Total number of ranges seen by this member. Type: float Unit: count |
etcd_debugging_mvcc_slow_watcher_total | Total number of unsynced slow watchers. Type: float Unit: count |
etcd_debugging_mvcc_total_put_size_in_bytes | The total size of put kv pairs seen by this member. Type: float Unit: count |
etcd_debugging_mvcc_txn_total | Total number of txns seen by this member. Type: float Unit: count |
etcd_debugging_mvcc_watch_stream_total | Total number of watch streams. Type: float Unit: count |
etcd_debugging_mvcc_watcher_total | Total number of watchers. Type: float Unit: count |
etcd_debugging_server_alarms | Alarms for every member in cluster. 1 for 'server_id' label with current ID. 2 for 'alarm_type' label with type of this alarm Type: float Unit: count |
etcd_debugging_server_lease_expired_total | The total number of expired leases. Type: float Unit: count |
etcd_debugging_snap_save_marshalling_duration_seconds | The marshaling cost distributions of save called by snapshot. Type: float Unit: count |
etcd_debugging_snap_save_total_duration_seconds | The total latency distributions of save called by snapshot. Type: float Unit: count |
etcd_debugging_store_expires_total | Total number of expired keys. Type: float Unit: count |
etcd_debugging_store_reads_failed_total | Failed read actions by (get/getRecursive), local to this member. Type: float Unit: count |
etcd_debugging_store_reads_total | Total number of reads action by (get/getRecursive), local to this member. Type: float Unit: count |
etcd_debugging_store_watch_requests_total | Total number of incoming watch requests (new or reestablished). Type: float Unit: count |
etcd_debugging_store_watchers | Count of currently active watchers. Type: float Unit: count |
etcd_debugging_store_writes_failed_total | Failed write actions (e.g. set/compareAndDelete), seen by this member. Type: float Unit: count |
etcd_debugging_store_writes_total | Total number of writes (e.g. set/compareAndDelete) seen by this member. Type: float Unit: count |
etcd_disk_backend_commit_duration_seconds | The latency distributions of commit called by backend. Type: float Unit: count |
etcd_disk_backend_defrag_duration_seconds | The latency distribution of backend defragmentation. Type: float Unit: count |
etcd_disk_backend_snapshot_duration_seconds | The latency distribution of backend snapshots. Type: float Unit: count |
etcd_disk_defrag_inflight | Whether or not defrag is active on the member. 1 means active, 0 means not. Type: float Unit: count |
etcd_disk_wal_fsync_duration_seconds | The latency distributions of fsync called by WAL. Type: float Unit: count |
etcd_disk_wal_write_bytes_total | Total number of bytes written in WAL. Type: float Unit: count |
etcd_grpc_proxy_cache_hits_total | Total number of cache hits Type: float Unit: count |
etcd_grpc_proxy_cache_keys_total | Total number of keys/ranges cached Type: float Unit: count |
etcd_grpc_proxy_cache_misses_total | Total number of cache misses Type: float Unit: count |
etcd_grpc_proxy_events_coalescing_total | Total number of events coalescing Type: float Unit: count |
etcd_grpc_proxy_watchers_coalescing_total | Total number of current watchers coalescing Type: float Unit: count |
etcd_mvcc_db_open_read_transactions | The number of currently open read transactions Type: float Unit: count |
etcd_mvcc_db_total_size_in_bytes | Total size of the underlying database physically allocated in bytes. Type: float Unit: count |
etcd_mvcc_db_total_size_in_use_in_bytes | Total size of the underlying database logically in use in bytes. Type: float Unit: count |
etcd_mvcc_delete_total | Total number of deletes seen by this member. Type: float Unit: count |
etcd_mvcc_hash_duration_seconds | The latency distribution of storage hash operation. Type: float Unit: count |
etcd_mvcc_hash_rev_duration_seconds | The latency distribution of storage hash by revision operation. Type: float Unit: count |
etcd_mvcc_put_total | Total number of puts seen by this member. Type: float Unit: count |
etcd_mvcc_range_total | Total number of ranges seen by this member. Type: float Unit: count |
etcd_mvcc_txn_total | Total number of txns seen by this member. Type: float Unit: count |
etcd_network_active_peers | The current number of active peer connections. Type: float Unit: count |
etcd_network_client_grpc_received_bytes_total | The total number of bytes received from grpc clients. Type: float Unit: count |
etcd_network_client_grpc_sent_bytes_total | The total number of bytes sent to grpc clients. Type: float Unit: count |
etcd_network_disconnected_peers_total | The total number of disconnected peers. Type: float Unit: count |
etcd_network_known_peers | The current number of known peers. Type: float Unit: count |
etcd_network_peer_received_bytes_total | The total number of bytes received from peers. Type: float Unit: count |
etcd_network_peer_received_failures_total | The total number of receive failures from peers. Type: float Unit: count |
etcd_network_peer_round_trip_time_seconds | Round-Trip-Time histogram between peers Type: float Unit: count |
etcd_network_peer_sent_bytes_total | The total number of bytes sent to peers. Type: float Unit: count |
etcd_network_peer_sent_failures_total | The total number of send failures from peers. Type: float Unit: count |
etcd_network_server_stream_failures_total | The total number of stream failures from the local server. Type: float Unit: count |
etcd_network_snapshot_receive_failures | Total number of snapshot receive failures Type: float Unit: count |
etcd_network_snapshot_receive_inflights_total | Total number of inflight snapshot receives Type: float Unit: count |
etcd_network_snapshot_receive_success | Total number of successful snapshot receives Type: float Unit: count |
etcd_network_snapshot_receive_total_duration_seconds | Total latency distributions of v3 snapshot receives Type: float Unit: count |
etcd_network_snapshot_send_failures | Total number of snapshot send failures Type: float Unit: count |
etcd_network_snapshot_send_inflights_total | Total number of inflight snapshot sends Type: float Unit: count |
etcd_network_snapshot_send_success | Total number of successful snapshot sends Type: float Unit: count |
etcd_network_snapshot_send_total_duration_seconds | Total latency distributions of v3 snapshot sends Type: float Unit: count |
etcd_server_apply_duration_seconds | The latency distributions of v2 apply called by backend. Type: float Unit: count |
etcd_server_client_requests_total | The total number of client requests per client version. Type: float Unit: count |
etcd_server_go_version | Which Go version server is running with. 1 for 'server_go_version' label with current version. Type: float Unit: count |
etcd_server_has_leader | Whether or not a leader exists. 1 is existence, 0 is not. Type: float Unit: count |
etcd_server_health_failures | The total number of failed health checks Type: float Unit: count |
etcd_server_health_success | The total number of successful health checks Type: float Unit: count |
etcd_server_heartbeat_send_failures_total | The total number of leader heartbeat send failures (likely overloaded from slow disk). Type: float Unit: count |
etcd_server_id | Server or member ID in hexadecimal format. 1 for 'server_id' label with current ID. Type: float Unit: count |
etcd_server_is_leader | Whether or not this member is a leader. 1 if is, 0 otherwise. Type: float Unit: count |
etcd_server_is_learner | Whether or not this member is a learner. 1 if is, 0 otherwise. Type: float Unit: count |
etcd_server_leader_changes_seen_total | The number of leader changes seen. Type: float Unit: count |
etcd_server_learner_promote_failures | The total number of failed learner promotions (likely learner not ready) while this member is leader. Type: float Unit: count |
etcd_server_learner_promote_successes | The total number of successful learner promotions while this member is leader. Type: float Unit: count |
etcd_server_proposals_applied_total | The total number of consensus proposals applied. Type: float Unit: count |
etcd_server_proposals_committed_total | The total number of consensus proposals committed. Type: float Unit: count |
etcd_server_proposals_failed_total | The total number of failed proposals seen. Type: float Unit: count |
etcd_server_proposals_pending | The current number of pending proposals to commit. Type: float Unit: count |
etcd_server_quota_backend_bytes | Current backend storage quota size in bytes. Type: float Unit: count |
etcd_server_read_indexes_failed_total | The total number of failed read indexes seen. Type: float Unit: count |
etcd_server_slow_apply_total | The total number of slow apply requests (likely overloaded from slow disk). Type: float Unit: count |
etcd_server_slow_read_indexes_total | The total number of pending read indexes not in sync with leader's or timed out read index requests. Type: float Unit: count |
etcd_server_snapshot_apply_in_progress_total | 1 if the server is applying the incoming snapshot. 0 if none. Type: float Unit: count |
etcd_server_version | Which version is running. 1 for 'server_version' label with current version. Type: float Unit: count |
etcd_snap_db_fsync_duration_seconds | The latency distributions of fsyncing .snap.db file Type: float Unit: count |
etcd_snap_db_save_total_duration_seconds | The total latency distributions of v3 snapshot save Type: float Unit: count |
etcd_snap_fsync_duration_seconds | The latency distributions of fsync called by snap. Type: float Unit: count |
go_gc_duration_seconds | A summary of the pause duration of garbage collection cycles. Type: float Unit: count |
go_goroutines | Number of goroutines that currently exist. Type: float Unit: count |
go_info | Information about the Go environment. Type: float Unit: count |
go_memstats_alloc_bytes | Number of bytes allocated and still in use. Type: float Unit: count |
go_memstats_alloc_bytes_total | Total number of bytes allocated, even if freed. Type: float Unit: count |
go_memstats_buck_hash_sys_bytes | Number of bytes used by the profiling bucket hash table. Type: float Unit: count |
go_memstats_frees_total | Total number of frees. Type: float Unit: count |
go_memstats_gc_cpu_fraction | The fraction of this program's available CPU time used by the GC since the program started. Type: float Unit: count |
go_memstats_gc_sys_bytes | Number of bytes used for garbage collection system metadata. Type: float Unit: count |
go_memstats_heap_alloc_bytes | Number of heap bytes allocated and still in use. Type: float Unit: count |
go_memstats_heap_idle_bytes | Number of heap bytes waiting to be used. Type: float Unit: count |
go_memstats_heap_inuse_bytes | Number of heap bytes that are in use. Type: float Unit: count |
go_memstats_heap_objects | Number of allocated objects. Type: float Unit: count |
go_memstats_heap_released_bytes | Number of heap bytes released to OS. Type: float Unit: count |
go_memstats_heap_sys_bytes | Number of heap bytes obtained from system. Type: float Unit: count |
go_memstats_last_gc_time_seconds | Number of seconds since 1970 of last garbage collection. Type: float Unit: count |
go_memstats_lookups_total | Total number of pointer lookups. Type: float Unit: count |
go_memstats_mallocs_total | Total number of mallocs. Type: float Unit: count |
go_memstats_mcache_inuse_bytes | Number of bytes in use by mcache structures. Type: float Unit: count |
go_memstats_mcache_sys_bytes | Number of bytes used for mcache structures obtained from system. Type: float Unit: count |
go_memstats_mspan_inuse_bytes | Number of bytes in use by mspan structures. Type: float Unit: count |
go_memstats_mspan_sys_bytes | Number of bytes used for mspan structures obtained from system. Type: float Unit: count |
go_memstats_next_gc_bytes | Number of heap bytes when next garbage collection will take place. Type: float Unit: count |
go_memstats_other_sys_bytes | Number of bytes used for other system allocations. Type: float Unit: count |
go_memstats_stack_inuse_bytes | Number of bytes in use by the stack allocator. Type: float Unit: count |
go_memstats_stack_sys_bytes | Number of bytes obtained from system for stack allocator. Type: float Unit: count |
go_memstats_sys_bytes | Number of bytes obtained from system. Type: float Unit: count |
go_threads | Number of OS threads created. Type: float Unit: count |
grpc_server_handled_total | Total number of RPCs completed on the server, regardless of success or failure. Type: float Unit: count |
grpc_server_msg_received_total | Total number of RPC stream messages received on the server. Type: float Unit: count |
grpc_server_msg_sent_total | Total number of gRPC stream messages sent by the server. Type: float Unit: count |
grpc_server_started_total | Total number of RPCs started on the server. Type: float Unit: count |
os_fd_limit | The file descriptor limit. Type: float Unit: count |
os_fd_used | The number of used file descriptors. Type: float Unit: count |
process_cpu_seconds_total | Total user and system CPU time spent in seconds Type: float Unit: count |
process_max_fds | Maximum number of open file descriptors Type: float Unit: count |
process_open_fds | Number of open file descriptors Type: float Unit: count |
process_resident_memory_bytes | Resident memory size in bytes Type: float Unit: count |
process_start_time_seconds | Start time of the process since unix epoch in seconds Type: float Unit: count |
process_virtual_memory_bytes | Virtual memory size in bytes Type: float Unit: count |
process_virtual_memory_max_bytes | Maximum amount of virtual memory available in bytes Type: float Unit: count |
promhttp_metric_handler_requests_in_flight | Current number of scrapes being served. Type: float Unit: count |
promhttp_metric_handler_requests_total | Total number of scrapes by HTTP status code. Type: float Unit: count |