Skip to content

etcd

·


The tcd collector can take many metrics from the etcd instance, such as the status of the etcd server and network, and collect the metrics to DataFlux to help you monitor and analyze various abnormal situations of etcd.

Configuration

Preconditions

etcd version >= 3, Already tested version:

  • 3.5.7
  • 3.4.24
  • 3.3.27

Collector Configuration

Open etcd, the default metrics interface is http://localhost:2379/metrics, or you can modify it in your configuration file.

Go to the conf.d/samples directory under the DataKit installation directory, copy etcd.conf.sample and name it etcd.conf. Examples are as follows:

[[inputs.etcd]]
  ## Exporter URLs.
  urls = ["http://127.0.0.1:2379/metrics"]

  ## TLS configuration.
  tls_open = false
  # tls_ca = "/tmp/ca.crt"
  # tls_cert = "/tmp/peer.crt"
  # tls_key = "/tmp/peer.key"

  ## Set to 'true' to enable election.
  election = true

  ## Ignore tags. Multi supported.
  ## The matched tags would be dropped, but the item would still be sent.
  # tags_ignore = ["xxxx"]

  ## Customize tags.
  [inputs.etcd.tags]
  # some_tag = "some_value"
  # more_tag = "some_other_value"

  ## (Optional) Collect interval: (defaults to "30s").
  # interval = "30s"

Once configured, restart DataKit.

The collector can now be turned on by ConfigMap injection collector configuration.

Metric

etcd

Tags & Fields Description
action
(tag)
Action.
cluster_version
(tag)
Cluster version.
code
(tag)
Code.
grpc_code
(tag)
GRPC code.
grpc_method
(tag)
GRPC method.
grpc_service
(tag)
GRPC service name.
grpc_type
(tag)
GRPC type.
host
(tag)
Hostname.
instance
(tag)
Instance.
server_go_version
(tag)
Server go version.
server_id
(tag)
Server ID.
server_version
(tag)
Server version.
version
(tag)
Version.
etcd_cluster_version Which version is running. 1 for 'cluster_version' label with current cluster version
Type: float | (gauge)
Unit: count
etcd_debugging_auth_revision The current revision of auth store.
Type: float | (gauge)
Unit: count
etcd_debugging_disk_backend_commit_rebalance_duration_seconds The latency distributions of commit.rebalance called by bboltdb backend.
Type: float | (histogram)
Unit: count
etcd_debugging_disk_backend_commit_spill_duration_seconds The latency distributions of commit.spill called by bboltdb backend.
Type: float | (histogram)
Unit: count
etcd_debugging_disk_backend_commit_write_duration_seconds The latency distributions of commit.write called by bboltdb backend.
Type: float | (histogram)
Unit: count
etcd_debugging_lease_granted_total The total number of granted leases.
Type: float | (count)
Unit: count
etcd_debugging_lease_renewed_total The number of renewed leases seen by the leader.
Type: float | (count)
Unit: count
etcd_debugging_lease_revoked_total The total number of revoked leases.
Type: float | (count)
Unit: count
etcd_debugging_lease_ttl_total Bucketed histogram of lease TTLs.
Type: float | (histogram)
Unit: count
etcd_debugging_mvcc_compact_revision The revision of the last compaction in store.
Type: float | (gauge)
Unit: count
etcd_debugging_mvcc_current_revision The current revision of store.
Type: float | (gauge)
Unit: count
etcd_debugging_mvcc_db_compaction_keys_total Total number of db keys compacted.
Type: float | (count)
Unit: count
etcd_debugging_mvcc_db_compaction_last The unix time of the last db compaction. Resets to 0 on start.
Type: float | (gauge)
Unit: count
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds Bucketed histogram of db compaction pause duration.
Type: float | (histogram)
Unit: count
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds Bucketed histogram of db compaction total duration.
Type: float | (histogram)
Unit: count
etcd_debugging_mvcc_db_total_size_in_bytes Total size of the underlying database physically allocated in bytes.
Type: float | (gauge)
Unit: count
etcd_debugging_mvcc_delete_total Total number of deletes seen by this member.
Type: float | (count)
Unit: count
etcd_debugging_mvcc_events_total Total number of events sent by this member.
Type: float | (count)
Unit: count
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds Bucketed histogram of index compaction pause duration.
Type: float | (histogram)
Unit: count
etcd_debugging_mvcc_keys_total Total number of keys.
Type: float | (gauge)
Unit: count
etcd_debugging_mvcc_pending_events_total Total number of pending events to be sent.
Type: float | (gauge)
Unit: count
etcd_debugging_mvcc_put_total Total number of puts seen by this member.
Type: float | (count)
Unit: count
etcd_debugging_mvcc_range_total Total number of ranges seen by this member.
Type: float | (count)
Unit: count
etcd_debugging_mvcc_slow_watcher_total Total number of unsynced slow watchers.
Type: float | (gauge)
Unit: count
etcd_debugging_mvcc_total_put_size_in_bytes The total size of put kv pairs seen by this member.
Type: float | (gauge)
Unit: count
etcd_debugging_mvcc_txn_total Total number of txns seen by this member.
Type: float | (count)
Unit: count
etcd_debugging_mvcc_watch_stream_total Total number of watch streams.
Type: float | (gauge)
Unit: count
etcd_debugging_mvcc_watcher_total Total number of watchers.
Type: float | (gauge)
Unit: count
etcd_debugging_server_alarms Alarms for every member in cluster. 1 for 'server_id' label with current ID. 2 for 'alarm_type' label with type of this alarm
Type: float | (gauge)
Unit: count
etcd_debugging_server_lease_expired_total The total number of expired leases.
Type: float | (count)
Unit: count
etcd_debugging_snap_save_marshalling_duration_seconds The marshaling cost distributions of save called by snapshot.
Type: float | (histogram)
Unit: count
etcd_debugging_snap_save_total_duration_seconds The total latency distributions of save called by snapshot.
Type: float | (histogram)
Unit: count
etcd_debugging_store_expires_total Total number of expired keys.
Type: float | (count)
Unit: count
etcd_debugging_store_reads_failed_total Failed read actions by (get/getRecursive), local to this member.
Type: float | (count)
Unit: count
etcd_debugging_store_reads_total Total number of reads action by (get/getRecursive), local to this member.
Type: float | (count)
Unit: count
etcd_debugging_store_watch_requests_total Total number of incoming watch requests (new or reestablished).
Type: float | (count)
Unit: count
etcd_debugging_store_watchers Count of currently active watchers.
Type: float | (gauge)
Unit: count
etcd_debugging_store_writes_failed_total Failed write actions (e.g. set/compareAndDelete), seen by this member.
Type: float | (count)
Unit: count
etcd_debugging_store_writes_total Total number of writes (e.g. set/compareAndDelete) seen by this member.
Type: float | (count)
Unit: count
etcd_disk_backend_commit_duration_seconds The latency distributions of commit called by backend.
Type: float | (histogram)
Unit: count
etcd_disk_backend_defrag_duration_seconds The latency distribution of backend defragmentation.
Type: float | (histogram)
Unit: count
etcd_disk_backend_snapshot_duration_seconds The latency distribution of backend snapshots.
Type: float | (histogram)
Unit: count
etcd_disk_defrag_inflight Whether or not defrag is active on the member. 1 means active, 0 means not.
Type: float | (gauge)
Unit: count
etcd_disk_wal_fsync_duration_seconds The latency distributions of fsync called by WAL.
Type: float | (histogram)
Unit: count
etcd_disk_wal_write_bytes_total Total number of bytes written in WAL.
Type: float | (gauge)
Unit: count
etcd_grpc_proxy_cache_hits_total Total number of cache hits
Type: float | (gauge)
Unit: count
etcd_grpc_proxy_cache_keys_total Total number of keys/ranges cached
Type: float | (gauge)
Unit: count
etcd_grpc_proxy_cache_misses_total Total number of cache misses
Type: float | (gauge)
Unit: count
etcd_grpc_proxy_events_coalescing_total Total number of events coalescing
Type: float | (count)
Unit: count
etcd_grpc_proxy_watchers_coalescing_total Total number of current watchers coalescing
Type: float | (gauge)
Unit: count
etcd_mvcc_db_open_read_transactions The number of currently open read transactions
Type: float | (gauge)
Unit: count
etcd_mvcc_db_total_size_in_bytes Total size of the underlying database physically allocated in bytes.
Type: float | (gauge)
Unit: count
etcd_mvcc_db_total_size_in_use_in_bytes Total size of the underlying database logically in use in bytes.
Type: float | (gauge)
Unit: count
etcd_mvcc_delete_total Total number of deletes seen by this member.
Type: float | (count)
Unit: count
etcd_mvcc_hash_duration_seconds The latency distribution of storage hash operation.
Type: float | (histogram)
Unit: count
etcd_mvcc_hash_rev_duration_seconds The latency distribution of storage hash by revision operation.
Type: float | (histogram)
Unit: count
etcd_mvcc_put_total Total number of puts seen by this member.
Type: float | (count)
Unit: count
etcd_mvcc_range_total Total number of ranges seen by this member.
Type: float | (count)
Unit: count
etcd_mvcc_txn_total Total number of txns seen by this member.
Type: float | (count)
Unit: count
etcd_network_active_peers The current number of active peer connections.
Type: float | (gauge)
Unit: count
etcd_network_client_grpc_received_bytes_total The total number of bytes received from grpc clients.
Type: float | (count)
Unit: count
etcd_network_client_grpc_sent_bytes_total The total number of bytes sent to grpc clients.
Type: float | (count)
Unit: count
etcd_network_disconnected_peers_total The total number of disconnected peers.
Type: float | (count)
Unit: count
etcd_network_known_peers The current number of known peers.
Type: float | (gauge)
Unit: count
etcd_network_peer_received_bytes_total The total number of bytes received from peers.
Type: float | (count)
Unit: count
etcd_network_peer_received_failures_total The total number of receive failures from peers.
Type: float | (count)
Unit: count
etcd_network_peer_round_trip_time_seconds Round-Trip-Time histogram between peers
Type: float | (histogram)
Unit: count
etcd_network_peer_sent_bytes_total The total number of bytes sent to peers.
Type: float | (count)
Unit: count
etcd_network_peer_sent_failures_total The total number of send failures from peers.
Type: float | (count)
Unit: count
etcd_network_server_stream_failures_total The total number of stream failures from the local server.
Type: float | (count)
Unit: count
etcd_network_snapshot_receive_failures Total number of snapshot receive failures
Type: float | (count)
Unit: count
etcd_network_snapshot_receive_inflights_total Total number of inflight snapshot receives
Type: float | (gauge)
Unit: count
etcd_network_snapshot_receive_success Total number of successful snapshot receives
Type: float | (count)
Unit: count
etcd_network_snapshot_receive_total_duration_seconds Total latency distributions of v3 snapshot receives
Type: float | (histogram)
Unit: count
etcd_network_snapshot_send_failures Total number of snapshot send failures
Type: float | (count)
Unit: count
etcd_network_snapshot_send_inflights_total Total number of inflight snapshot sends
Type: float | (gauge)
Unit: count
etcd_network_snapshot_send_success Total number of successful snapshot sends
Type: float | (count)
Unit: count
etcd_network_snapshot_send_total_duration_seconds Total latency distributions of v3 snapshot sends
Type: float | (histogram)
Unit: count
etcd_server_apply_duration_seconds The latency distributions of v2 apply called by backend.
Type: float | (histogram)
Unit: count
etcd_server_client_requests_total The total number of client requests per client version.
Type: float | (count)
Unit: count
etcd_server_go_version Which Go version server is running with. 1 for 'server_go_version' label with current version.
Type: float | (gauge)
Unit: count
etcd_server_has_leader Whether or not a leader exists. 1 is existence, 0 is not.
Type: float | (gauge)
Unit: count
etcd_server_health_failures The total number of failed health checks
Type: float | (count)
Unit: count
etcd_server_health_success The total number of successful health checks
Type: float | (count)
Unit: count
etcd_server_heartbeat_send_failures_total The total number of leader heartbeat send failures (likely overloaded from slow disk).
Type: float | (count)
Unit: count
etcd_server_id Server or member ID in hexadecimal format. 1 for 'server_id' label with current ID.
Type: float | (gauge)
Unit: count
etcd_server_is_leader Whether or not this member is a leader. 1 if is, 0 otherwise.
Type: float | (gauge)
Unit: count
etcd_server_is_learner Whether or not this member is a learner. 1 if is, 0 otherwise.
Type: float | (gauge)
Unit: count
etcd_server_leader_changes_seen_total The number of leader changes seen.
Type: float | (count)
Unit: count
etcd_server_learner_promote_failures The total number of failed learner promotions (likely learner not ready) while this member is leader.
Type: float | (count)
Unit: count
etcd_server_learner_promote_successes The total number of successful learner promotions while this member is leader.
Type: float | (count)
Unit: count
etcd_server_proposals_applied_total The total number of consensus proposals applied.
Type: float | (gauge)
Unit: count
etcd_server_proposals_committed_total The total number of consensus proposals committed.
Type: float | (gauge)
Unit: count
etcd_server_proposals_failed_total The total number of failed proposals seen.
Type: float | (count)
Unit: count
etcd_server_proposals_pending The current number of pending proposals to commit.
Type: float | (gauge)
Unit: count
etcd_server_quota_backend_bytes Current backend storage quota size in bytes.
Type: float | (gauge)
Unit: count
etcd_server_read_indexes_failed_total The total number of failed read indexes seen.
Type: float | (count)
Unit: count
etcd_server_slow_apply_total The total number of slow apply requests (likely overloaded from slow disk).
Type: float | (count)
Unit: count
etcd_server_slow_read_indexes_total The total number of pending read indexes not in sync with leader's or timed out read index requests.
Type: float | (count)
Unit: count
etcd_server_snapshot_apply_in_progress_total 1 if the server is applying the incoming snapshot. 0 if none.
Type: float | (gauge)
Unit: count
etcd_server_version Which version is running. 1 for 'server_version' label with current version.
Type: float | (gauge)
Unit: count
etcd_snap_db_fsync_duration_seconds The latency distributions of fsyncing .snap.db file
Type: float | (histogram)
Unit: count
etcd_snap_db_save_total_duration_seconds The total latency distributions of v3 snapshot save
Type: float | (histogram)
Unit: count
etcd_snap_fsync_duration_seconds The latency distributions of fsync called by snap.
Type: float | (histogram)
Unit: count
go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
Type: float | (summary)
Unit: count
go_goroutines Number of goroutines that currently exist.
Type: float | (gauge)
Unit: count
go_info Information about the Go environment.
Type: float | (gauge)
Unit: count
go_memstats_alloc_bytes Number of bytes allocated and still in use.
Type: float | (gauge)
Unit: count
go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
Type: float | (count)
Unit: count
go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
Type: float | (gauge)
Unit: count
go_memstats_frees_total Total number of frees.
Type: float | (count)
Unit: count
go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
Type: float | (gauge)
Unit: count
go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
Type: float | (gauge)
Unit: count
go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
Type: float | (gauge)
Unit: count
go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
Type: float | (gauge)
Unit: count
go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
Type: float | (gauge)
Unit: count
go_memstats_heap_objects Number of allocated objects.
Type: float | (gauge)
Unit: count
go_memstats_heap_released_bytes Number of heap bytes released to OS.
Type: float | (gauge)
Unit: count
go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
Type: float | (gauge)
Unit: count
go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
Type: float | (gauge)
Unit: count
go_memstats_lookups_total Total number of pointer lookups.
Type: float | (count)
Unit: count
go_memstats_mallocs_total Total number of mallocs.
Type: float | (count)
Unit: count
go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
Type: float | (gauge)
Unit: count
go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
Type: float | (gauge)
Unit: count
go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
Type: float | (gauge)
Unit: count
go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
Type: float | (gauge)
Unit: count
go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
Type: float | (gauge)
Unit: count
go_memstats_other_sys_bytes Number of bytes used for other system allocations.
Type: float | (gauge)
Unit: count
go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
Type: float | (gauge)
Unit: count
go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
Type: float | (gauge)
Unit: count
go_memstats_sys_bytes Number of bytes obtained from system.
Type: float | (gauge)
Unit: count
go_threads Number of OS threads created.
Type: float | (gauge)
Unit: count
grpc_server_handled_total Total number of RPCs completed on the server, regardless of success or failure.
Type: float | (count)
Unit: count
grpc_server_msg_received_total Total number of RPC stream messages received on the server.
Type: float | (count)
Unit: count
grpc_server_msg_sent_total Total number of gRPC stream messages sent by the server.
Type: float | (count)
Unit: count
grpc_server_started_total Total number of RPCs started on the server.
Type: float | (count)
Unit: count
os_fd_limit The file descriptor limit.
Type: float | (gauge)
Unit: count
os_fd_used The number of used file descriptors.
Type: float | (gauge)
Unit: count
process_cpu_seconds_total Total user and system CPU time spent in seconds
Type: float | (gauge)
Unit: count
process_max_fds Maximum number of open file descriptors
Type: float | (gauge)
Unit: count
process_open_fds Number of open file descriptors
Type: float | (gauge)
Unit: count
process_resident_memory_bytes Resident memory size in bytes
Type: float | (gauge)
Unit: count
process_start_time_seconds Start time of the process since unix epoch in seconds
Type: float | (gauge)
Unit: count
process_virtual_memory_bytes Virtual memory size in bytes
Type: float | (gauge)
Unit: count
process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes
Type: float | (gauge)
Unit: count
promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
Type: float | (gauge)
Unit: count
promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
Type: float | (count)
Unit: count

Feedback

Is this page helpful? ×