Cassandra
可以使用 DDTrace 采集 Cassandra 指标。采集数据流向如下:Cassandra -> DDTrace -> DataKit(StatsD)。
可以看到 DataKit 已经集成了 StatsD 的服务端,DDTrace 采集 Cassandra 的数据后使用 StatsD 的协议报告给了 DataKit。
配置¶
前置条件¶
-
已测试的版本:
- 5.0
- 4.1.3
- 3.11.15
- 3.0.24
- 2.1.22
-
下载
dd-java-agent.jar包,参见 这里; -
DataKit 侧:参见 StatsD 的配置。
-
Cassandra 侧:
在 /usr/local/cassandra/bin 下创建文件 setenv.sh 并赋予执行权限,再写入以下内容:
export CATALINA_OPTS="-javaagent:dd-java-agent.jar \
-Ddd.jmxfetch.enabled=true \
-Ddd.jmxfetch.statsd.host=${DATAKIT_HOST} \
-Ddd.jmxfetch.statsd.port=${DATAKIT_STATSD_HOST} \
-Ddd.jmxfetch.cassandra.enabled=true"
参数说明如下:
javaagent: 这个填写dd-java-agent.jar的完整路径;Ddd.jmxfetch.enabled: 填true, 表示开启 DDTrace 的采集功能;Ddd.jmxfetch.statsd.host: 填写 DataKit 监听的网络地址。不含端口号;Ddd.jmxfetch.statsd.port: 填写 DataKit 监听的端口号。一般为11002,由 DataKit 侧的配置决定;Ddd.jmxfetch.Cassandra.enabled: 填true, 表示开启 DDTrace 的 Cassandra 采集功能。开启后会多出名为cassandra的指标集;
重启 Cassandra 使配置生效。
采集器配置¶
进入 DataKit 安装目录下的 conf.d/samples 目录,复制 cassandra.conf.sample 并命名为 cassandra.conf。示例如下:
[[inputs.statsd]]
## Collector alias.
source = "statsd/cassandra"
## Collect interval, default is 10 seconds. (optional)
# interval = '10s'
protocol = "udp"
## Address and port to host UDP listener on: (defaults to ":8125")
service_address = ":11002"
## Tag request metric. Used for distinguish feed metric name.
## eg, DD_TAGS=source_key:tomcat,host_key:cn-shanghai-sq5ei
## eg, -Ddd.tags=source_key:tomcat,host_key:cn-shanghai-sq5ei
# statsd_source_key = "source_key"
# statsd_host_key = "host_key"
## Indicate whether report tag statsd_source_key and statsd_host_key.
# save_above_key = false
delete_gauges = true
delete_counters = true
delete_sets = true
delete_timings = true
## Counter metric is float in new Datakit version, set true if want be int.
# set_counter_int = false
## Percentiles to calculate for timing & histogram stats
percentiles = [50.0, 90.0, 99.0, 99.9, 99.95, 100.0]
## separator to use between elements of a statsd metric
metric_separator = "_"
## Parses tags in the datadog statsd format
## http://docs.datadoghq.com/guides/dogstatsd/
parse_data_dog_tags = true
## Parses datadog extensions to the statsd format
datadog_extensions = true
## Parses distributions metric as specified in the datadog statsd format
## https://docs.datadoghq.com/developers/metrics/types/?tab=distribution#definition
datadog_distributions = true
## We do not need following tags(they may create tremendous of time-series under influxdb's logic)
## Examples:
## "runtime-id", "metric-type"
drop_tags = [ ]
## All metric-name prefixed with 'jvm_' are set to influxdb's measurement 'jvm'
## All metric-name prefixed with 'stats_' are set to influxdb's measurement 'stats'
## Attention: Must add these word in statsd conf file.
metric_mapping = ["cassandra_:cassandra", "jvm_:cassandra_jvm", "jmx_:cassandra_jmx", "datadog_:cassandra_datadog"]
## Number of UDP messages allowed to queue up, once filled,
## the statsd server will start dropping packets, default is 128.
# allowed_pending_messages = 128
## Number of timing/histogram values to track per-measurement in the
## calculation of percentiles. Raising this limit increases the accuracy
## of percentiles but also increases the memory usage and cpu time.
percentile_limit = 1000
## Max duration (TTL) for each metric to stay cached/reported without being updated.
# max_ttl = "1000h"
[inputs.statsd.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
配置好后,重启 DataKit 即可。
可通过 ConfigMap 方式注入采集器配置 或 配置 ENV_DATAKIT_INPUTS 开启采集器。
指标¶
cassandra¶
| Tags & Fields | Description |
|---|---|
| active_tasks | The number of tasks that the thread pool is actively executing. Type: float | (gauge) Unit: count |
| bloom_filter_false_ratio | The ratio of Bloom filter false positives to total checks. Type: float | (gauge) Unit: count |
| bytes_flushed_count | The amount of data that was flushed since (re)start. Type: float | (gauge) Unit: digital,B |
| cas_commit_latency_75th_percentile | The latency of 'paxos' commit round - p75. Type: float | (gauge) Unit: time,ms |
| cas_commit_latency_95th_percentile | The latency of 'paxos' commit round - p95. Type: float | (gauge) Unit: time,ms |
| cas_commit_latency_one_minute_rate | The number of 'paxos' commit round per second. Type: float | (gauge) Unit: throughput,reqps |
| cas_prepare_latency_75th_percentile | The latency of 'paxos' prepare round - p75. Type: float | (gauge) Unit: time,ms |
| cas_prepare_latency_95th_percentile | The latency of 'paxos' prepare round - p95. Type: float | (gauge) Unit: time,ms |
| cas_prepare_latency_one_minute_rate | The number of 'paxos' prepare round per second. Type: float | (gauge) Unit: throughput,reqps |
| cas_propose_latency_75th_percentile | The latency of 'paxos' propose round - p75. Type: float | (gauge) Unit: time,ms |
| cas_propose_latency_95th_percentile | The latency of 'paxos' propose round - p95. Type: float | (gauge) Unit: time,ms |
| cas_propose_latency_one_minute_rate | The number of 'paxos' propose round per second. Type: float | (gauge) Unit: throughput,reqps |
| col_update_time_delta_histogram_75th_percentile | The column update time delta - p75. Type: float | (gauge) Unit: time,ms |
| col_update_time_delta_histogram_95th_percentile | The column update time delta - p95. Type: float | (gauge) Unit: time,ms |
| col_update_time_delta_histogram_min | The column update time delta - min. Type: float | (gauge) Unit: time,ms |
| compaction_bytes_written_count | The amount of data that was compacted since (re)start. Type: float | (gauge) Unit: digital,B |
| compression_ratio | The compression ratio for all SSTables. A low value means a high compression contrary to what the name suggests. Formula used is: 'size of the compressed SSTable / size of original' Type: float | (gauge) Unit: percent,percent |
| currently_blocked_tasks | The number of currently blocked tasks for the thread pool. Type: float | (gauge) Unit: count |
| currently_blocked_tasks_count | The number of currently blocked tasks for the thread pool. Type: float | (gauge) Unit: count |
| db_droppable_tombstone_ratio | The estimate of the droppable tombstone ratio. Type: float | (gauge) Unit: percent,percent |
| dropped_one_minute_rate | The tasks dropped during execution for the thread pool. Type: float | (gauge) Unit: count |
| exceptions_count | The number of exceptions thrown from 'Storage' metrics. Type: float | (gauge) Unit: count |
| key_cache_hit_rate | The key cache hit rate. Type: float | (gauge) Unit: count |
| latency_75th_percentile | The client request latency - p75. Type: float | (gauge) Unit: time,ms |
| latency_95th_percentile | The client request latency - p95. Type: float | (gauge) Unit: time,ms |
| latency_one_minute_rate | The number of client requests. Type: float | (gauge) Unit: throughput,reqps |
| live_disk_space_used_count | The disk space used by live SSTables (only counts in use files). Type: float | (gauge) Unit: digital,B |
| live_ss_table_count | Number of live (in use) SSTables. Type: float | (gauge) Unit: count |
| load_count | The disk space used by live data on a node. Type: float | (gauge) Unit: digital,B |
| max_partition_size | The size of the largest compacted partition. Type: float | (gauge) Unit: digital,B |
| max_row_size | The size of the largest compacted row. Type: float | (gauge) Unit: digital,B |
| mean_partition_size | The average size of compacted partition. Type: float | (gauge) Unit: digital,B |
| mean_row_size | The average size of compacted rows. Type: float | (gauge) Unit: digital,B |
| metrics_75th_percentile | Metrics - p75. Type: float | (gauge) Unit: count |
| metrics_95th_percentile | Metrics - p95. Type: float | (gauge) Unit: count |
| metrics_count | Metrics count. Type: float | (gauge) Unit: count |
| metrics_one_minute_rate | The number of metrics. Type: float | (gauge) Unit: count |
| metrics_value | Metrics value. Type: float | (gauge) Unit: count |
| net_down_endpoint_count | The number of unhealthy nodes in the cluster. They represent each individual node's view of the cluster and thus should not be summed across reporting nodes. Type: float | (gauge) Unit: count |
| net_up_endpoint_count | The number of healthy nodes in the cluster. They represent each individual node's view of the cluster and thus should not be summed across reporting nodes. Type: float | (gauge) Unit: count |
| nodetool_status_load | Amount of file system data under the 'cassandra' data directory without snapshot content. Type: float | (gauge) Unit: digital,B |
| nodetool_status_owns | Percentage of the data owned by the node per data center times the replication factor. Type: float | (gauge) Unit: percent,percent |
| nodetool_status_replication_availability | Percentage of data available per 'keyspace' times replication factor. Type: float | (gauge) Unit: percent,percent |
| nodetool_status_replication_factor | Replication factor per 'keyspace'. Type: float | (gauge) Unit: count |
| nodetool_status_status | Node status: up (1) or down (0). Type: float | (gauge) Unit: bool |
| pending_compactions | The number of pending compactions. Type: float | (gauge) Unit: count |
| pending_flushes_count | The number of pending flushes. Type: float | (gauge) Unit: count |
| pending_tasks | The number of pending tasks for the thread pool. Type: float | (gauge) Unit: count |
| range_latency_75th_percentile | The local range request latency - p75. Type: float | (gauge) Unit: time,ms |
| range_latency_95th_percentile | The local range request latency - p95. Type: float | (gauge) Unit: time,ms |
| range_latency_one_minute_rate | The number of local range requests. Type: float | (gauge) Unit: throughput,reqps |
| read_latency_75th_percentile | The local read latency - p75. Type: float | (gauge) Unit: time,ms |
| read_latency_95th_percentile | The local read latency - p95. Type: float | (gauge) Unit: time,ms |
| read_latency_99th_percentile | The local read latency - p99. Type: float | (gauge) Unit: time,ms |
| read_latency_one_minute_rate | The number of local read requests. Type: float | (gauge) Unit: throughput,reqps |
| row_cache_hit_count | The number of row cache hits. Type: float | (gauge) Unit: count |
| row_cache_hit_out_of_range_count | The number of row cache hits that do not satisfy the query filter and went to disk. Type: float | (gauge) Unit: count |
| row_cache_miss_count | The number of table row cache misses. Type: float | (gauge) Unit: count |
| snapshots_size | The disk space truly used by snapshots. Type: float | (gauge) Unit: digital,B |
| ss_tables_per_read_histogram_75th_percentile | The number of SSTable data files accessed per read - p75. Type: float | (gauge) Unit: count |
| ss_tables_per_read_histogram_95th_percentile | The number of SSTable data files accessed per read - p95. Type: float | (gauge) Unit: count |
| timeouts_count | Count of requests not acknowledged within configurable timeout window. Type: float | (gauge) Unit: count |
| timeouts_one_minute_rate | Recent timeout rate, as an exponentially weighted moving average over a one-minute interval. Type: float | (gauge) Unit: count |
| tombstone_scanned_histogram_75th_percentile | Number of tombstones scanned per read - p75. Type: float | (gauge) Unit: count |
| tombstone_scanned_histogram_95th_percentile | Number of tombstones scanned per read - p95. Type: float | (gauge) Unit: count |
| total_blocked_tasks | Total blocked tasks Type: float | (gauge) Unit: count |
| total_blocked_tasks_count | Total count of blocked tasks Type: float | (count) Unit: count |
| total_commit_log_size | The size used on disk by commit logs. Type: float | (gauge) Unit: digital,B |
| total_disk_space_used_count | Total disk space used by SSTables including obsolete ones waiting to be garbage collected Type: float | (gauge) Unit: digital,B |
| view_lock_acquire_time_75th_percentile | The time taken acquiring a partition lock for materialized view updates - p75. Type: float | (gauge) Unit: time,ms |
| view_lock_acquire_time_95th_percentile | The time taken acquiring a partition lock for materialized view updates - p95. Type: float | (gauge) Unit: time,ms |
| view_lock_acquire_time_one_minute_rate | The number of requests to acquire a partition lock for materialized view updates. Type: float | (gauge) Unit: count |
| view_read_time_75th_percentile | The time taken during the local read of a materialized view update - p75. Type: float | (gauge) Unit: time,ms |
| view_read_time_95th_percentile | The time taken during the local read of a materialized view update - p95. Type: float | (gauge) Unit: time,ms |
| view_read_time_one_minute_rate | The number of local reads for materialized view updates. Type: float | (gauge) Unit: count |
| waiting_on_free_memtable_space_75th_percentile | The time spent waiting for free mem table space either on- or off-heap - p75. Type: float | (gauge) Unit: time,ms |
| waiting_on_free_memtable_space_95th_percentile | The time spent waiting for free mem table space either on- or off-heap - p95. Type: float | (gauge) Unit: time,ms |
| write_latency_75th_percentile | The local write latency - p75. Type: float | (gauge) Unit: time,ms |
| write_latency_95th_percentile | The local write latency - p95. Type: float | (gauge) Unit: time,ms |
| write_latency_99th_percentile | The local write latency - p99. Type: float | (gauge) Unit: time,ms |
| write_latency_one_minute_rate | The number of local write requests. Type: float | (gauge) Unit: throughput,reqps |
cassandra_jvm¶
| Tags & Fields | Description |
|---|---|
| buffer_pool_direct_capacity | Measure of total memory capacity of direct buffers. Type: float | (gauge) Unit: digital,B |
| buffer_pool_direct_count | Number of direct buffers in the pool. Type: float | (gauge) Unit: count |
| buffer_pool_direct_used | Measure of memory used by direct buffers. Type: float | (gauge) Unit: digital,B |
| buffer_pool_mapped_capacity | Measure of total memory capacity of mapped buffers. Type: float | (gauge) Unit: digital,B |
| buffer_pool_mapped_count | Number of mapped buffers in the pool. Type: float | (gauge) Unit: count |
| buffer_pool_mapped_used | Measure of memory used by mapped buffers. Type: float | (gauge) Unit: digital,B |
| cpu_load_process | Recent CPU utilization for the process. Type: float | (gauge) Unit: percent,percent |
| cpu_load_system | Recent CPU utilization for the whole system. Type: float | (gauge) Unit: percent,percent |
| daemon_code_cache_used | The number of daemon threads. Type: float | (count) Unit: count |
| daemon_thread_count | Daemon thread count. Type: float | (gauge) Unit: count |
| gc_cms_count | The total number of garbage collections that have occurred. Type: float | (count) Unit: count |
| gc_code_cache_used | GC code cache used. Type: float | (gauge) Unit: count |
| gc_eden_size | The 'eden' size in garbage collection. Type: float | (gauge) Unit: digital,B |
| gc_major_collection_count | The rate of major garbage collections. Set new_gc_metrics: true to receive this metric. Type: float | (gauge) Unit: count |
| gc_major_collection_time | The fraction of time spent in major garbage collection. Set new_gc_metrics: true to receive this metric. Type: float | (gauge) Unit: PPM |
| gc_metaspace_size | The metaspace size in garbage collection.Type: float | (gauge) Unit: digital,B |
| gc_minor_collection_count | The rate of minor garbage collections. Set new_gc_metrics: true to receive this metric. Type: float | (gauge) Unit: count |
| gc_minor_collection_time | The fraction of time spent in minor garbage collection. Set new_gc_metrics: true to receive this metric. Type: float | (gauge) Unit: PPM |
| gc_old_gen_size | The ond gen size in garbage collection. Type: float | (gauge) Unit: digital,B |
| gc_parnew_time | The approximate accumulated garbage collection time elapsed. Type: float | (gauge) Unit: time,ms |
| gc_survivor_size | The survivor size in garbage collection. Type: float | (gauge) Unit: digital,B |
| heap_memory | The total Java heap memory used. Type: float | (gauge) Unit: digital,B |
| heap_memory_committed | The total Java heap memory committed to be used. Type: float | (gauge) Unit: digital,B |
| heap_memory_init | The initial Java heap memory allocated. Type: float | (gauge) Unit: digital,B |
| heap_memory_max | The maximum Java heap memory available. Type: float | (gauge) Unit: digital,B |
| loaded_classes | Number of classes currently loaded. Type: float | (gauge) Unit: count |
| non_heap_memory | The total Java non-heap memory used. Non-heap memory is: Metaspace + CompressedClassSpace + CodeCache.Type: float | (gauge) Unit: digital,B |
| non_heap_memory_committed | The total Java non-heap memory committed to be used. Type: float | (gauge) Unit: digital,B |
| non_heap_memory_init | The initial Java non-heap memory allocated. Type: float | (gauge) Unit: digital,B |
| non_heap_memory_max | The maximum Java non-heap memory available. Type: float | (gauge) Unit: digital,B |
| os_open_file_descriptors | The number of file descriptors used by this process (only available for processes run as the dd-agent user) Type: float | (gauge) Unit: count |
| peak_thread_count | The peak number of live threads. Type: float | (count) Unit: count |
| thread_count | The number of live threads. Type: float | (count) Unit: count |
| total_thread_count | The number of total threads. Type: float | (count) Unit: count |
cassandra_jmx¶
| Tags & Fields | Description |
|---|---|
| gc_cms.count | The total number of garbage collections that have occurred. Type: float | (count) Unit: count |
| gc_major_collection_count | The rate of major garbage collections. Set new_gc_metrics: true to receive this metric. Type: float | (gauge) Unit: count |
| gc_major_collection_time | The fraction of time spent in major garbage collection. Set new_gc_metrics: true to receive this metric. Type: float | (gauge) Unit: PPM |
| gc_minor_collection_count | The rate of minor garbage collections. Set new_gc_metrics: true to receive this metric. Type: float | (gauge) Unit: count |
| gc_minor_collection_time | The fraction of time spent in minor garbage collection. Set new_gc_metrics: true to receive this metric. Type: float | (gauge) Unit: PPM |
| gc_parnew.time | The approximate accumulated garbage collection time elapsed. Type: float | (gauge) Unit: time,ms |
| heap_memory | The total Java heap memory used. Type: float | (gauge) Unit: digital,B |
| heap_memory_committed | The total Java heap memory committed to be used. Type: float | (gauge) Unit: digital,B |
| heap_memory_init | The initial Java heap memory allocated. Type: float | (gauge) Unit: digital,B |
| heap_memory_max | The maximum Java heap memory available. Type: float | (gauge) Unit: digital,B |
| non_heap_memory | The total Java non-heap memory used. Non-heap memory is calculated as follows: 'Metaspace' + CompressedClassSpace + CodeCache Type: float | (gauge) Unit: digital,B |
| non_heap_memory_committed | The total Java non-heap memory committed to be used. Type: float | (gauge) Unit: digital,B |
| non_heap_memory_init | The initial Java non-heap memory allocated. Type: float | (gauge) Unit: digital,B |
| non_heap_memory_max | The maximum Java non-heap memory available. Type: float | (gauge) Unit: digital,B |
| thread_count | The number of live threads. Type: float | (count) Unit: count |
cassandra_datadog¶
| Tags & Fields | Description |
|---|---|
| tracer_agent_discovery_time | Tracer agent discovery time. Type: float | (gauge) Unit: time,ms |
| tracer_api_errors_total | Tracer api errors total. Type: float | (gauge) Unit: count |
| tracer_api_requests_total | Tracer api requests total. Type: float | (gauge) Unit: count |
| tracer_flush_bytes_total | Tracer flush bytes total. Type: float | (gauge) Unit: count |
| tracer_flush_traces_total | Tracer flush traces total. Type: float | (gauge) Unit: count |
| tracer_queue_enqueued_bytes | Tracer queue enqueued bytes. Type: float | (gauge) Unit: count |
| tracer_queue_enqueued_spans | Tracer queue enqueued spans. Type: float | (gauge) Unit: count |
| tracer_queue_enqueued_traces | Tracer queue enqueued traces. Type: float | (gauge) Unit: count |
| tracer_queue_max_length | Tracer queue max length. Type: float | (gauge) Unit: count |
| tracer_scope_activate_count | Tracer scope activate count. Type: float | (gauge) Unit: count |
| tracer_scope_close_count | Tracer scope close count. Type: float | (gauge) Unit: count |
| tracer_span_pending_created | Tracer span pending created. Type: float | (gauge) Unit: count |
| tracer_span_pending_finished | Tracer span pending finished. Type: float | (gauge) Unit: count |
| tracer_trace_agent_discovery_time | Tracer trace agent discovery time. Type: float | (gauge) Unit: count |
| tracer_trace_agent_send_time | Tracer trace agent send time. Type: float | (gauge) Unit: count |
| tracer_trace_pending_created | Tracer trace pending created. Type: float | (gauge) Unit: count |
| tracer_tracer_trace_buffer_fill_time | Tracer trace buffer fill time. Type: float | (gauge) Unit: count |