Cassandra

可以使用 DDTrace 采集 Cassandra 指标。采集数据流向如下：Cassandra -> DDTrace -> DataKit(StatsD)。

可以看到 DataKit 已经集成了 StatsD 的服务端，DDTrace 采集 Cassandra 的数据后使用 StatsD 的协议报告给了 DataKit。

配置¶

前置条件¶

已测试的版本：
- 5.0
- 4.1.3
- 3.11.15
- 3.0.24
- 2.1.22
下载 dd-java-agent.jar 包，参见这里;
DataKit 侧：参见 StatsD 的配置。
Cassandra 侧：

在 /usr/local/cassandra/bin 下创建文件 setenv.sh 并赋予执行权限，再写入以下内容：

export CATALINA_OPTS="-javaagent:dd-java-agent.jar \
                      -Ddd.jmxfetch.enabled=true \
                      -Ddd.jmxfetch.statsd.host=${DATAKIT_HOST} \
                      -Ddd.jmxfetch.statsd.port=${DATAKIT_STATSD_HOST} \
                      -Ddd.jmxfetch.cassandra.enabled=true"

参数说明如下：

javaagent: 这个填写 dd-java-agent.jar 的完整路径；
Ddd.jmxfetch.enabled: 填 true, 表示开启 DDTrace 的采集功能；
Ddd.jmxfetch.statsd.host: 填写 DataKit 监听的网络地址。不含端口号；
Ddd.jmxfetch.statsd.port: 填写 DataKit 监听的端口号。一般为 11002，由 DataKit 侧的配置决定；
Ddd.jmxfetch.Cassandra.enabled: 填 true, 表示开启 DDTrace 的 Cassandra 采集功能。开启后会多出名为 cassandra 的指标集；

重启 Cassandra 使配置生效。

采集器配置¶

主机安装Kubernetes

进入 DataKit 安装目录下的 conf.d/samples 目录，复制 cassandra.conf.sample 并命名为 cassandra.conf。示例如下：

[[inputs.statsd]]
  ## Collector alias.
  source = "statsd/cassandra"

  ## Collect interval, default is 10 seconds. (optional)
  # interval = '10s'

  protocol = "udp"

  ## Address and port to host UDP listener on: (defaults to ":8125")
  service_address = ":11002"

  ## Tag request metric. Used for distinguish feed metric name.
  ## eg, DD_TAGS=source_key:tomcat,host_key:cn-shanghai-sq5ei
  ## eg, -Ddd.tags=source_key:tomcat,host_key:cn-shanghai-sq5ei
  # statsd_source_key = "source_key"
  # statsd_host_key   = "host_key"
  ## Indicate whether report tag statsd_source_key and statsd_host_key.
  # save_above_key    = false

  delete_gauges = true
  delete_counters = true
  delete_sets = true
  delete_timings = true

  ## Counter metric is float in new Datakit version, set true if want be int.
  # set_counter_int = false

  ## Percentiles to calculate for timing & histogram stats
  percentiles = [50.0, 90.0, 99.0, 99.9, 99.95, 100.0]

  ## separator to use between elements of a statsd metric
  metric_separator = "_"

  ## Parses tags in the datadog statsd format
  ## http://docs.datadoghq.com/guides/dogstatsd/
  parse_data_dog_tags = true

  ## Parses datadog extensions to the statsd format
  datadog_extensions = true

  ## Parses distributions metric as specified in the datadog statsd format
  ## https://docs.datadoghq.com/developers/metrics/types/?tab=distribution#definition
  datadog_distributions = true

  ## We do not need following tags(they may create tremendous of time-series under influxdb's logic)
  ## Examples:
  ## "runtime-id", "metric-type"
  drop_tags = [ ]

  ## All metric-name prefixed with 'jvm_' are set to influxdb's measurement 'jvm'
  ## All metric-name prefixed with 'stats_' are set to influxdb's measurement 'stats'
  ## Attention: Must add these word in statsd conf file.
  metric_mapping = ["cassandra_:cassandra", "jvm_:cassandra_jvm", "jmx_:cassandra_jmx", "datadog_:cassandra_datadog"]

  ## Number of UDP messages allowed to queue up, once filled,
  ## the statsd server will start dropping packets, default is 128.
  # allowed_pending_messages = 128

  ## Number of timing/histogram values to track per-measurement in the
  ## calculation of percentiles. Raising this limit increases the accuracy
  ## of percentiles but also increases the memory usage and cpu time.
  percentile_limit = 1000

  ## Max duration (TTL) for each metric to stay cached/reported without being updated.
  # max_ttl = "1000h"

  [inputs.statsd.tags]
    # some_tag = "some_value"
    # more_tag = "some_other_value"

配置好后，重启 DataKit 即可。

可通过 ConfigMap 方式注入采集器配置或配置 ENV_DATAKIT_INPUTS 开启采集器。

指标¶

`cassandra`¶

Tags & Fields	Description
columnfamily (`tag`)	'columnfamily'=batches 'columnfamily'=built_views 'columnfamily'=columns 'columnfamily'='paxos' 'columnfamily'=peer
host (`tag`)	Host name.
instance (`tag`)	Instance name.
jmx_domain (`tag`)	JMX domain.
keyspace (`tag`)	'keyspace'=system 'keyspace'=system_schema
metric_type (`tag`)	Metric type.
name (`tag`)	Type name.
path (`tag`)	path=request
runtime-id (`tag`)	Runtime id.
scope (`tag`)	scope=ReadStage scope=MutationStage scope=HintsDispatcher scope='MemtableFlushWriter' scope='MemtablePostFlush'
service (`tag`)	Service name.
table (`tag`)	table=IndexInfo,table=available_ranges,table=batches,table=built_views,
type (`tag`)	Object type.
active_tasks	The number of tasks that the thread pool is actively executing. Type: float \| (gauge) Unit: count
bloom_filter_false_ratio	The ratio of Bloom filter false positives to total checks. Type: float \| (gauge) Unit: count
bytes_flushed_count	The amount of data that was flushed since (re)start. Type: float \| (gauge) Unit: digital,B
cas_commit_latency_75th_percentile	The latency of 'paxos' commit round - p75. Type: float \| (gauge) Unit: time,ms
cas_commit_latency_95th_percentile	The latency of 'paxos' commit round - p95. Type: float \| (gauge) Unit: time,ms
cas_commit_latency_one_minute_rate	The number of 'paxos' commit round per second. Type: float \| (gauge) Unit: throughput,reqps
cas_prepare_latency_75th_percentile	The latency of 'paxos' prepare round - p75. Type: float \| (gauge) Unit: time,ms
cas_prepare_latency_95th_percentile	The latency of 'paxos' prepare round - p95. Type: float \| (gauge) Unit: time,ms
cas_prepare_latency_one_minute_rate	The number of 'paxos' prepare round per second. Type: float \| (gauge) Unit: throughput,reqps
cas_propose_latency_75th_percentile	The latency of 'paxos' propose round - p75. Type: float \| (gauge) Unit: time,ms
cas_propose_latency_95th_percentile	The latency of 'paxos' propose round - p95. Type: float \| (gauge) Unit: time,ms
cas_propose_latency_one_minute_rate	The number of 'paxos' propose round per second. Type: float \| (gauge) Unit: throughput,reqps
col_update_time_delta_histogram_75th_percentile	The column update time delta - p75. Type: float \| (gauge) Unit: time,ms
col_update_time_delta_histogram_95th_percentile	The column update time delta - p95. Type: float \| (gauge) Unit: time,ms
col_update_time_delta_histogram_min	The column update time delta - min. Type: float \| (gauge) Unit: time,ms
compaction_bytes_written_count	The amount of data that was compacted since (re)start. Type: float \| (gauge) Unit: digital,B
compression_ratio	The compression ratio for all SSTables. A low value means a high compression contrary to what the name suggests. Formula used is: 'size of the compressed SSTable / size of original' Type: float \| (gauge) Unit: percent,percent
currently_blocked_tasks	The number of currently blocked tasks for the thread pool. Type: float \| (gauge) Unit: count
currently_blocked_tasks_count	The number of currently blocked tasks for the thread pool. Type: float \| (gauge) Unit: count
db_droppable_tombstone_ratio	The estimate of the droppable tombstone ratio. Type: float \| (gauge) Unit: percent,percent
dropped_one_minute_rate	The tasks dropped during execution for the thread pool. Type: float \| (gauge) Unit: count
exceptions_count	The number of exceptions thrown from 'Storage' metrics. Type: float \| (gauge) Unit: count
key_cache_hit_rate	The key cache hit rate. Type: float \| (gauge) Unit: count
latency_75th_percentile	The client request latency - p75. Type: float \| (gauge) Unit: time,ms
latency_95th_percentile	The client request latency - p95. Type: float \| (gauge) Unit: time,ms
latency_one_minute_rate	The number of client requests. Type: float \| (gauge) Unit: throughput,reqps
live_disk_space_used_count	The disk space used by live SSTables (only counts in use files). Type: float \| (gauge) Unit: digital,B
live_ss_table_count	Number of live (in use) SSTables. Type: float \| (gauge) Unit: count
load_count	The disk space used by live data on a node. Type: float \| (gauge) Unit: digital,B
max_partition_size	The size of the largest compacted partition. Type: float \| (gauge) Unit: digital,B
max_row_size	The size of the largest compacted row. Type: float \| (gauge) Unit: digital,B
mean_partition_size	The average size of compacted partition. Type: float \| (gauge) Unit: digital,B
mean_row_size	The average size of compacted rows. Type: float \| (gauge) Unit: digital,B
metrics_75th_percentile	Metrics - p75. Type: float \| (gauge) Unit: count
metrics_95th_percentile	Metrics - p95. Type: float \| (gauge) Unit: count
metrics_count	Metrics count. Type: float \| (gauge) Unit: count
metrics_one_minute_rate	The number of metrics. Type: float \| (gauge) Unit: count
metrics_value	Metrics value. Type: float \| (gauge) Unit: count
net_down_endpoint_count	The number of unhealthy nodes in the cluster. They represent each individual node's view of the cluster and thus should not be summed across reporting nodes. Type: float \| (gauge) Unit: count
net_up_endpoint_count	The number of healthy nodes in the cluster. They represent each individual node's view of the cluster and thus should not be summed across reporting nodes. Type: float \| (gauge) Unit: count
nodetool_status_load	Amount of file system data under the 'cassandra' data directory without snapshot content. Type: float \| (gauge) Unit: digital,B
nodetool_status_owns	Percentage of the data owned by the node per data center times the replication factor. Type: float \| (gauge) Unit: percent,percent
nodetool_status_replication_availability	Percentage of data available per 'keyspace' times replication factor. Type: float \| (gauge) Unit: percent,percent
nodetool_status_replication_factor	Replication factor per 'keyspace'. Type: float \| (gauge) Unit: count
nodetool_status_status	Node status: up (1) or down (0). Type: float \| (gauge) Unit: bool
pending_compactions	The number of pending compactions. Type: float \| (gauge) Unit: count
pending_flushes_count	The number of pending flushes. Type: float \| (gauge) Unit: count
pending_tasks	The number of pending tasks for the thread pool. Type: float \| (gauge) Unit: count
range_latency_75th_percentile	The local range request latency - p75. Type: float \| (gauge) Unit: time,ms
range_latency_95th_percentile	The local range request latency - p95. Type: float \| (gauge) Unit: time,ms
range_latency_one_minute_rate	The number of local range requests. Type: float \| (gauge) Unit: throughput,reqps
read_latency_75th_percentile	The local read latency - p75. Type: float \| (gauge) Unit: time,ms
read_latency_95th_percentile	The local read latency - p95. Type: float \| (gauge) Unit: time,ms
read_latency_99th_percentile	The local read latency - p99. Type: float \| (gauge) Unit: time,ms
read_latency_one_minute_rate	The number of local read requests. Type: float \| (gauge) Unit: throughput,reqps
row_cache_hit_count	The number of row cache hits. Type: float \| (gauge) Unit: count
row_cache_hit_out_of_range_count	The number of row cache hits that do not satisfy the query filter and went to disk. Type: float \| (gauge) Unit: count
row_cache_miss_count	The number of table row cache misses. Type: float \| (gauge) Unit: count
snapshots_size	The disk space truly used by snapshots. Type: float \| (gauge) Unit: digital,B
ss_tables_per_read_histogram_75th_percentile	The number of SSTable data files accessed per read - p75. Type: float \| (gauge) Unit: count
ss_tables_per_read_histogram_95th_percentile	The number of SSTable data files accessed per read - p95. Type: float \| (gauge) Unit: count
timeouts_count	Count of requests not acknowledged within configurable timeout window. Type: float \| (gauge) Unit: count
timeouts_one_minute_rate	Recent timeout rate, as an exponentially weighted moving average over a one-minute interval. Type: float \| (gauge) Unit: count
tombstone_scanned_histogram_75th_percentile	Number of tombstones scanned per read - p75. Type: float \| (gauge) Unit: count
tombstone_scanned_histogram_95th_percentile	Number of tombstones scanned per read - p95. Type: float \| (gauge) Unit: count
total_blocked_tasks	Total blocked tasks Type: float \| (gauge) Unit: count
total_blocked_tasks_count	Total count of blocked tasks Type: float \| (count) Unit: count
total_commit_log_size	The size used on disk by commit logs. Type: float \| (gauge) Unit: digital,B
total_disk_space_used_count	Total disk space used by SSTables including obsolete ones waiting to be garbage collected Type: float \| (gauge) Unit: digital,B
view_lock_acquire_time_75th_percentile	The time taken acquiring a partition lock for materialized view updates - p75. Type: float \| (gauge) Unit: time,ms
view_lock_acquire_time_95th_percentile	The time taken acquiring a partition lock for materialized view updates - p95. Type: float \| (gauge) Unit: time,ms
view_lock_acquire_time_one_minute_rate	The number of requests to acquire a partition lock for materialized view updates. Type: float \| (gauge) Unit: count
view_read_time_75th_percentile	The time taken during the local read of a materialized view update - p75. Type: float \| (gauge) Unit: time,ms
view_read_time_95th_percentile	The time taken during the local read of a materialized view update - p95. Type: float \| (gauge) Unit: time,ms
view_read_time_one_minute_rate	The number of local reads for materialized view updates. Type: float \| (gauge) Unit: count
waiting_on_free_memtable_space_75th_percentile	The time spent waiting for free mem table space either on- or off-heap - p75. Type: float \| (gauge) Unit: time,ms
waiting_on_free_memtable_space_95th_percentile	The time spent waiting for free mem table space either on- or off-heap - p95. Type: float \| (gauge) Unit: time,ms
write_latency_75th_percentile	The local write latency - p75. Type: float \| (gauge) Unit: time,ms
write_latency_95th_percentile	The local write latency - p95. Type: float \| (gauge) Unit: time,ms
write_latency_99th_percentile	The local write latency - p99. Type: float \| (gauge) Unit: time,ms
write_latency_one_minute_rate	The number of local write requests. Type: float \| (gauge) Unit: throughput,reqps

`cassandra_jvm`¶

Tags & Fields	Description
host (`tag`)	Host name.
instance (`tag`)	Instance name.
jmx_domain (`tag`)	JMX domain.
metric_type (`tag`)	Metric type.
name (`tag`)	Type name.
runtime-id (`tag`)	Runtime id.
service (`tag`)	Service name.
type (`tag`)	Object type.
buffer_pool_direct_capacity	Measure of total memory capacity of direct buffers. Type: float \| (gauge) Unit: digital,B
buffer_pool_direct_count	Number of direct buffers in the pool. Type: float \| (gauge) Unit: count
buffer_pool_direct_used	Measure of memory used by direct buffers. Type: float \| (gauge) Unit: digital,B
buffer_pool_mapped_capacity	Measure of total memory capacity of mapped buffers. Type: float \| (gauge) Unit: digital,B
buffer_pool_mapped_count	Number of mapped buffers in the pool. Type: float \| (gauge) Unit: count
buffer_pool_mapped_used	Measure of memory used by mapped buffers. Type: float \| (gauge) Unit: digital,B
cpu_load_process	Recent CPU utilization for the process. Type: float \| (gauge) Unit: percent,percent
cpu_load_system	Recent CPU utilization for the whole system. Type: float \| (gauge) Unit: percent,percent
daemon_code_cache_used	The number of daemon threads. Type: float \| (count) Unit: count
daemon_thread_count	Daemon thread count. Type: float \| (gauge) Unit: count
gc_code_cache_used	GC code cache used. Type: float \| (gauge) Unit: count
gc_eden_size	The 'eden' size in garbage collection. Type: float \| (gauge) Unit: digital,B
gc_major_collection_count	The rate of major garbage collections. Type: float \| (gauge) Unit: count
gc_major_collection_time	The fraction of time spent(rate) in major garbage collection. Type: float \| (gauge) Unit: time,ms
gc_metaspace_size	The `metaspace` size in garbage collection. Type: float \| (gauge) Unit: digital,B
gc_minor_collection_count	The rate of minor garbage collections. Type: float \| (gauge) Unit: count
gc_minor_collection_time	The fraction of time spent(rate) in minor garbage collection. Type: float \| (gauge) Unit: time,ms
gc_old_gen_size	The ond gen size in garbage collection. Type: float \| (gauge) Unit: digital,B
gc_survivor_size	The survivor size in garbage collection. Type: float \| (gauge) Unit: digital,B
heap_memory	The total Java heap memory used. Type: float \| (gauge) Unit: digital,B
heap_memory_committed	The total Java heap memory committed to be used. Type: float \| (gauge) Unit: digital,B
heap_memory_init	The initial Java heap memory allocated. Type: float \| (gauge) Unit: digital,B
heap_memory_max	The maximum Java heap memory available. Type: float \| (gauge) Unit: digital,B
loaded_classes	Number of classes currently loaded. Type: float \| (gauge) Unit: count
non_heap_memory	The total Java non-heap memory used. Non-heap memory is: `Metaspace + CompressedClassSpace + CodeCache`. Type: float \| (gauge) Unit: digital,B
non_heap_memory_committed	The total Java non-heap memory committed to be used. Type: float \| (gauge) Unit: digital,B
non_heap_memory_init	The initial Java non-heap memory allocated. Type: float \| (gauge) Unit: digital,B
non_heap_memory_max	The maximum Java non-heap memory available. Type: float \| (gauge) Unit: digital,B
os_open_file_descriptors	The number of file descriptors used by this process (only available for processes run as the dd-agent user) Type: float \| (gauge) Unit: count
peak_thread_count	The peak number of live threads. Type: float \| (count) Unit: count
thread_count	The number of live threads. Type: float \| (count) Unit: count
total_thread_count	The number of total threads. Type: float \| (count) Unit: count

`cassandra_jmx`¶

Tags & Fields	Description
host (`tag`)	Host name.
instance (`tag`)	Instance name.
jmx_domain (`tag`)	JMX domain.
metric_type (`tag`)	Metric type.
name (`tag`)	Type name.
runtime-id (`tag`)	Runtime id.
service (`tag`)	Service name.
type (`tag`)	Object type.
gc_cms.count	The total number of garbage collections that have occurred. Type: float \| (count) Unit: count
gc_major_collection_count	The rate of major garbage collections. Type: float \| (gauge) Unit: count
gc_major_collection_time	The fraction of time spent in major garbage collection. Set new_gc_metrics: true to receive this metric. Type: float \| (gauge) Unit: PPM
gc_minor_collection_count	The rate of minor garbage collections. Type: float \| (gauge) Unit: count
gc_minor_collection_time	The fraction of time spent in minor garbage collection. Set new_gc_metrics: true to receive this metric. Type: float \| (gauge) Unit: PPM
gc_parnew.time	The approximate accumulated garbage collection time elapsed. Type: float \| (gauge) Unit: time,ms
heap_memory	The total Java heap memory used. Type: float \| (gauge) Unit: digital,B
heap_memory_committed	The total Java heap memory committed to be used. Type: float \| (gauge) Unit: digital,B
heap_memory_init	The initial Java heap memory allocated. Type: float \| (gauge) Unit: digital,B
heap_memory_max	The maximum Java heap memory available. Type: float \| (gauge) Unit: digital,B
non_heap_memory	The total Java non-heap memory used. Non-heap memory is calculated as follows: 'Metaspace' + CompressedClassSpace + CodeCache Type: float \| (gauge) Unit: digital,B
non_heap_memory_committed	The total Java non-heap memory committed to be used. Type: float \| (gauge) Unit: digital,B
non_heap_memory_init	The initial Java non-heap memory allocated. Type: float \| (gauge) Unit: digital,B
non_heap_memory_max	The maximum Java non-heap memory available. Type: float \| (gauge) Unit: digital,B
thread_count	The number of live threads. Type: float \| (count) Unit: count

`cassandra_datadog`¶

Tags & Fields	Description
endpoint (`tag`)	Endpoint.
host (`tag`)	Host name.
lang (`tag`)	Lang type.
lang_interpreter (`tag`)	Lang interpreter.
lang_interpreter_vendor (`tag`)	Lang interpreter vendor.
lang_version (`tag`)	Lang version.
metric_type (`tag`)	Metric type.
priority (`tag`)	Priority.
service (`tag`)	Service name.
stat (`tag`)	Stat.
tracer_version (`tag`)	Tracer version.
tracer_agent_discovery_time	Tracer agent discovery time. Type: float \| (gauge) Unit: time,ms
tracer_api_errors_total	Tracer api errors total. Type: float \| (gauge) Unit: count
tracer_api_requests_total	Tracer api requests total. Type: float \| (gauge) Unit: count
tracer_flush_bytes_total	Tracer flush bytes total. Type: float \| (gauge) Unit: count
tracer_flush_traces_total	Tracer flush traces total. Type: float \| (gauge) Unit: count
tracer_queue_enqueued_bytes	Tracer queue enqueued bytes. Type: float \| (gauge) Unit: count
tracer_queue_enqueued_spans	Tracer queue enqueued spans. Type: float \| (gauge) Unit: count
tracer_queue_enqueued_traces	Tracer queue enqueued traces. Type: float \| (gauge) Unit: count
tracer_queue_max_length	Tracer queue max length. Type: float \| (gauge) Unit: count
tracer_scope_activate_count	Tracer scope activate count. Type: float \| (gauge) Unit: count
tracer_scope_close_count	Tracer scope close count. Type: float \| (gauge) Unit: count
tracer_span_pending_created	Tracer span pending created. Type: float \| (gauge) Unit: count
tracer_span_pending_finished	Tracer span pending finished. Type: float \| (gauge) Unit: count
tracer_trace_agent_discovery_time	Tracer trace agent discovery time. Type: float \| (gauge) Unit: count
tracer_trace_agent_send_time	Tracer trace agent send time. Type: float \| (gauge) Unit: count
tracer_trace_pending_created	Tracer trace pending created. Type: float \| (gauge) Unit: count
tracer_tracer_trace_buffer_fill_time	Tracer trace buffer fill time. Type: float \| (gauge) Unit: count

Cassandra

配置¶

前置条件¶

采集器配置¶

指标¶

cassandra¶

cassandra_jvm¶

cassandra_jmx¶

cassandra_datadog¶

文档内容是否对您有帮助？ ×

`cassandra`¶

`cassandra_jvm`¶

`cassandra_jmx`¶

`cassandra_datadog`¶