Cassandra

Cassandra metrics can be collected by using DDTrace. The flow of the collected data is as follows: Cassandra -> DDTrace -> DataKit(StatsD).

You can see that DataKit has integrated the StatsD server, DDTrace collects Cassandra metric data and reports it to DataKit using StatsD protocol.

Configuration¶

Preconditions¶

Already tested Cassandra version:
- 5.0
- 4.1.3
- 3.11.15
- 3.0.24
- 2.1.22

DDtrace Configuration¶

Download dd-java-agent.jar, see here;
DataKit configuration:

See the configuration of StatsD.

Restart DataKit to make configuration take effect.

Cassandra configuration:

Create the file setenv.sh under /usr/local/cassandra/bin and give it execute permission, then write the following:

export CATALINA_OPTS="-javaagent:dd-java-agent.jar \
                      -Ddd.jmxfetch.enabled=true \
                      -Ddd.jmxfetch.statsd.host=${DATAKIT_HOST} \
                      -Ddd.jmxfetch.statsd.port=${DATAKIT_STATSD_HOST} \
                      -Ddd.jmxfetch.cassandra.enabled=true"

The parameters are described below:

javaagent: Fill in the full path to dd-java-agent.jar;
Ddd.jmxfetch.enabled: Fill in true, which means the DDTrace collection function is enabled;
Ddd.jmxfetch.statsd.host: Fill in the network address that DataKit listens to. No port number is included;
Ddd.jmxfetch.statsd.port: Fill in the port number that DataKit listens to. Usually 11002, as determined by the DataKit side configuration;
Ddd.jmxfetch.cassandra.enabled: Fill in true, which means the Cassandra collect function of DDTrace is enabled. When enabled, the metrics set named cassandra will showing up;

Restart DataKit to make configuration take effect.

Collector Configuration¶

Host deploymentKubernetes

Go to the conf.d/db directory under the DataKit installation directory, copy cassandra.conf.sample and name it cassandra.conf. Examples are as follows:

[[inputs.statsd]]
  ## Collector alias.
  source = "statsd/cassandra"

  ## Collect interval, default is 10 seconds. (optional)
  # interval = '10s'

  protocol = "udp"

  ## Address and port to host UDP listener on: (defaults to ":8125")
  service_address = ":11002"

  ## Tag request metric. Used for distinguish feed metric name.
  ## eg, DD_TAGS=source_key:tomcat,host_key:cn-shanghai-sq5ei
  ## eg, -Ddd.tags=source_key:tomcat,host_key:cn-shanghai-sq5ei
  # statsd_source_key = "source_key"
  # statsd_host_key   = "host_key"
  ## Indicate whether report tag statsd_source_key and statsd_host_key.
  # save_above_key    = false

  delete_gauges = true
  delete_counters = true
  delete_sets = true
  delete_timings = true

  ## Counter metric is float in new Datakit version, set true if want be int.
  # set_counter_int = false

  ## Percentiles to calculate for timing & histogram stats
  percentiles = [50.0, 90.0, 99.0, 99.9, 99.95, 100.0]

  ## separator to use between elements of a statsd metric
  metric_separator = "_"

  ## Parses tags in the datadog statsd format
  ## http://docs.datadoghq.com/guides/dogstatsd/
  parse_data_dog_tags = true

  ## Parses datadog extensions to the statsd format
  datadog_extensions = true

  ## Parses distributions metric as specified in the datadog statsd format
  ## https://docs.datadoghq.com/developers/metrics/types/?tab=distribution#definition
  datadog_distributions = true

  ## We do not need following tags(they may create tremendous of time-series under influxdb's logic)
  ## Examples:
  ## "runtime-id", "metric-type"
  drop_tags = [ ]

  ## All metric-name prefixed with 'jvm_' are set to influxdb's measurement 'jvm'
  ## All metric-name prefixed with 'stats_' are set to influxdb's measurement 'stats'
  ## Attention: Must add these word in statsd conf file.
  metric_mapping = ["cassandra_:cassandra", "jvm_:cassandra_jvm", "jmx_:cassandra_jmx", "datadog_:cassandra_datadog"]

  ## Number of UDP messages allowed to queue up, once filled,
  ## the statsd server will start dropping packets, default is 128.
  # allowed_pending_messages = 128

  ## Number of timing/histogram values to track per-measurement in the
  ## calculation of percentiles. Raising this limit increases the accuracy
  ## of percentiles but also increases the memory usage and cpu time.
  percentile_limit = 1000

  ## Max duration (TTL) for each metric to stay cached/reported without being updated.
  # max_ttl = "1000h"

  [inputs.statsd.tags]
    # some_tag = "some_value"
    # more_tag = "some_other_value"

Once configured, restart DataKit.

Can be turned on by ConfigMap Injection Collector Configuration or Config ENV_DATAKIT_INPUTS .

Metric¶

`cassandra`¶

Tags

Tag	Description
columnfamily	'columnfamily'=batches 'columnfamily'=built_views 'columnfamily'=columns 'columnfamily'='paxos' 'columnfamily'=peer
host	Host name.
instance	Instance name.
jmx_domain	JMX domain.
keyspace	'keyspace'=system 'keyspace'=system_schema
metric_type	Metric type.
name	Type name.
path	path=request
runtime-id	Runtime id.
scope	scope=ReadStage scope=MutationStage scope=HintsDispatcher scope='MemtableFlushWriter' scope='MemtablePostFlush'
service	Service name.
table	table=IndexInfo,table=available_ranges,table=batches,table=built_views,
type	Object type.

Metrics

Metric	Description
active_tasks	The number of tasks that the thread pool is actively executing. Type: float Unit: count
bloom_filter_false_ratio	The ratio of Bloom filter false positives to total checks. Type: float Unit: count
bytes_flushed_count	The amount of data that was flushed since (re)start. Type: float Unit: digital,B
cas_commit_latency_75th_percentile	The latency of 'paxos' commit round - p75. Type: float Unit: time,ms
cas_commit_latency_95th_percentile	The latency of 'paxos' commit round - p95. Type: float Unit: time,ms
cas_commit_latency_one_minute_rate	The number of 'paxos' commit round per second. Type: float Unit: throughput,reqps
cas_prepare_latency_75th_percentile	The latency of 'paxos' prepare round - p75. Type: float Unit: time,ms
cas_prepare_latency_95th_percentile	The latency of 'paxos' prepare round - p95. Type: float Unit: time,ms
cas_prepare_latency_one_minute_rate	The number of 'paxos' prepare round per second. Type: float Unit: throughput,reqps
cas_propose_latency_75th_percentile	The latency of 'paxos' propose round - p75. Type: float Unit: time,ms
cas_propose_latency_95th_percentile	The latency of 'paxos' propose round - p95. Type: float Unit: time,ms
cas_propose_latency_one_minute_rate	The number of 'paxos' propose round per second. Type: float Unit: throughput,reqps
col_update_time_delta_histogram_75th_percentile	The column update time delta - p75. Type: float Unit: time,ms
col_update_time_delta_histogram_95th_percentile	The column update time delta - p95. Type: float Unit: time,ms
col_update_time_delta_histogram_min	The column update time delta - min. Type: float Unit: time,ms
compaction_bytes_written_count	The amount of data that was compacted since (re)start. Type: float Unit: digital,B
compression_ratio	The compression ratio for all SSTables. A low value means a high compression contrary to what the name suggests. Formula used is: 'size of the compressed SSTable / size of original' Type: float Unit: percent,percent
currently_blocked_tasks	The number of currently blocked tasks for the thread pool. Type: float Unit: count
currently_blocked_tasks_count	The number of currently blocked tasks for the thread pool. Type: float Unit: count
db_droppable_tombstone_ratio	The estimate of the droppable tombstone ratio. Type: float Unit: percent,percent
dropped_one_minute_rate	The tasks dropped during execution for the thread pool. Type: float Unit: count
exceptions_count	The number of exceptions thrown from 'Storage' metrics. Type: float Unit: count
key_cache_hit_rate	The key cache hit rate. Type: float Unit: count
latency_75th_percentile	The client request latency - p75. Type: float Unit: time,ms
latency_95th_percentile	The client request latency - p95. Type: float Unit: time,ms
latency_one_minute_rate	The number of client requests. Type: float Unit: throughput,reqps
live_disk_space_used_count	The disk space used by live SSTables (only counts in use files). Type: float Unit: digital,B
live_ss_table_count	Number of live (in use) SSTables. Type: float Unit: count
load_count	The disk space used by live data on a node. Type: float Unit: digital,B
max_partition_size	The size of the largest compacted partition. Type: float Unit: digital,B
max_row_size	The size of the largest compacted row. Type: float Unit: digital,B
mean_partition_size	The average size of compacted partition. Type: float Unit: digital,B
mean_row_size	The average size of compacted rows. Type: float Unit: digital,B
metrics_75th_percentile	Metrics - p75. Type: float Unit: count
metrics_95th_percentile	Metrics - p95. Type: float Unit: count
metrics_count	Metrics count. Type: float Unit: count
metrics_one_minute_rate	The number of metrics. Type: float Unit: count
metrics_value	Metrics value. Type: float Unit: count
net_down_endpoint_count	The number of unhealthy nodes in the cluster. They represent each individual node's view of the cluster and thus should not be summed across reporting nodes. Type: float Unit: count
net_up_endpoint_count	The number of healthy nodes in the cluster. They represent each individual node's view of the cluster and thus should not be summed across reporting nodes. Type: float Unit: count
nodetool_status_load	Amount of file system data under the 'cassandra' data directory without snapshot content. Type: float Unit: digital,B
nodetool_status_owns	Percentage of the data owned by the node per data center times the replication factor. Type: float Unit: percent,percent
nodetool_status_replication_availability	Percentage of data available per 'keyspace' times replication factor. Type: float Unit: percent,percent
nodetool_status_replication_factor	Replication factor per 'keyspace'. Type: float Unit: count
nodetool_status_status	Node status: up (1) or down (0). Type: float Unit: bool
pending_compactions	The number of pending compactions. Type: float Unit: count
pending_flushes_count	The number of pending flushes. Type: float Unit: count
pending_tasks	The number of pending tasks for the thread pool. Type: float Unit: count
range_latency_75th_percentile	The local range request latency - p75. Type: float Unit: time,ms
range_latency_95th_percentile	The local range request latency - p95. Type: float Unit: time,ms
range_latency_one_minute_rate	The number of local range requests. Type: float Unit: throughput,reqps
read_latency_75th_percentile	The local read latency - p75. Type: float Unit: time,ms
read_latency_95th_percentile	The local read latency - p95. Type: float Unit: time,ms
read_latency_99th_percentile	The local read latency - p99. Type: float Unit: time,ms
read_latency_one_minute_rate	The number of local read requests. Type: float Unit: throughput,reqps
row_cache_hit_count	The number of row cache hits. Type: float Unit: count
row_cache_hit_out_of_range_count	The number of row cache hits that do not satisfy the query filter and went to disk. Type: float Unit: count
row_cache_miss_count	The number of table row cache misses. Type: float Unit: count
snapshots_size	The disk space truly used by snapshots. Type: float Unit: digital,B
ss_tables_per_read_histogram_75th_percentile	The number of SSTable data files accessed per read - p75. Type: float Unit: count
ss_tables_per_read_histogram_95th_percentile	The number of SSTable data files accessed per read - p95. Type: float Unit: count
timeouts_count	Count of requests not acknowledged within configurable timeout window. Type: float Unit: count
timeouts_one_minute_rate	Recent timeout rate, as an exponentially weighted moving average over a one-minute interval. Type: float Unit: count
tombstone_scanned_histogram_75th_percentile	Number of tombstones scanned per read - p75. Type: float Unit: count
tombstone_scanned_histogram_95th_percentile	Number of tombstones scanned per read - p95. Type: float Unit: count
total_blocked_tasks	Total blocked tasks Type: float Unit: count
total_blocked_tasks_count	Total count of blocked tasks Type: float Unit: count
total_commit_log_size	The size used on disk by commit logs. Type: float Unit: digital,B
total_disk_space_used_count	Total disk space used by SSTables including obsolete ones waiting to be garbage collected Type: float Unit: digital,B
view_lock_acquire_time_75th_percentile	The time taken acquiring a partition lock for materialized view updates - p75. Type: float Unit: time,ms
view_lock_acquire_time_95th_percentile	The time taken acquiring a partition lock for materialized view updates - p95. Type: float Unit: time,ms
view_lock_acquire_time_one_minute_rate	The number of requests to acquire a partition lock for materialized view updates. Type: float Unit: count
view_read_time_75th_percentile	The time taken during the local read of a materialized view update - p75. Type: float Unit: time,ms
view_read_time_95th_percentile	The time taken during the local read of a materialized view update - p95. Type: float Unit: time,ms
view_read_time_one_minute_rate	The number of local reads for materialized view updates. Type: float Unit: count
waiting_on_free_memtable_space_75th_percentile	The time spent waiting for free mem table space either on- or off-heap - p75. Type: float Unit: time,ms
waiting_on_free_memtable_space_95th_percentile	The time spent waiting for free mem table space either on- or off-heap - p95. Type: float Unit: time,ms
write_latency_75th_percentile	The local write latency - p75. Type: float Unit: time,ms
write_latency_95th_percentile	The local write latency - p95. Type: float Unit: time,ms
write_latency_99th_percentile	The local write latency - p99. Type: float Unit: time,ms
write_latency_one_minute_rate	The number of local write requests. Type: float Unit: throughput,reqps

`cassandra_jvm`¶

Tags

Tag	Description
host	Host name.
instance	Instance name.
jmx_domain	JMX domain.
metric_type	Metric type.
name	Type name.
runtime-id	Runtime id.
service	Service name.
type	Object type.

Metrics

Metric	Description
buffer_pool_direct_capacity	Measure of total memory capacity of direct buffers. Type: float Unit: digital,B
buffer_pool_direct_count	Number of direct buffers in the pool. Type: float Unit: count
buffer_pool_direct_used	Measure of memory used by direct buffers. Type: float Unit: digital,B
buffer_pool_mapped_capacity	Measure of total memory capacity of mapped buffers. Type: float Unit: digital,B
buffer_pool_mapped_count	Number of mapped buffers in the pool. Type: float Unit: count
buffer_pool_mapped_used	Measure of memory used by mapped buffers. Type: float Unit: digital,B
cpu_load_process	Recent CPU utilization for the process. Type: float Unit: percent,percent
cpu_load_system	Recent CPU utilization for the whole system. Type: float Unit: percent,percent
daemon_code_cache_used	The number of daemon threads. Type: float Unit: count
daemon_thread_count	Daemon thread count. Type: float Unit: count
gc_cms_count	The total number of garbage collections that have occurred. Type: float Unit: count
gc_code_cache_used	GC code cache used. Type: float Unit: count
gc_eden_size	The 'eden' size in garbage collection. Type: float Unit: digital,B
gc_major_collection_count	The rate of major garbage collections. Set new_gc_metrics: true to receive this metric. Type: float Unit: count
gc_major_collection_time	The fraction of time spent in major garbage collection. Set new_gc_metrics: true to receive this metric. Type: float Unit: PPM
gc_metaspace_size	The `metaspace` size in garbage collection. Type: float Unit: digital,B
gc_minor_collection_count	The rate of minor garbage collections. Set new_gc_metrics: true to receive this metric. Type: float Unit: count
gc_minor_collection_time	The fraction of time spent in minor garbage collection. Set new_gc_metrics: true to receive this metric. Type: float Unit: PPM
gc_old_gen_size	The ond gen size in garbage collection. Type: float Unit: digital,B
gc_parnew_time	The approximate accumulated garbage collection time elapsed. Type: float Unit: time,ms
gc_survivor_size	The survivor size in garbage collection. Type: float Unit: digital,B
heap_memory	The total Java heap memory used. Type: float Unit: digital,B
heap_memory_committed	The total Java heap memory committed to be used. Type: float Unit: digital,B
heap_memory_init	The initial Java heap memory allocated. Type: float Unit: digital,B
heap_memory_max	The maximum Java heap memory available. Type: float Unit: digital,B
loaded_classes	Number of classes currently loaded. Type: float Unit: count
non_heap_memory	The total Java non-heap memory used. Non-heap memory is: `Metaspace + CompressedClassSpace + CodeCache`. Type: float Unit: digital,B
non_heap_memory_committed	The total Java non-heap memory committed to be used. Type: float Unit: digital,B
non_heap_memory_init	The initial Java non-heap memory allocated. Type: float Unit: digital,B
non_heap_memory_max	The maximum Java non-heap memory available. Type: float Unit: digital,B
os_open_file_descriptors	The number of file descriptors used by this process (only available for processes run as the dd-agent user) Type: float Unit: count
peak_thread_count	The peak number of live threads. Type: float Unit: count
thread_count	The number of live threads. Type: float Unit: count
total_thread_count	The number of total threads. Type: float Unit: count

`cassandra_jmx`¶

Tags

Tag	Description
host	Host name.
instance	Instance name.
jmx_domain	JMX domain.
metric_type	Metric type.
name	Type name.
runtime-id	Runtime id.
service	Service name.
type	Object type.

Metrics

Metric	Description
gc_cms.count	The total number of garbage collections that have occurred. Type: float Unit: count
gc_major_collection_count	The rate of major garbage collections. Set new_gc_metrics: true to receive this metric. Type: float Unit: count
gc_major_collection_time	The fraction of time spent in major garbage collection. Set new_gc_metrics: true to receive this metric. Type: float Unit: PPM
gc_minor_collection_count	The rate of minor garbage collections. Set new_gc_metrics: true to receive this metric. Type: float Unit: count
gc_minor_collection_time	The fraction of time spent in minor garbage collection. Set new_gc_metrics: true to receive this metric. Type: float Unit: PPM
gc_parnew.time	The approximate accumulated garbage collection time elapsed. Type: float Unit: time,ms
heap_memory	The total Java heap memory used. Type: float Unit: digital,B
heap_memory_committed	The total Java heap memory committed to be used. Type: float Unit: digital,B
heap_memory_init	The initial Java heap memory allocated. Type: float Unit: digital,B
heap_memory_max	The maximum Java heap memory available. Type: float Unit: digital,B
non_heap_memory	The total Java non-heap memory used. Non-heap memory is calculated as follows: 'Metaspace' + CompressedClassSpace + CodeCache Type: float Unit: digital,B
non_heap_memory_committed	The total Java non-heap memory committed to be used. Type: float Unit: digital,B
non_heap_memory_init	The initial Java non-heap memory allocated. Type: float Unit: digital,B
non_heap_memory_max	The maximum Java non-heap memory available. Type: float Unit: digital,B
thread_count	The number of live threads. Type: float Unit: count

`cassandra_datadog`¶

Tags

Tag	Description
endpoint	Endpoint.
host	Host name.
lang	Lang type.
lang_interpreter	Lang interpreter.
lang_interpreter_vendor	Lang interpreter vendor.
lang_version	Lang version.
metric_type	Metric type.
priority	Priority.
service	Service name.
stat	Stat.
tracer_version	Tracer version.

Metrics

Metric	Description
tracer_agent_discovery_time	Tracer agent discovery time. Type: float Unit: time,ms
tracer_api_errors_total	Tracer api errors total. Type: float Unit: count
tracer_api_requests_total	Tracer api requests total. Type: float Unit: count
tracer_flush_bytes_total	Tracer flush bytes total. Type: float Unit: count
tracer_flush_traces_total	Tracer flush traces total. Type: float Unit: count
tracer_queue_enqueued_bytes	Tracer queue enqueued bytes. Type: float Unit: count
tracer_queue_enqueued_spans	Tracer queue enqueued spans. Type: float Unit: count
tracer_queue_enqueued_traces	Tracer queue enqueued traces. Type: float Unit: count
tracer_queue_max_length	Tracer queue max length. Type: float Unit: count
tracer_scope_activate_count	Tracer scope activate count. Type: float Unit: count
tracer_scope_close_count	Tracer scope close count. Type: float Unit: count
tracer_span_pending_created	Tracer span pending created. Type: float Unit: count
tracer_span_pending_finished	Tracer span pending finished. Type: float Unit: count
tracer_trace_agent_discovery_time	Tracer trace agent discovery time. Type: float Unit: count
tracer_trace_agent_send_time	Tracer trace agent send time. Type: float Unit: count
tracer_trace_pending_created	Tracer trace pending created. Type: float Unit: count
tracer_tracer_trace_buffer_fill_time	Tracer trace buffer fill time. Type: float Unit: count

Cassandra

Configuration¶

Preconditions¶

DDtrace Configuration¶

Collector Configuration¶

Metric¶

cassandra¶

cassandra_jvm¶

cassandra_jmx¶

cassandra_datadog¶

Is this page helpful? ×

`cassandra`¶

`cassandra_jvm`¶

`cassandra_jmx`¶

`cassandra_datadog`¶