Skip to content

Cassandra


Cassandra metrics can be collected by using DDTrace. The flow of the collected data is as follows: Cassandra -> DDTrace -> DataKit(StatsD).

You can see that DataKit has integrated the StatsD server, DDTrace collects Cassandra metric data and reports it to DataKit using StatsD protocol.

Configuration

Preconditions

  • Already tested Cassandra version:
    • 5.0
    • 4.1.3
    • 3.11.15
    • 3.0.24
    • 2.1.22

DDtrace Configuration

  • Download dd-java-agent.jar, see here;

  • DataKit configuration:

See the configuration of StatsD.

Restart DataKit to make configuration take effect.

  • Cassandra configuration:

Create the file setenv.sh under /usr/local/cassandra/bin and give it execute permission, then write the following:

export CATALINA_OPTS="-javaagent:dd-java-agent.jar \
                      -Ddd.jmxfetch.enabled=true \
                      -Ddd.jmxfetch.statsd.host=${DATAKIT_HOST} \
                      -Ddd.jmxfetch.statsd.port=${DATAKIT_STATSD_HOST} \
                      -Ddd.jmxfetch.cassandra.enabled=true"

The parameters are described below:

  • javaagent: Fill in the full path to dd-java-agent.jar;
  • Ddd.jmxfetch.enabled: Fill in true, which means the DDTrace collection function is enabled;
  • Ddd.jmxfetch.statsd.host: Fill in the network address that DataKit listens to. No port number is included;
  • Ddd.jmxfetch.statsd.port: Fill in the port number that DataKit listens to. Usually 11002, as determined by the DataKit side configuration;
  • Ddd.jmxfetch.cassandra.enabled: Fill in true, which means the Cassandra collect function of DDTrace is enabled. When enabled, the metrics set named cassandra will showing up;

Restart DataKit to make configuration take effect.

Collector Configuration

Go to the conf.d/db directory under the DataKit installation directory, copy cassandra.conf.sample and name it cassandra.conf. Examples are as follows:

[[inputs.statsd]]
  ## Collector alias.
  source = "statsd/cassandra"

  ## Collect interval, default is 10 seconds. (optional)
  # interval = '10s'

  protocol = "udp"

  ## Address and port to host UDP listener on: (defaults to ":8125")
  service_address = ":11002"

  ## Tag request metric. Used for distinguish feed metric name.
  ## eg, DD_TAGS=source_key:tomcat,host_key:cn-shanghai-sq5ei
  ## eg, -Ddd.tags=source_key:tomcat,host_key:cn-shanghai-sq5ei
  # statsd_source_key = "source_key"
  # statsd_host_key   = "host_key"
  ## Indicate whether report tag statsd_source_key and statsd_host_key.
  # save_above_key    = false

  delete_gauges = true
  delete_counters = true
  delete_sets = true
  delete_timings = true

  ## Counter metric is float in new Datakit version, set true if want be int.
  # set_counter_int = false

  ## Percentiles to calculate for timing & histogram stats
  percentiles = [50.0, 90.0, 99.0, 99.9, 99.95, 100.0]

  ## separator to use between elements of a statsd metric
  metric_separator = "_"

  ## Parses tags in the datadog statsd format
  ## http://docs.datadoghq.com/guides/dogstatsd/
  parse_data_dog_tags = true

  ## Parses datadog extensions to the statsd format
  datadog_extensions = true

  ## Parses distributions metric as specified in the datadog statsd format
  ## https://docs.datadoghq.com/developers/metrics/types/?tab=distribution#definition
  datadog_distributions = true

  ## We do not need following tags(they may create tremendous of time-series under influxdb's logic)
  ## Examples:
  ## "runtime-id", "metric-type"
  drop_tags = [ ]

  ## All metric-name prefixed with 'jvm_' are set to influxdb's measurement 'jvm'
  ## All metric-name prefixed with 'stats_' are set to influxdb's measurement 'stats'
  ## Attention: Must add these word in statsd conf file.
  metric_mapping = ["cassandra_:cassandra", "jvm_:cassandra_jvm", "jmx_:cassandra_jmx", "datadog_:cassandra_datadog"]

  ## Number of UDP messages allowed to queue up, once filled,
  ## the statsd server will start dropping packets, default is 128.
  # allowed_pending_messages = 128

  ## Number of timing/histogram values to track per-measurement in the
  ## calculation of percentiles. Raising this limit increases the accuracy
  ## of percentiles but also increases the memory usage and cpu time.
  percentile_limit = 1000

  ## Max duration (TTL) for each metric to stay cached/reported without being updated.
  # max_ttl = "1000h"

  [inputs.statsd.tags]
    # some_tag = "some_value"
    # more_tag = "some_other_value"

Once configured, restart DataKit.


Metric

cassandra

  • Tags
Tag Description
columnfamily 'columnfamily'=batches 'columnfamily'=built_views 'columnfamily'=columns 'columnfamily'='paxos' 'columnfamily'=peer
host Host name.
instance Instance name.
jmx_domain JMX domain.
keyspace 'keyspace'=system 'keyspace'=system_schema
metric_type Metric type.
name Type name.
path path=request
runtime-id Runtime id.
scope scope=ReadStage scope=MutationStage scope=HintsDispatcher scope='MemtableFlushWriter' scope='MemtablePostFlush'
service Service name.
table table=IndexInfo,table=available_ranges,table=batches,table=built_views,
type Object type.
  • Metrics
Metric Description
active_tasks The number of tasks that the thread pool is actively executing.
Type: float
Unit: count
bloom_filter_false_ratio The ratio of Bloom filter false positives to total checks.
Type: float
Unit: count
bytes_flushed_count The amount of data that was flushed since (re)start.
Type: float
Unit: digital,B
cas_commit_latency_75th_percentile The latency of 'paxos' commit round - p75.
Type: float
Unit: time,ms
cas_commit_latency_95th_percentile The latency of 'paxos' commit round - p95.
Type: float
Unit: time,ms
cas_commit_latency_one_minute_rate The number of 'paxos' commit round per second.
Type: float
Unit: throughput,reqps
cas_prepare_latency_75th_percentile The latency of 'paxos' prepare round - p75.
Type: float
Unit: time,ms
cas_prepare_latency_95th_percentile The latency of 'paxos' prepare round - p95.
Type: float
Unit: time,ms
cas_prepare_latency_one_minute_rate The number of 'paxos' prepare round per second.
Type: float
Unit: throughput,reqps
cas_propose_latency_75th_percentile The latency of 'paxos' propose round - p75.
Type: float
Unit: time,ms
cas_propose_latency_95th_percentile The latency of 'paxos' propose round - p95.
Type: float
Unit: time,ms
cas_propose_latency_one_minute_rate The number of 'paxos' propose round per second.
Type: float
Unit: throughput,reqps
col_update_time_delta_histogram_75th_percentile The column update time delta - p75.
Type: float
Unit: time,ms
col_update_time_delta_histogram_95th_percentile The column update time delta - p95.
Type: float
Unit: time,ms
col_update_time_delta_histogram_min The column update time delta - min.
Type: float
Unit: time,ms
compaction_bytes_written_count The amount of data that was compacted since (re)start.
Type: float
Unit: digital,B
compression_ratio The compression ratio for all SSTables. A low value means a high compression contrary to what the name suggests. Formula used is: 'size of the compressed SSTable / size of original'
Type: float
Unit: percent,percent
currently_blocked_tasks The number of currently blocked tasks for the thread pool.
Type: float
Unit: count
currently_blocked_tasks_count The number of currently blocked tasks for the thread pool.
Type: float
Unit: count
db_droppable_tombstone_ratio The estimate of the droppable tombstone ratio.
Type: float
Unit: percent,percent
dropped_one_minute_rate The tasks dropped during execution for the thread pool.
Type: float
Unit: count
exceptions_count The number of exceptions thrown from 'Storage' metrics.
Type: float
Unit: count
key_cache_hit_rate The key cache hit rate.
Type: float
Unit: count
latency_75th_percentile The client request latency - p75.
Type: float
Unit: time,ms
latency_95th_percentile The client request latency - p95.
Type: float
Unit: time,ms
latency_one_minute_rate The number of client requests.
Type: float
Unit: throughput,reqps
live_disk_space_used_count The disk space used by live SSTables (only counts in use files).
Type: float
Unit: digital,B
live_ss_table_count Number of live (in use) SSTables.
Type: float
Unit: count
load_count The disk space used by live data on a node.
Type: float
Unit: digital,B
max_partition_size The size of the largest compacted partition.
Type: float
Unit: digital,B
max_row_size The size of the largest compacted row.
Type: float
Unit: digital,B
mean_partition_size The average size of compacted partition.
Type: float
Unit: digital,B
mean_row_size The average size of compacted rows.
Type: float
Unit: digital,B
metrics_75th_percentile Metrics - p75.
Type: float
Unit: count
metrics_95th_percentile Metrics - p95.
Type: float
Unit: count
metrics_count Metrics count.
Type: float
Unit: count
metrics_one_minute_rate The number of metrics.
Type: float
Unit: count
metrics_value Metrics value.
Type: float
Unit: count
net_down_endpoint_count The number of unhealthy nodes in the cluster. They represent each individual node's view of the cluster and thus should not be summed across reporting nodes.
Type: float
Unit: count
net_up_endpoint_count The number of healthy nodes in the cluster. They represent each individual node's view of the cluster and thus should not be summed across reporting nodes.
Type: float
Unit: count
nodetool_status_load Amount of file system data under the 'cassandra' data directory without snapshot content.
Type: float
Unit: digital,B
nodetool_status_owns Percentage of the data owned by the node per data center times the replication factor.
Type: float
Unit: percent,percent
nodetool_status_replication_availability Percentage of data available per 'keyspace' times replication factor.
Type: float
Unit: percent,percent
nodetool_status_replication_factor Replication factor per 'keyspace'.
Type: float
Unit: count
nodetool_status_status Node status: up (1) or down (0).
Type: float
Unit: bool
pending_compactions The number of pending compactions.
Type: float
Unit: count
pending_flushes_count The number of pending flushes.
Type: float
Unit: count
pending_tasks The number of pending tasks for the thread pool.
Type: float
Unit: count
range_latency_75th_percentile The local range request latency - p75.
Type: float
Unit: time,ms
range_latency_95th_percentile The local range request latency - p95.
Type: float
Unit: time,ms
range_latency_one_minute_rate The number of local range requests.
Type: float
Unit: throughput,reqps
read_latency_75th_percentile The local read latency - p75.
Type: float
Unit: time,ms
read_latency_95th_percentile The local read latency - p95.
Type: float
Unit: time,ms
read_latency_99th_percentile The local read latency - p99.
Type: float
Unit: time,ms
read_latency_one_minute_rate The number of local read requests.
Type: float
Unit: throughput,reqps
row_cache_hit_count The number of row cache hits.
Type: float
Unit: count
row_cache_hit_out_of_range_count The number of row cache hits that do not satisfy the query filter and went to disk.
Type: float
Unit: count
row_cache_miss_count The number of table row cache misses.
Type: float
Unit: count
snapshots_size The disk space truly used by snapshots.
Type: float
Unit: digital,B
ss_tables_per_read_histogram_75th_percentile The number of SSTable data files accessed per read - p75.
Type: float
Unit: count
ss_tables_per_read_histogram_95th_percentile The number of SSTable data files accessed per read - p95.
Type: float
Unit: count
timeouts_count Count of requests not acknowledged within configurable timeout window.
Type: float
Unit: count
timeouts_one_minute_rate Recent timeout rate, as an exponentially weighted moving average over a one-minute interval.
Type: float
Unit: count
tombstone_scanned_histogram_75th_percentile Number of tombstones scanned per read - p75.
Type: float
Unit: count
tombstone_scanned_histogram_95th_percentile Number of tombstones scanned per read - p95.
Type: float
Unit: count
total_blocked_tasks Total blocked tasks
Type: float
Unit: count
total_blocked_tasks_count Total count of blocked tasks
Type: float
Unit: count
total_commit_log_size The size used on disk by commit logs.
Type: float
Unit: digital,B
total_disk_space_used_count Total disk space used by SSTables including obsolete ones waiting to be garbage collected
Type: float
Unit: digital,B
view_lock_acquire_time_75th_percentile The time taken acquiring a partition lock for materialized view updates - p75.
Type: float
Unit: time,ms
view_lock_acquire_time_95th_percentile The time taken acquiring a partition lock for materialized view updates - p95.
Type: float
Unit: time,ms
view_lock_acquire_time_one_minute_rate The number of requests to acquire a partition lock for materialized view updates.
Type: float
Unit: count
view_read_time_75th_percentile The time taken during the local read of a materialized view update - p75.
Type: float
Unit: time,ms
view_read_time_95th_percentile The time taken during the local read of a materialized view update - p95.
Type: float
Unit: time,ms
view_read_time_one_minute_rate The number of local reads for materialized view updates.
Type: float
Unit: count
waiting_on_free_memtable_space_75th_percentile The time spent waiting for free mem table space either on- or off-heap - p75.
Type: float
Unit: time,ms
waiting_on_free_memtable_space_95th_percentile The time spent waiting for free mem table space either on- or off-heap - p95.
Type: float
Unit: time,ms
write_latency_75th_percentile The local write latency - p75.
Type: float
Unit: time,ms
write_latency_95th_percentile The local write latency - p95.
Type: float
Unit: time,ms
write_latency_99th_percentile The local write latency - p99.
Type: float
Unit: time,ms
write_latency_one_minute_rate The number of local write requests.
Type: float
Unit: throughput,reqps

cassandra_jvm

  • Tags
Tag Description
host Host name.
instance Instance name.
jmx_domain JMX domain.
metric_type Metric type.
name Type name.
runtime-id Runtime id.
service Service name.
type Object type.
  • Metrics
Metric Description
buffer_pool_direct_capacity Measure of total memory capacity of direct buffers.
Type: float
Unit: digital,B
buffer_pool_direct_count Number of direct buffers in the pool.
Type: float
Unit: count
buffer_pool_direct_used Measure of memory used by direct buffers.
Type: float
Unit: digital,B
buffer_pool_mapped_capacity Measure of total memory capacity of mapped buffers.
Type: float
Unit: digital,B
buffer_pool_mapped_count Number of mapped buffers in the pool.
Type: float
Unit: count
buffer_pool_mapped_used Measure of memory used by mapped buffers.
Type: float
Unit: digital,B
cpu_load_process Recent CPU utilization for the process.
Type: float
Unit: percent,percent
cpu_load_system Recent CPU utilization for the whole system.
Type: float
Unit: percent,percent
daemon_code_cache_used The number of daemon threads.
Type: float
Unit: count
daemon_thread_count Daemon thread count.
Type: float
Unit: count
gc_cms_count The total number of garbage collections that have occurred.
Type: float
Unit: count
gc_code_cache_used GC code cache used.
Type: float
Unit: count
gc_eden_size The 'eden' size in garbage collection.
Type: float
Unit: digital,B
gc_major_collection_count The rate of major garbage collections. Set new_gc_metrics: true to receive this metric.
Type: float
Unit: count
gc_major_collection_time The fraction of time spent in major garbage collection. Set new_gc_metrics: true to receive this metric.
Type: float
Unit: PPM
gc_metaspace_size The metaspace size in garbage collection.
Type: float
Unit: digital,B
gc_minor_collection_count The rate of minor garbage collections. Set new_gc_metrics: true to receive this metric.
Type: float
Unit: count
gc_minor_collection_time The fraction of time spent in minor garbage collection. Set new_gc_metrics: true to receive this metric.
Type: float
Unit: PPM
gc_old_gen_size The ond gen size in garbage collection.
Type: float
Unit: digital,B
gc_parnew_time The approximate accumulated garbage collection time elapsed.
Type: float
Unit: time,ms
gc_survivor_size The survivor size in garbage collection.
Type: float
Unit: digital,B
heap_memory The total Java heap memory used.
Type: float
Unit: digital,B
heap_memory_committed The total Java heap memory committed to be used.
Type: float
Unit: digital,B
heap_memory_init The initial Java heap memory allocated.
Type: float
Unit: digital,B
heap_memory_max The maximum Java heap memory available.
Type: float
Unit: digital,B
loaded_classes Number of classes currently loaded.
Type: float
Unit: count
non_heap_memory The total Java non-heap memory used. Non-heap memory is: Metaspace + CompressedClassSpace + CodeCache.
Type: float
Unit: digital,B
non_heap_memory_committed The total Java non-heap memory committed to be used.
Type: float
Unit: digital,B
non_heap_memory_init The initial Java non-heap memory allocated.
Type: float
Unit: digital,B
non_heap_memory_max The maximum Java non-heap memory available.
Type: float
Unit: digital,B
os_open_file_descriptors The number of file descriptors used by this process (only available for processes run as the dd-agent user)
Type: float
Unit: count
peak_thread_count The peak number of live threads.
Type: float
Unit: count
thread_count The number of live threads.
Type: float
Unit: count
total_thread_count The number of total threads.
Type: float
Unit: count

cassandra_jmx

  • Tags
Tag Description
host Host name.
instance Instance name.
jmx_domain JMX domain.
metric_type Metric type.
name Type name.
runtime-id Runtime id.
service Service name.
type Object type.
  • Metrics
Metric Description
gc_cms.count The total number of garbage collections that have occurred.
Type: float
Unit: count
gc_major_collection_count The rate of major garbage collections. Set new_gc_metrics: true to receive this metric.
Type: float
Unit: count
gc_major_collection_time The fraction of time spent in major garbage collection. Set new_gc_metrics: true to receive this metric.
Type: float
Unit: PPM
gc_minor_collection_count The rate of minor garbage collections. Set new_gc_metrics: true to receive this metric.
Type: float
Unit: count
gc_minor_collection_time The fraction of time spent in minor garbage collection. Set new_gc_metrics: true to receive this metric.
Type: float
Unit: PPM
gc_parnew.time The approximate accumulated garbage collection time elapsed.
Type: float
Unit: time,ms
heap_memory The total Java heap memory used.
Type: float
Unit: digital,B
heap_memory_committed The total Java heap memory committed to be used.
Type: float
Unit: digital,B
heap_memory_init The initial Java heap memory allocated.
Type: float
Unit: digital,B
heap_memory_max The maximum Java heap memory available.
Type: float
Unit: digital,B
non_heap_memory The total Java non-heap memory used. Non-heap memory is calculated as follows: 'Metaspace' + CompressedClassSpace + CodeCache
Type: float
Unit: digital,B
non_heap_memory_committed The total Java non-heap memory committed to be used.
Type: float
Unit: digital,B
non_heap_memory_init The initial Java non-heap memory allocated.
Type: float
Unit: digital,B
non_heap_memory_max The maximum Java non-heap memory available.
Type: float
Unit: digital,B
thread_count The number of live threads.
Type: float
Unit: count

cassandra_datadog

  • Tags
Tag Description
endpoint Endpoint.
host Host name.
lang Lang type.
lang_interpreter Lang interpreter.
lang_interpreter_vendor Lang interpreter vendor.
lang_version Lang version.
metric_type Metric type.
priority Priority.
service Service name.
stat Stat.
tracer_version Tracer version.
  • Metrics
Metric Description
tracer_agent_discovery_time Tracer agent discovery time.
Type: float
Unit: time,ms
tracer_api_errors_total Tracer api errors total.
Type: float
Unit: count
tracer_api_requests_total Tracer api requests total.
Type: float
Unit: count
tracer_flush_bytes_total Tracer flush bytes total.
Type: float
Unit: count
tracer_flush_traces_total Tracer flush traces total.
Type: float
Unit: count
tracer_queue_enqueued_bytes Tracer queue enqueued bytes.
Type: float
Unit: count
tracer_queue_enqueued_spans Tracer queue enqueued spans.
Type: float
Unit: count
tracer_queue_enqueued_traces Tracer queue enqueued traces.
Type: float
Unit: count
tracer_queue_max_length Tracer queue max length.
Type: float
Unit: count
tracer_scope_activate_count Tracer scope activate count.
Type: float
Unit: count
tracer_scope_close_count Tracer scope close count.
Type: float
Unit: count
tracer_span_pending_created Tracer span pending created.
Type: float
Unit: count
tracer_span_pending_finished Tracer span pending finished.
Type: float
Unit: count
tracer_trace_agent_discovery_time Tracer trace agent discovery time.
Type: float
Unit: count
tracer_trace_agent_send_time Tracer trace agent send time.
Type: float
Unit: count
tracer_trace_pending_created Tracer trace pending created.
Type: float
Unit: count
tracer_tracer_trace_buffer_fill_time Tracer trace buffer fill time.
Type: float
Unit: count

Feedback

Is this page helpful? ×