跳转至

Doris

·


Doris 采集器用于采集 Doris 相关的指标数据,目前只支持 Prometheus 格式的数据

配置

已测试的版本:

  • 2.0.0

前置条件

Doris 默认开启 Prometheus 端口

验证前端:curl ip:8030/metrics

验证后端:curl ip:8040/metrics

采集器配置

进入 DataKit 安装目录下的 conf.d/db 目录,复制 doris.conf.sample 并命名为 doris.conf。示例如下:

[[inputs.prom]]
  ## Collector alias.
  source = "doris"

  ## (Optional) Collect interval: (defaults to "30s").
  # interval = "15s"

  ## Exporter URLs.
  urls = ["http://127.0.0.1:8030/metrics","http://127.0.0.1:8040/metrics"]

  ## Stream Size. 
  ## The source stream segmentation size.
  ## Default 1, source stream undivided. 
  stream_size = 0

  ## TLS configuration.
  tls_open = false
  # tls_ca = "/tmp/ca.crt"
  # tls_cert = "/tmp/peer.crt"
  # tls_key = "/tmp/peer.key"

  ## Set to 'true' to enable election.
  election = true

  ## disable setting host tag for this input
  disable_host_tag = false

  ## disable setting instance tag for this input
  disable_instance_tag = false

  ## Measurement name.
  ## If measurement_name is empty, split metric name by '_', the first field after split as measurement set name, the rest as current metric name.
  ## If measurement_name is not empty, using this as measurement set name.
  ## Always add 'measurement_prefix' prefix at last.
  measurement_name = "doris_common"

## Customize measurement set name.
## Treat those metrics with prefix as one set.
## Prioritier over 'measurement_name' configuration.
[[inputs.prom.measurements]]
  prefix = "doris_fe_"
  name = "doris_fe"

[[inputs.prom.measurements]]
  prefix = "doris_be_"
  name = "doris_be"

[[inputs.prom.measurements]]
  prefix = "jvm_"
  name = "doris_jvm"

## Customize tags.
# [inputs.prom.tags]
  # some_tag = "some_value"
  # more_tag = "some_other_value"

配置好后,重启 DataKit 即可。

指标

doris_fe

  • 标签
Tag Description
catalog Catalog.
host Host name.
instance Instance endpoint.
job Job type.
method Method type.
name Metric name.
quantile quantile.
state State.
type Metric type.
  • 指标列表
Metric Description Type Unit
cache_added Cumulative value of the number. float count
cache_hit Count of cache hits. float count
connection_total Current number of FE MySQL port connections. float count
counter_hit_sql_block_rule Number of queries blocked by SQL BLOCK RULE. float count
edit_log Value of metadata log. float count
edit_log_clean The number of times the historical metadata log was cleared. float count
editlog_write_latency_ms metadata log write latency . For example, {quantile=0.75} indicates the 75th percentile write latency . float ms
external_schema_cache SpecifyExternal Catalog _ The number of corresponding schema caches. float count
hive_meta_cache Specify External Hive Meta store Catalog The number of corresponding partition caches. float count
image_clean The number of times cleaning of historical metadata image files. float count
image_push The number of times cleaning of historical metadata image files. float count
image_write The Number of to generate metadata image files. float count
job Current count of different job types and different job statuses. For example, {job=load, type=INSERT, state=LOADING} represents an import job of type INSERT and the number of jobs in the LOADING state. float count
max_journal_id The maximum metadata log ID of the current FE node . If it is Master FE , it is the maximum ID currently written , if it is a non- Master FE , represents the maximum ID of the metadata log currently being played back. float count
max_tablet_compaction_score The largest compaction score value among all BE nodes. float percent
publish_txn_num Specify the number of transactions being published by the DB . For example, { db =test} indicates the number of transactions currently being published by DB test. float count
qps Current number of FE queries per second ( only query requests are counted ). float req/s
query_err Value of error query. float count
query_err_rate Error queries per second. float req/s
query_instance_begin Specify the fragment where the user request starts Number of instances . For example, {user=test_u} represents the user test_u Number of instances to start requesting. float count
query_instance_num Specifies the fragment that the user is currently requesting Number of instances . For example, {user=test_u} represents the user test_u The number of instances currently being requested. float count
query_latency_ms Percentile statistics of query request latency. For example, {quantile=0.75} indicates the query delay at the 75th percentile. float ms
query_latency_ms_db Percentile statistics of query request delay of each DB . For example, {quantile=0.75,db=test} indicates the query delay of the 75th percentile of DB test. float ms
query_olap_table The statistics of the number of requests for the internal table ( OlapTable ). float count
query_rpc_failed RPC failures sent to the specified BE . For example, { be=192.168.10.1} indicates the number of RPC failures sent to BE with IP address 192.168.10.1. float count
query_rpc_size Specify the RPC data size of BE . For example, { be=192.168.10.1} indicates the number of RPC data bytes sent to BE with IP address 192.168.10.1. float count
query_rpc_total Of RPCs sent to the specified BE . For example, { be=192.168.10.1} indicates the number of RPCs sent to BE with IP address 192.168.10.1. float count
query_total All query requests. float count
report_queue_size The queue length of various periodic reporting tasks of BE on the FE side. float count
request_total All operation requests received through the MySQL port (including queries and other statements ). float count
routine_load_error_rows Count the total number of error rows for all Routine Load jobs in the cluster. float count
routine_load_receive_bytes The amount of data received by all Routine Load jobs in the cluster. float B
routine_load_rows Count the number of data rows received by all Routine Load jobs in the cluster. float count
rps Current number of FE requests per second (including queries and other types of statements ). float count
scheduled_tablet_num Tablets being scheduled by the Master FE node . Includes replicas being repaired and replicas being balanced. float count
tablet_max_compaction_score The compaction core reported by each BE node . For example, { backend=172.21.0.1:9556} represents the reported value of BE 172.21.0.1:9556. float percent
tablet_num Current total number of tablets on each BE node . For example, {backend=172.21.0.1:9556} indicates the current number of tablets of the BE 172.21.0.1:9556. float count
tablet_status_count Statistics Master FE node The cumulative value of the number of tablets scheduled by the tablet scheduler. float count
thread_pool Count the number of working threads and queuing status of various thread pools . active_thread_num Indicates the number of tasks being executed . pool_size Indicates the total number of threads in the thread pool . task_in_queue Indicates the number of tasks being queued. float count
thrift_rpc_latency_ms The RPC requests received by each method of the FE thrift interface take time. For example, {method=report} indicates that the RPC request received by the report method takes time. float ms
thrift_rpc_total RPC requests received by each method of the FE thrift interface . For example, {method=report} indicates the number of RPC requests received by the report method. float count
txn_counter Value of the number of imported transactions in each status. float count
txn_exec_latency_ms Percentile statistics of transaction execution time. For example, {quantile=0.75} indicates the 75th percentile transaction execution time. float ms
txn_num Specifies the number of transactions being performed by the DB . For example, { db =test} indicates the number of transactions currently being executed by DB test. float count
txn_publish_latency_ms Percentile statistics of transaction publish time. For example, {quantile=0.75} indicates that the 75th percentile transaction publish time is. float ms
txn_replica_num Specifies the number of replicas opened by the transaction being executed by the DB . For example, { db =test} indicates the number of copies opened by the transaction currently being executed by DB test. float count
txn_status Count the number of import transactions currently in various states. For example, {type=committed} indicates the number of transactions in the committed state. float count

doris_be

  • 标签
Tag Description
device Device name.
host Host name.
instance Instance endpoint.
mode Metric mode.
name Metric name.
path File path.
quantile quantile.
status Metric status.
type Metric type.
  • 指标列表
Metric Description Type Unit
active_scan_context_count The number of scanners currently opened directly from the outside. float count
add_batch_task_queue_size When recording import, the queue size of the thread pool that receives the batch. float count
agent_task_queue_size Display the length of each Agent Task processing queue, such as {type=CREATE_TABLE} Indicates the length of the CREATE_TABLE task queue. float count
all_rowsets_num All currently rowset number of. float count
all_segments_num The number of all current segments. float count
brpc_endpoint_stub_count Created _ The number of brpc stubs used for interaction between BEs. float count
brpc_function_endpoint_stub_count Created _ The number of brpc stubs used to interact with Remote RPC. float count
cache_capacity Record the capacity of the specified LRU Cache. float B
cache_hit_count Record the number of hits in the specified LRU Cache. float count
cache_hit_ratio Record the hit rate of the specified LRU Cache. float percent
cache_lookup_count Record the number of times the specified LRU Cache is searched. float B
cache_usage Record the usage of the specified LRU Cache. float B
cache_usage_ratio Record the usage of the specified LRU Cache. float percent
chunk_pool_local_core_alloc_count ChunkAllocator , the number of times memory is allocated from the memory queue of the bound core. float count
chunk_pool_other_core_alloc_count ChunkAllocator , the number of times memory is allocated from the memory queue of other cores. float count
chunk_pool_reserved_bytes ChunkAllocator The amount of memory reserved in. float B
chunk_pool_system_alloc_cost_ns SystemAllocator The cumulative value of time spent applying for memory. float ns
chunk_pool_system_alloc_count SystemAllocator Number of times to apply for memory. float count
chunk_pool_system_free_cost_ns SystemAllocator Cumulative value of time taken to release memory. float ns
chunk_pool_system_free_count SystemAllocator The number of times memory is released. float count
compaction_bytes_total Value of the amount of data processed by compaction. float B
compaction_deltas_total Processed by compaction rowset The cumulative value of the number. float count
compaction_used_permits The number of tokens used by the Compaction task. float count
compaction_waitting_permits Compaction tokens awaiting. float count
cpu CPU related metrics metrics, from /proc/stat collection. Each value of each logical core will be collected separately . like {device=cpu0,mode =nice} Indicates the nice value of cpu0. float count
data_stream_receiver_count Number of data receiving terminals Receiver. float count
disk_bytes_read The cumulative value of disk reads. from /proc/ diskstats collection. The values of each disk will be collected separately . like {device=vdd} express vvd disk value. float B
disk_bytes_written The cumulative value of disk writes. float B
disk_io_time_ms The dis io time. float ms
disk_io_time_weighted The dis io time weighted. float ms
disk_read_time_ms The dis reads time. float ms
disk_reads_completed The dis reads completed. float B
disk_write_time_ms The disk write time. float ms
disk_writes_completed The disk writes completed. float B
disks_avail_capacity Specify the remaining space on the disk where the specified data directory is located. like {path=path1} express /path1 The remaining space on the disk where the directory is located. float B
disks_compaction_num Compaction tasks being executed on the specified data directory . like {path=path1} means /path1 The number of tasks being executed on the directory. float count
disks_compaction_score Specifies the number of compaction tokens being executed on the data directory. like {path=path1} means /path1 Number of tokens being executed on the directory. float percent
disks_local_used_capacity The specified data directory is located. float B
disks_remote_used_capacity The specified data directory is located. float B
disks_state Specifies the disk status of the data directory . 1 means normal. 0 means abnormal. float bool
disks_total_capacity Capacity of the disk where the specified data directory is located. float B
engine_requests_total Engine_requests total on BE. float count
fd_num_limit System file handle limit upper limit. from /proc/sys/fs/file-nr collection. float count
fd_num_used The number of file handles used by the system . from /proc/sys/fs/file-nr collection. float count
file_created_total Cumulative number of local file creation times. float count
fragment_endpoint_count Value of various task execution statuses on BE. float count
fragment_instance_count The number of fragment instances currently received. float count
fragment_request_duration_us All fragment intance The cumulative execution time of. float μs
fragment_requests_total The cumulative number of executed fragment instances. float count
fragment_thread_pool_queue_size Current query execution thread pool waiting queue. float count
heavy_work_active_threads Number of active threads in heavy thread pool. float count
heavy_work_max_threads Number of heavy thread pool threads. float count
heavy_work_pool_queue_size The maximum length of the heavy thread pool queue will block the submission of work if it exceeds it. float count
light_work_active_threads Number of active threads in light thread pool. float count
light_work_max_threads Number of light thread pool threads. float count
light_work_pool_queue_size The maximum length of the light thread pool queue . If it exceeds the maximum length, the submission of work will be blocked. float count
load_average Machine Load Avg Metric metrics. For example, {mode=15_minutes} is 15 minutes Load Avg. float count
load_bytes Cumulative quantity sent through tablet Sink. float B
load_channel_count The number of load channels currently open. float count
load_rows Cumulative number of rows sent through tablet Sink. float count
local_bytes_read_total Depend on LocalFileReader Number of bytes read. float B
local_bytes_written_total Depend on LocalFileWriter Number of bytes written. float B
local_file_open_reading Currently open LocalFileReader number. float count
local_file_reader_total Opened LocalFileReader Cumulative count of. float count
local_file_writer_total Opened LocalFileWriter cumulative count. float count
max_disk_io_util_percent value of the disk with the largest IO UTIL among all disks. float percent
max_network_receive_bytes_rate The maximum receive rate calculated among all network cards. float B/S
max_network_send_bytes_rate The calculated maximum sending rate among all network cards. float B/S
mem_consumption Specifies the current memory overhead of the module . For example, {type=compaction} represents the current total memory overhead of the compaction module. float B
memory_allocated_bytes BE process physical memory size, taken from /proc/self/status/ VmRSS. float B
memory_jemalloc Jemalloc stats, taken from je_mallctl. float B
memory_pgpgin The amount of data written by the system from disk to memory page. float B
memory_pgpgout The amount of data written to disk by system memory pages. float B
memory_pool_bytes_total all MemPool The size of memory currently occupied. Statistical value, does not represent actual memory usage. float B
memory_pswpin The number of times the system swapped from disk to memory. float B
memory_pswpout The number of times the system swapped from memory to disk. float B
memtable_flush_duration_us value of the time taken to write memtable to disk. float μs
memtable_flush_total number of memtable writes to disk. float count
meta_request_duration Access RocksDB The cumulative time consumption of meta in. float μs
meta_request_total Access RocksDB The cumulative number of meta requests. float count
network_receive_bytes each network card are accumulated. Collected from /proc/net/dev. float B
network_receive_packets each network card is accumulated. Collected from /proc/net/dev. float count
network_send_bytes each network card . Collected from /proc/net/dev. float B
network_send_packets The total number of packets sent by each network card is accumulated. Collected from /proc/net/dev. float count
proc The number of processes currently . float count
process_fd_num_limit_hard BE process. pass /proc/ pid /limits collection. float count
process_fd_num_limit_soft BE process. pass /proc/ pid /limits collection. float count
process_fd_num_used The number of file handles used by the BE process. pass /proc/ pid /limits collection. float count
process_thread_num BE process threads. pass /proc/ pid /task collection. float count
query_cache_memory_total_byte Number of bytes occupied by Query Cache. float B
query_cache_partition_total_count Current number of Partition Cache caches. float count
query_cache_sql_total_count Current number of SQL Cache caches. float count
query_scan_bytes Read the cumulative value of the data amount. Here we only count reads Olap The amount of data in the table. float B
query_scan_bytes_per_second According to doris_be_query_scan_bytes Calculated read rate. float B/S
query_scan_rows Read the cumulative value of the number of rows. Here we only count reads Olap The amount of data in the table. and is RawRowsRead (Some data rows may be skipped by the index and not actually read, but will still be recorded in this value ). float count
result_block_queue_count The number of fragment instances in the current query result cache. float count
result_buffer_block_count The number of queries in the current query result cache. float count
routine_load_task_count The number of routine load tasks currently being executed. float count
rowset_count_generated_and_in_use New and in use since the last startup The number of rowset ids. float count
s3_bytes_read_total S3FileReader The cumulative number. float count
s3_file_open_reading currently open S3FileReader number. float count
scanner_thread_pool_queue_size used for OlapScanner The current queued number of thread pools. float B
segment_read Value of the number of segments read. float count
send_batch_thread_pool_queue_size The number of queues in the thread pool used to send data packets when importing. float count
send_batch_thread_pool_thread_num The number of threads in the thread pool used to send packets when importing. float count
small_file_cache_count Currently cached by BE. float count
snmp_tcp_in_errs tcp packet reception errors. Collected from /proc/net/ SNMP. float count
snmp_tcp_in_segs tcp packets sent . Collected from /proc/net/ SNMP. float count
snmp_tcp_out_segs tcp packets sent. Collected from /proc/net/ SNMP. float count
snmp_tcp_retrans_segs TCP packet retransmissions . Collected from /proc/net/ SNMP. float count
stream_load Value of the number received by stream load. float count
stream_load_pipe_count Current stream load data pipelines. float count
stream_load_txn_request Value of the number of transactions by stream load. float count
streaming_load_current_processing Number of stream load tasks currently running. float count
streaming_load_duration_ms The cumulative value of the execution time of all stream load tasks. float ms
streaming_load_requests_total Value of the number of stream load tasks. float count
tablet_base_max_compaction_score The current largest Base Compaction Score. float percent
tablet_cumulative_max_compaction_score Same as above. Current largest Cumulative Compaction Score. float percent
tablet_version_num_distribution The histogram of the number of tablet versions. float count
thrift_connections_total Thrift connections created . like {name=heartbeat} Indicates the cumulative number of connections to the heartbeat service. float count
thrift_current_connections Current number of thrift connections. like {name=heartbeat} Indicates the current number of connections to the heartbeat service. float count
thrift_opened_clients Thrift clients currently open . like {name=frontend} Indicates the number of clients accessing the FE service. float count
thrift_used_clients Thrift clients currently in use . like {name=frontend} Indicates the number of clients being used to access the FE service. float count
timeout_canceled_fragment_count Cumulative value of the number of fragment instances canceled due to timeout. float count
unused_rowsets_count The number of currently abandoned rowsets. float count
upload_fail_count Cumulative value of rowset failed to be uploaded to remote storage. float count
upload_rowset_count Cumulative number of rowsets successfully uploaded to remote storage. float count
upload_total_byte Value of rowset data successfully uploaded to remote storage. float B

doris_common

  • 标签
Tag Description
host Host name.
instance Instance endpoint.
name Metric name.
state Metric state.
type Metric type.
  • 指标列表
Metric Description Type Unit
node_info Node_number. float count
system_meminfo FE node machines. Collected from /proc/meminfo . include buffers , cached , memory_available , memory_free , memory_total. float B
system_snmp FE node machines. Collected from /proc/net/ SNMP. float count

doris_jvm

  • 标签
Tag Description
host Host name.
instance Instance endpoint.
type Metric type.
  • 指标列表
Metric Description Type Unit
heap_size_bytes JVM memory metrics. The tags include max, used, committed , corresponding to the maximum value, used and requested memory respectively. float B
non_heap_size_bytes JVM off-heap memory statistics. float B
old_gc Cumulative value of GC. float count
old_size_bytes JVM old generation memory statistics. float B
thread JVM thread count statistics. float count
young_size_bytes JVM new generation memory statistics. float B

文档评价

文档内容是否对您有帮助? ×