Doris
Doris 采集器用于采集 Doris 相关的指标数据,目前只支持 Prometheus 格式的数据
配置¶
已测试的版本:
- 2.0.0
前置条件¶
Doris 默认开启 Prometheus 端口
验证前端:curl ip:8030/metrics
验证后端:curl ip:8040/metrics
采集器配置¶
进入 DataKit 安装目录下的 conf.d/db
目录,复制 doris.conf.sample
并命名为 doris.conf
。示例如下:
[[inputs.prom]]
## Collector alias.
source = "doris"
## (Optional) Collect interval: (defaults to "30s").
# interval = "15s"
## Exporter URLs.
urls = ["http://127.0.0.1:8030/metrics","http://127.0.0.1:8040/metrics"]
## Stream Size.
## The source stream segmentation size.
## Default 1, source stream undivided.
stream_size = 0
## TLS configuration.
tls_open = false
# tls_ca = "/tmp/ca.crt"
# tls_cert = "/tmp/peer.crt"
# tls_key = "/tmp/peer.key"
## Set to 'true' to enable election.
election = true
## disable setting host tag for this input
disable_host_tag = false
## disable setting instance tag for this input
disable_instance_tag = false
## Measurement name.
## If measurement_name is empty, split metric name by '_', the first field after split as measurement set name, the rest as current metric name.
## If measurement_name is not empty, using this as measurement set name.
## Always add 'measurement_prefix' prefix at last.
measurement_name = "doris_common"
## Customize measurement set name.
## Treat those metrics with prefix as one set.
## Prioritier over 'measurement_name' configuration.
[[inputs.prom.measurements]]
prefix = "doris_fe_"
name = "doris_fe"
[[inputs.prom.measurements]]
prefix = "doris_be_"
name = "doris_be"
[[inputs.prom.measurements]]
prefix = "jvm_"
name = "doris_jvm"
## Customize tags.
# [inputs.prom.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
配置好后,重启 DataKit 即可。
可通过 ConfigMap 方式注入采集器配置 或 配置 ENV_DATAKIT_INPUTS 开启采集器。
指标¶
doris_fe
¶
- 标签
Tag | Description |
---|---|
catalog | Catalog. |
host | Host name. |
instance | Instance endpoint. |
job | Job type. |
method | Method type. |
name | Metric name. |
quantile | quantile. |
state | State. |
type | Metric type. |
- 指标列表
Metric | Description |
---|---|
cache_added | Cumulative value of the number. Type: float Unit: count |
cache_hit | Count of cache hits. Type: float Unit: count |
connection_total | Current number of FE MySQL port connections. Type: float Unit: count |
counter_hit_sql_block_rule | Number of queries blocked by SQL BLOCK RULE. Type: float Unit: count |
edit_log | Value of metadata log. Type: float Unit: count |
edit_log_clean | The number of times the historical metadata log was cleared. Type: float Unit: count |
editlog_write_latency_ms | metadata log write latency . For example, {quantile=0.75} indicates the 75th percentile write latency . Type: float Unit: time,ms |
external_schema_cache | SpecifyExternal Catalog _ The number of corresponding schema caches. Type: float Unit: count |
hive_meta_cache | Specify External Hive Meta store Catalog The number of corresponding partition caches. Type: float Unit: count |
image_clean | The number of times cleaning of historical metadata image files. Type: float Unit: count |
image_push | The number of times cleaning of historical metadata image files. Type: float Unit: count |
image_write | The Number of to generate metadata image files. Type: float Unit: count |
job | Current count of different job types and different job statuses. For example, {job=load, type=INSERT, state=LOADING} represents an import job of type INSERT and the number of jobs in the LOADING state. Type: float Unit: count |
max_journal_id | The maximum metadata log ID of the current FE node . If it is Master FE , it is the maximum ID currently written , if it is a non- Master FE , represents the maximum ID of the metadata log currently being played back. Type: float Unit: count |
max_tablet_compaction_score | The largest compaction score value among all BE nodes. Type: float Unit: percent,percent |
publish_txn_num | Specify the number of transactions being published by the DB . For example, { db =test} indicates the number of transactions currently being published by DB test. Type: float Unit: count |
qps | Current number of FE queries per second ( only query requests are counted ). Type: float Unit: throughput,reqps |
query_err | Value of error query. Type: float Unit: count |
query_err_rate | Error queries per second. Type: float Unit: throughput,reqps |
query_instance_begin | Specify the fragment where the user request starts Number of instances . For example, {user=test_u} represents the user test_u Number of instances to start requesting. Type: float Unit: count |
query_instance_num | Specifies the fragment that the user is currently requesting Number of instances . For example, {user=test_u} represents the user test_u The number of instances currently being requested. Type: float Unit: count |
query_latency_ms | Percentile statistics of query request latency. For example, {quantile=0.75} indicates the query delay at the 75th percentile. Type: float Unit: time,ms |
query_latency_ms_db | Percentile statistics of query request delay of each DB . For example, {quantile=0.75,db=test} indicates the query delay of the 75th percentile of DB test. Type: float Unit: time,ms |
query_olap_table | The statistics of the number of requests for the internal table ( OlapTable ).Type: float Unit: count |
query_rpc_failed | RPC failures sent to the specified BE . For example, { be=192.168.10.1} indicates the number of RPC failures sent to BE with IP address 192.168.10.1. Type: float Unit: count |
query_rpc_size | Specify the RPC data size of BE . For example, { be=192.168.10.1} indicates the number of RPC data bytes sent to BE with IP address 192.168.10.1. Type: float Unit: count |
query_rpc_total | Of RPCs sent to the specified BE . For example, { be=192.168.10.1} indicates the number of RPCs sent to BE with IP address 192.168.10.1. Type: float Unit: count |
query_total | All query requests. Type: float Unit: count |
report_queue_size | The queue length of various periodic reporting tasks of BE on the FE side. Type: float Unit: count |
request_total | All operation requests received through the MySQL port (including queries and other statements ). Type: float Unit: count |
routine_load_error_rows | Count the total number of error rows for all Routine Load jobs in the cluster. Type: float Unit: count |
routine_load_receive_bytes | The amount of data received by all Routine Load jobs in the cluster. Type: float Unit: digital,B |
routine_load_rows | Count the number of data rows received by all Routine Load jobs in the cluster. Type: float Unit: count |
rps | Current number of FE requests per second (including queries and other types of statements ). Type: float Unit: count |
scheduled_tablet_num | Tablets being scheduled by the Master FE node . Includes replicas being repaired and replicas being balanced. Type: float Unit: count |
tablet_max_compaction_score | The compaction core reported by each BE node . For example, { backend=172.21.0.1:9556} represents the reported value of BE 172.21.0.1:9556. Type: float Unit: percent,percent |
tablet_num | Current total number of tablets on each BE node . For example, {backend=172.21.0.1:9556} indicates the current number of tablets of the BE 172.21.0.1:9556. Type: float Unit: count |
tablet_status_count | Statistics Master FE node The cumulative value of the number of tablets scheduled by the tablet scheduler. Type: float Unit: count |
thread_pool | Count the number of working threads and queuing status of various thread pools . active_thread_num Indicates the number of tasks being executed . pool_size Indicates the total number of threads in the thread pool . task_in_queue Indicates the number of tasks being queued. Type: float Unit: count |
thrift_rpc_latency_ms | The RPC requests received by each method of the FE thrift interface take time. For example, {method=report} indicates that the RPC request received by the report method takes time. Type: float Unit: time,ms |
thrift_rpc_total | RPC requests received by each method of the FE thrift interface . For example, {method=report} indicates the number of RPC requests received by the report method. Type: float Unit: count |
txn_counter | Value of the number of imported transactions in each status. Type: float Unit: count |
txn_exec_latency_ms | Percentile statistics of transaction execution time. For example, {quantile=0.75} indicates the 75th percentile transaction execution time. Type: float Unit: time,ms |
txn_num | Specifies the number of transactions being performed by the DB . For example, { db =test} indicates the number of transactions currently being executed by DB test. Type: float Unit: count |
txn_publish_latency_ms | Percentile statistics of transaction publish time. For example, {quantile=0.75} indicates that the 75th percentile transaction publish time is. Type: float Unit: time,ms |
txn_replica_num | Specifies the number of replicas opened by the transaction being executed by the DB . For example, { db =test} indicates the number of copies opened by the transaction currently being executed by DB test. Type: float Unit: count |
txn_status | Count the number of import transactions currently in various states. For example, {type=committed} indicates the number of transactions in the committed state. Type: float Unit: count |
doris_be
¶
- 标签
Tag | Description |
---|---|
device | Device name. |
host | Host name. |
instance | Instance endpoint. |
mode | Metric mode. |
name | Metric name. |
path | File path. |
quantile | quantile. |
status | Metric status. |
type | Metric type. |
- 指标列表
Metric | Description |
---|---|
active_scan_context_count | The number of scanners currently opened directly from the outside. Type: float Unit: count |
add_batch_task_queue_size | When recording import, the queue size of the thread pool that receives the batch. Type: float Unit: count |
agent_task_queue_size | Display the length of each Agent Task processing queue, such as {type=CREATE_TABLE} Indicates the length of the CREATE_TABLE task queue. Type: float Unit: count |
all_rowsets_num | All currently rowset number of.Type: float Unit: count |
all_segments_num | The number of all current segments. Type: float Unit: count |
brpc_endpoint_stub_count | Created _ The number of brpc stubs used for interaction between BEs.Type: float Unit: count |
brpc_function_endpoint_stub_count | Created _ The number of brpc stubs used to interact with Remote RPC.Type: float Unit: count |
cache_capacity | Record the capacity of the specified LRU Cache. Type: float Unit: digital,B |
cache_hit_count | Record the number of hits in the specified LRU Cache. Type: float Unit: count |
cache_hit_ratio | Record the hit rate of the specified LRU Cache. Type: float Unit: percent,percent |
cache_lookup_count | Record the number of times the specified LRU Cache is searched. Type: float Unit: digital,B |
cache_usage | Record the usage of the specified LRU Cache. Type: float Unit: digital,B |
cache_usage_ratio | Record the usage of the specified LRU Cache. Type: float Unit: percent,percent |
chunk_pool_local_core_alloc_count | ChunkAllocator , the number of times memory is allocated from the memory queue of the bound core. Type: float Unit: count |
chunk_pool_other_core_alloc_count | ChunkAllocator , the number of times memory is allocated from the memory queue of other cores. Type: float Unit: count |
chunk_pool_reserved_bytes | ChunkAllocator The amount of memory reserved in. Type: float Unit: digital,B |
chunk_pool_system_alloc_cost_ns | SystemAllocator The cumulative value of time spent applying for memory. Type: float Unit: time,ns |
chunk_pool_system_alloc_count | SystemAllocator Number of times to apply for memory. Type: float Unit: count |
chunk_pool_system_free_cost_ns | SystemAllocator Cumulative value of time taken to release memory. Type: float Unit: time,ns |
chunk_pool_system_free_count | SystemAllocator The number of times memory is released. Type: float Unit: count |
compaction_bytes_total | Value of the amount of data processed by compaction. Type: float Unit: digital,B |
compaction_deltas_total | Processed by compaction rowset The cumulative value of the number.Type: float Unit: count |
compaction_used_permits | The number of tokens used by the Compaction task. Type: float Unit: count |
compaction_waitting_permits | Compaction tokens awaiting. Type: float Unit: count |
cpu | CPU related metrics metrics, from /proc/stat collection. Each value of each logical core will be collected separately . like {device=cpu0,mode =nice} Indicates the nice value of cpu0. Type: float Unit: count |
data_stream_receiver_count | Number of data receiving terminals Receiver. Type: float Unit: count |
disk_bytes_read | The cumulative value of disk reads. from /proc/ diskstats collection. The values of each disk will be collected separately . like {device=vdd} express vvd disk value.Type: float Unit: digital,B |
disk_bytes_written | The cumulative value of disk writes. Type: float Unit: digital,B |
disk_io_time_ms | The dis io time. Type: float Unit: time,ms |
disk_io_time_weighted | The dis io time weighted. Type: float Unit: time,ms |
disk_read_time_ms | The dis reads time. Type: float Unit: time,ms |
disk_reads_completed | The dis reads completed. Type: float Unit: digital,B |
disk_write_time_ms | The disk write time. Type: float Unit: time,ms |
disk_writes_completed | The disk writes completed. Type: float Unit: digital,B |
disks_avail_capacity | Specify the remaining space on the disk where the specified data directory is located. like {path=path1} express /path1 The remaining space on the disk where the directory is located. Type: float Unit: digital,B |
disks_compaction_num | Compaction tasks being executed on the specified data directory . like {path=path1} means /path1 The number of tasks being executed on the directory. Type: float Unit: count |
disks_compaction_score | Specifies the number of compaction tokens being executed on the data directory. like {path=path1} means /path1 Number of tokens being executed on the directory. Type: float Unit: percent,percent |
disks_local_used_capacity | The specified data directory is located. Type: float Unit: digital,B |
disks_remote_used_capacity | The specified data directory is located. Type: float Unit: digital,B |
disks_state | Specifies the disk status of the data directory . 1 means normal. 0 means abnormal. Type: float Unit: bool |
disks_total_capacity | Capacity of the disk where the specified data directory is located. Type: float Unit: digital,B |
engine_requests_total | Engine_requests total on BE. Type: float Unit: count |
fd_num_limit | System file handle limit upper limit. from /proc/sys/fs/file-nr collection. Type: float Unit: count |
fd_num_used | The number of file handles used by the system . from /proc/sys/fs/file-nr collection. Type: float Unit: count |
file_created_total | Cumulative number of local file creation times. Type: float Unit: count |
fragment_endpoint_count | Value of various task execution statuses on BE. Type: float Unit: count |
fragment_instance_count | The number of fragment instances currently received. Type: float Unit: count |
fragment_request_duration_us | All fragment intance The cumulative execution time of.Type: float Unit: time,μs |
fragment_requests_total | The cumulative number of executed fragment instances. Type: float Unit: count |
fragment_thread_pool_queue_size | Current query execution thread pool waiting queue. Type: float Unit: count |
heavy_work_active_threads | Number of active threads in heavy thread pool. Type: float Unit: count |
heavy_work_max_threads | Number of heavy thread pool threads. Type: float Unit: count |
heavy_work_pool_queue_size | The maximum length of the heavy thread pool queue will block the submission of work if it exceeds it. Type: float Unit: count |
light_work_active_threads | Number of active threads in light thread pool. Type: float Unit: count |
light_work_max_threads | Number of light thread pool threads. Type: float Unit: count |
light_work_pool_queue_size | The maximum length of the light thread pool queue . If it exceeds the maximum length, the submission of work will be blocked. Type: float Unit: count |
load_average | Machine Load Avg Metric metrics. For example, {mode=15_minutes} is 15 minutes Load Avg. Type: float Unit: count |
load_bytes | Cumulative quantity sent through tablet Sink. Type: float Unit: digital,B |
load_channel_count | The number of load channels currently open. Type: float Unit: count |
load_rows | Cumulative number of rows sent through tablet Sink. Type: float Unit: count |
local_bytes_read_total | Depend on LocalFileReader Number of bytes read. Type: float Unit: digital,B |
local_bytes_written_total | Depend on LocalFileWriter Number of bytes written. Type: float Unit: digital,B |
local_file_open_reading | Currently open LocalFileReader number. Type: float Unit: count |
local_file_reader_total | Opened LocalFileReader Cumulative count of. Type: float Unit: count |
local_file_writer_total | Opened LocalFileWriter cumulative count. Type: float Unit: count |
max_disk_io_util_percent | value of the disk with the largest IO UTIL among all disks. Type: float Unit: percent,percent |
max_network_receive_bytes_rate | The maximum receive rate calculated among all network cards. Type: float Unit: traffic,B/S |
max_network_send_bytes_rate | The calculated maximum sending rate among all network cards. Type: float Unit: traffic,B/S |
mem_consumption | Specifies the current memory overhead of the module . For example, {type=compaction} represents the current total memory overhead of the compaction module. Type: float Unit: digital,B |
memory_allocated_bytes | BE process physical memory size, taken from /proc/self/status/ VmRSS. Type: float Unit: digital,B |
memory_jemalloc | Jemalloc stats, taken from je_mallctl .Type: float Unit: digital,B |
memory_pgpgin | The amount of data written by the system from disk to memory page. Type: float Unit: digital,B |
memory_pgpgout | The amount of data written to disk by system memory pages. Type: float Unit: digital,B |
memory_pool_bytes_total | all MemPool The size of memory currently occupied. Statistical value, does not represent actual memory usage. Type: float Unit: digital,B |
memory_pswpin | The number of times the system swapped from disk to memory. Type: float Unit: digital,B |
memory_pswpout | The number of times the system swapped from memory to disk. Type: float Unit: digital,B |
memtable_flush_duration_us | value of the time taken to write memtable to disk.Type: float Unit: time,μs |
memtable_flush_total | number of memtable writes to disk.Type: float Unit: count |
meta_request_duration | Access RocksDB The cumulative time consumption of meta in. Type: float Unit: time,μs |
meta_request_total | Access RocksDB The cumulative number of meta requests. Type: float Unit: count |
network_receive_bytes | each network card are accumulated. Collected from /proc/net/dev. Type: float Unit: digital,B |
network_receive_packets | each network card is accumulated. Collected from /proc/net/dev. Type: float Unit: count |
network_send_bytes | each network card . Collected from /proc/net/dev. Type: float Unit: digital,B |
network_send_packets | The total number of packets sent by each network card is accumulated. Collected from /proc/net/dev. Type: float Unit: count |
proc | The number of processes currently . Type: float Unit: count |
process_fd_num_limit_hard | BE process. pass /proc/ pid /limits collection. Type: float Unit: count |
process_fd_num_limit_soft | BE process. pass /proc/ pid /limits collection. Type: float Unit: count |
process_fd_num_used | The number of file handles used by the BE process. pass /proc/ pid /limits collection. Type: float Unit: count |
process_thread_num | BE process threads. pass /proc/ pid /task collection. Type: float Unit: count |
query_cache_memory_total_byte | Number of bytes occupied by Query Cache. Type: float Unit: digital,B |
query_cache_partition_total_count | Current number of Partition Cache caches. Type: float Unit: count |
query_cache_sql_total_count | Current number of SQL Cache caches. Type: float Unit: count |
query_scan_bytes | Read the cumulative value of the data amount. Here we only count reads Olap The amount of data in the table.Type: float Unit: digital,B |
query_scan_bytes_per_second | According to doris_be_query_scan_bytes Calculated read rate. Type: float Unit: traffic,B/S |
query_scan_rows | Read the cumulative value of the number of rows. Here we only count reads Olap The amount of data in the table. and is RawRowsRead (Some data rows may be skipped by the index and not actually read, but will still be recorded in this value ).Type: float Unit: count |
result_block_queue_count | The number of fragment instances in the current query result cache. Type: float Unit: count |
result_buffer_block_count | The number of queries in the current query result cache. Type: float Unit: count |
routine_load_task_count | The number of routine load tasks currently being executed. Type: float Unit: count |
rowset_count_generated_and_in_use | New and in use since the last startup The number of rowset ids.Type: float Unit: count |
s3_bytes_read_total | S3FileReader The cumulative number. Type: float Unit: count |
s3_file_open_reading | currently open S3FileReader number. Type: float Unit: count |
scanner_thread_pool_queue_size | used for OlapScanner The current queued number of thread pools.Type: float Unit: digital,B |
segment_read | Value of the number of segments read. Type: float Unit: count |
send_batch_thread_pool_queue_size | The number of queues in the thread pool used to send data packets when importing. Type: float Unit: count |
send_batch_thread_pool_thread_num | The number of threads in the thread pool used to send packets when importing. Type: float Unit: count |
small_file_cache_count | Currently cached by BE. Type: float Unit: count |
snmp_tcp_in_errs | tcp packet reception errors. Collected from /proc/net/ SNMP. Type: float Unit: count |
snmp_tcp_in_segs | tcp packets sent . Collected from /proc/net/ SNMP. Type: float Unit: count |
snmp_tcp_out_segs | tcp packets sent. Collected from /proc/net/ SNMP. Type: float Unit: count |
snmp_tcp_retrans_segs | TCP packet retransmissions . Collected from /proc/net/ SNMP. Type: float Unit: count |
stream_load | Value of the number received by stream load. Type: float Unit: count |
stream_load_pipe_count | Current stream load data pipelines. Type: float Unit: count |
stream_load_txn_request | Value of the number of transactions by stream load. Type: float Unit: count |
streaming_load_current_processing | Number of stream load tasks currently running. Type: float Unit: count |
streaming_load_duration_ms | The cumulative value of the execution time of all stream load tasks. Type: float Unit: time,ms |
streaming_load_requests_total | Value of the number of stream load tasks. Type: float Unit: count |
tablet_base_max_compaction_score | The current largest Base Compaction Score. Type: float Unit: percent,percent |
tablet_cumulative_max_compaction_score | Same as above. Current largest Cumulative Compaction Score. Type: float Unit: percent,percent |
tablet_version_num_distribution | The histogram of the number of tablet versions. Type: float Unit: count |
thrift_connections_total | Thrift connections created . like {name=heartbeat} Indicates the cumulative number of connections to the heartbeat service. Type: float Unit: count |
thrift_current_connections | Current number of thrift connections. like {name=heartbeat} Indicates the current number of connections to the heartbeat service. Type: float Unit: count |
thrift_opened_clients | Thrift clients currently open . like {name=frontend} Indicates the number of clients accessing the FE service. Type: float Unit: count |
thrift_used_clients | Thrift clients currently in use . like {name=frontend} Indicates the number of clients being used to access the FE service. Type: float Unit: count |
timeout_canceled_fragment_count | Cumulative value of the number of fragment instances canceled due to timeout. Type: float Unit: count |
unused_rowsets_count | The number of currently abandoned rowsets .Type: float Unit: count |
upload_fail_count | Cumulative value of rowset failed to be uploaded to remote storage.Type: float Unit: count |
upload_rowset_count | Cumulative number of rowsets successfully uploaded to remote storage.Type: float Unit: count |
upload_total_byte | Value of rowset data successfully uploaded to remote storage.Type: float Unit: digital,B |
doris_common
¶
- 标签
Tag | Description |
---|---|
host | Host name. |
instance | Instance endpoint. |
name | Metric name. |
state | Metric state. |
type | Metric type. |
- 指标列表
Metric | Description |
---|---|
node_info | Node_number. Type: float Unit: count |
system_meminfo | FE node machines. Collected from /proc/meminfo . include buffers , cached , memory_available , memory_free , memory_total. Type: float Unit: digital,B |
system_snmp | FE node machines. Collected from /proc/net/ SNMP. Type: float Unit: count |
doris_jvm
¶
- 标签
Tag | Description |
---|---|
host | Host name. |
instance | Instance endpoint. |
type | Metric type. |
- 指标列表
Metric | Description |
---|---|
heap_size_bytes | JVM memory metrics. The tags include max, used, committed , corresponding to the maximum value, used and requested memory respectively. Type: float Unit: digital,B |
non_heap_size_bytes | JVM off-heap memory statistics. Type: float Unit: digital,B |
old_gc | Cumulative value of GC. Type: float Unit: count |
old_size_bytes | JVM old generation memory statistics. Type: float Unit: digital,B |
thread | JVM thread count statistics. Type: float Unit: count |
young_size_bytes | JVM new generation memory statistics. Type: float Unit: digital,B |