RabbitMQ
RabbitMQ 采集器是通过插件 rabbitmq-management 采集数据监控 RabbitMQ,它能够:
- RabbitMQ overview 总览,比如连接数、队列数、消息总数等
- 跟踪 RabbitMQ queue 信息,比如队列大小,消费者计数等
- 跟踪 RabbitMQ node 信息,比如使用的
socketmem等 - 跟踪 RabbitMQ exchange 信息 ,比如
message_publish_count等
配置¶
前置条件¶
-
RabbitMQ 版本 >=
3.6.0; 已测试的版本:- 3.11.x
- 3.10.x
- 3.9.x
- 3.8.x
- 3.7.x
- 3.6.x
-
安装
rabbitmq,以Ubuntu为例 -
开启
REST API plug-ins -
创建 user,比如:
采集器配置¶
进入 DataKit 安装目录下的 conf.d/samples 目录,复制 rabbitmq.conf.sample 并命名为 rabbitmq.conf。示例如下:
[[inputs.rabbitmq]]
# rabbitmq url ,required
url = "http://localhost:15672"
# rabbitmq user, required
username = "guest"
# rabbitmq password, required
password = "guest"
# ##(optional) collection interval, default is 30s
# interval = "30s"
## Optional TLS Config
# tls_ca = "/xxx/ca.pem"
# tls_cert = "/xxx/cert.cer"
# tls_key = "/xxx/key.key"
## Use TLS but skip chain & host verification
insecure_skip_verify = false
## Set true to enable election
election = true
# [inputs.rabbitmq.log]
# files = []
# #grok pipeline script path
# pipeline = "rabbitmq.p"
[inputs.rabbitmq.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
# ...
配置好后,重启 DataKit 即可。
目前可以通过 ConfigMap 方式注入采集器配置来开启采集器。
指标¶
以下所有数据采集,默认会追加全局选举 tag,也可以在配置中通过 [inputs.rabbitmq.tags] 指定其它标签:
rabbitmq_overview¶
| Tags & Fields | Description |
|---|---|
| cluster_name ( tag) |
RabbitMQ cluster name |
| host ( tag) |
Hostname of RabbitMQ running on. |
| rabbitmq_version ( tag) |
RabbitMQ version |
| url ( tag) |
RabbitMQ url |
| message_ack_count | Number of messages delivered to clients and acknowledged Type: int | (count) Unit: count |
| message_ack_rate | Rate of messages delivered to clients and acknowledged per second Type: float | (gauge) Unit: percent,percent |
| message_confirm_count | Count of messages confirmed Type: int | (count) Unit: count |
| message_confirm_rate | Rate of messages confirmed per second Type: float | (gauge) Unit: percent,percent |
| message_deliver_get_count | Sum of messages delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get Type: int | (count) Unit: count |
| message_deliver_get_rate | Rate per second of the sum of messages delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get Type: float | (gauge) Unit: percent,percent |
| message_publish_count | Count of messages published Type: int | (count) Unit: count |
| message_publish_in_count | Count of messages published from channels into this overview Type: int | (count) Unit: count |
| message_publish_in_rate | Rate of messages published from channels into this overview per sec Type: float | (gauge) Unit: percent,percent |
| message_publish_out_count | Count of messages published from this overview into queues Type: int | (count) Unit: count |
| message_publish_out_rate | Rate of messages published from this overview into queues per second Type: float | (gauge) Unit: percent,percent |
| message_publish_rate | Rate of messages published per second Type: float | (gauge) Unit: percent,percent |
| message_redeliver_count | Count of subset of messages in deliver_get which had the redelivered flag set Type: int | (count) Unit: count |
| message_redeliver_rate | Rate of subset of messages in deliver_get which had the redelivered flag set per second Type: float | (gauge) Unit: percent,percent |
| message_return_unroutable_count | Count of messages returned to publisher as unroutable Type: int | (count) Unit: count |
| message_return_unroutable_count_rate | Rate of messages returned to publisher as unroutable per second Type: float | (gauge) Unit: percent,percent |
| object_totals_channels | Total number of channels Type: int | (count) Unit: count |
| object_totals_connections | Total number of connections Type: int | (count) Unit: count |
| object_totals_consumers | Total number of consumers Type: int | (count) Unit: count |
| object_totals_queues | Total number of queues Type: int | (count) Unit: count |
| queue_totals_messages_count | Total number of messages (ready plus unacknowledged) Type: int | (count) Unit: count |
| queue_totals_messages_rate | Total rate of messages (ready plus unacknowledged) Type: float | (gauge) Unit: percent,percent |
| queue_totals_messages_ready_count | Number of messages ready for delivery Type: int | (count) Unit: count |
| queue_totals_messages_ready_rate | Rate of number of messages ready for delivery Type: float | (gauge) Unit: percent,percent |
| queue_totals_messages_unacknowledged_count | Number of unacknowledged messages Type: int | (count) Unit: count |
| queue_totals_messages_unacknowledged_rate | Rate of number of unacknowledged messages Type: float | (gauge) Unit: percent,percent |
rabbitmq_queue¶
| Tags & Fields | Description |
|---|---|
| cluster_name ( tag) |
RabbitMQ cluster name |
| host ( tag) |
Hostname of RabbitMQ running on. |
| node_name ( tag) |
RabbitMQ node name |
| queue_name ( tag) |
RabbitMQ queue name |
| url ( tag) |
RabbitMQ host URL |
| vhost ( tag) |
RabbitMQ queue virtual hosts |
| bindings_count | Number of bindings for a specific queue Type: int | (count) Unit: count |
| consumer_utilization | The ratio of time that a queue's consumers can take new messages Type: float | (gauge) Unit: percent,percent |
| consumers | Number of consumers Type: int | (count) Unit: count |
| head_message_timestamp | Timestamp of the head message of the queue. Shown as millisecond Type: int | (gauge) Unit: timeStamp,msec |
| memory | Bytes of memory consumed by the Erlang process associated with the queue, including stack, heap and internal structures Type: int | (gauge) Unit: digital,B |
| message_ack_count | Number of messages in queues delivered to clients and acknowledged Type: int | (count) Unit: count |
| message_ack_rate | Number per second of messages delivered to clients and acknowledged Type: float | (gauge) Unit: percent,percent |
| message_deliver_count | Count of messages delivered in acknowledgement mode to consumers Type: int | (count) Unit: count |
| message_deliver_get_count | Sum of messages in queues delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get. Type: int | (count) Unit: count |
| message_deliver_get_rate | Rate per second of the sum of messages in queues delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get. Type: float | (gauge) Unit: percent,percent |
| message_deliver_rate | Rate of messages delivered in acknowledgement mode to consumers Type: float | (gauge) Unit: percent,percent |
| message_publish_count | Count of messages in queues published Type: int | (count) Unit: count |
| message_publish_rate | Rate per second of messages published Type: float | (gauge) Unit: percent,percent |
| message_redeliver_count | Count of subset of messages in queues in deliver_get which had the redelivered flag set Type: int | (count) Unit: count |
| message_redeliver_rate | Rate per second of subset of messages in deliver_get which had the redelivered flag set Type: float | (gauge) Unit: percent,percent |
| messages | Count of the total messages in the queue Type: int | (count) Unit: count |
| messages_rate | Count per second of the total messages in the queue Type: float | (gauge) Unit: percent,percent |
| messages_ready | Number of messages ready to be delivered to clients Type: int | (count) Unit: count |
| messages_ready_rate | Number per second of messages ready to be delivered to clients Type: float | (gauge) Unit: percent,percent |
| messages_unacknowledged | Number of messages delivered to clients but not yet acknowledged Type: int | (count) Unit: count |
| messages_unacknowledged_rate | Number per second of messages delivered to clients but not yet acknowledged Type: float | (gauge) Unit: percent,percent |
rabbitmq_exchange¶
| Tags & Fields | Description |
|---|---|
| auto_delete ( tag) |
If set, the exchange is deleted when all queues have finished using it |
| cluster_name ( tag) |
RabbitMQ cluster name |
| durable ( tag) |
If set when creating a new exchange, the exchange will be marked as durable. Durable exchanges remain active when a server restarts. Non-durable exchanges (transient exchanges) are purged if/when a server restarts. |
| exchange_name ( tag) |
RabbitMQ exchange name |
| host ( tag) |
Hostname of RabbitMQ running on. |
| internal ( tag) |
If set, the exchange may not be used directly by publishers, but only when bound to other exchanges. Internal exchanges are used to construct wiring that is not visible to applications |
| type ( tag) |
RabbitMQ exchange type |
| url ( tag) |
RabbitMQ host URL |
| vhost ( tag) |
RabbitMQ exchange virtual hosts |
| message_ack_count | Number of messages in exchanges delivered to clients and acknowledged Type: int | (count) Unit: count |
| message_ack_rate | Rate of messages in exchanges delivered to clients and acknowledged per second Type: float | (gauge) Unit: percent,percent |
| message_confirm_count | Count of messages in exchanges confirmed Type: int | (count) Unit: count |
| message_confirm_rate | Rate of messages in exchanges confirmed per second Type: float | (gauge) Unit: percent,percent |
| message_deliver_get_count | Sum of messages in exchanges delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get Type: int | (count) Unit: count |
| message_deliver_get_rate | Rate per second of the sum of exchange messages delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get Type: float | (gauge) Unit: percent,percent |
| message_publish_count | Count of messages in exchanges published Type: int | (count) Unit: count |
| message_publish_in_count | Count of messages published from channels into this exchange Type: int | (count) Unit: count |
| message_publish_in_rate | Rate of messages published from channels into this exchange per sec Type: float | (gauge) Unit: percent,percent |
| message_publish_out_count | Count of messages published from this exchange into queues Type: int | (count) Unit: count |
| message_publish_out_rate | Rate of messages published from this exchange into queues per second Type: float | (gauge) Unit: percent,percent |
| message_publish_rate | Rate of messages in exchanges published per second Type: float | (gauge) Unit: percent,percent |
| message_redeliver_count | Count of subset of messages in exchanges in deliver_get which had the redelivered flag set Type: int | (count) Unit: count |
| message_redeliver_rate | Rate of subset of messages in exchanges in deliver_get which had the redelivered flag set per second Type: float | (gauge) Unit: percent,percent |
| message_return_unroutable_count | Count of messages in exchanges returned to publisher as un-routable Type: int | (count) Unit: count |
| message_return_unroutable_count_rate | Rate of messages in exchanges returned to publisher as un-routable per second Type: float | (gauge) Unit: percent,percent |
rabbitmq_node¶
| Tags & Fields | Description |
|---|---|
| cluster_name ( tag) |
RabbitMQ cluster name |
| host ( tag) |
Hostname of RabbitMQ running on. |
| node_name ( tag) |
RabbitMQ node name |
| url ( tag) |
RabbitMQ url |
| disk_free | Current free disk space Type: int | (gauge) Unit: digital,B |
| disk_free_alarm | Does the node have disk alarm Type: bool | (gauge) Unit: N/A |
| fd_used | Used file descriptors Type: int | (gauge) Unit: count |
| io_read_avg_time | Average wall time (milliseconds) for each disk read operation in the last statistics interval Type: float | (gauge) Unit: time,ms |
| io_seek_avg_time | Average wall time (milliseconds) for each seek operation in the last statistics interval Type: float | (gauge) Unit: time,ms |
| io_sync_avg_time | Average wall time (milliseconds) for each fsync() operation in the last statistics interval Type: float | (gauge) Unit: time,ms |
| io_write_avg_time | Average wall time (milliseconds) for each disk write operation in the last statistics interval Type: float | (gauge) Unit: time,ms |
| mem_alarm | Does the node have mem alarm Type: bool | (gauge) Unit: N/A |
| mem_limit | Memory usage high watermark in bytes Type: int | (gauge) Unit: digital,B |
| mem_used | Memory used in bytes Type: int | (gauge) Unit: digital,B |
| run_queue | Average number of Erlang processes waiting to run Type: int | (count) Unit: count |
| running | Is the node running or not Type: bool | (gauge) Unit: N/A |
| sockets_used | Number of file descriptors used as sockets Type: int | (count) Unit: count |
collector¶
| Tags & Fields | Description |
|---|---|
| instance ( tag) |
Server addr of the instance |
| job ( tag) |
Server name of the instance |
| up | Type: int | (gauge) Unit: - |
自定义对象¶
mq¶
| Tags & Fields | Description |
|---|---|
| col_co_status ( tag) |
Current status of collector on instance(OK/NotOK) |
| host ( tag) |
The server host address |
| ip ( tag) |
Connection IP of the instance |
| name ( tag) |
Object uniq ID |
| reason ( tag) |
If status not ok, we'll get some reasons about the status |
| display_name | Displayed name in UI Type: string | (gauge) Unit: N/A |
| uptime | Current instance uptime Type: int | (gauge) Unit: time,s |
| version | Current version of the instance Type: string | (gauge) Unit: N/A |
日志¶
Note
必须将 DataKit 安装在 RabbitMQ 所在主机才能采集 RabbitMQ 日志
如需采集 RabbitMQ 的日志,可在 rabbitmq.conf 中 将 files 打开,并写入 RabbitMQ 日志文件的绝对路径。比如:
[[inputs.rabbitmq]]
...
[inputs.rabbitmq.log]
files = ["/var/log/rabbitmq/rabbit@your-hostname.log"]
开启日志采集以后,默认会产生日志来源(source)为 rabbitmq 的日志。
日志 Pipeline 功能切割字段说明¶
- RabbitMQ 通用日志切割
通用日志文本示例:
2021-05-26 14:20:06.105 [warning] <0.12897.46> rabbitmqctl node_health_check and its HTTP API counterpart are DEPRECATED. See https://www.rabbitmq.com/monitoring.html#health-checks for replacement options.
切割后的字段列表如下:
| 字段名 | 字段值 | 说明 |
|---|---|---|
| status | warning | 日志等级 |
| msg | <0.12897.46>...replacement options | 日志等级 |
| time | 1622010006000000000 | 纳秒时间戳(作为行协议时间) |