点评 CAT
Dianping-cat 简称 Cat, 是一个开源的分布式实时监控系统,主要用于监控系统的性能、容量和业务指标等。它是美团点评公司研发的一款监控系统,目前已经开源并得到了广泛的应用。
Cat 通过采集系统的各种指标数据,如 CPU、内存、网络、磁盘等,进行实时监控和分析,帮助开发人员快速定位和解决系统问题。同时,它还提供了一些常用的监控功能,如告警、统计、日志分析等,方便开发人员进行系统监控和分析。
数据类型¶
数据传输协议:
- plaintext : 纯文本模式, DataKit 目前暂时不支持。
- native : 以特定符号为分隔符的文本形式,目前 DataKit 已经支持。
数据分类:
数据类型简写 | 类型 | 说明 | 当前版本的 DataKit 是否接入 | 对应到观测云中的数据类型 |
---|---|---|---|---|
t | transaction start | 事务开始 | true | trace |
T | transaction end | 事务结束 | true | trace |
E | event | 事件 | false | - |
M | metric | 自定义指标 | false | - |
L | trace | 链路 | false | - |
H | heartbeat | 心跳包 | true | 指标 |
客户端的启动模式¶
-
启动 cat server 模式
- 数据全在 DataKit 中,cat 的 web 页面已经没有数据,所以启动的意义不大,并且页面报错: 出问题 CAT 的服务端[xxx.xxx]
- 配置客户端行为可以在 client 的启动中做
- cat server 也会将 transaction 数据发送到 dk,造成观测云页面大量的垃圾数据
-
不启动 cat server: 在 DataKit 中配置
startTransactionTypes
:用于定义自定义事务类型,指定的事务类型会被 Cat 自动创建。多个事务类型之间使用分号进行分隔。block
:指定一个阈值用于阻塞监控,单位为毫秒。当某个事务的执行时间大于该阈值时,会触发 Cat 记录该事务的阻塞情况。routers
:指定 Cat 服务端的地址和端口号,多个服务器地址和端口号之间使用分号进行分隔。Cat 会自动将数据发送到这些服务器上,以保证数据的可靠性和容灾性。sample
:指定采样率,即只有一部分数据会被发送到 Cat 服务器。取值范围为 0 到 1,其中 1 表示全部数据都会被发送到 Cat 服务器,0 表示不发送任何数据。matchTransactionTypes
:用于定义自定义事务类型的匹配规则,通常用于 Api 服务监控中,指定需要监控哪些接口的性能。
所以: 不建议去开启一个 cat_home(cat server) 服务。相应的配置可以在 client.xml 中配置,请看下文。
配置¶
客户端配置¶
<?xml version="1.0" encoding="utf-8"?>
<config mode="client">
<servers>
<!-- datakit ip, cat port , http port -->
<server ip="10.200.6.16" port="2280" http-port="9529"/>
</servers>
</config>
注意:配置中的 9529 端口是 DataKit 的 http 端口。2280 是 cat 采集器开通的 2280 端口。
采集器配置¶
进入 DataKit 安装目录下的 conf.d/cat
目录,复制 cat.conf.sample
并命名为 cat.conf
。示例如下:
[[inputs.cat]]
## tcp port
tcp_port = "2280"
##native or plaintext, datakit only support native(NT1) !!!
decode = "native"
## This is default cat-client Kvs configs.
startTransactionTypes = "Cache.;Squirrel."
MatchTransactionTypes = "SQL"
block = "false"
routers = "127.0.0.1:2280;"
sample = "1.0"
## global tags.
# [inputs.cat.tags]
# key1 = "value1"
# key2 = "value2"
# ...
配置好后,重启 DataKit 即可。
目前可以通过 ConfigMap 方式注入采集器配置来开启采集器。
配置文件注意的地方:
startTransactionTypes
MatchTransactionTypes
block
routers
sample
是返回给 client 端的数据routers
是 DataKit 的 ip 或者域名tcp_port
对应的是 client 端配置 servers ip 地址
Tracing¶
cat
¶
This is the field description for the trace.
- 标签
Tag | Description |
---|---|
base_service | Span Base service name |
container_host | Container hostname. Available in OpenTelemetry. Optional. |
db_host | DB host name: ip or domain name. Optional. |
db_name | Database name. Optional. |
db_system | Database system name:mysql,oracle... Optional. |
dk_fingerprint | DataKit fingerprint is DataKit hostname |
endpoint | Endpoint info. Available in SkyWalking, Zipkin. Optional. |
env | Application environment info. Available in Jaeger. Optional. |
host | Hostname. |
http_method | HTTP request method name. Available in DDTrace, OpenTelemetry. Optional. |
http_route | HTTP route. Optional. |
http_status_code | HTTP response code. Available in DDTrace, OpenTelemetry. Optional. |
http_url | HTTP URL. Optional. |
operation | Span name |
out_host | This is the database host, equivalent to db_host,only DDTrace-go. Optional. |
project | Project name. Available in Jaeger. Optional. |
service | Service name. Optional. |
source_type | Tracing source type |
span_type | Span type |
status | Span status |
version | Application version info. Available in Jaeger. Optional. |
- 字段列表
Metric | Description |
---|---|
duration | Duration of span Type: int Unit: time,μs |
message | Origin content of span Type: string Unit: N/A |
parent_id | Parent span ID of current span Type: string Unit: N/A |
resource | Resource name produce current span Type: string Unit: N/A |
span_id | Span id Type: string Unit: N/A |
start | start time of span. Type: int Unit: timeStamp,usec |
trace_id | Trace id Type: string Unit: N/A |
Metric¶
Metric cat
¶
- 标签
Tag | Description |
---|---|
domain | IP address. |
hostName | Host name. |
os_arch | CPU architecture:AMD/ARM. |
os_name | OS name:'Windows/Linux/Mac',etc. |
os_version | The kernel version of the OS. |
runtime_java-version | Java version. |
runtime_user-dir | The path of jar. |
runtime_user-name | User name. |
- 字段列表
Metric | Description |
---|---|
disk_free | Free disk size. Type: float Unit: digital,B |
disk_total | Total disk size of data nodes. Type: float Unit: digital,B |
disk_usable | Used disk size. Type: float Unit: digital,B |
memory_free | Free memory size. Type: float Unit: count |
memory_heap-usage | The usage of heap memory. Type: float Unit: count |
memory_max | Max memory usage. Type: float Unit: count |
memory_non-heap-usage | The usage of non heap memory. Type: float Unit: count |
memory_total | Total memory size. Type: float Unit: count |
os_available-processors | The number of available processors in the host. Type: float Unit: count |
os_committed-virtual-memory | Committed virtual memory size. Type: float Unit: digital,B |
os_free-physical-memory | Free physical memory size. Type: float Unit: digital,B |
os_free-swap-space | Free swap space size Type: float Unit: digital,B |
os_system-load-average | Average system load. Type: float Unit: percent,percent |
os_total-physical-memory | Total physical memory size. Type: float Unit: digital,B |
os_total-swap-space | Total swap space size. Type: float Unit: digital,B |
runtime_start-time | Start time. Type: int Unit: time,s |
runtime_up-time | Runtime. Type: int Unit: time,ms |
thread_cat_thread_count | The number of threads used by cat. Type: float Unit: count |
thread_count | Total number of threads. Type: float Unit: count |
thread_daemon_count | The number of daemon threads. Type: float Unit: count |
thread_http_thread_count | The number of http threads. Type: float Unit: count |
thread_peek_count | Thread peek. Type: float Unit: count |
thread_pigeon_thread_count | The number of pigeon threads. Type: float Unit: count |
thread_total_started_count | Total number of started threads. Type: float Unit: count |