跳转至

点评 CAT

Version-1.9.0 · Experimental


Dianping-cat 简称 Cat, 是一个开源的分布式实时监控系统,主要用于监控系统的性能、容量和业务指标等。它是美团点评公司研发的一款监控系统,目前已经开源并得到了广泛的应用。

Cat 通过采集系统的各种指标数据,如 CPU、内存、网络、磁盘等,进行实时监控和分析,帮助开发人员快速定位和解决系统问题。同时,它还提供了一些常用的监控功能,如告警、统计、日志分析等,方便开发人员进行系统监控和分析。

数据类型

数据传输协议:

  • plaintext : 纯文本模式, DataKit 目前暂时不支持。
  • native : 以特定符号为分隔符的文本形式,目前 DataKit 已经支持。

数据分类:

数据类型简写 类型 说明 当前版本的 DataKit 是否接入 对应到观测云中的数据类型
t transaction start 事务开始 true trace
T transaction end 事务结束 true trace
E event 事件 false -
M metric 自定义指标 false -
L trace 链路 false -
H heartbeat 心跳包 true 指标

客户端的启动模式

  • 启动 cat server 模式

    • 数据全在 DataKit 中,cat 的 web 页面已经没有数据,所以启动的意义不大,并且页面报错: 出问题 CAT 的服务端[xxx.xxx]
    • 配置客户端行为可以在 client 的启动中做
    • cat server 也会将 transaction 数据发送到 dk,造成观测云页面大量的垃圾数据
  • 不启动 cat server: 在 DataKit 中配置

    • startTransactionTypes:用于定义自定义事务类型,指定的事务类型会被 Cat 自动创建。多个事务类型之间使用分号进行分隔。
    • block:指定一个阈值用于阻塞监控,单位为毫秒。当某个事务的执行时间大于该阈值时,会触发 Cat 记录该事务的阻塞情况。
    • routers:指定 Cat 服务端的地址和端口号,多个服务器地址和端口号之间使用分号进行分隔。Cat 会自动将数据发送到这些服务器上,以保证数据的可靠性和容灾性。
    • sample:指定采样率,即只有一部分数据会被发送到 Cat 服务器。取值范围为 0 到 1,其中 1 表示全部数据都会被发送到 Cat 服务器,0 表示不发送任何数据。
    • matchTransactionTypes:用于定义自定义事务类型的匹配规则,通常用于 Api 服务监控中,指定需要监控哪些接口的性能。

所以: 不建议去开启一个 cat_home(cat server) 服务。相应的配置可以在 client.xml 中配置,请看下文。

配置

客户端配置

<?xml version="1.0" encoding="utf-8"?>
<config mode="client">
    <servers>
        <!-- datakit ip, cat port , http port -->
        <server ip="10.200.6.16" port="2280" http-port="9529"/>
    </servers>
</config>

注意:配置中的 9529 端口是 DataKit 的 http 端口。2280 是 cat 采集器开通的 2280 端口。

采集器配置

进入 DataKit 安装目录下的 conf.d/samples 目录,复制 cat.conf.sample 并命名为 cat.conf。示例如下:

[[inputs.cat]]
  ## tcp port
  tcp_port = "2280"

  ##native or plaintext, datakit only support native(NT1) !!!
  decode = "native"

  ## This is default cat-client Kvs configs.
  startTransactionTypes = "Cache.;Squirrel."
  MatchTransactionTypes = "SQL"
  block = "false"
  routers = "127.0.0.1:2280;"
  sample = "1.0"

  ## global tags.
  # [inputs.cat.tags]
    # key1 = "value1"
    # key2 = "value2"
    # ...

配置好后,重启 DataKit 即可。

目前可以通过 ConfigMap 方式注入采集器配置来开启采集器。


配置文件注意的地方:

  1. startTransactionTypes MatchTransactionTypes block routers sample 是返回给 client 端的数据
  2. routers 是 DataKit 的 ip 或者域名
  3. tcp_port 对应的是 client 端配置 servers ip 地址

Tracing

cat

Following is tags/fields of tracing data

Tags & Fields Description
base_service
(tag)
Span base service name
container_host
(tag)
Container hostname. Available in OpenTelemetry. Optional.
db_host
(tag)
DB host name: ip or domain name. Optional.
db_name
(tag)
Database name. Optional.
db_system
(tag)
Database system name:mysql,oracle... Optional.
dk_fingerprint
(tag)
DataKit fingerprint(always DataKit's hostname)
endpoint
(tag)
Endpoint info. Available in SkyWalking, Zipkin. Optional.
env
(tag)
Application environment info. Available in Jaeger. Optional.
host
(tag)
Hostname.
http_method
(tag)
HTTP request method name. Available in DDTrace, OpenTelemetry. Optional.
http_route
(tag)
HTTP route. Optional.
http_status_code
(tag)
HTTP response code. Available in DDTrace, OpenTelemetry. Optional.
http_url
(tag)
HTTP URL. Optional.
operation
(tag)
Span name
out_host
(tag)
This is the database host, equivalent to db_host,only DDTrace-go. Optional.
project
(tag)
Project name. Available in Jaeger. Optional.
service
(tag)
Service name. Optional.
source_type
(tag)
Tracing source type
span_type
(tag)
Span type
status
(tag)
Span status
version
(tag)
Application version info. Available in Jaeger. Optional.
duration Duration of span
Type: int | (gauge)
Unit: time,μs
message Origin content of span
Type: string
Unit: N/A
parent_id Parent span ID of current span
Type: string
Unit: N/A
resource Resource name produce current span
Type: string
Unit: N/A
span_id Span id
Type: string
Unit: N/A
start start time of span.
Type: int | (gauge)
Unit: timeStamp,usec
trace_id Trace id
Type: string
Unit: N/A

Metric

Metric cat

Tags & Fields Description
domain
(tag)
IP address.
hostName
(tag)
Host name.
os_arch
(tag)
CPU architecture:AMD/ARM.
os_name
(tag)
OS name:'Windows/Linux/Mac',etc.
os_version
(tag)
The kernel version of the OS.
runtime_java-version
(tag)
Java version.
runtime_user-dir
(tag)
The path of jar.
runtime_user-name
(tag)
User name.
disk_free Free disk size.
Type: float | (gauge)
Unit: digital,B
disk_total Total disk size of data nodes.
Type: float | (gauge)
Unit: digital,B
disk_usable Used disk size.
Type: float | (gauge)
Unit: digital,B
memory_free Free memory size.
Type: float | (gauge)
Unit: count
memory_heap-usage The usage of heap memory.
Type: float | (gauge)
Unit: count
memory_max Max memory usage.
Type: float | (gauge)
Unit: count
memory_non-heap-usage The usage of non heap memory.
Type: float | (gauge)
Unit: count
memory_total Total memory size.
Type: float | (gauge)
Unit: count
os_available-processors The number of available processors in the host.
Type: float | (gauge)
Unit: count
os_committed-virtual-memory Committed virtual memory size.
Type: float | (gauge)
Unit: digital,B
os_free-physical-memory Free physical memory size.
Type: float | (gauge)
Unit: digital,B
os_free-swap-space Free swap space size
Type: float | (gauge)
Unit: digital,B
os_system-load-average Average system load.
Type: float | (gauge)
Unit: percent,percent
os_total-physical-memory Total physical memory size.
Type: float | (gauge)
Unit: digital,B
os_total-swap-space Total swap space size.
Type: float | (gauge)
Unit: digital,B
runtime_start-time Start time.
Type: int | (gauge)
Unit: time,s
runtime_up-time Runtime.
Type: int | (gauge)
Unit: time,ms
thread_cat_thread_count The number of threads used by cat.
Type: float | (gauge)
Unit: count
thread_count Total number of threads.
Type: float | (gauge)
Unit: count
thread_daemon_count The number of daemon threads.
Type: float | (gauge)
Unit: count
thread_http_thread_count The number of http threads.
Type: float | (gauge)
Unit: count
thread_peek_count Thread peek.
Type: float | (gauge)
Unit: count
thread_pigeon_thread_count The number of pigeon threads.
Type: float | (gauge)
Unit: count
thread_total_started_count Total number of started threads.
Type: float | (gauge)
Unit: count

文档评价

文档内容是否对您有帮助? ×