主机健康检查
健康检查采集器可以定期去监控主机的进程和网络(如 TCP 和 HTTP)的健康状况,如果不符合健康要求,DataKit 会收集相应的信息,并上报指标数据。
配置¶
采集器配置¶
进入 DataKit 安装目录下的 conf.d/host
目录,复制 host_healthcheck.conf.sample
并命名为 host_healthcheck.conf
。示例如下:
[[inputs.host_healthcheck]]
## Collect interval
interval = "1m"
## Check process
[[inputs.host_healthcheck.process]]
# Process filtering based on process name
names = ["nginx", "mysql"]
# Process filtering based on regular expression
# names_regex = [ "my_process_.*" ]
# Process filtering based on cmd line
# cmd_lines = ["nginx", "mysql"]
# Process filtering based on regular expression
# cmd_lines_regex = [ "my_args_.*" ]
## Process minimal run time
# Only check the process when the running time of the process is greater than min_run_time
min_run_time = "10m"
## Check TCP
# [[inputs.host_healthcheck.tcp]]
## Host and port
# host_ports = ["10.100.1.2:3369", "192.168.1.2:6379"]
## TCP timeout
# connection_timeout = "3s"
## Check HTTP
# [[inputs.host_healthcheck.http]]
## HTTP urls
# http_urls = [ "http://127.0.0.1:8000/path/to/api?arg1=x&arg2=y" ]
## HTTP method
# method = "GET"
## Expected response status code
# expect_status = 200
## HTTP timeout
# timeout = "30s"
## Ignore tls validation
# ignore_insecure_tls = false
## HTTP headers
# [inputs.host_healthcheck.http.headers]
# Header1 = "header-value-1"
# Hedaer2 = "header-value-2"
## Extra tags
[inputs.host_healthcheck.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
# ...
配置好后,重启 DataKit 即可。
可通过 ConfigMap 方式注入采集器配置 或 配置 ENV_DATAKIT_INPUTS 开启采集器。
也支持以环境变量的方式修改配置参数(需要在 ENV_DEFAULT_ENABLED_INPUTS 中加为默认采集器):
-
ENV_INPUT_HEALTHCHECK_INTERVAL
采集器重复间隔时长
字段类型: Duration
采集器配置字段:
interval
默认值: 10s
-
ENV_INPUT_HEALTHCHECK_PROCESS
检查处理器
字段类型: JSON
采集器配置字段:
process
示例: [{"names":["nginx","mysql"],"min_run_time":"10m"}]
-
ENV_INPUT_HEALTHCHECK_TCP
检查 TCP
字段类型: JSON
采集器配置字段:
tcp
示例: [{"host_ports":["10.100.1.2:3369","192.168.1.2:6379"],"connection_timeout":"3s"}]
-
ENV_INPUT_HEALTHCHECK_HTTP
检查 HTTP
字段类型: JSON
采集器配置字段:
http
示例: [{"http_urls":["http://local-ip:port/path/to/api?arg1=x&arg2=y"],"method":"GET","expect_status":200,"timeout":"30s","ignore_insecure_tls":false,"headers":{"Header1":"header-value-1","Hedaer2":"header-value-2"}}]
-
ENV_INPUT_HEALTHCHECK_TAGS
自定义标签。如果配置文件有同名标签,将会覆盖它
字段类型: JSON
采集器配置字段:
tags
示例: {"some_tag":"some_value","more_tag":"some_other_value"}
指标¶
以下所有数据采集,默认会追加名为 host
的全局 tag(tag 值为 DataKit 所在主机名),也可以在配置中通过 [inputs.host_healthcheck.tags]
指定其它标签:
host_process_exception
¶
- 标签
Tag | Description |
---|---|
cmd_line |
The command line of the process |
host |
System hostname |
process |
The name of the process |
type |
The type of the exception |
- 字段列表
Metric | Description | Type | Unit |
---|---|---|---|
exception |
Exception value, 1 or 0 | int | - |
pid |
The process ID | int | int |
start_duration |
The total time the process has run | int | μs |
host_tcp_exception
¶
- 标签
Tag | Description |
---|---|
host |
System hostname |
port |
The port |
type |
The type of the exception |
- 字段列表
Metric | Description | Type | Unit |
---|---|---|---|
exception |
Exception value, 1 or 0 | int | - |
host_http_exception
¶
- 标签
Tag | Description |
---|---|
error |
The error message |
host |
System hostname |
url |
The URL |
- 字段列表
Metric | Description | Type | Unit |
---|---|---|---|
exception |
Exception value, 1 or 0 | int | - |