主机健康检查
健康检查采集器可以定期去监控主机的进程和网络(如 TCP 和 HTTP)的健康状况,如果不符合健康要求,DataKit 会收集相应的信息,并上报指标数据。
配置¶
采集器配置¶
进入 DataKit 安装目录下的 conf.d/samples 目录,复制 host_healthcheck.conf.sample 并命名为 host_healthcheck.conf。示例如下:
[[inputs.host_healthcheck]]
  ## Collect interval
  interval = "1m" 
  ## Check process
  [[inputs.host_healthcheck.process]]
    # Process filtering based on process name
    names = ["nginx", "mysql"]
    # Process filtering based on regular expression 
    # names_regex = [ "my_process_.*" ]
    # Process filtering based on cmd line
    # cmd_lines = ["nginx", "mysql"]
    # Process filtering based on regular expression 
    # cmd_lines_regex = [ "my_args_.*" ]
    ## Process minimal run time
    # Only check the process when the running time of the process is greater than min_run_time
    min_run_time = "10m"
  ## Check TCP
  # [[inputs.host_healthcheck.tcp]]
    ## Host and port
    # host_ports = ["10.100.1.2:3369", "192.168.1.2:6379"]
    ## TCP timeout
    # connection_timeout = "3s"
  ## Check HTTP
  # [[inputs.host_healthcheck.http]]
      ## HTTP urls
      # http_urls = [ "http://127.0.0.1:8000/path/to/api?arg1=x&arg2=y" ]
      ## HTTP method
      # method = "GET"
      ## Expected response status code
      # expect_status = 200 
      ## HTTP timeout
      # timeout = "30s"
      ## Ignore tls validation 
      # ignore_insecure_tls = false
      ## HTTP headers
      # [inputs.host_healthcheck.http.headers]
        # Header1 = "header-value-1"
        # Hedaer2 = "header-value-2"
  ## Extra tags
  [inputs.host_healthcheck.tags]
  # some_tag = "some_value"
  # more_tag = "some_other_value"
  # ...
配置好后,重启 DataKit 即可。
可通过 ConfigMap 方式注入采集器配置 或 配置 ENV_DATAKIT_INPUTS 开启采集器。
也支持以环境变量的方式修改配置参数(需要在 ENV_DEFAULT_ENABLED_INPUTS 中加为默认采集器):
- 
ENV_INPUT_HEALTHCHECK_INTERVAL 采集器重复间隔时长 字段类型: Duration 采集器配置字段: interval默认值: 10s 
- 
ENV_INPUT_HEALTHCHECK_PROCESS 检查处理器 字段类型: JSON 采集器配置字段: process示例: '[{"names":["nginx","mysql"],"min_run_time":"10m"}]' 
- 
ENV_INPUT_HEALTHCHECK_TCP 检查 TCP 字段类型: JSON 采集器配置字段: tcp示例: '[{"host_ports":["10.100.1.2:3369","192.168.1.2:6379"],"connection_timeout":"3s"}]' 
- 
ENV_INPUT_HEALTHCHECK_HTTP 检查 HTTP 字段类型: JSON 采集器配置字段: http示例: '[{"http_urls":["http://local-ip:port/path/to/api?arg1=x&arg2=y"],"method":"GET","expect_status":200,"timeout":"30s","ignore_insecure_tls":false,"headers":{"Header1":"header-value-1","Hedaer2":"header-value-2"}}]' 
- 
ENV_INPUT_HEALTHCHECK_TAGS 自定义标签。如果配置文件有同名标签,将会覆盖它 字段类型: JSON 采集器配置字段: tags示例: '{"some_tag":"some_value","more_tag":"some_other_value"}' 
指标¶
以下所有数据采集,默认会追加名为 host 的全局 tag(tag 值为 DataKit 所在主机名),也可以在配置中通过 [inputs.host_healthcheck.tags] 指定其它标签:
host_process_exception¶
| Tags & Fields | Description | 
|---|---|
| cmd_line ( tag) | The command line of the process | 
| host ( tag) | System hostname | 
| process ( tag) | The name of the process | 
| type ( tag) | The type of the exception | 
| exception | Exception value, 1 or 0 Type: int | (bool) Unit: - | 
| pid | The process ID Type: int | (gauge) Unit: int | 
| start_duration | The total time the process has run Type: int | (gauge) Unit: time,μs | 
host_tcp_exception¶
| Tags & Fields | Description | 
|---|---|
| host ( tag) | System hostname | 
| port ( tag) | The port | 
| type ( tag) | The type of the exception | 
| exception | Exception value, 1 or 0 Type: int | (bool) Unit: - | 
host_http_exception¶
| Tags & Fields | Description | 
|---|---|
| error ( tag) | The error message | 
| host ( tag) | System hostname | 
| url ( tag) | The URL | 
| exception | Exception value, 1 or 0 Type: int | (bool) Unit: - |