Skip to content

Health Check

Version-1.24.0



The health check collector can regularly monitor the health of processes and networks (such as TCP and HTTP) of the main computer. If it doesn't meet the health requirements, DataKit will collect corresponding information and report the metric data.

Configuration

Collector Configuration

Go to the conf.d/host directory under the DataKit installation directory, copy host_healthcheck.conf.sample and name it host_healthcheck.conf. Examples are as follows:

[[inputs.host_healthcheck]]
  ## Collect interval
  interval = "1m" 

  ## Check process
  [[inputs.host_healthcheck.process]]
    # Process filtering based on process name
    names = ["nginx", "mysql"]

    # Process filtering based on regular expression 
    # names_regex = [ "my_process_.*" ]

    # Process filtering based on cmd line
    # cmd_lines = ["nginx", "mysql"]

    # Process filtering based on regular expression 
    # cmd_lines_regex = [ "my_args_.*" ]

    ## Process minimal run time
    # Only check the process when the running time of the process is greater than min_run_time
    min_run_time = "10m"

  ## Check TCP
  # [[inputs.host_healthcheck.tcp]]
    ## Host and port
    # host_ports = ["10.100.1.2:3369", "192.168.1.2:6379"]

    ## TCP timeout
    # connection_timeout = "3s"

  ## Check HTTP
  # [[inputs.host_healthcheck.http]]
      ## HTTP urls
      # http_urls = [ "http://127.0.0.1:8000/path/to/api?arg1=x&arg2=y" ]

      ## HTTP method
      # method = "GET"

      ## Expected response status code
      # expect_status = 200 

      ## HTTP timeout
      # timeout = "30s"

      ## Ignore tls validation 
      # ignore_insecure_tls = false

      ## HTTP headers
      # [inputs.host_healthcheck.http.headers]
        # Header1 = "header-value-1"
        # Hedaer2 = "header-value-2"

  ## Extra tags
  [inputs.host_healthcheck.tags]
  # some_tag = "some_value"
  # more_tag = "some_other_value"
  # ...

Once configured, restart DataKit.

Can be turned on by ConfigMap Injection Collector Configuration or Config ENV_DATAKIT_INPUTS .

Can also be turned on by environment variables, (needs to be added as the default collector in ENV_DEFAULT_ENABLED_INPUTS):

  • ENV_INPUT_HEALTHCHECK_INTERVAL

    Collect interval

    Type: Duration

    input.conf: interval

    Default: 10s

  • ENV_INPUT_HEALTHCHECK_PROCESS

    Check process

    Type: JSON

    input.conf: process

    Example: [{"names":["nginx","mysql"],"min_run_time":"10m"}]

  • ENV_INPUT_HEALTHCHECK_TCP

    Check TCP

    Type: JSON

    input.conf: tcp

    Example: [{"host_ports":["10.100.1.2:3369","192.168.1.2:6379"],"connection_timeout":"3s"}]

  • ENV_INPUT_HEALTHCHECK_HTTP

    Check HTTP

    Type: JSON

    input.conf: http

    Example: [{"http_urls":["http://local-ip:port/path/to/api?arg1=x&arg2=y"],"method":"GET","expect_status":200,"timeout":"30s","ignore_insecure_tls":false,"headers":{"Header1":"header-value-1","Hedaer2":"header-value-2"}}]

  • ENV_INPUT_HEALTHCHECK_TAGS

    Customize tags. If there is a tag with the same name in the configuration file, it will be overwritten

    Type: JSON

    input.conf: tags

    Example: {"some_tag":"some_value","more_tag":"some_other_value"}

Metric

For all of the following data collections, a global tag named host is appended by default (the tag value is the host name of the DataKit), or other tags can be specified in the configuration by [inputs.host_healthcheck.tags]:

 [inputs.host_healthcheck.tags]
  # some_tag = "some_value"
  # more_tag = "some_other_value"
  # ...

host_process_exception

  • tag
Tag Description
cmd_line The command line of the process
host System hostname
process The name of the process
type The type of the exception
  • field list
Metric Description Type Unit
exception Exception value, 1 or 0 int -
pid The process ID int int
start_duration The total time the process has run int μs

host_tcp_exception

  • tag
Tag Description
host System hostname
port The port
type The type of the exception
  • field list
Metric Description Type Unit
exception Exception value, 1 or 0 int -

host_http_exception

  • tag
Tag Description
error The error message
host System hostname
url The URL
  • field list
Metric Description Type Unit
exception Exception value, 1 or 0 int -

Feedback

Is this page helpful? ×