Network Dial Test¶

The collector collects the data of network dialing test results, and all the data generated by dialing test are reported to Guance Cloud.

Configuration¶

Environment Variables¶

By default, the dialtesting service can dial up any network, which may pose certain security risks. If you need to prohibit to connect to certain network, you can restrict it by setting the following environmental variables:

Environment variable name	Parameter example	Description
`ENV_INPUT_DIALTESTING_DISABLE_INTERNAL_NETWORK_TASK`	`true`	Enable or disable internal network dialing test. Default is `false`
`ENV_INPUT_DIALTESTING_DISABLED_INTERNAL_NETWORK_CIDR_LIST`	`["192.168.0.0/16"]`	List of network CIDRs that prohibit testing, which supports multiple entries. If left empty, all private networks will be disabled.

Private Test Node Deployment¶

host installationKubernetes

To deploy private dial-test nodes, you need to create private dial-test nodes on Guance Cloud page. When you're done, fill in the page with the relevant information in conf.d/network/dialtesting.conf:

Go to the conf.d/network directory under the DataKit installation directory, copy dialtesting.conf.sample and name it dialtesting.conf. Examples are as follows:

[[inputs.dialtesting]]
  # We can also configure a JSON path like "file:///your/dir/json-file-name"
  server = "https://dflux-dial.guance.com"

  # [require] node ID
  region_id = "default"

  # if server are dflux-dial.guance.com, ak/sk required
  ak = ""
  sk = ""

  # The interval to pull the tasks.
  pull_interval = "1m"

  # The timeout for the HTTP request.
  time_out = "30s"

  # The number of the workers.
  workers = 6

  # Collect related metric when job execution time error interval is larger than task_exec_time_interval
  task_exec_time_interval = "5s"

  # Stop the task when the task failed to send data to dataway over max_send_fail_count.
  max_send_fail_count = 16

  # The max number of jobs sending data to dataway in parallel. Default 10.
  max_job_number = 10

  # The max number of job chan. Default 1000.
  max_job_chan_number = 1000

  # Disable internal network task.
  disable_internal_network_task = true

  # Disable internal network cidr list.
  disabled_internal_network_cidr_list = []

  # Custom tags.
  [inputs.dialtesting.tags]
  # some_tag = "some_value"
  # more_tag = "some_other_value"
  # ...

Once configured, restart DataKit.

The collector can now be turned on by ConfigMap injection collector configuration.

Attention

Currently, only Linux dial-up nodes support, and the tracing data is stored in the traceroute field of the relevant metrics.

Dial Test Deployment Map¶

Metric¶

Dialtesting collector could expose some Prometheus metrics. You can upload these metrics to Guance Cloud through Datakit collector. The relevant configuration is as follows:

[[inputs.dk]]
  ......

  metric_name_filter = [

  ### others...

  ### dialtesting
  "datakit_dialtesting_.*",

  ]

  ......

Log¶

All of the following data collections are appended with a global tag named host by default (the tag value is the host name of the DataKit), or can be named in the configuration by [[inputs.dialtesting.tags]] alternative host.

`http_dial_testing`¶

tag

Tag	Description
`city`	The name of the city
`country`	The name of the country
`datakit_version`	The DataKit version
`dest_ip`	The IP address of the destination
`internal`	The boolean value, true for domestic and false for overseas
`isp`	ISP, such as `chinamobile`, `chinaunicom`, `chinatelecom`
`method`	HTTP method, such as `GET`
`name`	The name of the task
`owner`	The owner name
`proto`	The protocol of the HTTP, such as 'HTTP/1.1'
`province`	The name of the province
`status`	The status of the task, either 'OK' or 'FAIL'
`status_code_class`	The class of the status code, such as '2xx'
`status_code_string`	The status string, such as '200 OK'
`url`	The URL of the endpoint to be monitored

metric list

Metric	Description	Type	Unit
`fail_reason`	The reason that leads to the failure of the task	string	-
`message`	The message string which includes the header and the body of the request or the response	string	-
`response_body_size`	The length of the body of the response	int	B
`response_connection`	HTTP connection time	float	μs
`response_dns`	HTTP DNS parsing time	float	μs
`response_download`	HTTP downloading time	float	μs
`response_ssl`	HTTP ssl handshake time	float	μs
`response_time`	The time of the response	int	μs
`response_ttfb`	HTTP response `ttfb`	float	μs
`seq_number`	The sequence number of the test	int	count
`status_code`	The response code	int	-
`success`	The number to specify whether is successful, 1 for success, -1 for failure	int	-

`tcp_dial_testing`¶

tag

Tag	Description
`city`	The name of the city
`country`	The name of the country
`datakit_version`	The DataKit version
`dest_host`	The name of the host to be monitored
`dest_ip`	The IP address
`dest_port`	The port of the TCP connection
`internal`	The boolean value, true for domestic and false for overseas
`isp`	ISP, such as `chinamobile`, `chinaunicom`, `chinatelecom`
`name`	The name of the task
`owner`	The owner name
`proto`	The protocol of the task
`province`	The name of the province
`status`	The status of the task, either 'OK' or 'FAIL'

metric list

Metric	Description	Type	Unit
`fail_reason`	The reason that leads to the failure of the task	string	-
`message`	The message string includes the response time or fail reason	string	-
`response_time`	The time of the response	int	μs
`response_time_with_dns`	The time of the response, which contains DNS time	int	μs
`seq_number`	The sequence number of the test	int	count
`success`	The number to specify whether is successful, 1 for success, -1 for failure	int	-
`traceroute`	The json string fo the `traceroute` result	string	-

`icmp_dial_testing`¶

tag

Tag	Description
`city`	The name of the city
`country`	The name of the country
`datakit_version`	The DataKit version
`dest_host`	The name of the host to be monitored
`internal`	The boolean value, true for domestic and false for overseas
`isp`	ISP, such as `chinamobile`, `chinaunicom`, `chinatelecom`
`name`	The name of the task
`owner`	The owner name
`proto`	The protocol of the task
`province`	The name of the province
`status`	The status of the task, either 'OK' or 'FAIL'

metric list

Metric	Description	Type	Unit
`average_round_trip_time`	The average time of the round trip(RTT)	float	μs
`average_round_trip_time_in_millis`	The average time of the round trip(RTT), deprecated	float	ms
`fail_reason`	The reason that leads to the failure of the task	string	-
`max_round_trip_time`	The maximum time of the round trip(RTT)	float	μs
`max_round_trip_time_in_millis`	The maximum time of the round trip(RTT), deprecated	float	ms
`message`	The message string includes the average time of the round trip or the failure reason	string	-
`min_round_trip_time`	The minimum time of the round trip(RTT)	float	μs
`min_round_trip_time_in_millis`	The minimum time of the round trip(RTT), deprecated	float	ms
`packet_loss_percent`	The loss percent of the packets	float	-
`packets_received`	The number of the packets received	int	count
`packets_sent`	The number of the packets sent	int	count
`seq_number`	The sequence number of the test	int	count
`std_round_trip_time`	The standard deviation of the round trip	float	μs
`std_round_trip_time_in_millis`	The standard deviation of the round trip, deprecated	float	ms
`success`	The number to specify whether is successful, 1 for success, -1 for failure	int	-
`traceroute`	The `json` string fo the `traceroute` result	string	-

`websocket_dial_testing`¶

tag

Tag	Description
`city`	The name of the city
`country`	The name of the country
`datakit_version`	The DataKit version
`internal`	The boolean value, true for domestic and false for overseas
`isp`	ISP, such as `chinamobile`, `chinaunicom`, `chinatelecom`
`name`	The name of the task
`owner`	The owner name
`proto`	The protocol of the task
`province`	The name of the province
`status`	The status of the task, either 'OK' or 'FAIL'
`url`	The URL string, such as `ws://www.abc.com`

metric list

Metric	Description	Type	Unit
`fail_reason`	The reason that leads to the failure of the task	string	-
`message`	The message string includes the response time or the failure reason	string	-
`response_message`	The message of the response	string	-
`response_time`	The time of the response	int	μs
`response_time_with_dns`	The time of the response, include DNS	int	μs
`sent_message`	The sent message	string	-
`seq_number`	The sequence number of the test	int	count
`success`	The number to specify whether is successful, 1 for success, -1 for failure	int	-

`traceroute` Field Description¶

traceroute is the JSON text of the "route trace" data, and the entire data is an array object in which each array element records a route probe, as shown in the following example:

[
    {
        "total": 2,
        "failed": 0,
        "loss": 0,
        "avg_cost": 12700395,
        "min_cost": 11902041,
        "max_cost": 13498750,
        "std_cost": 1129043,
        "items": [
            {
                "ip": "10.8.9.1",
                "response_time": 13498750
            },
            {
                "ip": "10.8.9.1",
                "response_time": 11902041
            }
        ]
    },
    {
        "total": 2,
        "failed": 0,
        "loss": 0,
        "avg_cost": 13775021,
        "min_cost": 13740084,
        "max_cost": 13809959,
        "std_cost": 49409,
        "items": [
            {
                "ip": "10.12.168.218",
                "response_time": 13740084
            },
            {
                "ip": "10.12.168.218",
                "response_time": 13809959
            }
        ]
    }
]

Field description:

Field	Type	Description
`total`	number	Total number of detections
`failed`	number	Number of failures
`loss`	number	Percentage of failure
`avg_cost`	number	Average time spent (μs)
`min_cost`	number	Minimum time consumption (μs)
`max_cost`	number	Maximum time consumption(μs)
`std_cost`	number	Standard deviation of time consumption(μs)
`items`	Array of items	Per probe information (see)

Item¶

Field	Type	Description
`ip`	string	IP address, if it fails, the value is *
`response_time`	number	Response time (μs)

Network Dial Test¶

Configuration¶

Environment Variables¶

Private Test Node Deployment¶

Dial Test Deployment Map¶

Metric¶

Log¶

http_dial_testing¶

tcp_dial_testing¶

icmp_dial_testing¶

websocket_dial_testing¶

traceroute Field Description¶