Skip to content

Network Dial Testing


This collector gathers data from network dial testing results. All data generated by dial testing is reported to Guance.

Configuration

For private dial testing node deployment, you need to create a private dial testing node on the Guance page. After creation, fill in the relevant information from the page into conf.d/network/dialtesting.conf:

Enter the conf.d/network directory under the DataKit installation directory, copy dialtesting.conf.sample, and rename it to dialtesting.conf. Example configuration as follows:

[[inputs.dialtesting]]
  # We can also configure a JSON path like "file:///your/dir/json-file-name"
  server = "https://dflux-dial.guance.com"

  # [require] node ID
  region_id = "default"

  # If the server is dflux-dial.guance.com, ak/sk are required
  ak = ""
  sk = ""

  # The interval to pull the tasks.
  pull_interval = "1m"

  # The timeout for the HTTP request.
  time_out = "30s"

  # The number of workers.
  workers = 6

  # Collect related metrics when job execution time error interval is larger than task_exec_time_interval
  task_exec_time_interval = "5s"

  # Stop the task when the task fails to send data to DataWay over max_send_fail_count.
  max_send_fail_count = 16

  # The maximum sleep time when sending data to DataWay fails.
  max_send_fail_sleep_time = "30m"

  # The maximum number of jobs sending data to DataWay in parallel. Default 10.
  max_job_number = 10

  # The maximum number of job channels. Default 1000.
  max_job_chan_number = 1000

  # Disable internal network tasks.
  disable_internal_network_task = true

  # Disable internal network CIDR list.
  disabled_internal_network_cidr_list = []

  # Custom tags.
  [inputs.dialtesting.tags]
  # some_tag = "some_value"
  # more_tag = "some_other_value"
  # ...

After configuration, restart DataKit.

You can inject the collector configuration via ConfigMap or enable the collector by configuring ENV_DATAKIT_INPUTS.

Environment variables can also be used to modify configuration parameters (requires adding to ENV_DEFAULT_ENABLED_INPUTS as default collectors):

  • ENV_INPUT_DIALTESTING_ENV_INPUT_DIALTESTING_DISABLE_INTERNAL_NETWORK_TASK

    Whether to allow internal network addresses/services to be tested. Default is not allowed.

    Field Type: Boolean

    Collector Configuration Field: disable_internal_network_task

    Example: true

    Default Value: false

  • ENV_INPUT_DIALTESTING_ENV_INPUT_DIALTESTING_DISABLED_INTERNAL_NETWORK_CIDR_LIST

    List of CIDR addresses that are prohibited from being tested.

    Field Type: List

    Collector Configuration Field: disabled_internal_network_cidr_list

    Example: ["192.168.0.0/16"]

    Default Value: -

  • ENV_INPUT_DIALTESTING_ENV_INPUT_DIALTESTING_ENABLE_DEBUG_API

    Prohibit dial testing debug API (default prohibited)

    Field Type: Boolean

    Collector Configuration Field: env_input_dialtesting_enable_debug_api

    Example: false

    Default Value: false


Attention

Currently, only Linux dial testing nodes support "route tracing", and the trace data will be saved in the traceroute field of the related metrics.

Dial Testing Node Deployment

The following diagram shows the network topology for deploying dial testing nodes. There are two ways to deploy dial testing nodes:

  • Public Network Dial Testing Nodes: Directly use the globally deployed dial testing nodes provided by Guance to monitor services on the public internet.
  • Private Network Dial Testing Nodes: If you need to test services within your internal network, you must deploy private dial testing nodes yourself. Of course, if the network allows, these private dial testing nodes can also test services on the public internet.

Regardless of whether they are public or private dial testing nodes, they can all create dial testing tasks through the web interface.

graph TD
  %% node definitions
  dt_web(Dial Testing Web UI);
  dt_db(Public Storage for Dial Testing Tasks);
  dt_pub(DataKit Public Dial Testing Node);
  dt_pri(DataKit Private Dial Testing Node);
  site_inner(Internal Site);
  site_pub(Public Site);
  dw_inner(Internal Dataway);
  dw_pub(Public Dataway);
  guance(Guance);

  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  dt_web -->|Create Dial Testing Task| dt_db;
  dt_db -->|Fetch Dial Testing Task| dt_pub -->|Dial Testing Results| dw_pub --> guance;
  dt_db -->|Fetch Dial Testing Task| dt_pri;
  dt_pub <-->|Perform Dial Testing| site_pub;

  dt_pri <-.->|Perform Dial Testing| site_pub;
  dw_inner --> guance;
  subgraph "User Internal Network"
  dt_pri <-->|Perform Dial Testing| site_inner;
  dt_pri -->|Dial Testing Results| dw_inner;
  end

Logging

http_dial_testing

  • Tags
Tag Description
city Name of the city
country Name of the country
datakit_version DataKit version
dest_ip IP address of the destination
df_label Label of the task
internal Boolean value, true for domestic and false for overseas
isp ISP, such as chinamobile, chinaunicom, chinatelecom
method HTTP method, such as GET
name Name of the task
node_name Name of the node
owner Owner name
proto Protocol of the HTTP, such as 'HTTP/1.1'
province Name of the province
status Status of the task, either 'OK' or 'FAIL'
status_code_class Class of the status code, such as '2xx'
status_code_string Status string, such as '200 OK'
url URL of the endpoint to be monitored
  • Metrics List
Metric Description Type Unit
config_vars Configuration variables of the task string -
fail_reason Reason leading to the failure of the task string -
message Message string which includes the header and body of the request or response string -
response_body_size Length of the response body int B
response_connection HTTP connection time float μs
response_dns HTTP DNS parsing time float μs
response_download HTTP downloading time float μs
response_ssl HTTP SSL handshake time float μs
response_time Response time int μs
response_ttfb HTTP response ttfb float μs
seq_number Sequence number of the test int count
status_code Response code int -
success Number specifying success, 1 for success, -1 for failure int -
task Raw task string string -

tcp_dial_testing

  • Tags
Tag Description
city Name of the city
country Name of the country
datakit_version DataKit version
dest_host Name of the host to be monitored
dest_ip IP address
dest_port Port of the TCP connection
df_label Label of the task
internal Boolean value, true for domestic and false for overseas
isp ISP, such as chinamobile, chinaunicom, chinatelecom
name Name of the task
node_name Name of the node
owner Owner name
proto Protocol of the task
province Name of the province
status Status of the task, either 'OK' or 'FAIL'
  • Metrics List
Metric Description Type Unit
config_vars Configuration variables of the task string -
fail_reason Reason leading to the failure of the task string -
message Message string including response time or fail reason string -
response_time Response time int μs
response_time_with_dns Response time including DNS time int μs
seq_number Sequence number of the test int count
success Number specifying success, 1 for success, -1 for failure int -
task Raw task string string -
traceroute JSON string of the traceroute result string -

icmp_dial_testing

  • Tags
Tag Description
city Name of the city
country Name of the country
datakit_version DataKit version
dest_host Name of the host to be monitored
df_label Label of the task
internal Boolean value, true for domestic and false for overseas
isp ISP, such as chinamobile, chinaunicom, chinatelecom
name Name of the task
node_name Name of the node
owner Owner name
proto Protocol of the task
province Name of the province
status Status of the task, either 'OK' or 'FAIL'
  • Metrics List
Metric Description Type Unit
average_round_trip_time Average round trip time(RTT) float μs
average_round_trip_time_in_millis Average round trip time(RTT), deprecated float ms
config_vars Configuration variables of the task string -
fail_reason Reason leading to the failure of the task string -
max_round_trip_time Maximum round trip time(RTT) float μs
max_round_trip_time_in_millis Maximum round trip time(RTT), deprecated float ms
message Message string including average round trip time or failure reason string -
min_round_trip_time Minimum round trip time(RTT) float μs
min_round_trip_time_in_millis Minimum round trip time(RTT), deprecated float ms
packet_loss_percent Packet loss percentage float -
packets_received Number of packets received int count
packets_sent Number of packets sent int count
seq_number Sequence number of the test int count
std_round_trip_time Standard deviation of the round trip float μs
std_round_trip_time_in_millis Standard deviation of the round trip, deprecated float ms
success Number specifying success, 1 for success, -1 for failure int -
task Raw task string string -
traceroute JSON string of the traceroute result string -

websocket_dial_testing

  • Tags
Tag Description
city Name of the city
country Name of the country
datakit_version DataKit version
df_label Label of the task
internal Boolean value, true for domestic and false for overseas
isp ISP, such as chinamobile, chinaunicom, chinatelecom
name Name of the task
node_name Name of the node
owner Owner name
proto Protocol of the task
province Name of the province
status Status of the task, either 'OK' or 'FAIL'
url URL string, such as ws://www.abc.com
  • Metrics List
Metric Description Type Unit
config_vars Configuration variables of the task string -
fail_reason Reason leading to the failure of the task string -
message Message string including response time or failure reason string -
response_message Response message string -
response_time Response time int μs
response_time_with_dns Response time including DNS int μs
sent_message Sent message string -
seq_number Sequence number of the test int count
success Number specifying success, 1 for success, -1 for failure int -
task Raw task string string -

multi_dial_testing

  • Tags
Tag Description
city Name of the city
country Name of the country
datakit_version DataKit version
df_label Label of the task
internal Boolean value, true for domestic and false for overseas
isp ISP, such as chinamobile, chinaunicom, chinatelecom
name Name of the task
node_name Name of the node
owner Owner name
province Name of the province
status Status of the task, either 'OK' or 'FAIL'
  • Metrics List
Metric Description Type Unit
config_vars Configuration variables of the task string -
fail_reason Reason leading to the failure of the task string -
last_step Last step number executed int -
message Message string which includes the header and body of the request or response string -
response_time Response time int μs
seq_number Sequence number of the test int count
steps Result of each step string -
success Number specifying success, 1 for success, -1 for failure int -
task Raw task string string -

traceroute

traceroute is the JSON text of "route tracing" data. The entire data is an array object, where each element records the relevant details of one route probe. Example as follows:

[
    {
        "total": 2,
        "failed": 0,
        "loss": 0,
        "avg_cost": 12700395,
        "min_cost": 11902041,
        "max_cost": 13498750,
        "std_cost": 1129043,
        "items": [
            {
                "ip": "10.8.9.1",
                "response_time": 13498750
            },
            {
                "ip": "10.8.9.1",
                "response_time": 11902041
            }
        ]
    },
    {
        "total": 2,
        "failed": 0,
        "loss": 0,
        "avg_cost": 13775021,
        "min_cost": 13740084,
        "max_cost": 13809959,
        "std_cost": 49409,
        "items": [
            {
                "ip": "10.12.168.218",
                "response_time": 13740084
            },
            {
                "ip": "10.12.168.218",
                "response_time": 13809959
            }
        ]
    }
]

Field Descriptions:

Field Type Description
total number Total number of probes
failed number Number of failures
loss number Failure percentage
avg_cost number Average latency (μs)
min_cost number Minimum latency (μs)
max_cost number Maximum latency (μs)
std_cost number Standard deviation of latency (μs)
items Array of Items Details of each probe (see below items field description)

items Field Description

Field Type Description
ip string IP address, * if failed
response_time number Response time (μs)

Self-Monitoring Metrics Collection for Dial Testing Collector

The dial testing collector exposes Prometheus metrics. If you need to report these metrics to Guance, you can collect them using the DataKit collector. Refer to the following configuration example:

[[inputs.dk]]
  ......

  metric_name_filter = [

  ### others...

  ### dialtesting
  "datakit_dialtesting_.*",

  ]

  ......

Feedback

Is this page helpful? ×