跳转至

网络拨测


该采集器是网络拨测结果数据采集,所有拨测产生的数据,上报观测云。

配置

私有拨测节点部署,需在 观测云页面创建私有拨测节点。创建完成后,将页面上相关信息填入 conf.d/samples/dialtesting.conf 即可:

进入 DataKit 安装目录下的 conf.d/samples 目录,复制 dialtesting.conf.sample 并命名为 dialtesting.conf。示例如下:

[[inputs.dialtesting]]
  # We can also configure a JSON path like "file:///your/dir/json-file-name"
  server = "https://dflux-dial.guance.com"

  # [require] node ID
  region_id = "default"

  # if server are dflux-dial.guance.com, ak/sk required
  ak = ""
  sk = ""

  # The interval to pull the tasks.
  pull_interval = "1m"

  # The timeout for the HTTP request.
  time_out = "30s"

  # The number of the workers.
  workers = 6

  # Collect related metric when job execution time error interval is larger than task_exec_time_interval
  task_exec_time_interval = "5s"

  # Stop the task when the task failed to send data to dataway over max_send_fail_count.
  max_send_fail_count = 16

  # The max sleep time when send data to dataway failed.
  max_send_fail_sleep_time = "30m"

  # The max number of jobs sending data to dataway in parallel. Default 10.
  max_job_number = 10

  # The max number of job chan. Default 1000.
  max_job_chan_number = 1000

  # The max number of icmp packets sent at one time. Default 0, no limit.
  max_icmp_concurrency = 0

  # The max number of points in cache for each type of task. Default 10000.
  max_cache_points_number = 10000

  # Disable internal network task.
  disable_internal_network_task = true

  # Disable internal network cidr list.
  disabled_internal_network_cidr_list = []

  # Set true to enable election
  election = false

  [inputs.dialtesting.browser]
    # Enable browser dialtesting on Linux nodes. Enabled by default.
    enabled = true

    # Browser engine used for browser dialtesting.
    # Supported engine: lightpanda.
    engine = "lightpanda"

    # Optional browser engine executable path.
    # If empty, the embedded browser runner will use LIGHTPANDA_EXECUTABLE_PATH or PATH.
    engine_path = ""

    # Max browser dialtesting tasks running at the same time. 0 means no limit.
    max_concurrency = 0

  # Custom tags.
  [inputs.dialtesting.tags]
  # some_tag = "some_value"
  # more_tag = "some_other_value"
  # ...

配置好后,重启 DataKit 即可。

可通过 ConfigMap 方式注入采集器配置配置 ENV_DATAKIT_INPUTS 开启采集器。

也支持以环境变量的方式修改配置参数(需要在 ENV_DEFAULT_ENABLED_INPUTS 中加为默认采集器):

  • ENV_INPUT_DIALTESTING_DISABLE_INTERNAL_NETWORK_TASK

    是否允许内网地址/服务的拨测。默认不允许

    字段类型: Boolean

    采集器配置字段: disable_internal_network_task

    示例: true

    默认值: true

  • ENV_INPUT_DIALTESTING_DISABLED_INTERNAL_NETWORK_CIDR_LIST

    禁止拨测的 CIDR 地址列表

    字段类型: List

    采集器配置字段: disabled_internal_network_cidr_list

    示例: ["192.168.0.0/16"]

    默认值: -

  • ENV_INPUT_DIALTESTING_ENABLE_DEBUG_API

    禁止拨测调试接口(默认禁止)

    字段类型: Boolean

    采集器配置字段: enable_debug_api

    示例: false

    默认值: false

  • ENV_INPUT_DIALTESTING_ELECTION

    开启选举功能(默认禁止)

    字段类型: Boolean

    采集器配置字段: election

    示例: false

    默认值: false

  • ENV_INPUT_DIALTESTING_BROWSER_ENABLED

    是否开启浏览器拨测

    字段类型: Boolean

    采集器配置字段: browser.enabled

    示例: false

    默认值: true

  • ENV_INPUT_DIALTESTING_BROWSER_ENGINE

    浏览器拨测使用的引擎,支持 lightpanda

    字段类型: String

    采集器配置字段: browser.engine

    示例: lightpanda

    默认值: lightpanda

  • ENV_INPUT_DIALTESTING_BROWSER_ENGINE_PATH

    浏览器拨测使用的浏览器引擎可执行文件路径

    字段类型: String

    采集器配置字段: browser.engine_path

    示例: /usr/local/bin/lightpanda

    默认值: -

  • ENV_INPUT_DIALTESTING_BROWSER_MAX_CONCURRENCY

    同一时间最多执行的浏览器拨测任务数,0 表示不限制

    字段类型: Int

    采集器配置字段: browser.max_concurrency

    示例: 1

    默认值: 0


Note

目前只有 Linux 的拨测节点才支持「路由跟踪」,跟踪数据会保存在相关指标的 traceroute 字段中。

Note

浏览器拨测从 DataKit Version-2.1.0 开始支持。

浏览器拨测任务(BROWSER)在 Linux 拨测节点上默认执行;如需关闭,可设置 [inputs.dialtesting.browser].enabled = false。执行浏览器任务时,DataKit 需能访问 Lightpanda;如需控制资源峰值,可设置 [inputs.dialtesting.browser].max_concurrency 限制并发数。

Kubernetes 中推荐使用内置 Lightpanda 的 datakit:<version> 镜像。

更多部署、任务配置和排查说明,请参考浏览器拨测

拨测节点部署

以下是拨测节点的网络部署拓扑图,这里存在两种拨测节点部署方式:

  • 公网拨测节点:直接使用观测云在全球部署的拨测节点来检测 公网 的服务运行情况。
  • 私网拨测节点:如果需要拨测用户 内网 的服务,此时需要用户自行部署 私有 的拨测节点。当然,如果网络允许,这些私有的拨测节点也能部署公网上的服务。
Note

当拨测节点部署在内网环境而无法访问外网时,可通过配置代理服务实现流量转发。具体配置方法请参考 DataKit 内置代理的相关说明。

不管是公网拨测节点,还是私有拨测节点,它们都能通过 Web 页面创建拨测任务。

如果拨测节点需要执行浏览器拨测任务,请确保该节点所在环境满足以下条件:

  • Lightpanda 可被 DataKit 进程访问。
  • 节点可以访问被测站点,以及任务 post_url 对应的 Dataway,用于上报拨测结果。
graph TD
  %% node definitions
  dt_web(拨测 Web UI);
  dt_db(拨测任务公网存储);
  dt_pub(DataKit 公网拨测节点);
  dt_pri(DataKit 私有拨测节点);
  site_inner(内网站点);
  site_pub(公网站点);
  dw_inner(内网 Dataway);
  dw_pub(公网 Dataway);
  server(观测云);

  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  dt_web -->|创建拨测任务| dt_db;
  dt_db -->|拉取拨测任务| dt_pub -->|拨测结果| dw_pub --> server;
  dt_db -->|拉取拨测任务| dt_pri;
  dt_pub <-->|实施拨测| site_pub;

  dt_pri <-.->|实施拨测| site_pub;
  dw_inner --> server;
  subgraph "用户内网"
  dt_pri <-->|实施拨测| site_inner;
  dt_pri -->|拨测结果| dw_inner;
  end

日志

http_dial_testing

Tags & Fields Description
city
(tag)
The name of the city
country
(tag)
The name of the country
datakit_version
(tag)
The DataKit version
dest_ip
(tag)
The IP address of the destination
df_label
(tag)
The label of the task
internal
(tag)
The boolean value, true for domestic and false for overseas
isp
(tag)
ISP, such as chinamobile, chinaunicom, chinatelecom
method
(tag)
HTTP method, such as GET
name
(tag)
The name of the task
node_name
(tag)
The name of the node
owner
(tag)
The owner name
proto
(tag)
The protocol of the HTTP, such as 'HTTP/1.1'
province
(tag)
The name of the province
status
(tag)
The status of the task, either 'OK' or 'FAIL'
status_code_class
(tag)
The class of the status code, such as '2xx'
status_code_string
(tag)
The status string, such as '200 OK'
url
(tag)
The URL of the endpoint to be monitored
config_vars The configuration variables of the task
Type: string | (gauge)
Unit: N/A
fail_reason The reason that leads to the failure of the task
Type: string | (gauge)
Unit: N/A
message The message string which includes the header and the body of the request or the response
Type: string | (gauge)
Unit: N/A
response_body_size The length of the body of the response
Type: int | (gauge)
Unit: digital,B
response_connection HTTP connection time
Type: float | (gauge)
Unit: time,μs
response_dns HTTP DNS parsing time
Type: float | (gauge)
Unit: time,μs
response_download HTTP downloading time
Type: float | (gauge)
Unit: time,μs
response_ssl HTTP ssl handshake time
Type: float | (gauge)
Unit: time,μs
response_time The time of the response
Type: int | (gauge)
Unit: time,μs
response_ttfb HTTP response ttfb
Type: float | (gauge)
Unit: time,μs
seq_number The sequence number of the test
Type: int | (gauge)
Unit: count
ssl_cert_expires_in_days The SSL certificate expires in days
Type: int | (gauge)
Unit: time,d
ssl_cert_not_after The SSL certificate not after time
Type: int | (gauge)
Unit: timeStamp,usec
status_code The response code
Type: int | (gauge)
Unit: N/A
success The number to specify whether is successful, 1 for success, -1 for failure
Type: int | (gauge)
Unit: N/A
task The raw task string
Type: string | (gauge)
Unit: N/A
task_id The dialtesting task external ID
Type: string | (gauge)
Unit: N/A

tcp_dial_testing

Tags & Fields Description
city
(tag)
The name of the city
country
(tag)
The name of the country
datakit_version
(tag)
The DataKit version
dest_host
(tag)
The name of the host to be monitored
dest_ip
(tag)
The IP address
dest_port
(tag)
The port of the TCP connection
df_label
(tag)
The label of the task
internal
(tag)
The boolean value, true for domestic and false for overseas
isp
(tag)
ISP, such as chinamobile, chinaunicom, chinatelecom
name
(tag)
The name of the task
node_name
(tag)
The name of the node
owner
(tag)
The owner name
proto
(tag)
The protocol of the task
province
(tag)
The name of the province
status
(tag)
The status of the task, either 'OK' or 'FAIL'
config_vars The configuration variables of the task
Type: string | (gauge)
Unit: N/A
fail_reason The reason that leads to the failure of the task
Type: string | (gauge)
Unit: N/A
message The message string includes the response time or fail reason
Type: string | (gauge)
Unit: N/A
response_time The time of the response
Type: int | (gauge)
Unit: time,μs
response_time_with_dns The time of the response, which contains DNS time
Type: int | (gauge)
Unit: time,μs
seq_number The sequence number of the test
Type: int | (gauge)
Unit: count
success The number to specify whether is successful, 1 for success, -1 for failure
Type: int | (gauge)
Unit: N/A
task The raw task string
Type: string | (gauge)
Unit: N/A
task_id The dialtesting task external ID
Type: string | (gauge)
Unit: N/A
traceroute The json string fo the traceroute result
Type: string | (gauge)
Unit: N/A

icmp_dial_testing

Tags & Fields Description
city
(tag)
The name of the city
country
(tag)
The name of the country
datakit_version
(tag)
The DataKit version
dest_host
(tag)
The name of the host to be monitored
df_label
(tag)
The label of the task
internal
(tag)
The boolean value, true for domestic and false for overseas
isp
(tag)
ISP, such as chinamobile, chinaunicom, chinatelecom
name
(tag)
The name of the task
node_name
(tag)
The name of the node
owner
(tag)
The owner name
proto
(tag)
The protocol of the task
province
(tag)
The name of the province
status
(tag)
The status of the task, either 'OK' or 'FAIL'
average_round_trip_time The average time of the round trip(RTT)
Type: float | (gauge)
Unit: time,μs
average_round_trip_time_in_millis The average time of the round trip(RTT), deprecated
Type: float | (gauge)
Unit: time,ms
config_vars The configuration variables of the task
Type: string | (gauge)
Unit: N/A
fail_reason The reason that leads to the failure of the task
Type: string | (gauge)
Unit: N/A
max_round_trip_time The maximum time of the round trip(RTT)
Type: float | (gauge)
Unit: time,μs
max_round_trip_time_in_millis The maximum time of the round trip(RTT), deprecated
Type: float | (gauge)
Unit: time,ms
message The message string includes the average time of the round trip or the failure reason
Type: string | (gauge)
Unit: N/A
min_round_trip_time The minimum time of the round trip(RTT)
Type: float | (gauge)
Unit: time,μs
min_round_trip_time_in_millis The minimum time of the round trip(RTT), deprecated
Type: float | (gauge)
Unit: time,ms
packet_loss_percent The loss percent of the packets
Type: float | (gauge)
Unit: percent,percent
packets_received The number of the packets received
Type: int | (gauge)
Unit: count
packets_sent The number of the packets sent
Type: int | (gauge)
Unit: count
seq_number The sequence number of the test
Type: int | (gauge)
Unit: count
std_round_trip_time The standard deviation of the round trip
Type: float | (gauge)
Unit: time,μs
std_round_trip_time_in_millis The standard deviation of the round trip, deprecated
Type: float | (gauge)
Unit: time,ms
success The number to specify whether is successful, 1 for success, -1 for failure
Type: int | (gauge)
Unit: N/A
task The raw task string
Type: string | (gauge)
Unit: N/A
task_id The dialtesting task external ID
Type: string | (gauge)
Unit: N/A
traceroute The json string fo the traceroute result
Type: string | (gauge)
Unit: N/A

websocket_dial_testing

Tags & Fields Description
city
(tag)
The name of the city
country
(tag)
The name of the country
datakit_version
(tag)
The DataKit version
df_label
(tag)
The label of the task
internal
(tag)
The boolean value, true for domestic and false for overseas
isp
(tag)
ISP, such as chinamobile, chinaunicom, chinatelecom
name
(tag)
The name of the task
node_name
(tag)
The name of the node
owner
(tag)
The owner name
proto
(tag)
The protocol of the task
province
(tag)
The name of the province
status
(tag)
The status of the task, either 'OK' or 'FAIL'
url
(tag)
The URL string, such as ws://www.abc.com
config_vars The configuration variables of the task
Type: string | (gauge)
Unit: N/A
fail_reason The reason that leads to the failure of the task
Type: string | (gauge)
Unit: N/A
message The message string includes the response time or the failure reason
Type: string | (gauge)
Unit: N/A
response_message The message of the response
Type: string | (gauge)
Unit: N/A
response_time The time of the response
Type: int | (gauge)
Unit: time,μs
response_time_with_dns The time of the response, include DNS
Type: int | (gauge)
Unit: time,μs
sent_message The sent message
Type: string | (gauge)
Unit: N/A
seq_number The sequence number of the test
Type: int | (gauge)
Unit: count
ssl_cert_expires_in_days The SSL certificate expires in days
Type: int | (gauge)
Unit: time,d
ssl_cert_not_after The SSL certificate not after time
Type: int | (gauge)
Unit: timeStamp,usec
success The number to specify whether is successful, 1 for success, -1 for failure
Type: int | (gauge)
Unit: N/A
task The raw task string
Type: string | (gauge)
Unit: N/A
task_id The dialtesting task external ID
Type: string | (gauge)
Unit: N/A

multi_dial_testing

Tags & Fields Description
city
(tag)
The name of the city
country
(tag)
The name of the country
datakit_version
(tag)
The DataKit version
df_label
(tag)
The label of the task
internal
(tag)
The boolean value, true for domestic and false for overseas
isp
(tag)
ISP, such as chinamobile, chinaunicom, chinatelecom
name
(tag)
The name of the task
node_name
(tag)
The name of the node
owner
(tag)
The owner name
province
(tag)
The name of the province
status
(tag)
The status of the task, either 'OK' or 'FAIL'
config_vars The configuration variables of the task
Type: string | (gauge)
Unit: N/A
fail_reason The reason that leads to the failure of the task
Type: string | (gauge)
Unit: N/A
last_step The last number of the task be executed
Type: int | (gauge)
Unit: count
message The message string which includes the header and the body of the request or the response
Type: string | (gauge)
Unit: N/A
response_time The time of the response
Type: int | (gauge)
Unit: time,μs
seq_number The sequence number of the test
Type: int | (gauge)
Unit: count
steps The result of each step
Type: string | (gauge)
Unit: N/A
success The number to specify whether is successful, 1 for success, -1 for failure
Type: int | (gauge)
Unit: N/A
task The raw task string
Type: string | (gauge)
Unit: N/A
task_id The dialtesting task external ID
Type: string | (gauge)
Unit: N/A

grpc_dial_testing

Tags & Fields Description
city
(tag)
The name of the city
country
(tag)
The name of the country
datakit_version
(tag)
The DataKit version
dest_host
(tag)
The name of the host to be monitored
df_label
(tag)
The label of the task
internal
(tag)
The boolean value, true for domestic and false for overseas
isp
(tag)
ISP, such as chinamobile, chinaunicom, chinatelecom
method
(tag)
The gRPC method name
name
(tag)
The name of the task
node_name
(tag)
The name of the node
owner
(tag)
The owner name
proto
(tag)
The protocol of the task
province
(tag)
The name of the province
server
(tag)
The gRPC server address
status
(tag)
The status of the task, either 'OK' or 'FAIL'
config_vars The configuration variables of the task
Type: string | (gauge)
Unit: N/A
fail_reason The reason that leads to the failure of the task
Type: string | (gauge)
Unit: N/A
message The message string includes the response time or the failure reason
Type: string | (gauge)
Unit: N/A
response_time The time of the response
Type: int | (gauge)
Unit: time,μs
seq_number The sequence number of the test
Type: int | (gauge)
Unit: count
ssl_cert_expires_in_days The SSL certificate expires in days
Type: int | (gauge)
Unit: time,d
ssl_cert_not_after The SSL certificate not after time
Type: int | (gauge)
Unit: timeStamp,usec
success The number to specify whether is successful, 1 for success, -1 for failure
Type: int | (gauge)
Unit: N/A
task The raw task string
Type: string | (gauge)
Unit: N/A
task_id The dialtesting task external ID
Type: string | (gauge)
Unit: N/A

browser_dial_testing

Tags & Fields Description
browser_engine
(tag)
The browser engine used to run the task
city
(tag)
The name of the city
country
(tag)
The name of the country
datakit_version
(tag)
The DataKit version
df_label
(tag)
The label of the task
internal
(tag)
The boolean value, true for domestic and false for overseas
isp
(tag)
ISP, such as chinamobile, chinaunicom, chinatelecom
name
(tag)
The name of the task
node_name
(tag)
The name of the node
owner
(tag)
The owner name
province
(tag)
The name of the province
status
(tag)
The status of the task, either 'OK' or 'FAIL'
url
(tag)
The URL of the page to be monitored
viewport
(tag)
The browser viewport size, such as 1920x1080
browser_config_vars The JSON string of variables defined in browser_config
Type: string | (gauge)
Unit: N/A
browser_run_id The browser run ID
Type: string | (gauge)
Unit: N/A
config_vars The configuration variables of the task
Type: string | (gauge)
Unit: N/A
fail_reason The reason that leads to the failure of the task
Type: string | (gauge)
Unit: N/A
has_screenshot Whether the browser run has uploaded screenshots
Type: bool | (gauge)
Unit: N/A
last_step The last browser step sequence number
Type: int | (gauge)
Unit: count
message The message string includes success message or failure reason
Type: string | (gauge)
Unit: N/A
response_time The browser run duration
Type: int | (gauge)
Unit: time,μs
retry_count The retry count of the browser run
Type: int | (gauge)
Unit: count
retry_records The JSON string of browser retry attempt records
Type: string | (gauge)
Unit: N/A
screenshot_upload_error The browser screenshot upload error
Type: string | (gauge)
Unit: N/A
seq_number The sequence number of the test
Type: int | (gauge)
Unit: count
steps The JSON string of browser step results
Type: string | (gauge)
Unit: N/A
success The number to specify whether is successful, 1 for success, -1 for failure
Type: int | (gauge)
Unit: N/A
task_id The dialtesting task external ID
Type: string | (gauge)
Unit: N/A
trace_id The first trace ID captured during the browser run
Type: string | (gauge)
Unit: N/A
viewport_height The browser viewport height
Type: int | (gauge)
Unit: N/A
viewport_width The browser viewport width
Type: int | (gauge)
Unit: N/A

traceroute

traceroute 是「路由跟踪」数据的 JSON 文本,整个数据是一个数组对象,对象中的每个数组元素记录了一次路由探测的相关情况,示例如下:

[
    {
        "total": 2,
        "failed": 0,
        "loss": 0,
        "avg_cost": 12700395,
        "min_cost": 11902041,
        "max_cost": 13498750,
        "std_cost": 1129043,
        "items": [
            {
                "ip": "10.8.9.1",
                "response_time": 13498750
            },
            {
                "ip": "10.8.9.1",
                "response_time": 11902041
            }
        ]
    },
    {
        "total": 2,
        "failed": 0,
        "loss": 0,
        "avg_cost": 13775021,
        "min_cost": 13740084,
        "max_cost": 13809959,
        "std_cost": 49409,
        "items": [
            {
                "ip": "10.12.168.218",
                "response_time": 13740084
            },
            {
                "ip": "10.12.168.218",
                "response_time": 13809959
            }
        ]
    }
]

字段描述:

字段 类型 说明
total number 总探测次数
failed number 失败次数
loss number 失败百分比
avg_cost number 平均耗时(μs)
min_cost number 最小耗时(μs)
max_cost number 最大耗时(μs)
std_cost number 耗时标准差(μs)
items Item 的 Array 每次探测信息(详见下面 items 字段说明)

items 字段说明

字段 类型 说明
ip string IP 地址,如果失败,值为 *
response_time number 响应时间(μs)

拨测采集器自身指标采集

拨测采集器会暴露 Prometheus 指标。默认情况下,DataKit 采集器 会采集这些 datakit_dialtesting_* 指标并上报至观测云,无需额外配置。

文档评价

文档内容是否对您有帮助? ×