网络拨测
该采集器是网络拨测结果数据采集,所有拨测产生的数据,上报观测云。
配置¶
私有拨测节点部署,需在 观测云页面创建私有拨测节点。创建完成后,将页面上相关信息填入 conf.d/network/dialtesting.conf
即可:
进入 DataKit 安装目录下的 conf.d/network
目录,复制 dialtesting.conf.sample
并命名为 dialtesting.conf
。示例如下:
[[inputs.dialtesting]]
# We can also configure a JSON path like "file:///your/dir/json-file-name"
server = "https://dflux-dial.guance.com"
# [require] node ID
region_id = "default"
# if server are dflux-dial.guance.com, ak/sk required
ak = ""
sk = ""
# The interval to pull the tasks.
pull_interval = "1m"
# The timeout for the HTTP request.
time_out = "30s"
# The number of the workers.
workers = 6
# Collect related metric when job execution time error interval is larger than task_exec_time_interval
task_exec_time_interval = "5s"
# Stop the task when the task failed to send data to dataway over max_send_fail_count.
max_send_fail_count = 16
# The max sleep time when send data to dataway failed.
max_send_fail_sleep_time = "30m"
# The max number of jobs sending data to dataway in parallel. Default 10.
max_job_number = 10
# The max number of job chan. Default 1000.
max_job_chan_number = 1000
# Disable internal network task.
disable_internal_network_task = true
# Disable internal network cidr list.
disabled_internal_network_cidr_list = []
# Custom tags.
[inputs.dialtesting.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
# ...
配置好后,重启 DataKit 即可。
可通过 ConfigMap 方式注入采集器配置 或 配置 ENV_DATAKIT_INPUTS 开启采集器。
也支持以环境变量的方式修改配置参数(需要在 ENV_DEFAULT_ENABLED_INPUTS 中加为默认采集器):
-
ENV_INPUT_DIALTESTING_ENV_INPUT_DIALTESTING_DISABLE_INTERNAL_NETWORK_TASK
是否允许内网地址/服务的拨测。默认不允许
字段类型: Boolean
采集器配置字段:
disable_internal_network_task
示例:
true
默认值:
false
-
ENV_INPUT_DIALTESTING_ENV_INPUT_DIALTESTING_DISABLED_INTERNAL_NETWORK_CIDR_LIST
禁止拨测的 CIDR 地址列表
字段类型: List
采集器配置字段:
disabled_internal_network_cidr_list
示例:
["192.168.0.0/16"]
默认值:
-
-
ENV_INPUT_DIALTESTING_ENV_INPUT_DIALTESTING_ENABLE_DEBUG_API
禁止拨测调试接口(默认禁止)
字段类型: Boolean
采集器配置字段:
env_input_dialtesting_enable_debug_api
示例:
false
默认值:
false
Note
目前只有 Linux 的拨测节点才支持「路由跟踪」,跟踪数据会保存在相关指标的 traceroute
字段中。
拨测节点部署¶
以下是拨测节点的网络部署拓扑图,这里存在两种拨测节点部署方式:
- 公网拨测节点:直接使用观测云在全球部署的拨测节点来检测 公网 的服务运行情况。
- 私网拨测节点:如果需要拨测用户 内网 的服务,此时需要用户自行部署 私有 的拨测节点。当让,如果网络允许,这些私有的拨测节点也能部署公网上的服务。
不管是公网拨测节点,还是私有拨测节点,它们都能通过 Web 页面创建拨测任务。
graph TD
%% node definitions
dt_web(拨测 Web UI);
dt_db(拨测任务公网存储);
dt_pub(DataKit 公网拨测节点);
dt_pri(DataKit 私有拨测节点);
site_inner(内网站点);
site_pub(公网站点);
dw_inner(内网 Dataway);
dw_pub(公网 Dataway);
server(观测云);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
dt_web -->|创建拨测任务| dt_db;
dt_db -->|拉取拨测任务| dt_pub -->|拨测结果| dw_pub --> server;
dt_db -->|拉取拨测任务| dt_pri;
dt_pub <-->|实施拨测| site_pub;
dt_pri <-.->|实施拨测| site_pub;
dw_inner --> server;
subgraph "用户内网"
dt_pri <-->|实施拨测| site_inner;
dt_pri -->|拨测结果| dw_inner;
end
日志¶
http_dial_testing
¶
- 标签
Tag | Description |
---|---|
city | The name of the city |
country | The name of the country |
datakit_version | The DataKit version |
dest_ip | The IP address of the destination |
df_label | The label of the task |
internal | The boolean value, true for domestic and false for overseas |
isp | ISP, such as chinamobile , chinaunicom , chinatelecom |
method | HTTP method, such as GET |
name | The name of the task |
node_name | The name of the node |
owner | The owner name |
proto | The protocol of the HTTP, such as 'HTTP/1.1' |
province | The name of the province |
status | The status of the task, either 'OK' or 'FAIL' |
status_code_class | The class of the status code, such as '2xx' |
status_code_string | The status string, such as '200 OK' |
url | The URL of the endpoint to be monitored |
- 指标列表
Metric | Description |
---|---|
config_vars | The configuration variables of the task Type: string Unit: - |
fail_reason | The reason that leads to the failure of the task Type: string Unit: - |
message | The message string which includes the header and the body of the request or the response Type: string Unit: - |
response_body_size | The length of the body of the response Type: int Unit: digital,B |
response_connection | HTTP connection time Type: float Unit: time,μs |
response_dns | HTTP DNS parsing time Type: float Unit: time,μs |
response_download | HTTP downloading time Type: float Unit: time,μs |
response_ssl | HTTP ssl handshake time Type: float Unit: time,μs |
response_time | The time of the response Type: int Unit: time,μs |
response_ttfb | HTTP response ttfb Type: float Unit: time,μs |
seq_number | The sequence number of the test Type: int Unit: count |
status_code | The response code Type: int Unit: - |
success | The number to specify whether is successful, 1 for success, -1 for failure Type: int Unit: - |
task | The raw task string Type: string Unit: - |
tcp_dial_testing
¶
- 标签
Tag | Description |
---|---|
city | The name of the city |
country | The name of the country |
datakit_version | The DataKit version |
dest_host | The name of the host to be monitored |
dest_ip | The IP address |
dest_port | The port of the TCP connection |
df_label | The label of the task |
internal | The boolean value, true for domestic and false for overseas |
isp | ISP, such as chinamobile , chinaunicom , chinatelecom |
name | The name of the task |
node_name | The name of the node |
owner | The owner name |
proto | The protocol of the task |
province | The name of the province |
status | The status of the task, either 'OK' or 'FAIL' |
- 指标列表
Metric | Description |
---|---|
config_vars | The configuration variables of the task Type: string Unit: - |
fail_reason | The reason that leads to the failure of the task Type: string Unit: - |
message | The message string includes the response time or fail reason Type: string Unit: - |
response_time | The time of the response Type: int Unit: time,μs |
response_time_with_dns | The time of the response, which contains DNS time Type: int Unit: time,μs |
seq_number | The sequence number of the test Type: int Unit: count |
success | The number to specify whether is successful, 1 for success, -1 for failure Type: int Unit: - |
task | The raw task string Type: string Unit: - |
traceroute | The json string fo the traceroute resultType: string Unit: - |
icmp_dial_testing
¶
- 标签
Tag | Description |
---|---|
city | The name of the city |
country | The name of the country |
datakit_version | The DataKit version |
dest_host | The name of the host to be monitored |
df_label | The label of the task |
internal | The boolean value, true for domestic and false for overseas |
isp | ISP, such as chinamobile , chinaunicom , chinatelecom |
name | The name of the task |
node_name | The name of the node |
owner | The owner name |
proto | The protocol of the task |
province | The name of the province |
status | The status of the task, either 'OK' or 'FAIL' |
- 指标列表
Metric | Description |
---|---|
average_round_trip_time | The average time of the round trip(RTT) Type: float Unit: time,μs |
average_round_trip_time_in_millis | The average time of the round trip(RTT), deprecated Type: float Unit: time,ms |
config_vars | The configuration variables of the task Type: string Unit: - |
fail_reason | The reason that leads to the failure of the task Type: string Unit: - |
max_round_trip_time | The maximum time of the round trip(RTT) Type: float Unit: time,μs |
max_round_trip_time_in_millis | The maximum time of the round trip(RTT), deprecated Type: float Unit: time,ms |
message | The message string includes the average time of the round trip or the failure reason Type: string Unit: - |
min_round_trip_time | The minimum time of the round trip(RTT) Type: float Unit: time,μs |
min_round_trip_time_in_millis | The minimum time of the round trip(RTT), deprecated Type: float Unit: time,ms |
packet_loss_percent | The loss percent of the packets Type: float Unit: - |
packets_received | The number of the packets received Type: int Unit: count |
packets_sent | The number of the packets sent Type: int Unit: count |
seq_number | The sequence number of the test Type: int Unit: count |
std_round_trip_time | The standard deviation of the round trip Type: float Unit: time,μs |
std_round_trip_time_in_millis | The standard deviation of the round trip, deprecated Type: float Unit: time,ms |
success | The number to specify whether is successful, 1 for success, -1 for failure Type: int Unit: - |
task | The raw task string Type: string Unit: - |
traceroute | The json string fo the traceroute resultType: string Unit: - |
websocket_dial_testing
¶
- 标签
Tag | Description |
---|---|
city | The name of the city |
country | The name of the country |
datakit_version | The DataKit version |
df_label | The label of the task |
internal | The boolean value, true for domestic and false for overseas |
isp | ISP, such as chinamobile , chinaunicom , chinatelecom |
name | The name of the task |
node_name | The name of the node |
owner | The owner name |
proto | The protocol of the task |
province | The name of the province |
status | The status of the task, either 'OK' or 'FAIL' |
url | The URL string, such as ws://www.abc.com |
- 指标列表
Metric | Description |
---|---|
config_vars | The configuration variables of the task Type: string Unit: - |
fail_reason | The reason that leads to the failure of the task Type: string Unit: - |
message | The message string includes the response time or the failure reason Type: string Unit: - |
response_message | The message of the response Type: string Unit: - |
response_time | The time of the response Type: int Unit: time,μs |
response_time_with_dns | The time of the response, include DNS Type: int Unit: time,μs |
sent_message | The sent message Type: string Unit: - |
seq_number | The sequence number of the test Type: int Unit: count |
success | The number to specify whether is successful, 1 for success, -1 for failure Type: int Unit: - |
task | The raw task string Type: string Unit: - |
multi_dial_testing
¶
- 标签
Tag | Description |
---|---|
city | The name of the city |
country | The name of the country |
datakit_version | The DataKit version |
df_label | The label of the task |
internal | The boolean value, true for domestic and false for overseas |
isp | ISP, such as chinamobile , chinaunicom , chinatelecom |
name | The name of the task |
node_name | The name of the node |
owner | The owner name |
province | The name of the province |
status | The status of the task, either 'OK' or 'FAIL' |
- 指标列表
Metric | Description |
---|---|
config_vars | The configuration variables of the task Type: string Unit: - |
fail_reason | The reason that leads to the failure of the task Type: string Unit: - |
last_step | The last number of the task be executed Type: int Unit: - |
message | The message string which includes the header and the body of the request or the response Type: string Unit: - |
response_time | The time of the response Type: int Unit: time,μs |
seq_number | The sequence number of the test Type: int Unit: count |
steps | The result of each step Type: string Unit: - |
success | The number to specify whether is successful, 1 for success, -1 for failure Type: int Unit: - |
task | The raw task string Type: string Unit: - |
traceroute
¶
traceroute
是「路由跟踪」数据的 JSON 文本,整个数据是一个数组对象,对象中的每个数组元素记录了一次路由探测的相关情况,示例如下:
[
{
"total": 2,
"failed": 0,
"loss": 0,
"avg_cost": 12700395,
"min_cost": 11902041,
"max_cost": 13498750,
"std_cost": 1129043,
"items": [
{
"ip": "10.8.9.1",
"response_time": 13498750
},
{
"ip": "10.8.9.1",
"response_time": 11902041
}
]
},
{
"total": 2,
"failed": 0,
"loss": 0,
"avg_cost": 13775021,
"min_cost": 13740084,
"max_cost": 13809959,
"std_cost": 49409,
"items": [
{
"ip": "10.12.168.218",
"response_time": 13740084
},
{
"ip": "10.12.168.218",
"response_time": 13809959
}
]
}
]
字段描述:
字段 | 类型 | 说明 |
---|---|---|
total |
number | 总探测次数 |
failed |
number | 失败次数 |
loss |
number | 失败百分比 |
avg_cost |
number | 平均耗时(μs) |
min_cost |
number | 最小耗时(μs) |
max_cost |
number | 最大耗时(μs) |
std_cost |
number | 耗时标准差(μs) |
items |
Item 的 Array | 每次探测信息(详见下面 items 字段说明) |
items
字段说明
字段 | 类型 | 说明 |
---|---|---|
ip |
string | IP 地址,如果失败,值为 * |
response_time |
number | 响应时间(μs) |
拨测采集器自身指标采集¶
拨测采集器会暴露 Prometheus 指标,如果需要上报这些指标至观测云,可以通过 DataKit 采集器 进行采集,相关配置参考如下: