网络拨测
该采集器是网络拨测结果数据采集,所有拨测产生的数据,上报观测云。
配置¶
私有拨测节点部署,需在 观测云页面创建私有拨测节点。创建完成后,将页面上相关信息填入 conf.d/network/dialtesting.conf
即可:
进入 DataKit 安装目录下的 conf.d/network
目录,复制 dialtesting.conf.sample
并命名为 dialtesting.conf
。示例如下:
[[inputs.dialtesting]]
# We can also configure a JSON path like "file:///your/dir/json-file-name"
server = "https://dflux-dial.guance.com"
# [require] node ID
region_id = "default"
# if server are dflux-dial.guance.com, ak/sk required
ak = ""
sk = ""
# The interval to pull the tasks.
pull_interval = "1m"
# The timeout for the HTTP request.
time_out = "30s"
# The number of the workers.
workers = 6
# Collect related metric when job execution time error interval is larger than task_exec_time_interval
task_exec_time_interval = "5s"
# Stop the task when the task failed to send data to dataway over max_send_fail_count.
max_send_fail_count = 16
# The max sleep time when send data to dataway failed.
max_send_fail_sleep_time = "30m"
# The max number of jobs sending data to dataway in parallel. Default 10.
max_job_number = 10
# The max number of job chan. Default 1000.
max_job_chan_number = 1000
# Disable internal network task.
disable_internal_network_task = true
# Disable internal network cidr list.
disabled_internal_network_cidr_list = []
# Custom tags.
[inputs.dialtesting.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
# ...
配置好后,重启 DataKit 即可。
可通过 ConfigMap 方式注入采集器配置 或 配置 ENV_DATAKIT_INPUTS 开启采集器。
也支持以环境变量的方式修改配置参数(需要在 ENV_DEFAULT_ENABLED_INPUTS 中加为默认采集器):
-
ENV_INPUT_DIALTESTING_ENV_INPUT_DIALTESTING_DISABLE_INTERNAL_NETWORK_TASK
是否允许内网地址/服务的拨测。默认不允许
字段类型: Boolean
采集器配置字段:
disable_internal_network_task
示例:
true
默认值:
false
-
ENV_INPUT_DIALTESTING_ENV_INPUT_DIALTESTING_DISABLED_INTERNAL_NETWORK_CIDR_LIST
禁止拨测的 CIDR 地址列表
字段类型: List
采集器配置字段:
disabled_internal_network_cidr_list
示例:
["192.168.0.0/16"]
默认值:
-
-
ENV_INPUT_DIALTESTING_ENV_INPUT_DIALTESTING_ENABLE_DEBUG_API
禁止拨测调试接口(默认禁止)
字段类型: Boolean
采集器配置字段:
env_input_dialtesting_enable_debug_api
示例:
false
默认值:
false
Attention
目前只有 Linux 的拨测节点才支持「路由跟踪」,跟踪数据会保存在相关指标的 traceroute
字段中。
拨测节点部署¶
以下是拨测节点的网络部署拓扑图,这里存在两种拨测节点部署方式:
- 公网拨测节点:直接使用观测云在全球部署的拨测节点来检测 公网 的服务运行情况。
- 私网拨测节点:如果需要拨测用户 内网 的服务,此时需要用户自行部署 私有 的拨测节点。当让,如果网络允许,这些私有的拨测节点也能部署公网上的服务。
不管是公网拨测节点,还是私有拨测节点,它们都能通过 Web 页面创建拨测任务。
graph TD
%% node definitions
dt_web(拨测 Web UI);
dt_db(拨测任务公网存储);
dt_pub(Datakit 公网拨测节点);
dt_pri(Datakit 私有拨测节点);
site_inner(内网站点);
site_pub(公网站点);
dw_inner(内网 Dataway);
dw_pub(公网 Dataway);
guance(观测云);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
dt_web -->|创建拨测任务| dt_db;
dt_db -->|拉取拨测任务| dt_pub -->|拨测结果| dw_pub --> guance;
dt_db -->|拉取拨测任务| dt_pri;
dt_pub <-->|实施拨测| site_pub;
dt_pri <-.->|实施拨测| site_pub;
dw_inner --> guance;
subgraph "用户内网"
dt_pri <-->|实施拨测| site_inner;
dt_pri -->|拨测结果| dw_inner;
end
日志¶
http_dial_testing
¶
- 标签
Tag | Description |
---|---|
city |
The name of the city |
country |
The name of the country |
datakit_version |
The DataKit version |
dest_ip |
The IP address of the destination |
df_label |
The label of the task |
internal |
The boolean value, true for domestic and false for overseas |
isp |
ISP, such as chinamobile , chinaunicom , chinatelecom |
method |
HTTP method, such as GET |
name |
The name of the task |
node_name |
The name of the node |
owner |
The owner name |
proto |
The protocol of the HTTP, such as 'HTTP/1.1' |
province |
The name of the province |
status |
The status of the task, either 'OK' or 'FAIL' |
status_code_class |
The class of the status code, such as '2xx' |
status_code_string |
The status string, such as '200 OK' |
url |
The URL of the endpoint to be monitored |
- 指标列表
Metric | Description | Type | Unit |
---|---|---|---|
fail_reason |
The reason that leads to the failure of the task | string | - |
message |
The message string which includes the header and the body of the request or the response | string | - |
response_body_size |
The length of the body of the response | int | B |
response_connection |
HTTP connection time | float | μs |
response_dns |
HTTP DNS parsing time | float | μs |
response_download |
HTTP downloading time | float | μs |
response_ssl |
HTTP ssl handshake time | float | μs |
response_time |
The time of the response | int | μs |
response_ttfb |
HTTP response ttfb |
float | μs |
seq_number |
The sequence number of the test | int | count |
status_code |
The response code | int | - |
success |
The number to specify whether is successful, 1 for success, -1 for failure | int | - |
tcp_dial_testing
¶
- 标签
Tag | Description |
---|---|
city |
The name of the city |
country |
The name of the country |
datakit_version |
The DataKit version |
dest_host |
The name of the host to be monitored |
dest_ip |
The IP address |
dest_port |
The port of the TCP connection |
df_label |
The label of the task |
internal |
The boolean value, true for domestic and false for overseas |
isp |
ISP, such as chinamobile , chinaunicom , chinatelecom |
name |
The name of the task |
node_name |
The name of the node |
owner |
The owner name |
proto |
The protocol of the task |
province |
The name of the province |
status |
The status of the task, either 'OK' or 'FAIL' |
- 指标列表
Metric | Description | Type | Unit |
---|---|---|---|
fail_reason |
The reason that leads to the failure of the task | string | - |
message |
The message string includes the response time or fail reason | string | - |
response_time |
The time of the response | int | μs |
response_time_with_dns |
The time of the response, which contains DNS time | int | μs |
seq_number |
The sequence number of the test | int | count |
success |
The number to specify whether is successful, 1 for success, -1 for failure | int | - |
traceroute |
The json string fo the traceroute result |
string | - |
icmp_dial_testing
¶
- 标签
Tag | Description |
---|---|
city |
The name of the city |
country |
The name of the country |
datakit_version |
The DataKit version |
dest_host |
The name of the host to be monitored |
df_label |
The label of the task |
internal |
The boolean value, true for domestic and false for overseas |
isp |
ISP, such as chinamobile , chinaunicom , chinatelecom |
name |
The name of the task |
node_name |
The name of the node |
owner |
The owner name |
proto |
The protocol of the task |
province |
The name of the province |
status |
The status of the task, either 'OK' or 'FAIL' |
- 指标列表
Metric | Description | Type | Unit |
---|---|---|---|
average_round_trip_time |
The average time of the round trip(RTT) | float | μs |
average_round_trip_time_in_millis |
The average time of the round trip(RTT), deprecated | float | ms |
fail_reason |
The reason that leads to the failure of the task | string | - |
max_round_trip_time |
The maximum time of the round trip(RTT) | float | μs |
max_round_trip_time_in_millis |
The maximum time of the round trip(RTT), deprecated | float | ms |
message |
The message string includes the average time of the round trip or the failure reason | string | - |
min_round_trip_time |
The minimum time of the round trip(RTT) | float | μs |
min_round_trip_time_in_millis |
The minimum time of the round trip(RTT), deprecated | float | ms |
packet_loss_percent |
The loss percent of the packets | float | - |
packets_received |
The number of the packets received | int | count |
packets_sent |
The number of the packets sent | int | count |
seq_number |
The sequence number of the test | int | count |
std_round_trip_time |
The standard deviation of the round trip | float | μs |
std_round_trip_time_in_millis |
The standard deviation of the round trip, deprecated | float | ms |
success |
The number to specify whether is successful, 1 for success, -1 for failure | int | - |
traceroute |
The json string fo the traceroute result |
string | - |
websocket_dial_testing
¶
- 标签
Tag | Description |
---|---|
city |
The name of the city |
country |
The name of the country |
datakit_version |
The DataKit version |
df_label |
The label of the task |
internal |
The boolean value, true for domestic and false for overseas |
isp |
ISP, such as chinamobile , chinaunicom , chinatelecom |
name |
The name of the task |
node_name |
The name of the node |
owner |
The owner name |
proto |
The protocol of the task |
province |
The name of the province |
status |
The status of the task, either 'OK' or 'FAIL' |
url |
The URL string, such as ws://www.abc.com |
- 指标列表
Metric | Description | Type | Unit |
---|---|---|---|
fail_reason |
The reason that leads to the failure of the task | string | - |
message |
The message string includes the response time or the failure reason | string | - |
response_message |
The message of the response | string | - |
response_time |
The time of the response | int | μs |
response_time_with_dns |
The time of the response, include DNS | int | μs |
sent_message |
The sent message | string | - |
seq_number |
The sequence number of the test | int | count |
success |
The number to specify whether is successful, 1 for success, -1 for failure | int | - |
traceroute
¶
traceroute
是「路由跟踪」数据的 JSON 文本,整个数据是一个数组对象,对象中的每个数组元素记录了一次路由探测的相关情况,示例如下:
[
{
"total": 2,
"failed": 0,
"loss": 0,
"avg_cost": 12700395,
"min_cost": 11902041,
"max_cost": 13498750,
"std_cost": 1129043,
"items": [
{
"ip": "10.8.9.1",
"response_time": 13498750
},
{
"ip": "10.8.9.1",
"response_time": 11902041
}
]
},
{
"total": 2,
"failed": 0,
"loss": 0,
"avg_cost": 13775021,
"min_cost": 13740084,
"max_cost": 13809959,
"std_cost": 49409,
"items": [
{
"ip": "10.12.168.218",
"response_time": 13740084
},
{
"ip": "10.12.168.218",
"response_time": 13809959
}
]
}
]
字段描述:
字段 | 类型 | 说明 |
---|---|---|
total |
number | 总探测次数 |
failed |
number | 失败次数 |
loss |
number | 失败百分比 |
avg_cost |
number | 平均耗时(μs) |
min_cost |
number | 最小耗时(μs) |
max_cost |
number | 最大耗时(μs) |
std_cost |
number | 耗时标准差(μs) |
items |
Item 的 Array | 每次探测信息(详见下面 items 字段说明) |
items
字段说明
字段 | 类型 | 说明 |
---|---|---|
ip |
string | IP 地址,如果失败,值为 * |
response_time |
number | 响应时间(μs) |
拨测采集器自身指标采集¶
拨测采集器会暴露 Prometheus 指标,如果需要上报这些指标至观测云,可以通过 DataKit 采集器 进行采集,相关配置参考如下: