Troubleshooting Availability Monitoring¶
Managing Self-built Nodes for Synthetic Tests Name or service not known¶
Problem Overview¶
The "Name or service not known" error occurs in the self-built node management.
Error Cause¶
- Due to load balancing issues with some cloud providers, services cannot access their own ingress domain.
Steps¶
1. Open Launcher and Modify Application Configuration¶
Access the Launcher service and click on Modify Application Configuration
in the top-right corner.
2. Add Parameters¶
Edit the internal_server
parameter under the namespace forethought-core
- core
:
3. Automatically Restart Related Services After Modifying Configuration¶
Select the option to automatically restart related services after modifying the configuration.
Data Discontinuity in Synthetic Tests Explorer¶
Problem Overview¶
This section describes how to troubleshoot data discontinuity issues in the Synthetic Tests Explorer.
Flowchart¶
Troubleshooting Approach¶
Step One: Verify Configuration¶
-
First, check if the configuration files are correct:
-
Verify the ConfigMap named
core
under theforethought-core
Namespace:
# Synthetic Tests service
DialingServer:
# Address configuration for the Synthetic Tests center
use_https: true
port: 443 ## Modify based on actual conditions
host: 'dflux-dial.guance.com' ## Modify based on actual conditions
timeout: 10
dflux-dial.guance.com is the official Synthetic Tests center. If switching to a private Synthetic Tests center, please refer to the ingress configuration.
- Verify the ConfigMap named
dialtesting-config
under theutils
Namespace:
global:
enable_inner_api: false
stats_on: 256
listen: ":9538"
sys_external_id: "ak_R5Fxxxxxxxxx8Go8-wksp_system"
sys_external_id consists of the uuid + external_id from the aksk table below.
- Confirm the data in the
aksk
table within the MySQLdf_dialtesting
database:
id | uuid | accessKey | secretKey | owner | parent_ak | external_id | status | version | createAt | updateAt |
---|---|---|---|---|---|---|---|---|---|---|
1 | ak_R5Fxxxxxxxxx8Go8 | asjTxxxxxxxxxxxxxXMJ | zeiX99gxxxxxxxxxxxxxxxx2h5 | system | -1 | wksp_system | OK | 0 | 1,686,218,468 | 1,686,218,468 |
- Compare with the data in the
main_config
table in thedf_core
database wherekeyCode
isDialingServerSet
:
id | keyCode | description | value |
---|---|---|---|
6 | DialingServerSet | Synthetic Tests service configuration | "{\"ak\": \"asjTxxxxxxxxxxxxxXMJ\", \"sk\": \"zeiX99gxxxxxxxxxxxxxxxx2h5\", \"dataway\": \"http://deploy-openway.dataflux.cn?token={}\"}" |
- Compare with the data in the
dialServiceAK
module in launcher:
Steps: Log in to the launcher interface ---> Top-right button ---> Others
Correspondence table:
Key name in aksk table | Key name in dialServiceAK module of launcher others configuration |
---|---|
uuid | ak_id |
accessKey | ak |
secretKey | sk |
If there are inconsistencies, make changes according to the database.
- If you have switched the Synthetic Tests center or made modifications, reactivate the license on the launcher page to rewrite the information.
No need to change the license configuration; just reactivate it.
Confirm that the data gateway address matches the example format exactly. token={} does not need to be modified.
Step Two: Confirm Communication¶
Use the ping
command on the Synthetic Tests node machine to confirm communication with the Synthetic Tests center and DataWay.
Step Three: Check Data Reporting¶
Run the following commands on the Synthetic Tests node machine to check if data is being reported:
- Check I/O communication:
sudo datakit monitor -M IO
## DYNAMIC_DW represents the Synthetic Tests center service; Points(ok/total) being consistent indicates no issues.
┌IO Info───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Cat│ChanUsage│ Points(ok/total)│ Bytes(ok/total/gz) │
│DYNAMIC_DW│ 0/1│ 626 /626 │ 389.392 k/389.392 k(267.603 k) │
│ M│ 0/1│1.7955 M/1.796043 M│560.715703 M/560.880362 M(240.616886 M) │
│ O│ 0/1│ 588.045 k/588.2 k│ 516.475667 M/516.613201 M(37.964566 M)
- Check input:
sudo datakit monitor -M In
## Feeds and TotalPts of dialtesting not being zero means data is being uploaded.
┌Inputs Info(11 inputs)────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Input│Cat│ Feeds│ TotalPts│Filtered│ LastFeed│ AvgCost│Errors │
│ dialtesting│ L │ 626 │ 626 │ 0 │25 minutes ago│ 0s│ 0 │
│ cpu│ M │ 112.71 k│ 112.71 k│ 0 │ 6 seconds ago│ 237.807?s│ 0 │
Step Four: Check Logs¶
Use the following command to view the DataKit
logs on the Synthetic Tests node for further troubleshooting.