Data Discontinuity Troubleshooting¶

Introduction¶

This article will introduce how to troubleshoot log, trace, and metrics data discontinuity issues in Guance.

Architecture Diagram¶

The Guance data flow is as follows:

DataKit pushes metric logs to the Guance DataWay cluster.
DataWay pushes the data to the kodo service for processing.
kodo pushes the processed data to the nsqd message queue service.
kodo-x requests data consumption from the nsqd message queue service.
kodo-x pushes the consumed data to the corresponding storage engine.

Data Discontinuity Troubleshooting Steps¶

Step One: Check Host Time¶

Please confirm the following information:

The host time of the Guance cluster matches the current time.
The DataKit collector host time matches the current time.

You can check this by running the command:

date

If the host time does not match the current time, please correct it using the following methods:

Online EnvironmentOffline Environment

# Install ntpdate
yum install ntpdate -y

# Synchronize local time
ntpdate time.windows.com

# Synchronize with network source
ntpdate cn.pool.ntp.org

sudo date -s "2022-01-01 10:30:00"

Note: Modify the time accordingly

Step Two: Collector Troubleshooting¶

Refer to DataKit Data Discontinuity Troubleshooting

Step Three: View DataWay Service Logs¶

Please follow these steps:

# Login to the container
kubectl exec -ti -n <Namespace> <dataway pod name> bash
# View logs
cd /usr/local/cloudcare/dataflux/dataway
# Search for error logs
grep -Ei error log

Step Four: Check the Running Status of Each Service¶

Check if the cluster node status is normal

kubectl get node

Check if all services under forethought-kodo are running normally

kubectl get pods -n forethought-kodo

Check if the nsqd service status is normal

kubectl get pods -n middleware | grep nsqd

Check if the storage engine is functioning normally

kubectl get pods -n middleware

Step Five: View kodo Service Logs¶

Note

Viewing the kodo service logs can help determine whether Guance successfully pushed data to the consumption queue.

Namespace: forethought-kodo
Deployment: kodo
Log path: /logdata/log

kodo Service Normalkodo Service Abnormal

If the kodo service is normal, please execute the following commands:

# Login to the container
kubectl exec -ti -n forethought-kodo <kodo pod name> bash
# View logs
cd /logdata
# Search for error logs
grep -Ei error log

If the kodo service is abnormal, you will be unable to log into the container. You can first adjust the kodo log output mode and then view the container logs.

Modify kodo log output mode

kubectl get configmap kodo -n forethought-kodo -o yaml | \
       sed "s/\/logdata\/log/stdout/g" | \
       kubectl apply -f -

Restart the kodo container

kubectl rollout restart -n forethought-kodo deploy kodo

View kodo container logs

kubectl logs -f -n forethought-kodo <kodo pod name>

Step Six: View kodo-x Service Logs¶

Note

Viewing the kodo-x service logs can help determine whether Guance successfully wrote data, and whether there are issues such as log throttling or slow log writing.

Namespace: forethought-kodo
Deployment: kodo-x
Log path: /logdata/log

kodo-x Service Normalkodo-x Service Abnormal

If the kodo-x service is normal, please execute the following commands:

# Login to the container
kubectl exec -ti -n forethought-kodo <kodo-x pod name> bash
# View logs
cd /logdata
# Search for error logs
grep -Ei error log

If the kodo-x service is abnormal, you will be unable to log into the container. You can first adjust the kodo-x log output mode and then view the container logs.

Modify kodo-x log output mode

kubectl get configmap kodo-x -n forethought-kodo -o yaml | \
       sed "s/\/logdata\/log/stdout/g" | \
       kubectl apply -f -

Restart the kodo-x container

kubectl rollout restart -n forethought-kodo deploy kodo-x

View kodo-x container logs

kubectl logs -f -n forethought-kodo <kodo-x pod name>