Enable Observability for the Deployment Plan¶
Overview¶
The purpose of this document is to assist users of the Deployment Plan in implementing observability for their deployment to enhance the overall reliability of the Guance service. This document covers two classic observability patterns and how to deploy Datakit data collection, logging and parsing, APM, Synthetic Tests, RUM in a Kubernetes environment. Additionally, we provide one-click import template files for infrastructure and middleware observability and application service observability to facilitate better monitoring of your environment.
Deployment Plan Observability Patterns¶
This pattern refers to monitoring oneself. In other words, data is sent to one's own workspace. This means that if the environment goes down, one will not be able to observe their own data and further troubleshoot the issue. The advantage of this approach is ease of deployment. The disadvantage is that data is continuously generated, leading to data self-iteration and an endless loop. Additionally, if the cluster crashes, one cannot observe the issue.
Account Information Preparation¶
Name | Type | Description | Creation Syntax (Note: Modify the password) | Importance |
---|---|---|---|---|
Private FUNC Database Account | DB & USER | FUNC service connection account | CREATE DATABASE private_func; create user 'private_func'@'%' identified by 'V4KySbFhzDkxxxx'; GRANT ALL PRIVILEGES ON private_func.* TO private_func; FLUSH PRIVILEGES; |
Optional |
MySQL Self-Observability Account | USER | Self-observability account for collecting MySQL metrics | CREATE USER 'datakit'@'%' IDENTIFIED WITH caching_sha2_password by 'SFGS&DFxxxx32!'; GRANT PROCESS ON . TO 'datakit'@'%'; GRANT SELECT ON . TO 'datakit'@'%'; show databases like 'performance_schema'; GRANT SELECT ON performance_schema.* TO 'datakit'@'%'; GRANT SELECT ON mysql.user TO 'datakit'@'%'; GRANT replication client on . to 'datakit'@'%'; |
Important |
Business Data Collection Account | USER | Used for collecting business data using FUNC | CREATE USER 'read'@'%' IDENTIFIED BY 'u19e0LmkL8Fxxxx'; GRANT SELECT ON df_core.* TO 'read'@'%'; FLUSH PRIVILEGES; |
Optional |
PostgreSQL Self-Observability Account | USER | Used for GuanceDB 3.0 monitoring | CREATE USER datakit WITH PASSWORD 'Z7ZdQ326EeexxxxP'; GRANT pg_monitor TO datakit; GRANT CONNECT ON DATABASE scopedb_meta TO datakit; GRANT SELECT ON pg_stat_database TO datakit; |
Optional |
Configure Data Collection¶
Deploy DataKit¶
Note
Note: The default middleware configuration for DataKit is already set up. Minor modifications are required for use.
2) Modify the DaemonSet
template file in datakit.yaml
- name: ENV_DATAWAY
value: https://openway.guance.com?token=tkn_a624xxxxxxxxxxxxxxxxxxxxxxxx74 ## Fill in the actual dataway address here
- name: ENV_GLOBAL_TAGS
value: host=__datakit_hostname,host_ip=__datakit_ip,guance_site=guance,cluster_name_k8s=guance # Modify panel variables according to actual conditions
- name: ENV_GLOBAL_ELECTION_TAGS
value: guance_site=guance,cluster_name_k8s=guance # Modify panel variables according to actual conditions
image: pubrepo.guance.com/datakit/datakit:1.65.2 ## Modify to the latest image version
3) Modify the related configuration for ConfigMap
in datakit.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: datakit-conf
namespace: datakit
data:
mysql.conf: |-
[[inputs.mysql]]
host = "xxxxxxxxxxxxxxx" ## Modify the corresponding MySQL connection address
user = "ste3" ## Modify the MySQL username
pass = "Test1234" ## Modify the MySQL password
......
redis.conf: |-
[[inputs.redis]]
host = "r-xxxxxxxxx.redis.rds.ops.ste3.com" ## Modify the Redis connection address
port = 6379
# unix_socket_path = "/var/run/redis/redis.sock"
# Configure multiple dbs. If dbs is configured, db will also be included in the collection list. If dbs=[] or not configured, all non-empty dbs in Redis will be collected.
# dbs=[]
# username = "<USERNAME>"
password = "Test1234" ## Modify the Redis password
......
openes.conf: |-
[[inputs.elasticsearch]]
## Elasticsearch server configuration
# Supports Basic authentication:
# servers = ["http://user:pass@localhost:9200"]
servers = ["http://guance:123.com@opensearch-cluster-client.middleware:9200"] ## Modify the username, password, etc.
......
4) Mount operation
- mountPath: /usr/local/datakit/conf.d/db/mysql.conf
name: datakit-conf
subPath: mysql.conf
readOnly: false
Note: Multiple configurations are handled similarly. Add them sequentially.
6) Start deploying DataKit after modifications
Environment Variable Specification¶
Variable Name | Description |
---|---|
ENV_DEFAULT_ENABLED_INPUTS | Configure default collection: self,cpu,disk,diskio,mem,swap,system,hostobject,net,host_processes,container,zipkin |
ENV_ENABLE_ELECTION | When election is enabled, Prometheus collection (or other components) will work in master or candidate mode |
ENV_GLOBAL_ELECTION_TAGS | Add additional tag dimensions to election components, used for tagging during Prometheus collection (effective when election is enabled) |
ENV_INPUT_DDTRACE_COMPATIBLE_OTEL | Enable compatibility between otel Trace and DDTrace Trace |
ENV_INPUT_DISK_USE_NSENTER | Use nsenter method to collect disk usage information, collect cluster dynamic storage block information. If using dynamic storage blocks, this must be set |
ENV_INPUT_HOSTOBJECT_USE_NSENTER | Use nsenter method to collect disk usage information, collect cluster dynamic storage block information. If using dynamic storage blocks, this must be set |
ENV_INPUT_CONTAINER_ENABLE_CONTAINER_METRIC | Enable container metric collection |
ENV_INPUT_CONTAINER_ENABLE_POD_METRIC | Enable Pod metric collection (CPU and memory usage) |
ENV_INPUT_CONTAINER_ENABLE_K8S_METRIC | Enable k8s metric collection |
Import Views, Monitor Templates, and Pipelines¶
Note
After importing the monitor template, manually set the alert strategy and notification targets.
Import Views and Monitor Templates¶
Download View and Monitor Template
「Manage」-「Workspace Settings」-「Import」
Note
After importing, modify the corresponding jump link configuration in Monitoring. Replace dsbd_xxxx in the URL with the corresponding dashboard and wksp_xxxx with the space to be monitored.
Import Pipelines¶
Unzip the guance-self-observing-latest.zip file, path: guance-self-observing-latest/pipeline
「Manage」-「Pipelines」-「Import」
Application Service Observability¶
Configure Prometheus Collection for Services¶
Download Prometheus Configuration File
Unzip guance-self-observing-prom-latest.zip and execute the following commands:
cd guance-self-observing-prom-latest
kubectl patch deploy kodo-x -n forethought-kodo --type merge --patch "$(cat kodo-x-prom.yaml)"
kubectl patch deploy kodo -n forethought-kodo --type merge --patch "$(cat kodo-prom.yaml)"
kubectl patch sts kodo-servicemap -n forethought-kodo --type merge --patch "$(cat kodo-servicemap-prom.yaml)"
kubectl patch sts kodo-x-backuplog -n forethought-kodo --type merge --patch "$(cat kodo-x-backuplog-prom.yaml)"
kubectl patch deploy inner -n forethought-core --type merge --patch "$(cat core-inner-prom.yaml)"
Configure APM¶
Inject forethought-core Configuration¶
#!/bin/bash
set -euo pipefail
# Namespace
NAMESPACE="${NAMESPACE:-forethought-core}"
# —— Each deployment's exclusive KV is written here (in the script, no external file)——
# One line per: "<deploy> KEY=VAL KEY=VAL ..."
DEPLOY_ENV_CONFIG=(
'front-backend DD_PATCH_MODULES=redis:true,urllib3:true,httplib:true,sqlalchemy:true,httpx:true DD_AGENT_PORT=9529 DD_GEVENT_PATCH_ALL=true DD_SERVICE=front-backend DD_TAGS=pod_name:$(POD_NAME),project:dataflux'
'inner DD_PATCH_MODULES=redis:true,urllib3:true,httplib:true,sqlalchemy:true,httpx:true DD_AGENT_PORT=9529 DD_GEVENT_PATCH_ALL=true DD_SERVICE=inner DD_TAGS=pod_name:$(POD_NAME),project:dataflux'
'management-backend DD_PATCH_MODULES=redis:true,urllib3:true,httplib:true,sqlalchemy:true,httpx:true DD_AGENT_PORT=9529 DD_GEVENT_PATCH_ALL=true DD_SERVICE=management-backend DD_TAGS=pod_name:$(POD_NAME),project:dataflux'
'open-api DD_PATCH_MODULES=redis:true,urllib3:true,httplib:true,sqlalchemy:true,httpx:true DD_AGENT_PORT=9529 DD_GEVENT_PATCH_ALL=true DD_SERVICE=open-api DD_TAGS=pod_name:$(POD_NAME),project:dataflux'
'sse DD_PATCH_MODULES=redis:true,urllib3:true,httplib:true,sqlalchemy:true,httpx:true DD_AGENT_PORT=9529 DD_GEVENT_PATCH_ALL=true DD_SERVICE=sse DD_TAGS=pod_name:$(POD_NAME),project:dataflux'
'core-worker DD_TRACE_ENABLED=false'
'core-worker-0 DD_TRACE_ENABLED=false'
'core-worker-beat DD_TRACE_ENABLED=false'
'core-worker-correlation DD_TRACE_ENABLED=false'
)
# —— Only prefix ddtrace-run to args[0] (do not modify command)——
prefix_ddtrace_run_args_only() { # $1 deploy
local d="$1"
# If tracing is explicitly disabled, do not add
local trace_enabled
trace_enabled="$(kubectl get deploy "$d" -n "$NAMESPACE" \
-o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="DD_TRACE_ENABLED")].value}' 2>/dev/null || true)"
if [[ "$trace_enabled" == "false" ]]; then
echo " • DD_TRACE_ENABLED=false,skip ddtrace-run."
return 0
fi
# Read args[0]
local first_arg
first_arg="$(kubectl get deploy "$d" -n "$NAMESPACE" \
-o jsonpath='{.spec.template.spec.containers[0].args[0]}' 2>/dev/null || true)"
# Skip if already starts with ddtrace-run
if [[ "$first_arg" == "ddtrace-run" ]]; then
echo " • args already starts with ddtrace-run, skip."
return 0
fi
# Check if there is already an args array
local has_args
has_args="$(kubectl get deploy "$d" -n "$NAMESPACE" \
-o jsonpath='{.spec.template.spec.containers[0].args}' 2>/dev/null || true)"
if [[ -n "$has_args" ]]; then
# Insert ddtrace-run at the beginning of existing args
kubectl patch deploy "$d" -n "$NAMESPACE" --type='json' -p='[
{"op":"add","path":"/spec/template/spec/containers/0/args/0","value":"ddtrace-run"}
]' >/dev/null
echo " • Inserted ddtrace-run at args[0]"
else
# If no args, create args and set ddtrace-run as the first element
kubectl patch deploy "$d" -n "$NAMESPACE" --type='json' -p='[
{"op":"add","path":"/spec/template/spec/containers/0/args","value":["ddtrace-run"]}
]' >/dev/null
echo " • No args: created args=[\"ddtrace-run\"]"
fi
}
# —— Utility functions —— #
has_env() { # $1 deploy $2 KEY
kubectl get deploy "$1" -n "$NAMESPACE" \
-o jsonpath="{.spec.template.spec.containers[0].env[?(@.name=='$2')].name}" 2>/dev/null | grep -qx "$2"
}
ensure_env_array() { # $1 deploy
local has_array
has_array="$(kubectl get deploy "$1" -n "$NAMESPACE" -o jsonpath="{.spec.template.spec.containers[0].env}" 2>/dev/null || true)"
if [[ -z "${has_array}" ]]; then
kubectl patch deploy "$1" -n "$NAMESPACE" --type='json' -p="[
{\"op\":\"add\",\"path\":\"/spec/template/spec/containers/0/env\",\"value\":[]}
]" >/dev/null
fi
}
for item in "${DEPLOY_ENV_CONFIG[@]}"; do
deploy="${item%% *}"
# If the line only contains the deployment name, skip
rest="${item#* }"; [[ "$rest" == "$deploy" ]] && rest=""
echo "→ Processing: $deploy"
# Check if it exists
if ! kubectl get deploy "$deploy" -n "$NAMESPACE" >/dev/null 2>&1; then
echo " - Not found, skip."
continue
fi
# Ensure there is an env array (otherwise /env/- append will fail)
ensure_env_array "$deploy"
# Append Downward API (add if missing): DD_AGENT_HOST=status.hostIP、POD_NAME=metadata.name
if ! has_env "$deploy" "DD_AGENT_HOST"; then
kubectl patch deploy "$deploy" -n "$NAMESPACE" --type='json' -p='[
{"op":"add","path":"/spec/template/spec/containers/0/env/-",
"value":{"name":"DD_AGENT_HOST","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"status.hostIP"}}}}
]' >/dev/null
echo " • add DD_AGENT_HOST (status.hostIP)"
else
echo " • DD_AGENT_HOST exists, skip."
fi
if ! has_env "$deploy" "POD_NAME"; then
kubectl patch deploy "$deploy" -n "$NAMESPACE" --type='json' -p='[
{"op":"add","path":"/spec/template/spec/containers/0/env/-",
"value":{"name":"POD_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.name"}}}}
]' >/dev/null
echo " • add POD_NAME (metadata.name)"
else
echo " • POD_NAME exists, skip."
fi
# Static KEY=VAL (add if missing; skip if exists)
for kv in $rest; do
key="${kv%%=*}"
val="${kv#*=}"
if has_env "$deploy" "$key"; then
echo " • $key exists, skip."
else
kubectl set env deploy/"$deploy" -n "$NAMESPACE" "$key=$val" >/dev/null
echo " • add $key=$val"
fi
done
# Ensure the command starts with ddtrace-run
prefix_ddtrace_run_args_only "$deploy"
echo " -> Done: $deploy"
done
Inject forethought-kodo Configuration¶
#!/bin/bash
set -euo pipefail
# Namespace
NAMESPACE="${NAMESPACE:-forethought-kodo}"
# —— Each deployment's exclusive KV is written here (in the script, no external file)——
# One line per: "<deploy> KEY=VAL KEY=VAL ..."
DEPLOY_ENV_CONFIG=(
'kodo DD_TRACE_ENABLED=true DD_TRACE_AGENT_PORT=9529 DD_TRACE_SAMPLE_RATE=0 DD_SERVICE=kodo DD_TAGS=pod_name:$(POD_NAME),project:dataflux'
'kodo-inner DD_TRACE_ENABLED=true DD_TRACE_AGENT_PORT=9529 DD_SERVICE=kodo-inner DD_TAGS=pod_name:$(POD_NAME),project:dataflux'
)
# —— Utility functions —— #
has_env() { # $1 deploy $2 KEY
kubectl get deploy "$1" -n "$NAMESPACE" \
-o jsonpath="{.spec.template.spec.containers[0].env[?(@.name=='$2')].name}" 2>/dev/null | grep -qx "$2"
}
ensure_env_array() { # $1 deploy
local has_array
has_array="$(kubectl get deploy "$1" -n "$NAMESPACE" -o jsonpath="{.spec.template.spec.containers[0].env}" 2>/dev/null || true)"
if [[ -z "${has_array}" ]]; then
kubectl patch deploy "$1" -n "$NAMESPACE" --type='json' -p="[
{\"op\":\"add\",\"path\":\"/spec/template/spec/containers/0/env\",\"value\":[]}
]" >/dev/null
fi
}
for item in "${DEPLOY_ENV_CONFIG[@]}"; do
deploy="${item%% *}"
# If the line only contains the deployment name, skip
rest="${item#* }"; [[ "$rest" == "$deploy" ]] && rest=""
echo "→ Processing: $deploy"
# Check if it exists
if ! kubectl get deploy "$deploy" -n "$NAMESPACE" >/dev/null 2>&1; then
echo " - Not found, skip."
continue
fi
# Ensure there is an env array (otherwise /env/- append will fail)
ensure_env_array "$deploy"
# Append Downward API (add if missing): DD_AGENT_HOST=status.hostIP、POD_NAME=metadata.name
if ! has_env "$deploy" "DD_AGENT_HOST"; then
kubectl patch deploy "$deploy" -n "$NAMESPACE" --type='json' -p='[
{"op":"add","path":"/spec/template/spec/containers/0/env/-",
"value":{"name":"DD_AGENT_HOST","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"status.hostIP"}}}}
]' >/dev/null
echo " • add DD_AGENT_HOST (status.hostIP)"
else
echo " • DD_AGENT_HOST exists, skip."
fi
if ! has_env "$deploy" "POD_NAME"; then
kubectl patch deploy "$deploy" -n "$NAMESPACE" --type='json' -p='[
{"op":"add","path":"/spec/template/spec/containers/0/env/-",
"value":{"name":"POD_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.name"}}}}
]' >/dev/null
echo " • add POD_NAME (metadata.name)"
else
echo " • POD_NAME exists, skip."
fi
# Static KEY=VAL (add if missing; skip if exists)
for kv in $rest; do
key="${kv%%=*}"
val="${kv#*=}"
if has_env "$deploy" "$key"; then
echo " • $key exists, skip."
else
kubectl set env deploy/"$deploy" -n "$NAMESPACE" "$key=$val" >/dev/null
echo " • add $key=$val"
fi
done
echo " -> Done: $deploy"
done
Configure Synthetic Tests¶
1) Create a new website to monitor
2) Configure Synthetic Testing tasks
Note
Modify according to the actual domain name set