Skip to content

Enable observability for Deployment Plan

Overview

The purpose of this document is to assist private Deployment Plan users in how to implement observability for the Deployment Plan, thereby enhancing the overall reliability of Guance services. This article discusses two classic observability patterns, as well as how to deploy Datakit data collection, logs and slicing, APM, Synthetic Tests, RUM in a Kubernetes environment. Additionally, we provide one-click import template files for infrastructure and MIDDLEWARE observability and application service observability, making it easier for everyone to observe their own environment.

This mode refers to self-observation. In other words, it means sending data to your own space. This implies that if the environment goes down, you will also be unable to observe your information data, and further troubleshooting will not be possible. The advantage of this solution is: easy deployment. The disadvantage is: data is continuously generated, leading to self-iteration of data, causing a continuous loop, and when the cluster crashes, you cannot observe its own issues.

This mode refers to multiple Guance instances sending data to the same node. Advantages: no data transmission closed-loop situation occurs, and real-time monitoring of the cluster's status is possible.

guance2

Infrastructure and MIDDLEWARE Observability

Note

Enabling infrastructure and MIDDLEWARE observability can meet basic needs for observing the state of MIDDLEWARE and infrastructure. If more detailed observation of application services is desired, refer to the following Application Service Observability.

Configure Data Collection

1) Download datakit.yaml

Note

Note: All default configurations for DataKit are already configured, requiring minor adjustments before use.

2) Modify the DaemonSet template file in datakit.yaml

   - name: ENV_DATAWAY
     value: https://openway.guance.com?token=tkn_a624xxxxxxxxxxxxxxxxxxxxxxxx74 ## Enter the actual dataway address here
   - name: ENV_GLOBAL_TAGS
     value: host=__datakit_hostname,host_ip=__datakit_ip,guance_site=guance,cluster_name_k8s=guance # Modify panel variables according to your actual situation
   - name: ENV_GLOBAL_ELECTION_TAGS
     value: guance_site=guance,cluster_name_k8s=guance     # Modify according to your actual panel variable situation
   image: pubrepo.guance.com/datakit/datakit:1.65.2     ## Update to the latest image version

3) Modify the ConfigMap related configurations in datakit.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: datakit-conf
  namespace: datakit
data:
    mysql.conf: |-
        [[inputs.mysql]]
          host = "xxxxxxxxxxxxxxx"      ## Modify the corresponding MySQL connection address
          user = "ste3"                 ## Modify the MySQL username
          pass = "Test1234"             ## Modify the MySQL password
          ......

    redis.conf: |-
        [[inputs.redis]]
          host = "r-xxxxxxxxx.redis.rds.ops.ste3.com"            ## Modify Redis connection address
          port = 6379                                                   
          # unix_socket_path = "/var/run/redis/redis.sock"
          # Configure multiple dbs; if dbs=[] or not configured, all non-empty dbs in Redis will be collected
          # dbs=[]
          # username = "<USERNAME>"
           password = "Test1234"                                        ## Modify Redis password
          ......

    openes.conf: |-
        [[inputs.elasticsearch]]
          ## Elasticsearch server configuration
          # Supports Basic authentication:
          # servers = ["http://user:pass@localhost:9200"]
          servers = ["http://guance:123.com@opensearch-cluster-client.middleware:9200"]   ## Modify username, password, etc.
          ......

    influxdb.conf: |-
        [[inputs.influxdb]]
      url = "http://localhost:8086/debug/vars"

      ## (optional) collect interval, default is 10 seconds
      interval = '10s'

      ## Username and password to send using HTTP Basic Authentication.
      # username = ""
      # password = ""

      ## http request & header timeout
      timeout = "5s"

      ## Set true to enable election
      election = true

4) Configure the log collection function of the DataKit collector itself

  template:
    metadata:
      annotations:
        datakit/logs: |
          [
            {
              "disable": true
            }
          ]

5) Mount Operations

        - mountPath: /usr/local/datakit/conf.d/db/mysql.conf
          name: datakit-conf
          subPath: mysql.conf
          readOnly: false

Note: The same applies for multiple configurations. Add them sequentially.

6) Deploy DataKit after modification

kubectl apply -f datakit.yaml

Import Views and Monitor Templates

Infrastructure and MIDDLEWARE Template Download Address

  • Import Template

allin

Note

After importing the monitor, modify the corresponding jump link configuration. Replace dsbd_xxxx with the corresponding dashboard and wksp_xxxx with the monitored workspace.

Application Service Observability (Optional)

Note

Enabling application service observability consumes a large amount of storage resources. Please evaluate before enabling.

Configure Logs

Prerequisites
  1. Your HOST must have DataKit installed

  2. If you do not understand Pipeline knowledge, please check the Log Pipeline User Manual

Configure Log and Metric Collection

1) Inject via ConfigMap + Container Annotation through command line

# Change log output mode
kubectl edit -n forethought-kodo cm <configmap_name>

2) Change the log output in the ConfigMap under the forethought-kodo Namespace to stdout.

3) Enable metric collection for the corresponding services listed below

Namespace Service Name Enable Metric Collection Enable DDtrace Collection
forethought-core front-backend No Yes
inner Yes Yes
management-backend No Yes
openapi No Yes
websocket No Yes
forethought-kodo kodo Yes Yes
kodo-inner Yes Yes
kodo-x Yes Yes
kodo-asynq-client Yes Yes
kodo-x-backuplog Yes Yes
kodo-x-scan Yes No
  • Configure Deployment Annotations in the corresponding applications, no changes needed below
spec:
  template:
    metadata:
      annotations:
        datakit/prom.instances: |-
          [[inputs.prom]]
            ## Exporter address
            url = "http://$IP:9527/v1/metric?metrics_api_key=apikey_5577006791947779410"

            ## Collector alias
           source = "kodo-prom"

            ## Metric type filtering, optional values are counter, gauge, histogram, summary
            # By default, only counter and gauge types of metrics are collected
            # If empty, no filtering is performed
            # metric_types = ["counter","gauge"]
            metric_types = []

            ## Metric name filtering
            # Supports regex, multiple configurations can be set, meeting any one of them is sufficient
            # If empty, no filtering is performed
            # metric_name_filter = ["cpu"]

            ## Metric set name prefix
            # Configuring this item adds a prefix to the metric set name
            measurement_prefix = ""

            ## Metric set name
            # By default, the metric name is split by underscore "_", the first field after splitting becomes the metric set name, the remaining fields become the current metric name
            # If measurement_name is configured, the metric name will not be split
            # The final metric set name will have the measurement_prefix added as a prefix
            # measurement_name = "prom"

            ## Collection interval "ns", "us" (or "µs"), "ms", "s", "m", "h"
            interval = "10s"

            ## Filter tags, multiple tags can be configured
            # Matching tags will be ignored
            # tags_ignore = ["xxxx"]

            ## TLS Configuration
            tls_open = false
            # tls_ca = "/tmp/ca.crt"
            # tls_cert = "/tmp/peer.crt"
            # tls_key = "/tmp/peer.key"

            ## Custom metric set names
            # Metrics containing the prefix can be grouped into one metric set
            # Custom metric set name configuration takes precedence over measurement_name configuration
            [[inputs.prom.measurements]]
              prefix = "kodo_api_"
              name = "kodo_api"

           [[inputs.prom.measurements]]
             prefix = "kodo_workers_"
             name = "kodo_workers"

           [[inputs.prom.measurements]]
             prefix = "kodo_workspace_"
             name = "kodo_workspace"

           [[inputs.prom.measurements]]
             prefix = "kodo_dql_"
             name = "kodo_dql"

            ## Custom Tags
            [inputs.prom.tags]
            # some_tag = "some_value"
            # more_tag = "some_other_value"
Note

The above only enables log collection. To perform slicing on the logs' status, you need to configure the corresponding Pipeline.

Use Pipeline to Slice Logs

Import Pipeline template with one click on the interface

Pipeline Download Address

pipeline001

Configure APM

Prerequisites
  1. Your HOST must have DataKit installed

  2. And enable the ddtrace collector on DataKit, injecting it via K8S ConfigMap.

Start Configuration

  • Modify the Deployment configuration under the forethought-core Namespace
kubectl edit -n <namespace> deployment <service_name>
spec:
  template:
    spec:
      affinity: {}
      containers:
      - args:
        - ddtrace-run               ## Add this line
        - gunicorn
        - -c
        - wsgi.py
        - web:wsgi()
        - --limit-request-line
        - "0"
        - -e
        - RUN_APP_CODE=front
        - --timeout
        - "300"
        env:                     ## Enable DDtrace collection, add the following content (including services under the forethought-kodo Namespace in the table)
        - name: DD_PATCH_MODULES
          value: redis:true,urllib3:true,httplib:true,httpx:true
        - name: DD_AGENT_PORT
          value: "9529"
        - name: DD_GEVENT_PATCH_ALL
          value: "true"
        - name: DD_SERVICE
          value: py-front-backend    ## Modify to the corresponding service name
        - name: DD_TAGS
          value: project:dataflux
        - name: DD_AGENT_HOST
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
Note

There are multiple services under the forethought-core namespace that can be enabled. If you see all arg defaults in yaml files need to be enabled.

  • Page Display

img

Configure Synthetic Tests

1) Create a new website to monitor

boce1

2) Configure the Synthetic Testing task

boce2

Note

Modify according to the actual domain name settings

Name Synthetic Testing Address Type Task Status Operation
cn4-console-api https://cn4-console-api.guance.com HTTP Start img
cn4-auth https://cn4-auth.guance.com HTTP Start img
cn4-openway https://cn4-openway.guance.com HTTP Start img
cn4-static-res https://cn4-static-res.guance.com/dataflux-template/README.md HTTP Start img
cn4-console https://cn4-console.guance.com HTTP Start img
cn4-management-api https://cn4-management-api.guance.com HTTP Start img

Configure RUM

1) Deploy a DataKit in Deployment status

## deployment-datakit.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    workload.user.cattle.io/workloadselector: apps.deployment-utils-test-rum-datakit
    manager: kube-controller-manager
  name: test-rum-datakit  
  namespace: utils
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      workload.user.cattle.io/workloadselector: apps.deployment-utils-test-rum-datakit 
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      labels:
        workload.user.cattle.io/workloadselector: apps.deployment-utils-test-rum-datakit
    spec:
      affinity: {}
      containers:
      - env:
        - name: ENV_DATAWAY
          value: http://internal-dataway.utils:9528?token=xxxxxx    ## Modify the token here
        - name: ENV_DISABLE_404PAGE
          value: "1"
        - name: ENV_GLOBAL_TAGS
          value: project=dataflux-saas-prodution,host_ip=__datakit_ip,host=__datakit_hostname
        - name: ENV_HTTP_LISTEN
          value: 0.0.0.0:9529
        - name: ENV_IPDB
          value: iploc
        - name: ENV_RUM_ORIGIN_IP_HEADER
          value: X-Forwarded-For
        - name: ENV_DEFAULT_ENABLED_INPUTS
          value: rum
        image: pubrepo.guance.com/datakit/datakit:1.5.0
        imagePullPolicy: Always
        name: test-rum-datakit
        resources: {}
        securityContext:
          allowPrivilegeEscalation: false
          capabilities: {}
          privileged: false
          readOnlyRootFilesystem: false
          runAsNonRoot: false
        stdin: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        tty: true
        volumeMounts:
        - mountPath: /usr/local/datakit/data/ipdb/iploc/
          name: datakit-ipdb
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: registry-key
      initContainers:
      - args:
        - tar -xf /opt/iploc.tar.gz -C /usr/local/datakit/data/ipdb/iploc/
        command:
        - bash
        - -c
        image: pubrepo.guance.com/datakit/iploc:1.0
        imagePullPolicy: IfNotPresent
        name: init-volume
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/local/datakit/data/iploc/
          name: datakit-ipdb
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir: {}
        name: datakit-ipdb
---
apiVersion: v1
kind: Service
metadata:
  name: test-rum-datakit
  namespace: utils
spec:
  ports:
  - name: http
    port: 9529
    protocol: TCP
    targetPort: 9529
  selector:
    app: test-rum-datakit
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}

2) Modify the ConfigMap configuration named config.js for the forethought-webclient service

Note

All inner-app domain names below should be modified to the actual corresponding domain names

window.DEPLOYCONFIG = {
    cookieDomain: '.guance.com',
    apiUrl: 'https://cn4-console-api.guance.com',
    wsUrl: 'wss://.guance.com',
    innerAppDisabled: 0,
    innerAppLogin: 'https://cn4-auth.guance.com/redirectpage/login',
    innerAppRegister: 'https://cn4-auth.guance.com/redirectpage/register',
    innerAppProfile: 'https://cn4-auth.guance.com/redirectpage/profile',
    innerAppCreateworkspace: 'https://cn4-auth.guance.com/redirectpage/createworkspace',
    staticFileUrl: 'https://cn4-static-res.guance.com',
    staticDatakit: 'https://static.guance.com',
    cloudDatawayUrl: '',
    isSaas: '1',
    showHelp: 1,
    rumEnable: 1,                                                                              ## 0 means off, 1 means on, enable it here
    rumDatakitUrl: "",                                                                         ## Modify to the deployment datakit address
    rumApplicationId: "",                                                                      ## Modify to the actual appid
    rumJsUrl: "https://static.guance.com/browser-sdk/v2/dataflux-rum.js",
    rumDataEnv: 'prod',
    shrineApiUrl: '',
    upgradeUrl: '',
    rechargeUrl: '',
    paasCustomLoginInfo: []
};

3) Modify the ConfigMap configuration named dataway-config under utils

token: xxxxxxxxxxx       ## Modify to the actual token

Import Views and Monitor Templates

Application Service Monitoring Template Download Address

  • Import Template

allin

Note

After importing, modify the corresponding jump link configurations in the monitor. Replace dsbd_xxxx with the corresponding dashboard and wksp_xxxx with the monitored workspace.

Func Self-Observability (Optional)

Func Task Log Data Reporting

The function execution logs and automatic trigger configurations of DataFlux Func can be directly reported to Guance. Follow the steps shown in the figure below:

allin

Fill in the DataWay/OpenWay address and Token information in the Guance data reporting section, as follows:

https://openway.guance.com?token=tkn_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Note: If Func data reporting fails, you can check the DataFlux Func documentation

Verification Method

  • Check if there are any data on the dashboards in the scenario
  • Check if there is any related information from DataKit enabled HOSTs in the infrastructure
  • Check if the metrics contain MySQL, Redis, etc., database metric data
  • Check if there are any logs and if the corresponding statuses are enabled
  • Check if APM has RUM data

Feedback

Is this page helpful? ×