Managing Configuration with Helm¶

This document explains how to manage DataKit's environment variables and collection configurations using Helm. We can maintain Helm to manage changes in DataKit's configuration.

Installation and Configuration Modification¶

Download DataKit Charts Package with Helm¶

helm pull datakit --repo https://pubrepo.guance.com/chartrepo/datakit --untar

Modify `values.yaml`¶

Attention

values.yaml is located in the datakit directory.

Modify `dataway url`¶

...
datakit:
  # DataKit will send the indicator data to dataway. Please be sure to change the parameters
  # @param dataway_url - string - optional - default: 'https://guance.com'
  # The host of the DataKit intake server to send Agent data to, only set this option
  dataway_url: https://openway.guance.com?token=tkn_xxxxxxxxxx
...

Add Default Collectors¶

Add RUM, append the parameter at the end of default_enabled_inputs.

..
datakit:
  ...
  # @param default_enabled_inputs - string
  # The default open collector list, format example: input1, input2, input3
  default_enabled_inputs: cpu,disk,diskio,mem,swap,system,hostobject,net,host_processes,rum
....

Add Global Tags¶

Add the global tag cluster_name_k8s.

datakit:
  ...
  # @param global_tags - string - optional - default: 'host=__datakit_hostname,host_ip=__datakit_ip'
  # It supports filling in global tags during the installation phase. The format example is: Project = ABC, owner = Zhang San (multiple tags are separated by English commas)
  global_tags: host=__datakit_hostname,host_ip=__datakit_ip,cluster_name_k8s=prod

Add DataKit Environment Variables¶

Refer to Container Environment Variables for more environment variables.

# @param extraEnvs - array - optional
# extra env Add env for customization
# more, see: https://docs.guance.com/en/datakit/datakit-daemonset-deploy/#using-k8-env
# You can add more than one parameter  
extraEnvs:
 - name: ENV_NAMESPACE
   value: government-prod
 - name: ENV_GLOBAL_ELECTION_TAGS
   value: cluster_name_k8s=government-prod

Mount Collector Configuration¶

For collecting container host system logs, path is the container path, which must be under /usr/local/datakit/conf.d/. name is the configuration name. value is the content of the collection configuration. Sample files for collectors can be obtained from the /usr/local/datakit/conf.d/ directory inside the container.

dkconfig:   
 - path: "/usr/local/datakit/conf.d/logging.conf"
   name: logging.conf
   value: |-
     [[inputs.logging]]
       logfiles = [
         "/var/log/syslog",
         "/var/log/message",
       ]
       ignore = [""]
       source = ""
       service = ""
       pipeline = ""
       ignore_status = []
       character_encoding = ""
       auto_multiline_detection = true
       auto_multiline_extra_patterns = []
       remove_ansi_escape_codes = true
       blocking_mode = true
       ignore_dead_log = "1h"
       [inputs.logging.tags]

Mount Pipeline¶

Using test.p as an example, path is the absolute path of the configuration file, which must be under /usr/local/datakit/pipeline/. name is the Pipeline name. value is the Pipeline content.

dkconfig:
 - path: "/usr/local/datakit/pipeline/test.p"
   name: test.p
   value: |-
     # access log
     grok(_,"%{GREEDYDATA:ip_or_host} - - \\[%{HTTPDATE:time}\\] \"%{DATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\" %{NUMBER:http_code} ")
     grok(_,"%{GREEDYDATA:ip_or_host} - - \\[%{HTTPDATE:time}\\] \"-\" %{NUMBER:http_code} ")
     default_time(time)
     cast(http_code,"int")

     # error log
     grok(_,"\\[%{HTTPDERROR_DATE:time}\\] \\[%{GREEDYDATA:type}:%{GREEDYDATA:status}\\] \\[pid %{GREEDYDATA:pid}:tid %{GREEDYDATA:tid}\\] ")
     grok(_,"\\[%{HTTPDERROR_DATE:time}\\] \\[%{GREEDYDATA:type}:%{GREEDYDATA:status}\\] \\[pid %{INT:pid}\\] ")
     default_time(time)

Install DataKit¶

helm install datakit datakit \
         --repo  https://pubrepo.guance.com/chartrepo/datakit \
         -n datakit --create-namespace \
         -f values.yaml

Output:

NAME: datakit
LAST DEPLOYED: Tue Apr  4 19:13:29 2023
NAMESPACE: datakit
STATUS: deployed
REVISION: 1
NOTES:
1. Get the application URL by running these commands:
  export POD_NAME=$(kubectl get pods --namespace datakit -l "app.kubernetes.io/name=datakit,app.kubernetes.io/instance=datakit" -o jsonpath="{.items[0].metadata.name}")
  export CONTAINER_PORT=$(kubectl get pod --namespace datakit $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
  echo "Visit http://127.0.0.1:9527 to use your application"
  kubectl --namespace datakit port-forward $POD_NAME 9527:$CONTAINER_PORT

Specified Version Installation¶

helm install datakit datakit \
         --repo  https://pubrepo.guance.com/chartrepo/datakit \
         -n datakit --create-namespace \
         -f values.yaml \
         --version 1.5.x

Upgrade¶

Info

If values.yaml is lost, you can execute helm -n datakit get values datakit -o yaml > values.yaml to retrieve it.

helm upgrade datakit datakit \
         --repo  https://pubrepo.guance.com/chartrepo/datakit \
         -n datakit \
         -f values.yaml

Uninstall¶

helm uninstall datakit -n datakit

Configuration File Reference¶

values.yaml

# Default values for datakit.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

datakit:
  # DataKit will send the indicator data to dataway. Please be sure to change the parameters
  # @param dataway_url - string - optional - default: 'https://guance.com'
  # The host of the DataKit intake server to send Agent data to, only set this option
  dataway_url: https://openway.guance.com?token=tkn_xxxxxxxxxx

  # @param global_tags - string - optional - default: 'host=__datakit_hostname,host_ip=__datakit_ip'
  # It supports filling in global tags during the installation phase. The format example is: Project = ABC, owner = Zhang San (multiple tags are separated by English commas)
  global_tags: host=__datakit_hostname,host_ip=__datakit_ip,cluster_name_k8s=government-prod

  # @param default_enabled_inputs - string
  # The default open collector list, format example: input1, input2, input3
  default_enabled_inputs: cpu,disk,diskio,mem,swap,system,hostobject,net,host_processes,rum

  # @param enabled_election - boolean
  # When the election is enabled, it is enabled by default. If it needs to be enabled, you can give any non-empty string value to the environment variable. (e.g. true / false)
  enabled_election: true

  # @param log - string
  # Set logging verbosity, valid log levels are:
  # info, debug, stdout, warn, error, critical, and off
  log_level: info

  # @param http_listen - string
  # It supports specifying the network card bound to the Datakit HTTP service during the installation phase (default localhost)
  http_listen: 0.0.0.0:9529

image:
  # @param repository - string - required
  # Define the repository to use:
  #
  repository:  pubrepo.guance.com/datakit/datakit

  # @param tag - string - required
  # Define the Cluster-Agent version to use.
  #
  tag: ""

  # @param pullPolicy - string - optional
  # The Kubernetes [imagePullPolicy][] value
  #
  pullPolicy: Always

# https://docs.guance.com/en/datakit/datakit-daemonset-deploy/

git_repos:
  # Use Git management for DataKit input
  enable: false

  # @param git_url - string - required
  # You can set git@github.com:path/to/repository.git or http://username:password@github.com/path/to/repository.git.
  # see https://docs.guance.com/en/best-practices/insight/datakit-daemonset/#git
  git_url: "-"

  # @param git_key_path - string - optional
  # The Git Ssh Key Content,
  # For details,
  # -----BEGIN OPENSSH PRIVATE KEY--
  # ---xxxxx---
  #--END OPENSSH PRIVATE KEY-----
  git_key_path: "-"

  # @param git_key_pw - string - optional
  # The ssh Key Password
  git_key_pw: "-"

  # @param git_branch - string - required
  # Specifies the branch to pull. If it is blank, it is the default. The default is the main branch specified remotely, usually the master.
  git_branch: "master"

  # @param git_interval - string - required
  # Timed pull interval. (e.g. 1m)
  git_interval: "1m"
  is_use_key: false

# If true, Datakit installs ipdb.
# ref: https://docs.guance.com/en/datakit/datakit-tools-how-to/#install-ipdb
iploc:
  enable: true
  image:
    # @param repository - string - required
    # Define the repository to use:
    #
    repository: "pubrepo.guance.com/datakit/iploc"

    # @param tag - string - required
    # Define the Cluster-Agent version to use.
    #
    tag: "1.0"

# @param extraEnvs - array - optional
# extra env Add env for customization
# more, see: https://docs.guance.com/en/datakit/datakit-daemonset-deploy/#using-k8-env
# You can add more than one parameter
extraEnvs:
 - name: ENV_NAMESPACE
   value: government-prod
 - name: ENV_GLOBAL_ELECTION_TAGS
   value: cluster_name_k8s=government-prod
 # - name: ENV_NAMESPACE # electoral
 #   value: k8s
 # - name: "NODE_OPTIONS"
 #   value: "--max-old-space-size=1800"


resources:
  requests:
    cpu: "200m"
    memory: "128Mi" 
  limits:
    cpu: "2000m"
    memory: "4Gi"

# @param nameOverride - string - optional
# Override name of app.
#
nameOverride: ""

# @param fullnameOverride - string - optional
# Override name of app.
#
fullnameOverride: ""

podAnnotations:
  datakit/logs: |
    [{"disable": true}]

# @param tolerations - array - optional
# Allow the DaemonSet to schedule on tainted nodes (requires Kubernetes >= 1.6)
#
tolerations:
  - operator: Exists

service:
  type: ClusterIP
  port: 9529

# @param dkconfig - array - optional
# Configure Datakit custom input
#
dkconfig: 
 - path: "/usr/local/datakit/conf.d/logging.conf"
   name: logging.conf
   value: |-
     [[inputs.logging]]
       logfiles = [
         "/var/log/syslog",
         "/var/log/message",
       ]
       ignore = [""]
       source = ""
       service = ""
       pipeline = ""
       ignore_status = []
       character_encoding = ""
       auto_multiline_detection = true
       auto_multiline_extra_patterns = []
       remove_ansi_escape_codes = true
       blocking_mode = true
       ignore_dead_log = "1h"
       [inputs.logging.tags]
 - path: "/usr/local/datakit/pipeline/test.p"
   name: test.p
   value: |-
     # access log
     grok(_,"%{GREEDYDATA:ip_or_host} - - \\[%{HTTPDATE:time}\\] \"%{DATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\" %{NUMBER:http_code} ")
     grok(_,"%{GREEDYDATA:ip_or_host} - - \\[%{HTTPDATE:time}\\] \"-\" %{NUMBER:http_code} ")
     default_time(time)
     cast(http_code,"int")

     # error log
     grok(_,"\\[%{HTTPDERROR_DATE:time}\\] \\[%{GREEDYDATA:type}:%{GREEDYDATA:status}\\] \\[pid %{GREEDYDATA:pid}:tid %{GREEDYDATA:tid}\\] ")
     grok(_,"\\[%{HTTPDERROR_DATE:time}\\] \\[%{GREEDYDATA:type}:%{GREEDYDATA:status}\\] \\[pid %{INT:pid}\\] ")

# If true, deploys the kube-state-metrics deployment.
# ref: https://github.com/kubernetes/charts/tree/master/stable/kube-state-metrics
kubeStateMetricsEnabled: true

# If true, deploys the metrics-server deployment.
# ref: https://github.com/kubernetes-sigs/metrics-server/tree/master/charts/metrics-server
MetricsServerEnabled: false

FAQ¶

PodSecurityPolicy Issues¶

PodSecurityPolicy has been deprecated in Kubernetes1.21 and removed in Kubernetes1.25. If you forcibly upgrade the cluster version, Helm deployment of kube-state-metrics will fail with the following error:

Error: UPGRADE FAILED: current release manifest 
contains removed kubernetes api(s) for this kubernetes
version and it is therefore unable to build the
kubernetes objects for performing the diff. error from
kubernetes: unable to recognize "": no matches for kind
"PodSecurityPolicy" in version "policy/v1beta1"

Backup Helm Values¶

helm get values -n datakit datakit -o yaml > values.yaml

Clear Helm Information¶

Delete the secrets containing Helm information in the Datakit namespace.

Retrieve secrets

$ kubectl get secrets -n datakit
NAME                            TYPE                 DATA   AGE
sh.helm.release.v1.datakit.v1   helm.sh/release.v1   1      4h17m
sh.helm.release.v1.datakit.v2   helm.sh/release.v1   1      4h17m
sh.helm.release.v1.datakit.v3   helm.sh/release.v1   1      4h16m

Delete secrets containing sh.helm.release.v1.datakit

kubectl delete  secrets sh.helm.release.v1.datakit.v1 sh.helm.release.v1.datakit.v2 sh.helm.release.v1.datakit.v3   -n datakit

Reinstall or Upgrade¶

helm upgrade -i -n datakit datakit  --repo  https://pubrepo.guance.com/chartrepo/datakit  -f values.yaml