Skip to content

Managing Configuration with Helm

This document describes how to use Helm to manage DataKit environment variables and collection configurations. We can maintain DataKit configuration changes through Helm.

Installation and Configuration

Download DataKit Charts Package with Helm

helm pull datakit --repo https://pubrepo.guance.com/chartrepo/datakit --untar

Modify values.yaml

Info

values.yaml is located in the datakit directory.

Modify dataway url

...
datakit:
  # DataKit will send the indicator data to dataway. Please be sure to change the parameters
  # @param dataway_url - string - optional - default: 'https://guance.com'
  # The host of the DataKit intake server to send Agent data to, only set this option
  dataway_url: https://openway.guance.com?token=tkn_xxxxxxxxxx
...

Add Default Collectors

Add rum by appending the parameter to the end of default_enabled_inputs.

..
datakit:
  ...
  # @param default_enabled_inputs - string
  # The default open collector list, format example: input1, input2, input3
  default_enabled_inputs: cpu,disk,diskio,mem,swap,system,hostobject,net,host_processes,rum
....

Add Global Tags

Add cluster_name_k8s global tag.

datakit:
  ...
  # @param global_tags - string - optional - default: 'host=__datakit_hostname,host_ip=__datakit_ip'
  # It supports filling in global tags in the installation phase. The format example is: Project = ABC, owner = Zhang San (multiple tags are separated by English commas)
  global_tags: host=__datakit_hostname,host_ip=__datakit_ip,cluster_name_k8s=prod  

Add DataKit Environment Variables

For more environment variables, refer to Container Environment Variables

# @param extraEnvs - array - optional
# extra env Add env for customization
# more, see: https://docs.guance.com/datakit/datakit-daemonset-deploy/#using-k8-env
# You can add more than one parameter  
extraEnvs:
 - name: ENV_NAMESPACE
   value: government-prod
 - name: ENV_GLOBAL_ELECTION_TAGS
   value: cluster_name_k8s=government-prod

Mount Collector Configurations

Taking container host system log collection as an example, path is the container path and must be under /usr/local/datakit/conf.d/. name is the configuration name. value is the collection configuration content. You can obtain the collector's sample files by entering the /usr/local/datakit/conf.d/ directory in the container.

dkconfig:   
 - path: "/usr/local/datakit/conf.d/logging.conf"
   name: logging.conf
   value: |-
     [[inputs.logging]]
       logfiles = [
         "/var/log/syslog",
         "/var/log/message",
       ]
       ignore = [""]
       source = ""
       service = ""
       pipeline = ""
       ignore_status = []
       character_encoding = ""
       auto_multiline_detection = true
       auto_multiline_extra_patterns = []
       remove_ansi_escape_codes = true
       blocking_mode = true
       ignore_dead_log = "1h"
       [inputs.logging.tags]

Mount Pipeline

Taking test.p as an example, path is the absolute path of the configuration file and must be under /usr/local/datakit/pipeline/. name is the Pipeline name. value is the Pipeline content.

dkconfig:
 - path: "/usr/local/datakit/pipeline/test.p"
   name: test.p
   value: |-
     # access log
     grok(_,"%{GREEDYDATA:ip_or_host} - - \\[%{HTTPDATE:time}\\] \"%{DATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\" %{NUMBER:http_code} ")
     grok(_,"%{GREEDYDATA:ip_or_host} - - \\[%{HTTPDATE:time}\\] \"-\" %{NUMBER:http_code} ")
     default_time(time)
     cast(http_code,"int")

     # error log
     grok(_,"\\[%{HTTPDERROR_DATE:time}\\] \\[%{GREEDYDATA:type}:%{GREEDYDATA:status}\\] \\[pid %{GREEDYDATA:pid}:tid %{GREEDYDATA:tid}\\] ")
     grok(_,"\\[%{HTTPDERROR_DATE:time}\\] \\[%{GREEDYDATA:type}:%{GREEDYDATA:status}\\] \\[pid %{INT:pid}\\] ")
     default_time(time)

Install DataKit

helm install datakit datakit \
         --repo  https://pubrepo.guance.com/chartrepo/datakit \
         -n datakit --create-namespace \
         -f values.yaml

Output:

NAME: datakit
LAST DEPLOYED: Tue Apr  4 19:13:29 2023
NAMESPACE: datakit
STATUS: deployed
REVISION: 1
NOTES:
1. Get the application URL by running these commands:
  export POD_NAME=$(kubectl get pods --namespace datakit -l "app.kubernetes.io/name=datakit,app.kubernetes.io/instance=datakit" -o jsonpath="{.items[0].metadata.name}")
  export CONTAINER_PORT=$(kubectl get pod --namespace datakit $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
  echo "Visit http://127.0.0.1:9527 to use your application"
  kubectl --namespace datakit port-forward $POD_NAME 9527:$CONTAINER_PORT

Install Specific Version

helm install datakit datakit \
         --repo  https://pubrepo.guance.com/chartrepo/datakit \
         -n datakit --create-namespace \
         -f values.yaml \
         --version 1.5.x

Upgrade

Info

If values.yaml is lost, you can execute helm -n datakit get values datakit -o yaml > values.yaml to retrieve it.

helm upgrade datakit datakit \
         --repo  https://pubrepo.guance.com/chartrepo/datakit \
         -n datakit \
         -f values.yaml

Uninstall

helm uninstall datakit -n datakit 

Configuration File Reference

values.yaml
# Default values for datakit.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

datakit:
  # DataKit will send the indicator data to dataway. Please be sure to change the parameters
  # @param dataway_url - string - optional - default: 'https://guance.com'
  # The host of the DataKit intake server to send Agent data to, only set this option
  dataway_url: https://openway.guance.com?token=tkn_xxxxxxxxxx

  # @param global_tags - string - optional - default: 'host=__datakit_hostname,host_ip=__datakit_ip'
  # It supports filling in global tags in the installation phase. The format example is: Project = ABC, owner = Zhang San (multiple tags are separated by English commas)
  global_tags: host=__datakit_hostname,host_ip=__datakit_ip,cluster_name_k8s=government-prod

  # @param default_enabled_inputs - string
  # The default open collector list, format example: input1, input2, input3
  default_enabled_inputs: cpu,disk,diskio,mem,swap,system,hostobject,net,host_processes,rum

  # @param enabled_election - boolean
  # When the election is enabled, it is enabled by default. If it needs to be enabled, you can give any non empty string value to the environment variable. (e.g. true / false)
  enabled_election: true

  # @param log - string
  # Set logging verbosity, valid log levels are:
  # info, debug, stdout, warn, error, critical, and off
  log_level: info

  # @param http_listen - string
  # It supports specifying the network card bound to the DataKit HTTP service in the installation phase (default localhost)
  http_listen: 0.0.0.0:9529

image:
  # @param repository - string - required
  # Define the repository to use:
  #
  repository:  pubrepo.guance.com/datakit/datakit

  # @param tag - string - required
  # Define the Cluster-Agent version to use.
  #
  tag: ""

  # @param pullPolicy - string - optional
  # The Kubernetes [imagePullPolicy][] value
  #
  pullPolicy: Always

# https://docs.guance.com/datakit/datakit-daemonset-deploy/

git_repos:
  # use git management DataKit input
  enable: false

  # @param git_url - string - required
  # You Can Set git@github.com:path/to/repository.git or http://username:password@github.com/path/to/repository.git.
  # see https://docs.guance.com/best-practices/insight/datakit-daemonset/#git
  git_url: "-"

  # @param git_key_path - string - optional
  # The Git Ssh Key Content,
  # For details,
  # -----BEGIN OPENSSH PRIVATE KEY--
  # ---xxxxx---
  #--END OPENSSH PRIVATE KEY-----
  git_key_path: "-"

  # @param git_key_pw - string - optional
  # The ssh Key Password
  git_key_pw: "-"

  # @param git_url - string - required
  # Specifies the branch to pull. If it is blank, it is the default. The default is the main branch specified remotely, usually the master.
  git_branch: "master"

  # @param git_url - string - required
  # Timed pull interval. (e.g. 1m)
  git_interval: "1m"
  is_use_key: false

# If true, DataKit install ipdb.
# ref: https://docs.guance.com/datakit/datakit-tools-how-to/#install-ipdb
iploc:
  enable: true
  image:
    # @param repository - string - required
    # Define the repository to use:
    #
    repository: "pubrepo.guance.com/datakit/iploc"

    # @param tag - string - required
    # Define the Cluster-Agent version to use.
    #
    tag: "1.0"

# @param extraEnvs - array - optional
# extra env Add env for customization
# more, see: https://docs.guance.com/datakit/datakit-daemonset-deploy/#using-k8-env
# You can add more than one parameter
extraEnvs:
 - name: ENV_NAMESPACE
   value: government-prod
 - name: ENV_GLOBAL_ELECTION_TAGS
   value: cluster_name_k8s=government-prod
 # - name: ENV_NAMESPACE # electoral
 #   value: k8s
 # - name: "NODE_OPTIONS"
 #   value: "--max-old-space-size=1800"


resources:
  requests:
    cpu: "200m"
    memory: "128Mi" 
  limits:
    cpu: "2000m"
    memory: "4Gi"

# @param nameOverride - string - optional
# Override name of app.
#
nameOverride: ""

# @param fullnameOverride - string - optional
# Override name of app.
#
fullnameOverride: ""

podAnnotations:
  datakit/logs: |
    [{"disable": true}]

# @param tolerations - array - optional
# Allow the DaemonSet to schedule on tainted nodes (requires Kubernetes >= 1.6)
#
tolerations:
  - operator: Exists

service:
  type: ClusterIP
  port: 9529

# @param dkconfig - array - optional
# Configure DataKit custom input
#
dkconfig: 
 - path: "/usr/local/datakit/conf.d/logging.conf"
   name: logging.conf
   value: |-
     [[inputs.logging]]
       logfiles = [
         "/var/log/syslog",
         "/var/log/message",
       ]
       ignore = [""]
       source = ""
       service = ""
       pipeline = ""
       ignore_status = []
       character_encoding = ""
       auto_multiline_detection = true
       auto_multiline_extra_patterns = []
       remove_ansi_escape_codes = true
       blocking_mode = true
       ignore_dead_log = "1h"
       [inputs.logging.tags]
 - path: "/usr/local/datakit/pipeline/test.p"
   name: test.p
   value: |-
     # access log
     grok(_,"%{GREEDYDATA:ip_or_host} - - \\[%{HTTPDATE:time}\\] \"%{DATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\" %{NUMBER:http_code} ")
     grok(_,"%{GREEDYDATA:ip_or_host} - - \\[%{HTTPDATE:time}\\] \"-\" %{NUMBER:http_code} ")
     default_time(time)
     cast(http_code,"int")

     # error log
     grok(_,"\\[%{HTTPDERROR_DATE:time}\\] \\[%{GREEDYDATA:type}:%{GREEDYDATA:status}\\] \\[pid %{GREEDYDATA:pid}:tid %{GREEDYDATA:tid}\\] ")
     grok(_,"\\[%{HTTPDERROR_DATE:time}\\] \\[%{GREEDYDATA:type}:%{GREEDYDATA:status}\\] \\[pid %{INT:pid}\\] ")

# If true, deploys the kube-state-metrics deployment.
# ref: https://github.com/kubernetes/charts/tree/master/stable/kube-state-metrics
kubeStateMetricsEnabled: true

# If true, deploys the metrics-server deployment.
# ref: https://github.com/kubernetes-sigs/metrics-server/tree/master/charts/metrics-server
MetricsServerEnabled: false

FAQ

Securing Dataway Token with Kubernetes Secret

DataKit supports two methods to secure dataway_token in Kubernetes configuration.

  • When installing DataKit with Helm, you can hide dataway_token by configuring Secret:

    Install with Helm command, enable Secret mode

    helm install datakit charts/datakit \
      --set datakit.dataway_url="https://openway.example.com?token=tkn_xxxxxxxxxxxx" \
      --set datakit.dataway_secret_enabled=true
    

    With this approach: - Helm automatically creates a Kubernetes Secret to store the encrypted dataway_url - The ENV_DATAWAY environment variable in the Pod references the Secret

  • When installing with native YAML files, you need to manually create the Secret and modify environment variable references:

    1. Create Secret Create a Secret containing ENV_DATAWAY:

      apiVersion: v1
      kind: Secret
      metadata:
        name: datakit-dataway-secret
        namespace: datakit
      type: Opaque
      data:
        ENV_DATAWAY: <base64-encoded-dataway-url>
      

      Base64 encode your dataway_url:

      echo -n "https://openway.example.com?token=tkn_xxxxxxxxxxxx" | base64
      
    2. Modify Environment Variable Reference In datakit.template.yaml or datakit-deployment.template.yaml, change the ENV_DATAWAY environment variable definition from:

      - name: ENV_DATAWAY
        value: "https://openway.example.com?token=tkn_xxxxxxxxxxxx"
      

      to:

      - name: ENV_DATAWAY
        valueFrom:
          secretKeyRef:
            name: datakit-dataway-secret
            key: ENV_DATAWAY
      
    3. Apply Configuration

    kubectl apply -f datakit.yaml
    

PodSecurityPolicy Issue

PodSecurityPolicy was deprecated in Kubernetes1.21 and removed in Kubernetes1.25. If you forcibly upgrade the cluster version, Helm deployment of kube-state-metrics will report an error:

Error: UPGRADE FAILED: current release manifest 
contains removed kubernetes api(s) for this kubernetes
version and it is therefore unable to build the
kubernetes objects for performing the diff. error from
kubernetes: unable to recognize "": no matches for kind
"PodSecurityPolicy" in version "policy/v1beta1"

Backup Helm Values

helm get values -n datakit datakit -o yaml > values.yaml

Clear Helm Information

Delete Helm information secrets in the DataKit namespace.

  • Get secrets
$ kubectl get secrets -n datakit
NAME                            TYPE                 DATA   AGE
sh.helm.release.v1.datakit.v1   helm.sh/release.v1   1      4h17m
sh.helm.release.v1.datakit.v2   helm.sh/release.v1   1      4h17m
sh.helm.release.v1.datakit.v3   helm.sh/release.v1   1      4h16m
  • Delete secrets with sh.helm.release.v1.datakit
kubectl delete  secrets sh.helm.release.v1.datakit.v1 sh.helm.release.v1.datakit.v2 sh.helm.release.v1.datakit.v3   -n datakit

Re-upgrade or Install

helm upgrade -i -n datakit datakit  --repo  https://pubrepo.guance.com/chartrepo/datakit  -f values.yaml

Feedback

Is this page helpful? ×