Skip to content

Configuring Container Log Collection via Kubernetes CRD

DataKit provides a declarative approach for container log collection configuration through Kubernetes Custom Resource Definitions (CRDs). Users can automatically configure DataKit's log collection by creating ClusterLoggingConfig resources, eliminating the need to manually modify DataKit configuration files or restart DataKit.

Prerequisites

  • Kubernetes cluster version 1.16+
  • DataKit Version-1.84.0 or later
  • Cluster administrator permissions (for registering the CRD)

Usage Workflow

  1. Register the Kubernetes CRD
  2. Create CRD resources to automatically apply collection configurations
  3. Configure CRD-related RBAC permissions for DataKit and start the DataKit service

Register Kubernetes CRD

Use the following YAML to register the ClusterLoggingConfig CRD:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: clusterloggingconfigs.logging.datakits.io
  labels:
    app: datakit-logging-config
    version: v1alpha1
spec:
  group: logging.datakits.io
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            apiVersion:
              type: string
            kind:
              type: string
            metadata:
              type: object
            spec:
              type: object
              required:
                - selector
              properties:
                selector:
                  type: object
                  properties:
                    namespaceRegex:
                      type: string
                    podRegex:
                      type: string
                    podLabelSelector:
                      type: string
                    containerRegex:
                      type: string
                podTargetLabels:
                  type: array
                  items:
                    type: string
                configs:
                  type: array
                  items:
                    type: object
                    required:
                      - source
                      - type
                    properties:
                      type:
                        type: string
                      source:
                        type: string
                      disable:
                        type: boolean
                      path:
                        type: string
                      storage_index:
                        type: string
                      service:
                        type: string
                      pipeline:
                        type: string
                      multiline_match:
                        type: string
                      character_encoding:
                        type: string
                      remove_ansi_escape_codes:
                        type: boolean
                      from_beginning:
                        type: boolean
                      tags:
                        type: object
                        additionalProperties:
                          type: string
  scope: Cluster
  names:
    plural: clusterloggingconfigs
    singular: clusterloggingconfig
    kind: ClusterLoggingConfig
    shortNames:
      - logging

Apply the CRD:

kubectl apply -f clusterloggingconfig-crd.yaml

Verify the CRD registration:

kubectl get crd clusterloggingconfigs.logging.datakits.io

Create CRD Configuration Resource

The following example configures the collection of log files from all Pods in the test01 namespace whose names start with logging:

apiVersion: logging.datakits.io/v1alpha1
kind: ClusterLoggingConfig
metadata:
  name: nginx-logs
spec:
  selector:
    namespaceRegex: "^(test01)$"
    podRegex: "^(logging.*)$"

  podTargetLabels:
    - app
    - version

  configs:
    - source: "nginx-access"
      type: "file"
      path: "/var/log/nginx/access.log"
      service: "nginx-logging"
      pipeline: "nginx-access.p"
      tags:
        log_type: "access"
        component: "nginx"

    - source: "nginx-error"
      type: "file"
      path: "/var/log/nginx/error.log"
      pipeline: "nginx-error.p"
      tags:
        log_type: "error"
        component: "nginx"

Apply the configuration:

kubectl apply -f logging-config.yaml

Configuration Details

  • selector Configuration

The selector is used to match target Pods and containers. All conditions have an AND relationship.

Field Type Required Description Example
namespaceRegex string No Regular expression to match namespace names "^(default\ | nginx)$"
podRegex string No Regular expression to match Pod names "^(nginx-log-demo.*)$"
podLabelSelector string No Pod label selector (comma-separated key=value pairs) "app=nginx,environment=production"
containerRegex string No Regular expression to match container names "^(nginx\ | app-container)$"

Selector example combination:

selector:
  namespaceRegex: "^(production|staging)$" # Match production or staging namespaces
  podLabelSelector: "app=web-server"       # Match Pods with the app=web-server label
  containerRegex: "^(app|web)$"            # Match containers named app or web
  • podTargetLabels Pod Label Propagation
Field Type Required Description Example
podTargetLabels []string No List of keys to copy from Pod Labels to log tags ["app", "version", "environment"]
  • configs Collection Configuration
Field Type Required Description Example
disable boolean No Whether to disable this collection configuration false
type string Yes Collection type: file - file logs, stdout - standard output "file"
source string Yes Log source identifier, used to distinguish different log streams "nginx-access"
service string No Service to which the logs belong, default value is the log source (source) "nginx"
path string Conditionally Required Log file path (supports glob patterns), required when type=file "/var/log/nginx/*.log"
multiline_match string No Regular expression for the starting line of multi-line logs "^\\d{4}-\\d{2}-\\d{2}"
pipeline string No Name of the log parsing pipeline configuration file "nginx-access.p"
storage_index string No Index name for log storage "app-logs"
remove_ansi_escape_codes boolean No Whether to remove ANSI escape codes from log data false
from_beginning boolean No Whether to collect logs from the beginning of the file false
from_beginning_threshold_size int No When a file is discovered, if the file size is less than this value, start reading from the beginning of the file, in bytes, default 20MB 1000
character_encoding string No Character encoding selection. Incorrect encoding may prevent data viewing. Supports utf-8, utf-16le, gbk, gb18030 or "". Default is empty "utf-8"
tags map[string]string No Key-value pairs of tags attached to the logs {"log_type": "access", "component": "nginx"}

Add Relevant RBAC Configuration

Add the following permissions to DataKit's ClusterRole:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: datakit
rules:
  # Other existing permissions
  - apiGroups: ["logging.datakits.io"]
    resources: ["clusterloggingconfigs"]
    verbs: ["get", "list", "watch"]

Complete RBAC configuration example:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: datakit
rules:
- apiGroups: ["rbac.authorization.k8s.io"]
  resources: ["clusterroles"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["nodes", "nodes/stats", "nodes/metrics", "namespaces", "pods", "pods/log", "events", "services", "endpoints", "persistentvolumes", "persistentvolumeclaims"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources: ["deployments", "daemonsets", "statefulsets", "replicasets"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["batch"]
  resources: ["jobs", "cronjobs"]
  verbs: [ "get", "list", "watch"]
- apiGroups: ["monitoring.coreos.com"]
  resources: ["podmonitors", "servicemonitors"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["logging.datakits.io"]
  resources: ["clusterloggingconfigs"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["metrics.k8s.io"]
  resources: ["pods", "nodes"]
  verbs: ["get", "list"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]

Example Application

The following is a complete CRD test application example:

apiVersion: v1
kind: Namespace
metadata:
  name: test01

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: logging-deployment
  namespace: test01
  labels:
    app: logging
    version: v1.0
    environment: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: logging
  template:
    metadata:
      labels:
        app: logging
        version: v1.0
        environment: test
    spec:
      containers:
      - name: demo
        image: ubuntu:22.04
        env:
        resources:
          limits:
            cpu: "200m"
            memory: "100Mi"
          requests:
            cpu: "100m"
            memory: "50Mi"
        command: ["/bin/bash", "-c", "--"]
        args:
        - |
          mkdir -p /tmp/opt/abc;
          i=1;
          while true; do
            echo "Writing logs to file ${i}.log";
            for ((j=1;j<=10000;j++)); do
              echo "$(date +'%F %H:%M:%S')  [$j]  Bash For Loop Examples. Hello, world! Testing output." >> /tmp/opt/abc/file_${i}.log;
              sleep 1;
            done;
            echo "Finished writing 10000 lines to file_${i}.log";
            i=$((i+1));
          done

Apply the deployment:

kubectl apply -f test-application.yaml
kubectl apply -f logging-config.yaml

FAQ

  • Does it support dynamic creation, modification, or deletion of CRDs?

Yes. DataKit dynamically adjusts log and field collection based on the status of CRDs. When a CRD is created or updated, the configuration is automatically applied to all matching containers. If a CRD is deleted, any ongoing log collection using that configuration will be terminated, though container stdout will continue to be collected using the default configuration.

  • Which has higher priority: CRD configuration or Pod Annotations configuration?

Pod Annotations configuration has higher priority. If a container matches both a CRD configuration and its Pod contains a datakit/logs annotation configuration, the Pod Annotations configuration will take effect, and the CRD configuration will be ignored.

  • How long does it take for CRD configuration changes to take effect?

The maximum window for configuration changes to take effect is 1 minute.

  • What happens when multiple ClusterLoggingConfigs match the same Pod?

In theory, the ClusterLoggingConfig that was created first (with the smallest ResourceVersion) will be applied. It is best to avoid such situations.

  • Do I need to add a mount to collect log files inside containers?

Starting from Version-1.84.0, for standard Docker mode or Containerd runtime (excluding CRI-O), log files inside containers can be collected without mounting. For the CRI-O runtime, Docker uses a tmpfs mount for the path, requiring an emptyDir mount to be added.

Feedback

Is this page helpful? ×