Skip to content

Syslog Collection

Syslog collection falls into two categories: one is logs that are already text-based, typically located in /var/log/syslog or /var/log/messages, which can be collected directly by configuring the logging collector. The other is Systemd Journal, which consists of binary files typically found in /run/log/journal or /var/log/journal. Text-based logs are usually converted from Journald by the rsyslog service.

This article mainly describes how to collect Journald binary logs directly via DataKit. The basic process is as follows:

sequenceDiagram
autonumber

box Linux Server
participant journald as Journald
participant rsyslog as rsyslog Service
end

participant dk as DataKit

rsyslog ->> journald: Read binary syslog
rsyslog ->> rsyslog: Convert to plain text logs
rsyslog ->> dk: Send to TCP/UDP log service

Prerequisites

  • Install rsyslog service
  • Install DataKit

Configuration Steps

  • Enable the DataKit log collector and its TCP/UDP listening port

    /usr/loca/datakit/conf.d/logging.conf
    # Socket log reception, supports tcp/udp protocols
    # Recommended to use internal network ports for security
    sockets = [
      "tcp://0.0.0.0:9540",  # TCP listening port
      # Or use UDP
      #"udp://0.0.0.0:9541",  # UDP listening port
    ]
    source = "journald-syslog" # Can specify any custom source here, default is 'default'
    
  • Restart DataKit

Host Journald Log Collection

  • Edit /etc/rsyslog.conf (or create a new forward-to-datakit.conf under /etc/rsyslog.d/)
  • Add forwarding rules:

    /etc/rsyslog.d/forward-to-datakit.conf
    # Forward all logs (*.*) to remote IP port
    # One @ represents UDP, two @@ represent TCP
    
    # TCP forwarding example
    *.* @@0.0.0.0:9540
    
    # UDP forwarding example
    *.* @0.0.0.0:9541
    

    Enable only one of UDP/TCP here; TCP is used as an example below.

  • Restart rsyslog: systemctl restart rsyslog

Kubernetes Node Journald Log Collection

If collecting Node Journald binary logs in Kubernetes, we can send logs to the current Daemonset DataKit (configuration is the same as above). In this case, we need to mount a logging.conf collector via ConfigMap and enable TCP/UDP log reception configuration.

  • Allow DataKit specific log receiving ports and map them to the Node (choose either TCP or UDP port below):

    apiVersion: v1
    kind: Service
    metadata:
      name: datakit-service
      namespace: datakit
    spec:
      selector:
        app: daemonset-datakit
      ports:
        - name: tcp-logging
          protocol: TCP
          port: 9540
        - name: udp-logging
          protocol: UDP
          port: 9541
    
  • Change the rsyslog forwarding address to the DataKit service address:

    # Forward all logs (*.*) to remote IP port
    # One @ represents UDP, two @@ represent TCP
    
    # TCP forwarding example
    *.* @@0.0.0.0:9540
    
    # UDP forwarding example
    #*.* @0.0.0.0:9541
    
  • Restart rsyslog: systemctl restart rsyslog

Collection Effect

If DataKit successfully collects data, you will see output similar to the following in datakit monitor:

|                  Source│Cat│Feeds│   P90Lat│P90Pts│Filtered│           LastFeed│       AvgCost│Errors
|socketLog.journal-syslog│ L │   2 │104.138µs│  2   │      0 │5m43.891165045s ago│      -       │

Here journal-syslog corresponds to the source value configured in the collector.

More Formatting

By default, the format of logs collected by DataKit from Journald might look like this:

<14>1 2024-03-19T15:53:04.391+0800 host-name app-name 12345 00000000 my-message-id This is the log message

This includes log time, hostname, process info, etc., but it is not convenient for Pipeline extraction. We can configure rsyslog to output logs structurally using a JSON template. Below is a JSON configuration example. You can flexibly configure the JSON format and corresponding Pipeline processing according to your actual situation.

  • Add the following template to /etc/rsyslog.conf:

    /etc/rsyslog.conf
    # Define a template named "JsonFormat"
    template(name="JsonFormat" type="list") {
        # JSON start
        constant(value="{")
    
        # 1. Time field (key: time)
        # dateFormat="rfc3339" output format like: 2026-02-02T08:05:00Z
        constant(value="\"time\":\"")
        property(name="timereported" dateFormat="rfc3339")
        constant(value="\",")
    
        # 2. Hostname field (key: host)
        constant(value="\"host\":\"")
        property(name="hostname")
        constant(value="\",")
    
        # 3. Program/Service name (key: app) - Optional, recommended for distinguishing kubelet or kernel
        constant(value="\"app\":\"")
        property(name="programname")
        constant(value="\",")
    
        # 4. Log level (key: level) - Optional
        constant(value="\"level\":\"")
        property(name="syslogseverity-text")
        constant(value="\",")
    
        # 5. Message body (key: message)
        # format="json" is very important! It automatically escapes double quotes in messages to prevent JSON format breakage
        constant(value="\"message\":\"")
        property(name="msg" format="json")
        constant(value="\"")
    
        # JSON end and newline
        constant(value="}\n")
    }
    
  • Forwarding rule setting: Append JSON format forwarding

    /etc/rsyslog.d/forward-to-datakit.conf
    *.* @@0.0.0.0:9540;JsonFormat
    
  • Use the load_json function in Pipeline to easily process logs in this format. Place the following pipeline in the pipeline/logging/ directory of the DataKit installation path, with the filename matching the source field configured above:

    /usr/loca/datakit/pipeline/logging/journal-syslog.p
    journald_log=load_json(_)
    
    drop_origin_data()
    
    # Set a group of tags
    pt_kvs_set("host", journald_log["host"], true) 
    pt_kvs_set("app", journald_log["app"], true)
    pt_kvs_set("status", journald_log["level"])
    pt_kvs_set("message", journald_log["message"])
    pt_kvs_set("time", journald_log["time"])
    default_time("time")
    

    You can see the Pipeline processing status in datakit monitor -MP similar to:

    ║          Script│Cat│Namespace│TotalPts│DropPts│ErrPts│           PLUpdate│ AvgCost  ║
    ║journal-syslog.p│  L│  default│     12 │   -   │  -   │8m16.719853556s ago│65.548µs  ║
    

Feedback

Is this page helpful? ×