Syslog Collection
Syslog collection falls into two categories: one is logs that are already text-based, typically located in /var/log/syslog or /var/log/messages, which can be collected directly by configuring the logging collector. The other is Systemd Journal, which consists of binary files typically found in /run/log/journal or /var/log/journal. Text-based logs are usually converted from Journald by the rsyslog service.
This article mainly describes how to collect Journald binary logs directly via DataKit. The basic process is as follows:
sequenceDiagram
autonumber
box Linux Server
participant journald as Journald
participant rsyslog as rsyslog Service
end
participant dk as DataKit
rsyslog ->> journald: Read binary syslog
rsyslog ->> rsyslog: Convert to plain text logs
rsyslog ->> dk: Send to TCP/UDP log service
Prerequisites¶
- Install rsyslog service
- Install DataKit
Configuration Steps¶
-
Enable the DataKit log collector and its TCP/UDP listening port
/usr/loca/datakit/conf.d/logging.conf# Socket log reception, supports tcp/udp protocols # Recommended to use internal network ports for security sockets = [ "tcp://0.0.0.0:9540", # TCP listening port # Or use UDP #"udp://0.0.0.0:9541", # UDP listening port ] source = "journald-syslog" # Can specify any custom source here, default is 'default' -
Restart DataKit
Host Journald Log Collection¶
- Edit /etc/rsyslog.conf (or create a new forward-to-datakit.conf under /etc/rsyslog.d/)
-
Add forwarding rules:
/etc/rsyslog.d/forward-to-datakit.conf# Forward all logs (*.*) to remote IP port # One @ represents UDP, two @@ represent TCP # TCP forwarding example *.* @@0.0.0.0:9540 # UDP forwarding example *.* @0.0.0.0:9541Enable only one of UDP/TCP here; TCP is used as an example below.
-
Restart rsyslog:
systemctl restart rsyslog
Kubernetes Node Journald Log Collection¶
If collecting Node Journald binary logs in Kubernetes, we can send logs to the current Daemonset DataKit (configuration is the same as above). In this case, we need to mount a logging.conf collector via ConfigMap and enable TCP/UDP log reception configuration.
-
Allow DataKit specific log receiving ports and map them to the Node (choose either TCP or UDP port below):
-
Change the rsyslog forwarding address to the DataKit service address:
-
Restart rsyslog:
systemctl restart rsyslog
Collection Effect¶
If DataKit successfully collects data, you will see output similar to the following in datakit monitor:
| Source│Cat│Feeds│ P90Lat│P90Pts│Filtered│ LastFeed│ AvgCost│Errors
|socketLog.journal-syslog│ L │ 2 │104.138µs│ 2 │ 0 │5m43.891165045s ago│ - │
Here journal-syslog corresponds to the source value configured in the collector.
More Formatting¶
By default, the format of logs collected by DataKit from Journald might look like this:
<14>1 2024-03-19T15:53:04.391+0800 host-name app-name 12345 00000000 my-message-id This is the log message
This includes log time, hostname, process info, etc., but it is not convenient for Pipeline extraction. We can configure rsyslog to output logs structurally using a JSON template. Below is a JSON configuration example. You can flexibly configure the JSON format and corresponding Pipeline processing according to your actual situation.
-
Add the following template to /etc/rsyslog.conf:
/etc/rsyslog.conf# Define a template named "JsonFormat" template(name="JsonFormat" type="list") { # JSON start constant(value="{") # 1. Time field (key: time) # dateFormat="rfc3339" output format like: 2026-02-02T08:05:00Z constant(value="\"time\":\"") property(name="timereported" dateFormat="rfc3339") constant(value="\",") # 2. Hostname field (key: host) constant(value="\"host\":\"") property(name="hostname") constant(value="\",") # 3. Program/Service name (key: app) - Optional, recommended for distinguishing kubelet or kernel constant(value="\"app\":\"") property(name="programname") constant(value="\",") # 4. Log level (key: level) - Optional constant(value="\"level\":\"") property(name="syslogseverity-text") constant(value="\",") # 5. Message body (key: message) # format="json" is very important! It automatically escapes double quotes in messages to prevent JSON format breakage constant(value="\"message\":\"") property(name="msg" format="json") constant(value="\"") # JSON end and newline constant(value="}\n") } -
Forwarding rule setting: Append JSON format forwarding
-
Use the
load_jsonfunction in Pipeline to easily process logs in this format. Place the following pipeline in the pipeline/logging/ directory of the DataKit installation path, with the filename matching thesourcefield configured above:/usr/loca/datakit/pipeline/logging/journal-syslog.pjournald_log=load_json(_) drop_origin_data() # Set a group of tags pt_kvs_set("host", journald_log["host"], true) pt_kvs_set("app", journald_log["app"], true) pt_kvs_set("status", journald_log["level"]) pt_kvs_set("message", journald_log["message"]) pt_kvs_set("time", journald_log["time"]) default_time("time")You can see the Pipeline processing status in
datakit monitor -MPsimilar to: