Journald
Journald 采集器用于在 Linux 系统上从 systemd journal (journald) 收集日志。它使用外部二进制包装器与 libsystemd 交互,高效地从 journal 收集结构化日志条目。
前置条件¶
- 仅限 Linux: 需要
systemd和journald - libsystemd: 外部二进制需要
libsystemd开发库 - 权限: DataKit 需要 journal 文件的读取权限(通常需要加入
systemd-journal组)
系统要求检查¶
部署 journald 采集器之前,验证您的系统是否满足要求:
可通过如下命令快速检查:
systemctl --version >/dev/null 2>&1 && journalctl -n 1 >/dev/null 2>&1 && echo "Systemd OK" || echo "Systemd not available"
以下是全面的预检查脚本:
journald-prereq-check.sh
#!/bin/bash
# journald-prereq-check.sh - 验证 systemd 要求
echo "=== Journald 采集器前置条件检查 ==="
echo
# 1. 检查 systemctl 是否存在
echo -n "1. systemctl 命令:"
if command -v systemctl >/dev/null 2>&1; then
VERSION=$(systemctl --version | head -1)
echo "✅ 已找到 - $VERSION"
else
echo "❌ 未找到 - 未安装 systemctl"
exit 1
fi
# 2. 检查 libsystemd 库
echo -n "2. libsystemd.so.0: "
if ldconfig -p 2>/dev/null | grep -q "libsystemd.so.0"; then
LIBPATH=$(ldconfig -p 2>/dev/null | grep "libsystemd.so.0" | head -1 | awk '{print $NF}')
echo "✅ 已找到 - $LIBPATH"
else
echo "❌ 未找到 - 缺少 libsystemd.so.0"
exit 1
fi
# 3. 检查 journalctl 访问权限
echo -n "3. journalctl 访问:"
if journalctl -n 1 >/dev/null 2>&1; then
echo "✅ 正常 - 可以读取 journal"
else
echo "⚠️ 受限 - journalctl 存在但无读取权限"
fi
# 4. 检查 journal 目录
echo "4. Journal 目录:"
for dir in "/var/log/journal" "/run/log/journal"; do
echo -n " $dir: "
if [ -d "$dir" ]; then
if [ -r "$dir" ]; then
echo "✅ 存在且可读"
else
echo "⚠️ 存在但不可读"
fi
else
echo "❌ 未找到"
fi
done
# 5. 检查 systemd 版本
echo -n "5. systemd 版本:"
SYSTEMD_VERSION=$(systemctl --version | head -1 | grep -oP 'systemd \K\d+' || echo "0")
if [ "$SYSTEMD_VERSION" -ge 205 ]; then
echo "✅ v$SYSTEMD_VERSION (满足最低 v205 要求)"
else
echo "⚠️ v$SYSTEMD_VERSION (低于推荐版本 v205)"
fi
echo
echo "=== 检查完成 ==="
保存为 journald-prereq-check.sh 并运行:
预期输出:
=== Journald 采集器前置条件检查 ===
1. systemctl 命令:✅ 已找到 - systemd 257 (257.3-1-arch)
2. libsystemd.so.0: ✅ 已找到 - /usr/lib/libsystemd.so.0
3. journalctl 访问:✅ 正常 - 可以读取 journal
4. Journal 目录:
/var/log/journal: ✅ 存在且可读
/run/log/journal: ✅ 存在且可读
5. systemd 版本:✅ v257 (满足最低 v205 要求)
=== 检查完成 ===
可能的故障排除方案:
| 问题 | 解决方案 |
|---|---|
systemctl: command not found |
安装 systemd 或使用替代日志收集方式 |
libsystemd.so.0: cannot open |
安装 systemd-libs:apt install libsystemd0 或 yum install systemd-libs |
journalctl: no read access |
将用户添加到 systemd-journal 组:usermod -aG systemd-journal $USER |
/var/log/journal not found |
启用持久化 journal:mkdir -p /var/log/journal && systemd-tmpfiles --create |
配置¶
采集器配置¶
成功安装并启动 DataKit 后,通过复制配置文件启用 Journald 采集器:
进入 DataKit 安装目录下的 conf.d/samples 目录,复制 journald.conf.sample 并命名为 journald.conf。示例如下:
# Collect systemd journal logs using external binary
[[inputs.journald]]
## Name of the collector
name = 'journald'
## Run as daemon (required for journald collection)
daemon = true
http_endpoint = "http://localhost:9529"
log_level = "info"
log_path = "/usr/local/datakit/externals/journald.log"
## Path to datakit-journald binary
## Default: searches in /usr/local/datakit/externals/datakit-journald and ./externals/datakit-journald
# cmd = "/usr/local/datakit/externals/datakit-journald"
## Interval to check external process (for non-daemon mode)
# interval = "10s"
## Rootfs mount point for container/Kubernetes mode only
## DataKit uses this as the host root prefix when auto-prefixing absolute paths
## and preparing host-side systemd libraries (copy_node_libs).
mount_dir = "/rootfs"
## Journal directory paths
## Host installation: use default paths
## Container/Kubernetes: DataKit auto-prefixes absolute paths with mount_dir.
paths = [
"/var/log/journal", # Persistent storage
"/run/log/journal", # Runtime storage
]
## Filter by systemd unit names (supports glob patterns)
## Empty = all units
# units = ["*.service", "docker.service", "kubelet.service"]
## Filter by priority levels
## Levels: emerg(0), alert(1), crit(2), err(3), warning(4), notice(5), info(6), debug(7)
## Empty = all priorities
# priorities = ["err", "warning", "crit", "alert", "emerg"]
## Field selection - collect all by default, exclude specific fields
exclude_fields = [
"_BOOT_ID",
"_MACHINE_ID",
"__MONOTONIC_TIMESTAMP",
]
## Collection behavior
## tail_only=true: Only collect new entries (cursor not needed)
## tail_only=false: Read from last position (cursor required)
tail_only = true
max_entries_per_batch = 1000
## Cursor management (only used when tail_only=false)
# save_cursor = true
# cursor_file = "/usr/local/datakit/cache/journald.cursor"
## Environment variables for external binary
# envs = [
# "LD_LIBRARY_PATH=/usr/local/datakit/externals:$LD_LIBRARY_PATH",
# ]
## Host-side systemd library prepare:
## - Container/Kubernetes (Docker or Kubernetes): auto forced to true.
## - Non-container host: disabled by default. If enabled manually, set copy_node_libs_files explicitly.
## - In container/kubernetes mode, when copy_node_libs_files is empty, DataKit first copies
## libsystemd.so* then runs "LD_LIBRARY_PATH=<dst> ldd libsystemd.so.0"
## style dependency probing and copies missing .so files automatically.
# copy_node_libs = true
## Optional override file list. If set, only these patterns/files are copied.
# copy_node_libs_files = [
# "libsystemd.so*",
# "liblz4.so*",
# "libzstd.so*",
# "liblzma.so*",
# "libcap.so*",
# "libgcrypt.so*",
# "libgpg-error.so*",
# "libselinux.so*",
# "libmount.so*",
# "libblkid.so*",
# "libacl.so*",
# "libpcre2-8.so*",
# "libpcre.so*",
# ]
## Additional arguments for external binary
# args = []
[inputs.journald.tags]
# Add custom tags as needed
# environment = "production"
# cluster = "k8s-cluster-1"
配置完成后,重启 DataKit。
可以通过 ConfigMap 注入采集器配置 或 配置 ENV_DATAKIT_INPUTS 开启。
配置选项¶
| 选项 | 类型 | 默认值 | 描述 |
|---|---|---|---|
paths |
[]string | ["/var/log/journal", "/run/log/journal"] |
Journal 目录路径 |
units |
[]string | [] |
按 systemd 单元名称过滤(支持 glob 模式,例如 *.service) |
priorities |
[]string | [] |
按优先级过滤:emerg、alert、crit、err、warning、notice、info、debug |
exclude_fields |
[]string | [] |
从收集中排除的 journal 字段(例如 _BOOT_ID、_MACHINE_ID) |
tail_only |
bool | true |
仅收集新条目(启动时跳过历史日志) |
max_entries_per_batch |
int | 1000 |
每批收集的最大条目数 |
save_cursor |
bool | true |
持久化读取位置以便重启后恢复 |
cursor_file |
string | /usr/local/datakit/cache/journald.pos |
存储游标位置的路径 |
mount_dir |
string | "/rootfs" |
仅在容器/Kubernetes 模式使用的 rootfs 挂载目录。DataKit 会用它作为绝对 paths 的前缀,以及宿主机动态库准备的源目录根路径 |
copy_node_libs |
bool | false(容器或 Kubernetes 下自动强制为 true) |
启动 external collector 前,是否从 mount_dir 复制宿主机动态库到 DataKit 自己的 external-libs 目录。在容器或 Kubernetes 环境(datakit.Docker || config.IsKubernetes())中会自动启用 |
copy_node_libs_files |
[]string | [] |
要复制的动态库文件名或 glob 列表。若显式配置则只复制该列表。若容器/Kubernetes 自动模式下为空,DataKit 会先复制 libsystemd.so*,再执行 LD_LIBRARY_PATH=/usr/local/datakit/externals/systemd-libs ldd libsystemd.so.0 风格依赖探测并自动补齐缺失 .so。若在非容器/Kubernetes 且 copy_node_libs=true 时留空,会以配置错误方式启动失败 |
日志字段¶
journald¶
Systemd 日志。注意:字段的可用性因 systemd 版本而异 - 请参阅每个字段描述中的版本提示(例如 v188+、v205+)
| Tags & Fields | Description |
|---|---|
| host ( tag) |
Hostname (from _HOSTNAME, v188+) |
| service ( tag) |
Service identifier (from SYSLOG_IDENTIFIER, _SYSTEMD_UNIT, or _COMM) |
| CODE_FILE | Source code filename for debugging (v188+) Type: string Unit: N/A |
| CODE_FUNC | Function name for debugging (v188+) Type: string Unit: N/A |
| CODE_LINE | Source code line number for debugging (v188+) Type: int Unit: N/A |
| COREDUMP_CMDLINE | Full command line at crash time (v188+) Type: string Unit: N/A |
| COREDUMP_CWD | Current working directory at crash time (v188+) Type: string Unit: N/A |
| COREDUMP_EXE | Executable path of crashed binary (v188+) Type: string Unit: N/A |
| COREDUMP_GID | Crashed process GID (v188+) Type: int Unit: N/A |
| COREDUMP_HOSTNAME | Hostname at crash time (v188+) Type: string Unit: N/A |
| COREDUMP_PID | Crashed process PID (v188+) Type: int Unit: N/A |
| COREDUMP_ROOT | Root directory, usually / (v188+) Type: string Unit: N/A |
| COREDUMP_SIGNAL | Signal number that caused crash (v188+) Type: int Unit: N/A |
| COREDUMP_STACKTRACE | Full stack trace backtrace (v188+) Type: string Unit: N/A |
| COREDUMP_TIMESTAMP | Crash timestamp in microseconds (v188+) Type: int Unit: time,μs |
| COREDUMP_UID | Crashed process UID (v188+) Type: int Unit: N/A |
| COREDUMP_UNIT | System unit that crashed (v198+) Type: string Unit: N/A |
| COREDUMP_USER_UNIT | User unit that crashed (v198+) Type: string Unit: N/A |
| DOCUMENTATION | Documentation URL http/https/file/man/info (v246+) Type: string Unit: N/A |
| ERRNO | Unix error number associated with message (v188+) Type: int Unit: N/A |
| INVOCATION_ID | Invocation ID for systemd code messages (v245+) Type: string Unit: N/A |
| MESSAGE_ID | 128-bit message identifier (UUID format, v188+)Type: string Unit: N/A |
| OBJECT_AUDIT_LOGINUID | Target login UID (v205+) Type: int Unit: N/A |
| OBJECT_AUDIT_SESSION | Target audit session ID (v205+) Type: int Unit: N/A |
| OBJECT_CMDLINE | Target process full command line (v205+) Type: string Unit: N/A |
| OBJECT_COMM | Target process comm (v205+) Type: string Unit: N/A |
| OBJECT_EXE | Target process executable path (v205+) Type: string Unit: N/A |
| OBJECT_GID | Target process GID (v205+) Type: int Unit: N/A |
| OBJECT_PID | Target process PID, requires UID 0 to set (v205+) Type: int Unit: N/A |
| OBJECT_SYSTEMD_CGROUP | Target cgroup path (v205+) Type: string Unit: N/A |
| OBJECT_SYSTEMD_INVOCATION_ID | Target invocation ID (v235+) Type: string Unit: N/A |
| OBJECT_SYSTEMD_OWNER_UID | Target session owner UID (v205+) Type: int Unit: N/A |
| OBJECT_SYSTEMD_SESSION | Target session ID (v205+) Type: string Unit: N/A |
| OBJECT_SYSTEMD_UNIT | Target unit name (v205+) Type: string Unit: N/A |
| OBJECT_SYSTEMD_USER_UNIT | Target user unit name (v205+) Type: string Unit: N/A |
| OBJECT_UID | Target process UID (v205+) Type: int Unit: N/A |
| SYSLOG_FACILITY | Syslog facility 0-23 (v188+) Type: int Unit: N/A |
| SYSLOG_PID | Client PID from syslog, may differ from _PID (v188+)Type: int Unit: N/A |
| SYSLOG_RAW | Original syslog line if MESSAGE modified or timestamp lost (v240+)Type: string Unit: N/A |
| SYSLOG_TIMESTAMP | Original syslog timestamp as received (v188+) Type: string Unit: N/A |
| TID | Thread ID numeric (v247+) Type: int Unit: N/A |
| UNIT | Unit name user-provided alternative to _SYSTEMD_UNIT (v251+)Type: string Unit: N/A |
| USER_INVOCATION_ID | User invocation ID for user manager messages (v245+) Type: string Unit: N/A |
| USER_UNIT | User unit user-provided alternative to _SYSTEMD_USER_UNIT (v251+)Type: string Unit: N/A |
| _AUDIT_LOGINUID | Login UID from kernel audit (v188+) Type: int Unit: N/A |
| _AUDIT_SESSION | Audit session ID from kernel (v188+) Type: int Unit: N/A |
| _BOOT_ID | Boot ID 128-bit hex UUID (v188+)Type: string Unit: N/A |
| _CAP_EFFECTIVE | Effective capabilities bitmask (v206+) Type: int Unit: N/A |
| _CMDLINE | Full command line, most complete process info (v188+) Type: string Unit: N/A |
| _COMM | Command name truncated to 15 chars (v188+) Type: string Unit: N/A |
| _CONTAINER_ID | Container ID for nspawn/containers (v205+) Type: string Unit: N/A |
| _CONTAINER_IMAGE | Container image for nspawn/containers (v205+) Type: string Unit: N/A |
| _CONTAINER_NAME | Container name for nspawn/containers (v205+) Type: string Unit: N/A |
| _EXE | Executable path, full path (v188+) Type: string Unit: N/A |
| _GID | Group ID, trusted (v188+) Type: int Unit: N/A |
| _KERNEL_DEVICE | Kernel device name format: bM:N, cM:N, nN, +subsys:name (v189+)Type: string Unit: N/A |
| _KERNEL_SUBSYSTEM | Kernel subsystem e.g. block, net (v189+)Type: string Unit: N/A |
| _LINE_BREAK | Line termination info: nul, line-max, eof, pid-change (v235+)Type: string Unit: N/A |
| _MACHINE_ID | Machine ID from /etc/machine-id (v188+)Type: string Unit: N/A |
| _NAMESPACE | Journal namespace ID (v245+) Type: string Unit: N/A |
| _RUNTIME_SCOPE | Runtime scope: initrd, system, or user (v252+)Type: string Unit: N/A |
| _SELINUX_CONTEXT | SELinux security context label (v188+) Type: string Unit: N/A |
| _SOURCE_BOOTTIME_TIMESTAMP | Boottime timestamp in microseconds CLOCK_BOOTTIME (v257+)Type: int Unit: time,μs |
| _SOURCE_REALTIME_TIMESTAMP | Source timestamp in microseconds CLOCK_REALTIME (v188+)Type: int Unit: time,μs |
| _STREAM_ID | Stream connection ID 128-bit UUID for stdout streams (v235+)Type: string Unit: N/A |
| _SYSTEMD_CGROUP | Control group path (v188+) Type: string Unit: N/A |
| _SYSTEMD_INVOCATION_ID | Unit invocation ID unique per unit start (v233+) Type: string Unit: N/A |
| _SYSTEMD_OWNER_UID | Session owner UID (v188+) Type: int Unit: N/A |
| _SYSTEMD_SESSION | Login session ID (v188+) Type: string Unit: N/A |
| _SYSTEMD_SLICE | Slice unit name e.g. system.slice (v188+)Type: string Unit: N/A |
| _SYSTEMD_UNIT | Unit name e.g. sshd.service (v188+)Type: string Unit: N/A |
| _SYSTEMD_USER_SLICE | User slice name e.g. user.slice (v188+)Type: string Unit: N/A |
| _SYSTEMD_USER_UNIT | User unit name for user sessions (v188+) Type: string Unit: N/A |
| _TRANSPORT | How entry was received: audit, driver, syslog, journal, stdout, kernel (v205+)Type: string Unit: N/A |
| _UDEV_DEVLINK | Symlinks to device, can appear multiple times (v189+) Type: string Unit: N/A |
| _UDEV_DEVNODE | Device node in /dev/ full path (v189+) Type: string Unit: N/A |
| _UDEV_SYSNAME | Device name in /sys/ (v189+) Type: string Unit: N/A |
| _UID | User ID, trusted cannot be spoofed (v188+) Type: int Unit: N/A |
| __CURSOR | Entry cursor, address field export only (v188+) Type: string Unit: N/A |
| __MONOTONIC_TIMESTAMP | Monotonic timestamp in microseconds, address field export only (v188+) Type: int Unit: time,μs |
| __REALTIME_TIMESTAMP | Reception timestamp in microseconds, address field export only (v188+) Type: int Unit: time,μs |
| __SEQNUM | Sequence number, address field export only (v254+) Type: int Unit: N/A |
| __SEQNUM_ID | Sequence ID, address field export only (v254+) Type: string Unit: N/A |
| journald_timestamp | Journal entry timestamp in nanoseconds (from _SOURCE_REALTIME_TIMESTAMP or __REALTIME_TIMESTAMP, v188+)Type: int Unit: time,ns |
| message | Log message content (from MESSAGE, v188+)Type: string Unit: N/A |
| pid | Process ID (from _PID or SYSLOG_PID, v188+)Type: int Unit: N/A |
| priority | Numeric priority level 0-7 (from PRIORITY, v188+)Type: int Unit: N/A |
| status | Log status level mapped from priority: error, warn, critical, notice, info, debug, unknownType: string Unit: N/A |
常见用例¶
- 收集特定服务的日志
[[inputs.journald]]
units = ["nginx.service", "mysql.service", "docker.service"]
priorities = ["err", "crit", "alert", "emerg"]
tail_only = true
- 排除冗余字段
[[inputs.journald]]
exclude_fields = [
"_BOOT_ID",
"_MACHINE_ID",
"__MONOTONIC_TIMESTAMP",
"_AUDIT_SESSION",
"_AUDIT_LOGINUID",
]
- Kubernetes 节点 journal 收集(自动模式)
说明:
- collector 会按配置顺序解析候选目录,并优先尝试打开第一个可读的 journal 目录
- 在容器或 Kubernetes 环境(
datakit.Docker || config.IsKubernetes())中,DataKit 会自动启用 journald rootfs 模式 - 在容器/Kubernetes 模式下,绝对路径会自动加上
mount_dir前缀(默认"/rootfs") - 如果路径本身是
<mount_dir>/var/log/journal这类 journal 根目录,collector 会自动下钻到 machine-id 子目录后再打开 -
在 kind、k3d 等容器化节点环境中,要在 node 容器内验证
logger/journalctl,不要在宿主机直接验证 -
Kubernetes 节点 journal 收集,并在启动前准备宿主机 systemd 相关库
[[inputs.journald]]
mount_dir = "/rootfs"
paths = ["/var/log/journal", "/run/log/journal"]
tail_only = true
copy_node_libs = true
copy_node_libs_files = [
"libsystemd.so*",
"liblz4.so*",
"libzstd.so*",
"liblzma.so*",
"libcap.so*",
"libgcrypt.so*",
"libgpg-error.so*",
"libselinux.so*",
"libmount.so*",
"libblkid.so*",
"libacl.so*",
"libpcre2-8.so*",
"libpcre.so*",
]
- 收集所有日志(调试)
故障排除¶
权限错误¶
确保 DataKit 有 journal 文件的读取权限:
# 将 datakit 用户添加到 systemd-journal 组
sudo usermod -aG systemd-journal datakit
# 重启 DataKit
sudo systemctl restart datakit
未收集到日志¶
- 验证 journald 是否正在运行:
- 检查 journal 文件是否存在:
- 如果当前环境安装了
journalctl,可继续使用它做额外验证;如果容器里没有journalctl,直接查看 DataKit 的兼容性告警和 probe 结果即可:
如果启动日志中出现 reason=unsupported-format,说明当前 collector 运行时使用的 libsystemd 版本低于目标 journal 文件格式。此时 DataKit 会记录告警并让 journald collector 保持 inactive,而不是继续输出部分或具有误导性的采集结果。
这种情况并不只出现在 EKS。在 Kubernetes 中,只要 DataKit 需要采集 node 上的 journal,而容器镜像自带的 libsystemd 版本低于宿主机 journal 文件格式所需版本,就可能出现这个问题。典型现象包括:
- 如果 Pod 内安装了
journalctl,执行后可能报unsupported feature - DataKit 已启动,但 journald collector 在兼容性告警后保持 inactive
在容器或 Kubernetes 环境(datakit.Docker || config.IsKubernetes())中,DataKit 已经自动启用宿主机 systemd 相关动态库准备能力;如果你希望在非容器场景也启用,可配置:
启用后,DataKit 会在启动 collector 前,从 mount_dir(默认 "/rootfs")下的候选系统库目录复制动态库到自己的 external-libs 目录,并自动把该目录前置到 LD_LIBRARY_PATH。
复制行为细节:
- 如果
copy_node_libs_files已配置且非空,则只复制该列表。 - 如果容器/Kubernetes 自动模式下
copy_node_libs_files为空,DataKit 会先复制libsystemd.so*,然后在复制目录下对libsystemd.so.0做ldd依赖探测,并自动补齐缺失.so。 - 如果非容器且非 Kubernetes 且
copy_node_libs=true且copy_node_libs_files为空,DataKit 会报配置错误并保持 collector inactive。 - 启用
copy_node_libs后如果库准备失败,journald 采集器会保持 inactive(不影响 DataKit 其他采集器)。
collector 成功打开 journal 后,会在 external journald.log 中打印类似如下日志,帮助确认运行时到底加载了哪一套 libsystemd:
约束说明:
- 宿主机上的
libsystemd并不保证一定兼容 DataKit 当前使用的 journald external binary - 如果宿主机上的
libsystemd版本过低,external binary 可能在动态链接阶段就因为符号或版本不匹配而无法启动 - 如果宿主机上的
libsystemd版本更高,则也可能在读取 journal 文件时出现unsupported feature - 因此,
copy_node_libs只是一个前置准备能力,不代表复制后的库一定兼容;最终仍需结合启动日志与 probe 结果判断
不要把整个宿主机 /usr/lib64 直接加入 LD_LIBRARY_PATH。这样可能把不兼容的 glibc 组件一并带入 collector 进程,导致更难诊断的问题。
如果启动日志显示:
说明 collector 当前是按目录方式打开 journal,这也是当前 live journal 的推荐路径;不要手动把单个 .journal 文件路径作为主要配置方式。
游标文件问题¶
如果游标文件损坏(例如主机重启后),采集器会自动回退到 tail 模式并创建新游标。要手动重置:
高内存使用¶
默认批次大小为 1000 个条目。如果内存使用是问题,可以减少批次大小: