跳转至

Journald


Journald 采集器用于在 Linux 系统上从 systemd journal (journald) 收集日志。它使用外部二进制包装器与 libsystemd 交互,高效地从 journal 收集结构化日志条目。

前置条件

  • 仅限 Linux: 需要 systemdjournald
  • libsystemd: 外部二进制需要 libsystemd 开发库
  • 权限: DataKit 需要 journal 文件的读取权限(通常需要加入 systemd-journal 组)

系统要求检查

部署 journald 采集器之前,验证您的系统是否满足要求:

可通过如下命令快速检查:

systemctl --version >/dev/null 2>&1 && journalctl -n 1 >/dev/null 2>&1 && echo "Systemd OK" || echo "Systemd not available"

以下是全面的预检查脚本:

journald-prereq-check.sh
#!/bin/bash
# journald-prereq-check.sh - 验证 systemd 要求

echo "=== Journald 采集器前置条件检查 ==="
echo

# 1. 检查 systemctl 是否存在
echo -n "1. systemctl 命令:"
if command -v systemctl >/dev/null 2>&1; then
    VERSION=$(systemctl --version | head -1)
    echo "✅ 已找到 - $VERSION"
else
    echo "❌ 未找到 - 未安装 systemctl"
    exit 1
fi

# 2. 检查 libsystemd 库
echo -n "2. libsystemd.so.0: "
if ldconfig -p 2>/dev/null | grep -q "libsystemd.so.0"; then
    LIBPATH=$(ldconfig -p 2>/dev/null | grep "libsystemd.so.0" | head -1 | awk '{print $NF}')
    echo "✅ 已找到 - $LIBPATH"
else
    echo "❌ 未找到 - 缺少 libsystemd.so.0"
    exit 1
fi

# 3. 检查 journalctl 访问权限
echo -n "3. journalctl 访问:"
if journalctl -n 1 >/dev/null 2>&1; then
    echo "✅ 正常 - 可以读取 journal"
else
    echo "⚠️  受限 - journalctl 存在但无读取权限"
fi

# 4. 检查 journal 目录
echo "4. Journal 目录:"
for dir in "/var/log/journal" "/run/log/journal"; do
    echo -n "   $dir: "
    if [ -d "$dir" ]; then
        if [ -r "$dir" ]; then
            echo "✅ 存在且可读"
        else
            echo "⚠️  存在但不可读"
        fi
    else
        echo "❌ 未找到"
    fi
done

# 5. 检查 systemd 版本
echo -n "5. systemd 版本:"
SYSTEMD_VERSION=$(systemctl --version | head -1 | grep -oP 'systemd \K\d+' || echo "0")
if [ "$SYSTEMD_VERSION" -ge 205 ]; then
    echo "✅ v$SYSTEMD_VERSION (满足最低 v205 要求)"
else
    echo "⚠️  v$SYSTEMD_VERSION (低于推荐版本 v205)"
fi

echo
echo "=== 检查完成 ==="

保存为 journald-prereq-check.sh 并运行:

chmod +x journald-prereq-check.sh
./journald-prereq-check.sh

预期输出:

=== Journald 采集器前置条件检查 ===

1. systemctl 命令:✅ 已找到 - systemd 257 (257.3-1-arch)
2. libsystemd.so.0: ✅ 已找到 - /usr/lib/libsystemd.so.0
3. journalctl 访问:✅ 正常 - 可以读取 journal
4. Journal 目录:
   /var/log/journal: ✅ 存在且可读
   /run/log/journal: ✅ 存在且可读
5. systemd 版本:✅ v257 (满足最低 v205 要求)

=== 检查完成 ===

可能的故障排除方案:

问题 解决方案
systemctl: command not found 安装 systemd 或使用替代日志收集方式
libsystemd.so.0: cannot open 安装 systemd-libs:apt install libsystemd0yum install systemd-libs
journalctl: no read access 将用户添加到 systemd-journal 组:usermod -aG systemd-journal $USER
/var/log/journal not found 启用持久化 journal:mkdir -p /var/log/journal && systemd-tmpfiles --create

配置

采集器配置

成功安装并启动 DataKit 后,通过复制配置文件启用 Journald 采集器:

进入 DataKit 安装目录下的 conf.d/samples 目录,复制 journald.conf.sample 并命名为 journald.conf。示例如下:

# Collect systemd journal logs using external binary
[[inputs.journald]]
  ## Name of the collector
  name = 'journald'

  ## Run as daemon (required for journald collection)
  daemon = true

  http_endpoint = "http://localhost:9529"
  log_level = "info"
  log_path = "/usr/local/datakit/externals/journald.log"

  ## Path to datakit-journald binary
  ## Default: searches in /usr/local/datakit/externals/datakit-journald and ./externals/datakit-journald
  # cmd = "/usr/local/datakit/externals/datakit-journald"

  ## Interval to check external process (for non-daemon mode)
  # interval = "10s"

  ## Rootfs mount point for container/Kubernetes mode only
  ## DataKit uses this as the host root prefix when auto-prefixing absolute paths
  ## and preparing host-side systemd libraries (copy_node_libs).
  mount_dir = "/rootfs"

  ## Journal directory paths
  ## Host installation: use default paths
  ## Container/Kubernetes: DataKit auto-prefixes absolute paths with mount_dir.
  paths = [
    "/var/log/journal",      # Persistent storage
    "/run/log/journal",      # Runtime storage
  ]

  ## Filter by systemd unit names (supports glob patterns)
  ## Empty = all units
  # units = ["*.service", "docker.service", "kubelet.service"]

  ## Filter by priority levels
  ## Levels: emerg(0), alert(1), crit(2), err(3), warning(4), notice(5), info(6), debug(7)
  ## Empty = all priorities
  # priorities = ["err", "warning", "crit", "alert", "emerg"]

  ## Field selection - collect all by default, exclude specific fields
  exclude_fields = [
    "_BOOT_ID",
    "_MACHINE_ID",
    "__MONOTONIC_TIMESTAMP",
  ]

  ## Collection behavior
  ## tail_only=true: Only collect new entries (cursor not needed)
  ## tail_only=false: Read from last position (cursor required)
  tail_only = true
  max_entries_per_batch = 1000

  ## Cursor management (only used when tail_only=false)
  # save_cursor = true
  # cursor_file = "/usr/local/datakit/cache/journald.cursor"

  ## Environment variables for external binary
  # envs = [
  #   "LD_LIBRARY_PATH=/usr/local/datakit/externals:$LD_LIBRARY_PATH",
  # ]

  ## Host-side systemd library prepare:
  ## - Container/Kubernetes (Docker or Kubernetes): auto forced to true.
  ## - Non-container host: disabled by default. If enabled manually, set copy_node_libs_files explicitly.
  ## - In container/kubernetes mode, when copy_node_libs_files is empty, DataKit first copies
  ##   libsystemd.so* then runs "LD_LIBRARY_PATH=<dst> ldd libsystemd.so.0"
  ##   style dependency probing and copies missing .so files automatically.
  # copy_node_libs = true
  ## Optional override file list. If set, only these patterns/files are copied.
  # copy_node_libs_files = [
  #   "libsystemd.so*",
  #   "liblz4.so*",
  #   "libzstd.so*",
  #   "liblzma.so*",
  #   "libcap.so*",
  #   "libgcrypt.so*",
  #   "libgpg-error.so*",
  #   "libselinux.so*",
  #   "libmount.so*",
  #   "libblkid.so*",
  #   "libacl.so*",
  #   "libpcre2-8.so*",
  #   "libpcre.so*",
  # ]

  ## Additional arguments for external binary
  # args = []

  [inputs.journald.tags]
    # Add custom tags as needed
    # environment = "production"
    # cluster = "k8s-cluster-1"

配置完成后,重启 DataKit

配置选项

选项 类型 默认值 描述
paths []string ["/var/log/journal", "/run/log/journal"] Journal 目录路径
units []string [] 按 systemd 单元名称过滤(支持 glob 模式,例如 *.service
priorities []string [] 按优先级过滤:emergalertcriterrwarningnoticeinfodebug
exclude_fields []string [] 从收集中排除的 journal 字段(例如 _BOOT_ID_MACHINE_ID
tail_only bool true 仅收集新条目(启动时跳过历史日志)
max_entries_per_batch int 1000 每批收集的最大条目数
save_cursor bool true 持久化读取位置以便重启后恢复
cursor_file string /usr/local/datakit/cache/journald.pos 存储游标位置的路径
mount_dir string "/rootfs" 仅在容器/Kubernetes 模式使用的 rootfs 挂载目录。DataKit 会用它作为绝对 paths 的前缀,以及宿主机动态库准备的源目录根路径
copy_node_libs bool false(容器或 Kubernetes 下自动强制为 true 启动 external collector 前,是否从 mount_dir 复制宿主机动态库到 DataKit 自己的 external-libs 目录。在容器或 Kubernetes 环境(datakit.Docker || config.IsKubernetes())中会自动启用
copy_node_libs_files []string [] 要复制的动态库文件名或 glob 列表。若显式配置则只复制该列表。若容器/Kubernetes 自动模式下为空,DataKit 会先复制 libsystemd.so*,再执行 LD_LIBRARY_PATH=/usr/local/datakit/externals/systemd-libs ldd libsystemd.so.0 风格依赖探测并自动补齐缺失 .so。若在非容器/Kubernetes 且 copy_node_libs=true 时留空,会以配置错误方式启动失败

日志字段

journald

Systemd 日志。注意:字段的可用性因 systemd 版本而异 - 请参阅每个字段描述中的版本提示(例如 v188+、v205+)

Tags & Fields Description
host
(tag)
Hostname (from _HOSTNAME, v188+)
service
(tag)
Service identifier (from SYSLOG_IDENTIFIER, _SYSTEMD_UNIT, or _COMM)
CODE_FILE Source code filename for debugging (v188+)
Type: string
Unit: N/A
CODE_FUNC Function name for debugging (v188+)
Type: string
Unit: N/A
CODE_LINE Source code line number for debugging (v188+)
Type: int
Unit: N/A
COREDUMP_CMDLINE Full command line at crash time (v188+)
Type: string
Unit: N/A
COREDUMP_CWD Current working directory at crash time (v188+)
Type: string
Unit: N/A
COREDUMP_EXE Executable path of crashed binary (v188+)
Type: string
Unit: N/A
COREDUMP_GID Crashed process GID (v188+)
Type: int
Unit: N/A
COREDUMP_HOSTNAME Hostname at crash time (v188+)
Type: string
Unit: N/A
COREDUMP_PID Crashed process PID (v188+)
Type: int
Unit: N/A
COREDUMP_ROOT Root directory, usually / (v188+)
Type: string
Unit: N/A
COREDUMP_SIGNAL Signal number that caused crash (v188+)
Type: int
Unit: N/A
COREDUMP_STACKTRACE Full stack trace backtrace (v188+)
Type: string
Unit: N/A
COREDUMP_TIMESTAMP Crash timestamp in microseconds (v188+)
Type: int
Unit: time,μs
COREDUMP_UID Crashed process UID (v188+)
Type: int
Unit: N/A
COREDUMP_UNIT System unit that crashed (v198+)
Type: string
Unit: N/A
COREDUMP_USER_UNIT User unit that crashed (v198+)
Type: string
Unit: N/A
DOCUMENTATION Documentation URL http/https/file/man/info (v246+)
Type: string
Unit: N/A
ERRNO Unix error number associated with message (v188+)
Type: int
Unit: N/A
INVOCATION_ID Invocation ID for systemd code messages (v245+)
Type: string
Unit: N/A
MESSAGE_ID 128-bit message identifier (UUID format, v188+)
Type: string
Unit: N/A
OBJECT_AUDIT_LOGINUID Target login UID (v205+)
Type: int
Unit: N/A
OBJECT_AUDIT_SESSION Target audit session ID (v205+)
Type: int
Unit: N/A
OBJECT_CMDLINE Target process full command line (v205+)
Type: string
Unit: N/A
OBJECT_COMM Target process comm (v205+)
Type: string
Unit: N/A
OBJECT_EXE Target process executable path (v205+)
Type: string
Unit: N/A
OBJECT_GID Target process GID (v205+)
Type: int
Unit: N/A
OBJECT_PID Target process PID, requires UID 0 to set (v205+)
Type: int
Unit: N/A
OBJECT_SYSTEMD_CGROUP Target cgroup path (v205+)
Type: string
Unit: N/A
OBJECT_SYSTEMD_INVOCATION_ID Target invocation ID (v235+)
Type: string
Unit: N/A
OBJECT_SYSTEMD_OWNER_UID Target session owner UID (v205+)
Type: int
Unit: N/A
OBJECT_SYSTEMD_SESSION Target session ID (v205+)
Type: string
Unit: N/A
OBJECT_SYSTEMD_UNIT Target unit name (v205+)
Type: string
Unit: N/A
OBJECT_SYSTEMD_USER_UNIT Target user unit name (v205+)
Type: string
Unit: N/A
OBJECT_UID Target process UID (v205+)
Type: int
Unit: N/A
SYSLOG_FACILITY Syslog facility 0-23 (v188+)
Type: int
Unit: N/A
SYSLOG_PID Client PID from syslog, may differ from _PID (v188+)
Type: int
Unit: N/A
SYSLOG_RAW Original syslog line if MESSAGE modified or timestamp lost (v240+)
Type: string
Unit: N/A
SYSLOG_TIMESTAMP Original syslog timestamp as received (v188+)
Type: string
Unit: N/A
TID Thread ID numeric (v247+)
Type: int
Unit: N/A
UNIT Unit name user-provided alternative to _SYSTEMD_UNIT (v251+)
Type: string
Unit: N/A
USER_INVOCATION_ID User invocation ID for user manager messages (v245+)
Type: string
Unit: N/A
USER_UNIT User unit user-provided alternative to _SYSTEMD_USER_UNIT (v251+)
Type: string
Unit: N/A
_AUDIT_LOGINUID Login UID from kernel audit (v188+)
Type: int
Unit: N/A
_AUDIT_SESSION Audit session ID from kernel (v188+)
Type: int
Unit: N/A
_BOOT_ID Boot ID 128-bit hex UUID (v188+)
Type: string
Unit: N/A
_CAP_EFFECTIVE Effective capabilities bitmask (v206+)
Type: int
Unit: N/A
_CMDLINE Full command line, most complete process info (v188+)
Type: string
Unit: N/A
_COMM Command name truncated to 15 chars (v188+)
Type: string
Unit: N/A
_CONTAINER_ID Container ID for nspawn/containers (v205+)
Type: string
Unit: N/A
_CONTAINER_IMAGE Container image for nspawn/containers (v205+)
Type: string
Unit: N/A
_CONTAINER_NAME Container name for nspawn/containers (v205+)
Type: string
Unit: N/A
_EXE Executable path, full path (v188+)
Type: string
Unit: N/A
_GID Group ID, trusted (v188+)
Type: int
Unit: N/A
_KERNEL_DEVICE Kernel device name format: bM:N, cM:N, nN, +subsys:name (v189+)
Type: string
Unit: N/A
_KERNEL_SUBSYSTEM Kernel subsystem e.g. block, net (v189+)
Type: string
Unit: N/A
_LINE_BREAK Line termination info: nul, line-max, eof, pid-change (v235+)
Type: string
Unit: N/A
_MACHINE_ID Machine ID from /etc/machine-id (v188+)
Type: string
Unit: N/A
_NAMESPACE Journal namespace ID (v245+)
Type: string
Unit: N/A
_RUNTIME_SCOPE Runtime scope: initrd, system, or user (v252+)
Type: string
Unit: N/A
_SELINUX_CONTEXT SELinux security context label (v188+)
Type: string
Unit: N/A
_SOURCE_BOOTTIME_TIMESTAMP Boottime timestamp in microseconds CLOCK_BOOTTIME (v257+)
Type: int
Unit: time,μs
_SOURCE_REALTIME_TIMESTAMP Source timestamp in microseconds CLOCK_REALTIME (v188+)
Type: int
Unit: time,μs
_STREAM_ID Stream connection ID 128-bit UUID for stdout streams (v235+)
Type: string
Unit: N/A
_SYSTEMD_CGROUP Control group path (v188+)
Type: string
Unit: N/A
_SYSTEMD_INVOCATION_ID Unit invocation ID unique per unit start (v233+)
Type: string
Unit: N/A
_SYSTEMD_OWNER_UID Session owner UID (v188+)
Type: int
Unit: N/A
_SYSTEMD_SESSION Login session ID (v188+)
Type: string
Unit: N/A
_SYSTEMD_SLICE Slice unit name e.g. system.slice (v188+)
Type: string
Unit: N/A
_SYSTEMD_UNIT Unit name e.g. sshd.service (v188+)
Type: string
Unit: N/A
_SYSTEMD_USER_SLICE User slice name e.g. user.slice (v188+)
Type: string
Unit: N/A
_SYSTEMD_USER_UNIT User unit name for user sessions (v188+)
Type: string
Unit: N/A
_TRANSPORT How entry was received: audit, driver, syslog, journal, stdout, kernel (v205+)
Type: string
Unit: N/A
_UDEV_DEVLINK Symlinks to device, can appear multiple times (v189+)
Type: string
Unit: N/A
_UDEV_DEVNODE Device node in /dev/ full path (v189+)
Type: string
Unit: N/A
_UDEV_SYSNAME Device name in /sys/ (v189+)
Type: string
Unit: N/A
_UID User ID, trusted cannot be spoofed (v188+)
Type: int
Unit: N/A
__CURSOR Entry cursor, address field export only (v188+)
Type: string
Unit: N/A
__MONOTONIC_TIMESTAMP Monotonic timestamp in microseconds, address field export only (v188+)
Type: int
Unit: time,μs
__REALTIME_TIMESTAMP Reception timestamp in microseconds, address field export only (v188+)
Type: int
Unit: time,μs
__SEQNUM Sequence number, address field export only (v254+)
Type: int
Unit: N/A
__SEQNUM_ID Sequence ID, address field export only (v254+)
Type: string
Unit: N/A
journald_timestamp Journal entry timestamp in nanoseconds (from _SOURCE_REALTIME_TIMESTAMP or __REALTIME_TIMESTAMP, v188+)
Type: int
Unit: time,ns
message Log message content (from MESSAGE, v188+)
Type: string
Unit: N/A
pid Process ID (from _PID or SYSLOG_PID, v188+)
Type: int
Unit: N/A
priority Numeric priority level 0-7 (from PRIORITY, v188+)
Type: int
Unit: N/A
status Log status level mapped from priority: error, warn, critical, notice, info, debug, unknown
Type: string
Unit: N/A

常见用例

  • 收集特定服务的日志
[[inputs.journald]]
  units = ["nginx.service", "mysql.service", "docker.service"]
  priorities = ["err", "crit", "alert", "emerg"]
  tail_only = true
  • 排除冗余字段
[[inputs.journald]]
  exclude_fields = [
    "_BOOT_ID",
    "_MACHINE_ID",
    "__MONOTONIC_TIMESTAMP",
    "_AUDIT_SESSION",
    "_AUDIT_LOGINUID",
  ]
  • Kubernetes 节点 journal 收集(自动模式)
[[inputs.journald]]
  paths = ["/var/log/journal", "/run/log/journal"]
  tail_only = true

说明:

  • collector 会按配置顺序解析候选目录,并优先尝试打开第一个可读的 journal 目录
  • 在容器或 Kubernetes 环境(datakit.Docker || config.IsKubernetes())中,DataKit 会自动启用 journald rootfs 模式
  • 在容器/Kubernetes 模式下,绝对路径会自动加上 mount_dir 前缀(默认 "/rootfs"
  • 如果路径本身是 <mount_dir>/var/log/journal 这类 journal 根目录,collector 会自动下钻到 machine-id 子目录后再打开
  • 在 kind、k3d 等容器化节点环境中,要在 node 容器内验证 logger/journalctl,不要在宿主机直接验证

  • Kubernetes 节点 journal 收集,并在启动前准备宿主机 systemd 相关库

[[inputs.journald]]
  mount_dir = "/rootfs"
  paths = ["/var/log/journal", "/run/log/journal"]
  tail_only = true
  copy_node_libs = true
  copy_node_libs_files = [
    "libsystemd.so*",
    "liblz4.so*",
    "libzstd.so*",
    "liblzma.so*",
    "libcap.so*",
    "libgcrypt.so*",
    "libgpg-error.so*",
    "libselinux.so*",
    "libmount.so*",
    "libblkid.so*",
    "libacl.so*",
    "libpcre2-8.so*",
    "libpcre.so*",
  ]
  • 收集所有日志(调试)
[[inputs.journald]]
  tail_only = false
  max_entries_per_batch = 500
  exclude_fields = []

故障排除

权限错误

确保 DataKit 有 journal 文件的读取权限:

# 将 datakit 用户添加到 systemd-journal 组
sudo usermod -aG systemd-journal datakit

# 重启 DataKit
sudo systemctl restart datakit

未收集到日志

  1. 验证 journald 是否正在运行:
systemctl status systemd-journald
  1. 检查 journal 文件是否存在:
ls -la /var/log/journal/
ls -la /run/log/journal/
  1. 如果当前环境安装了 journalctl,可继续使用它做额外验证;如果容器里没有 journalctl,直接查看 DataKit 的兼容性告警和 probe 结果即可:
journalctl -n 10

如果启动日志中出现 reason=unsupported-format,说明当前 collector 运行时使用的 libsystemd 版本低于目标 journal 文件格式。此时 DataKit 会记录告警并让 journald collector 保持 inactive,而不是继续输出部分或具有误导性的采集结果。

这种情况并不只出现在 EKS。在 Kubernetes 中,只要 DataKit 需要采集 node 上的 journal,而容器镜像自带的 libsystemd 版本低于宿主机 journal 文件格式所需版本,就可能出现这个问题。典型现象包括:

  • 如果 Pod 内安装了 journalctl,执行后可能报 unsupported feature
  • DataKit 已启动,但 journald collector 在兼容性告警后保持 inactive

在容器或 Kubernetes 环境(datakit.Docker || config.IsKubernetes())中,DataKit 已经自动启用宿主机 systemd 相关动态库准备能力;如果你希望在非容器场景也启用,可配置:

[[inputs.journald]]
  copy_node_libs = true

启用后,DataKit 会在启动 collector 前,从 mount_dir(默认 "/rootfs")下的候选系统库目录复制动态库到自己的 external-libs 目录,并自动把该目录前置到 LD_LIBRARY_PATH

复制行为细节:

  • 如果 copy_node_libs_files 已配置且非空,则只复制该列表。
  • 如果容器/Kubernetes 自动模式下 copy_node_libs_files 为空,DataKit 会先复制 libsystemd.so*,然后在复制目录下对 libsystemd.so.0ldd 依赖探测,并自动补齐缺失 .so
  • 如果非容器且非 Kubernetes 且 copy_node_libs=truecopy_node_libs_files 为空,DataKit 会报配置错误并保持 collector inactive。
  • 启用 copy_node_libs 后如果库准备失败,journald 采集器会保持 inactive(不影响 DataKit 其他采集器)。

collector 成功打开 journal 后,会在 external journald.log 中打印类似如下日志,帮助确认运行时到底加载了哪一套 libsystemd

loaded libsystemd paths: [/usr/local/datakit/externals/systemd-libs/libsystemd.so.0.35.0]

约束说明:

  • 宿主机上的 libsystemd 并不保证一定兼容 DataKit 当前使用的 journald external binary
  • 如果宿主机上的 libsystemd 版本过低,external binary 可能在动态链接阶段就因为符号或版本不匹配而无法启动
  • 如果宿主机上的 libsystemd 版本更高,则也可能在读取 journal 文件时出现 unsupported feature
  • 因此,copy_node_libs 只是一个前置准备能力,不代表复制后的库一定兼容;最终仍需结合启动日志与 probe 结果判断

不要把整个宿主机 /usr/lib64 直接加入 LD_LIBRARY_PATH。这样可能把不兼容的 glibc 组件一并带入 collector 进程,导致更难诊断的问题。

如果启动日志显示:

resolved journal directory: target=...
opening journal from directory: ...

说明 collector 当前是按目录方式打开 journal,这也是当前 live journal 的推荐路径;不要手动把单个 .journal 文件路径作为主要配置方式。

游标文件问题

如果游标文件损坏(例如主机重启后),采集器会自动回退到 tail 模式并创建新游标。要手动重置:

# 删除游标文件
rm /usr/local/datakit/cache/journald.pos

# 重启 DataKit
sudo systemctl restart datakit

高内存使用

默认批次大小为 1000 个条目。如果内存使用是问题,可以减少批次大小:

[[inputs.journald]]
  max_entries_per_batch = 100

文档评价

文档内容是否对您有帮助? ×