跳转至

进程


进程采集器可以对系统中各种运行的进程进行实施监控, 获取、分析进程运行时各项指标,包括内存使用率、占用 CPU 时间、进程当前状态、进程监听的端口等,并根据进程运行时的各项指标信息,用户可以在观测云中配置相关告警,使用户了解进程的状态,在进程发生故障时,可以及时对发生故障的进程进行维护。

Warning

进程采集器(不管是对象还是指标),在 macOS 上可能消耗比较大,导致 CPU 飙升,可以手动将其关闭。目前默认采集器仍然开启进程对象采集器(默认 5min 运行一次)。

配置

前置条件

  • 进程采集器默认不采集进程指标数据,如需采集指标相关数据,可在 host_processes.conf 中 将 open_metric 设置为 true。比如:
[[inputs.host_processes]]
    ...
    open_metric = true

采集器配置

进入 DataKit 安装目录下的 conf.d/host 目录,复制 host_processes.conf.sample 并命名为 host_processes.conf。示例如下:

[[inputs.host_processes]]
  # Only collect these matched process' metrics. For process objects
  # these white list not applied. Process name support regexp.
  # process_name = [".*nginx.*", ".*mysql.*"]

  # Process minimal run time(default 10m)
  # If process running time less than the setting, we ignore it(both for metric and object)
  min_run_time = "10m"

  ## Enable process metric collecting
  open_metric = false

  ## Enable listen ports tag, default is false
  enable_listen_ports = false

  ## Enable open files field, default is false
  enable_open_files = false

  ## only collect container-based process(object and metric)
  only_container_processes = false

  # Extra tags
  [inputs.host_processes.tags]
  # some_tag = "some_value"
  # more_tag = "some_other_value"
  # ...

配置好后,重启 DataKit 即可。

可通过 ConfigMap 方式注入采集器配置配置 ENV_DATAKIT_INPUTS 开启采集器。

也支持以环境变量的方式修改配置参数(需要在 ENV_DEFAULT_ENABLED_INPUTS 中加为默认采集器):

  • ENV_INPUT_HOST_PROCESSES_OPEN_METRIC

    开启进程指标采集

    字段类型: Boolean

    采集器配置字段: open_metric

    默认值: false

  • ENV_INPUT_HOST_PROCESSES_PROCESS_NAME

    进程名白名单

    字段类型: List

    采集器配置字段: process_name

    示例: .*datakit.*,guance

  • ENV_INPUT_HOST_PROCESSES_MIN_RUN_TIME

    进程最短运行时间

    字段类型: Duration

    采集器配置字段: min_run_time

    默认值: 10m

  • ENV_INPUT_HOST_PROCESSES_ENABLE_LISTEN_PORTS

    启用监听端口标签

    字段类型: Boolean

    采集器配置字段: enable_listen_ports

    默认值: false

  • ENV_INPUT_HOST_PROCESSES_TAGS

    自定义标签。如果配置文件有同名标签,将会覆盖它

    字段类型: Map

    采集器配置字段: tags

    示例: tag1=value1,tag2=value2

  • ENV_INPUT_HOST_PROCESSES_ONLY_CONTAINER_PROCESSES

    只采集容器进程的指标和对象

    字段类型: Boolean

    采集器配置字段: only_container_processes

    默认值: false

  • ENV_INPUT_HOST_PROCESSES_METRIC_INTERVAL

    指标采集间隔

    字段类型: Duration

    采集器配置字段: metric_interval

    默认值: 30s

  • ENV_INPUT_HOST_PROCESSES_object_interval

    对象采集间隔

    字段类型: Duration

    采集器配置字段: object_interval

    默认值: 300s

指标

以下所有数据采集,默认会追加名为 host 的全局 tag(tag 值为 DataKit 所在主机名),也可以在配置中通过 [inputs.host_processes.tags] 指定其它标签:

 [inputs.host_processes.tags]
  # some_tag = "some_value"
  # more_tag = "some_other_value"
  # ...

host_processes

Collect process metrics, including CPU/memory usage, etc.

  • 标签
Tag Description
container_id Container ID of the process, only supported Linux
host Host name
pid Process ID
process_name Process name
username Username
  • 字段列表
Metric Description
cpu_usage CPU usage, the percentage of CPU occupied by the process since it was started. This value will be more stable (different from the instantaneous percentage of top)
Type: float
Unit: percent,percent
cpu_usage_top CPU usage, the average CPU usage of the process within a collection cycle
Type: float
Unit: percent,percent
mem_used_percent Memory usage percentage
Type: float
Unit: percent,percent
nonvoluntary_ctxt_switches From /proc/[PID]/status. Context switches that nonvoluntary drop the CPU. Linux only
Type: int
Unit: count
open_files Number of open files (Linux only)
Type: int
Unit: count
page_children_major_faults Linux from /proc/[PID]/stat. The number of major page faults for this process. Linux only
Type: int
Unit: digital,B
page_children_minor_faults Linux from /proc/[PID]/stat. The number of minor page faults for this process. Linux only
Type: int
Unit: digital,B
page_major_faults Linux from /proc/[PID]/stat. The number of major page faults. Linux only
Type: int
Unit: digital,B
page_minor_faults Linux from /proc/[PID]/stat. The number of minor page faults. Linux only
Type: int
Unit: digital,B
proc_read_bytes Linux from /proc/[PID]/io, Windows from GetProcessIoCounters(). Read bytes from disk
Type: int
Unit: digital,B
proc_syscr Linux from /proc/[PID]/io, Windows from GetProcessIoCounters(). Count of read() like syscall`. Linux&Windows only
Type: int
Unit: count
proc_syscw Linux from /proc/[PID]/io, Windows from GetProcessIoCounters(). Count of write() like syscall`. Linux&Windows only
Type: int
Unit: count
proc_write_bytes Linux from /proc/[PID]/io, Windows from GetProcessIoCounters(). Written bytes to disk
Type: int
Unit: digital,B
rss Resident Set Size
Type: int
Unit: digital,B
threads Total number of threads
Type: int
Unit: count
vms Virtual memory size
Type: int
Unit: digital,B
voluntary_ctxt_switches From /proc/[PID]/status. Context switches that voluntary drop the CPU, such as sleep()/read()/sched_yield(). Linux only
Type: int
Unit: count

对象

host_processes

Collect data on process objects, including process names, process commands, etc.

  • 标签
Tag Description
container_id Container ID of the process if the process is running in container, Linux only
host Host name
name Process object name field, consisting of [host-name]_[pid]
process_name Process name
state Process status. Linux only
username Username
  • 字段列表
Metric Description
cmdline Command line parameters for the process
Type: string
Unit: -
cpu_usage CPU usage, the percentage of CPU occupied by the process since it was started. This value will be more stable (different from the instantaneous percentage of top)
Type: float
Unit: percent,percent
cpu_usage_top CPU usage, the average CPU usage of the process within a collection cycle
Type: float
Unit: percent,percent
listen_ports The port the process is listening on
Type: string
Unit: -
mem_used_percent Memory usage percentage
Type: float
Unit: percent,percent
message Process details
Type: string
Unit: -
nonvoluntary_ctxt_switches From /proc/[PID]/status. Context switches that nonvoluntary drop the CPU. Linux only
Type: int
Unit: count
open_files Number of open files (only supports Linux, and the enable_open_files option needs to be turned on)
Type: int
Unit: count
page_children_major_faults Linux from /proc/[PID]/stat. The number of major page faults of it's child processes. Linux only
Type: int
Unit: digital,B
page_children_minor_faults Linux from /proc/[PID]/stat. The number of minor page faults of it's child processes. Linux only
Type: int
Unit: digital,B
page_major_faults Linux from /proc/[PID]/stat. The number of major page faults. Linux only
Type: int
Unit: digital,B
page_minor_faults Linux from /proc/[PID]/stat. The number of minor page faults. Linux only
Type: int
Unit: digital,B
pid Process ID
Type: int
Unit: -
proc_read_bytes Linux from /proc/[PID]/io, Windows from GetProcessIoCounters(). Read bytes from disk
Type: int
Unit: digital,B
proc_syscr Linux from /proc/[PID]/io, Windows from GetProcessIoCounters(). Count of read() like syscall`. Linux&Windows only
Type: int
Unit: count
proc_syscw Linux from /proc/[PID]/io, Windows from GetProcessIoCounters(). Count of write() like syscall`. Linux&Windows only
Type: int
Unit: count
proc_write_bytes Linux from /proc/[PID]/io, Windows from GetProcessIoCounters(). Written bytes to disk
Type: int
Unit: digital,B
rss Resident set size
Type: int
Unit: digital,B
start_time process start time
Type: int
Unit: timeStamp,msec
started_duration Process startup time
Type: int
Unit: timeStamp,sec
state_zombie Whether it is a zombie process
Type: bool
Unit: -
threads Total number of threads
Type: int
Unit: count
vms Virtual memory size
Type: int
Unit: digital,B
voluntary_ctxt_switches From /proc/[PID]/status. Context switches that voluntary drop the CPU, such as sleep()/read()/sched_yield(). Linux only
Type: int
Unit: count
work_directory Working directory (Linux only)
Type: string
Unit: -

文档评价

文档内容是否对您有帮助? ×