进程
进程采集器可以对系统中各种运行的进程进行实施监控, 获取、分析进程运行时各项指标,包括内存使用率、占用 CPU 时间、进程当前状态、进程监听的端口等,并根据进程运行时的各项指标信息,用户可以在观测云中配置相关告警,使用户了解进程的状态,在进程发生故障时,可以及时对发生故障的进程进行维护。
Warning
进程采集器(不管是对象还是指标),在 macOS 上可能消耗比较大,导致 CPU 飙升,可以手动将其关闭。目前默认采集器仍然开启进程对象采集器(默认 5min 运行一次)。
配置¶
前置条件¶
- 进程采集器默认不采集进程指标数据,如需采集指标相关数据,可在
host_processes.conf
中 将open_metric
设置为true
。比如:
采集器配置¶
进入 DataKit 安装目录下的 conf.d/host
目录,复制 host_processes.conf.sample
并命名为 host_processes.conf
。示例如下:
[[inputs.host_processes]]
# Only collect these matched process' metrics. For process objects
# these white list not applied. Process name support regexp.
# process_name = [".*nginx.*", ".*mysql.*"]
# Process minimal run time(default 10m)
# If process running time less than the setting, we ignore it(both for metric and object)
min_run_time = "10m"
## Enable process metric collecting
open_metric = false
## Enable listen ports tag, default is false
enable_listen_ports = false
## Enable open files field, default is false
enable_open_files = false
## only collect container-based process(object and metric)
only_container_processes = false
# Extra tags
[inputs.host_processes.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
# ...
配置好后,重启 DataKit 即可。
可通过 ConfigMap 方式注入采集器配置 或 配置 ENV_DATAKIT_INPUTS 开启采集器。
也支持以环境变量的方式修改配置参数(需要在 ENV_DEFAULT_ENABLED_INPUTS 中加为默认采集器):
-
ENV_INPUT_HOST_PROCESSES_OPEN_METRIC
开启进程指标采集
字段类型: Boolean
采集器配置字段:
open_metric
默认值: false
-
ENV_INPUT_HOST_PROCESSES_PROCESS_NAME
进程名白名单
字段类型: List
采集器配置字段:
process_name
示例:
.*datakit.*,guance
-
ENV_INPUT_HOST_PROCESSES_MIN_RUN_TIME
进程最短运行时间
字段类型: Duration
采集器配置字段:
min_run_time
默认值: 10m
-
ENV_INPUT_HOST_PROCESSES_ENABLE_LISTEN_PORTS
启用监听端口标签
字段类型: Boolean
采集器配置字段:
enable_listen_ports
默认值: false
-
ENV_INPUT_HOST_PROCESSES_TAGS
自定义标签。如果配置文件有同名标签,将会覆盖它
字段类型: Map
采集器配置字段:
tags
示例: tag1=value1,tag2=value2
-
ENV_INPUT_HOST_PROCESSES_ONLY_CONTAINER_PROCESSES
只采集容器进程的指标和对象
字段类型: Boolean
采集器配置字段:
only_container_processes
默认值: false
-
ENV_INPUT_HOST_PROCESSES_METRIC_INTERVAL
指标采集间隔
字段类型: Duration
采集器配置字段:
metric_interval
默认值: 30s
-
ENV_INPUT_HOST_PROCESSES_object_interval
对象采集间隔
字段类型: Duration
采集器配置字段:
object_interval
默认值: 300s
指标¶
以下所有数据采集,默认会追加名为 host
的全局 tag(tag 值为 DataKit 所在主机名),也可以在配置中通过 [inputs.host_processes.tags]
指定其它标签:
host_processes
¶
Collect process metrics, including CPU/memory usage, etc.
- 标签
Tag | Description |
---|---|
container_id | Container ID of the process, only supported Linux |
host | Host name |
pid | Process ID |
process_name | Process name |
username | Username |
- 字段列表
Metric | Description |
---|---|
cpu_usage | CPU usage, the percentage of CPU occupied by the process since it was started. This value will be more stable (different from the instantaneous percentage of top )Type: float Unit: percent,percent |
cpu_usage_top | CPU usage, the average CPU usage of the process within a collection cycle Type: float Unit: percent,percent |
mem_used_percent | Memory usage percentage Type: float Unit: percent,percent |
nonvoluntary_ctxt_switches | From /proc/[PID]/status. Context switches that nonvoluntary drop the CPU. Linux only Type: int Unit: count |
open_files | Number of open files (Linux only) Type: int Unit: count |
page_children_major_faults | Linux from /proc/[PID]/stat. The number of major page faults for this process. Linux only Type: int Unit: digital,B |
page_children_minor_faults | Linux from /proc/[PID]/stat. The number of minor page faults for this process. Linux only Type: int Unit: digital,B |
page_major_faults | Linux from /proc/[PID]/stat. The number of major page faults. Linux only Type: int Unit: digital,B |
page_minor_faults | Linux from /proc/[PID]/stat. The number of minor page faults. Linux only Type: int Unit: digital,B |
proc_read_bytes | Linux from /proc/[PID]/io, Windows from GetProcessIoCounters() . Read bytes from diskType: int Unit: digital,B |
proc_syscr | Linux from /proc/[PID]/io, Windows from GetProcessIoCounters() . Count of read() like syscall`. Linux&Windows onlyType: int Unit: count |
proc_syscw | Linux from /proc/[PID]/io, Windows from GetProcessIoCounters() . Count of write() like syscall`. Linux&Windows onlyType: int Unit: count |
proc_write_bytes | Linux from /proc/[PID]/io, Windows from GetProcessIoCounters() . Written bytes to diskType: int Unit: digital,B |
rss | Resident Set Size Type: int Unit: digital,B |
threads | Total number of threads Type: int Unit: count |
vms | Virtual memory size Type: int Unit: digital,B |
voluntary_ctxt_switches | From /proc/[PID]/status. Context switches that voluntary drop the CPU, such as sleep()/read()/sched_yield() . Linux onlyType: int Unit: count |
对象¶
host_processes
¶
Collect data on process objects, including process names, process commands, etc.
- 标签
Tag | Description |
---|---|
container_id | Container ID of the process if the process is running in container, Linux only |
host | Host name |
name | Process object name field, consisting of [host-name]_[pid] |
process_name | Process name |
state | Process status. Linux only |
username | Username |
- 字段列表
Metric | Description |
---|---|
cmdline | Command line parameters for the process Type: string Unit: - |
cpu_usage | CPU usage, the percentage of CPU occupied by the process since it was started. This value will be more stable (different from the instantaneous percentage of top )Type: float Unit: percent,percent |
cpu_usage_top | CPU usage, the average CPU usage of the process within a collection cycle Type: float Unit: percent,percent |
listen_ports | The port the process is listening on Type: string Unit: - |
mem_used_percent | Memory usage percentage Type: float Unit: percent,percent |
message | Process details Type: string Unit: - |
nonvoluntary_ctxt_switches | From /proc/[PID]/status. Context switches that nonvoluntary drop the CPU. Linux only Type: int Unit: count |
open_files | Number of open files (only supports Linux, and the enable_open_files option needs to be turned on)Type: int Unit: count |
page_children_major_faults | Linux from /proc/[PID]/stat. The number of major page faults of it's child processes. Linux only Type: int Unit: digital,B |
page_children_minor_faults | Linux from /proc/[PID]/stat. The number of minor page faults of it's child processes. Linux only Type: int Unit: digital,B |
page_major_faults | Linux from /proc/[PID]/stat. The number of major page faults. Linux only Type: int Unit: digital,B |
page_minor_faults | Linux from /proc/[PID]/stat. The number of minor page faults. Linux only Type: int Unit: digital,B |
pid | Process ID Type: int Unit: - |
proc_read_bytes | Linux from /proc/[PID]/io, Windows from GetProcessIoCounters() . Read bytes from diskType: int Unit: digital,B |
proc_syscr | Linux from /proc/[PID]/io, Windows from GetProcessIoCounters() . Count of read() like syscall`. Linux&Windows onlyType: int Unit: count |
proc_syscw | Linux from /proc/[PID]/io, Windows from GetProcessIoCounters() . Count of write() like syscall`. Linux&Windows onlyType: int Unit: count |
proc_write_bytes | Linux from /proc/[PID]/io, Windows from GetProcessIoCounters() . Written bytes to diskType: int Unit: digital,B |
rss | Resident set size Type: int Unit: digital,B |
start_time | process start time Type: int Unit: timeStamp,msec |
started_duration | Process startup time Type: int Unit: timeStamp,sec |
state_zombie | Whether it is a zombie process Type: bool Unit: - |
threads | Total number of threads Type: int Unit: count |
vms | Virtual memory size Type: int Unit: digital,B |
voluntary_ctxt_switches | From /proc/[PID]/status. Context switches that voluntary drop the CPU, such as sleep()/read()/sched_yield() . Linux onlyType: int Unit: count |
work_directory | Working directory (Linux only) Type: string Unit: - |