Host Observability Best Practices (Linux)¶
Basic Overview¶
Linux, fully named GNU/Linux, is a freely usable and freely distributable Unix-like operating system. As the most widely used operating system in enterprises, its stability is necessarily the most critical aspect. Guance has achieved full coverage of host observability through years of customer experience accumulation, helping customers quickly understand the operation of their infrastructure and drastically reducing maintenance costs.
Scene Overview¶
< Guance - Scene - Dashboard - Create Dashboard - Host Overview_Linux >
Prerequisites¶
Go to the official website Guance to register an account, and log in using the registered account/password.
Deployment Implementation¶
One-Click Installation¶
DataKit is the official data collection application released by Guance, supporting the collection of hundreds of types of data.
Log in to the Guance console, click on "Integration" - "DataKit", copy the command line and run it directly on the server.
Default Paths¶
Directory | Path |
---|---|
Installation Directory | /usr/local/datakit/ |
Log Directory | /var/log/datakit/ |
Main Configuration File | /usr/local/datakit/conf.d/datakit.conf |
Plugin Configuration Directory | /usr/local/datakit/conf.d/ |
Default Plugins¶
After installation, some plugins (data collection) will be enabled by default. These can be viewed in the main configuration file datakit.conf
.
default_enabled_inputs = ["cpu", "disk", "diskio", "mem", "swap", "hostobject", "net", "host_processes", "container", "system"]
Plugin Description:
Metric data can be viewed in [ Guance - Metrics ], and object data can be viewed directly on relevant pages.
Plugin Name | Description | Data Type |
---|---|---|
cpu | Collects CPU usage information from the host | Metrics |
disk | Collects disk usage information | Metrics |
diskio | Collects disk IO usage information from the host | Metrics |
mem | Collects memory usage information from the host | Metrics |
swap | Collects Swap memory usage information | Metrics |
system | Collects operating system load information from the host | Metrics |
net | Collects network traffic information from the host | Metrics |
host_processes | Collects a list of resident processes (alive for over 10 minutes) on the host | Objects |
hostobject | Collects basic host information (such as OS information, hardware information, etc.) | Objects |
Data Collection¶
When viewing metrics with Guance, you can use tags for quick condition filtering.
Default Collection¶
CPU Metrics¶
[ Guance - Metrics - cpu, view CPU status data ] [ Guance - Metrics - systecm, view CPU load and core count data ]
Memory Metrics¶
[ Guance - Metrics - mem, view memory data ] [ Guance - Metrics - swap, view memory swap data ]
Disk Metrics¶
[ Guance - Metrics - disk, view disk data ] [ Guance - Metrics - disk, view disk IO data ]
Network Metrics¶
[ Guance - Metrics - net, view network data ]
Host Objects¶
[ Guance - Infrastructure - Host, view all host object lists ]
[ Guance - Infrastructure - Host - Click any host to view basic system information ]
Integration runtime status represents the list of plugins already running on this server
Process Objects¶
[ Guance - Infrastructure - Process, view all process object lists ]
[ Guance - Infrastructure - Process - Click any process name to view related process information ]
Advanced Collection¶
In addition to the default metric/object data, DataKit can also complete operating system monitoring data through other plugins.
Process List¶
To understand real-time process list information for all hosts, enable the process plugin (global top functionality).
- Enter the plugin configuration directory and copy the sample file
cd /usr/local/datakit/conf.d/host/
cp host_processes.conf.sample host_processes.conf
vi host_processes.conf
- Enable the process plugin
- Restart DataKit
[ Guance - Metrics - host_processes, view process data ]
Network Interface Metrics¶
Use ebpf technology to collect tcp/udp connection information for the host's network interface.
- Install the ebpf plugin
- Enter the plugin configuration directory and copy the sample file
- Enable the ebpf plugin
[[inputs.ebpf]]
daemon = true
name = 'ebpf'
cmd = "/usr/local/datakit/externals/datakit-ebpf"
args = ["--datakit-apiserver", "0.0.0.0:9529"]
enabled_plugins = ["ebpf-net"]
- Restart DataKit
[ Guance - Infrastructure - Host - Click on the host where the ebpf plugin is installed - Network, view system network interface information ]
Security Check¶
Perform real-time detection of security vulnerabilities on the host operating system.
- Install the Scheck service
Installation Instructions
Directory | Path |
---|---|
Installation Directory | /usr/local/scheck |
Log Directory | /usr/local/scheck/log |
Main Configuration File | /usr/local/scheck/scheck.conf |
Detection Rule Directory | /usr/local/scheck/rules.d |
- Modify the main configuration file
rule_dir='/usr/local/scheck/rules.d'
output='http://127.0.0.1:9529/v1/write/security'
log='/usr/local/scheck/log'
log_level='info'
- Start the service
[ Guance - Security Check - Explorer, view all security events ]
Extended Collection¶
In addition to its own data collection, DataKit is fully compatible with the telegraf collector.
Install Telegraf, taking CentOS as an example; for other systems, refer to the Telegraf Official Documentation
- Add yum source
cat <<EOF | tee /etc/yum.repos.d/influxdb.repo
[influxdb]
name = InfluxDB Repository - RHEL \$releasever
baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdb.key
EOF
- Install the telegraf collector
- Modify the main configuration file telegraf.conf
- Disable influxdb, enable outputs.http (to upload data to datakit)
- Disable telegraf default collection
#[[inputs.cpu]]
# percpu = true
# totalcpu = true
# collect_cpu_time = false
# report_active = false
#[[inputs.disk]]
# ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
#[[inputs.diskio]]
#[[inputs.mem]]
#[[inputs.processes]]
#[[inputs.swap]]
#[[inputs.system]]
- Start telegraf
Port Metrics¶
Detect important ports in the operating system.
- Modify the main configuration file telegraf.conf
- Enable port detection
[[inputs.net_response]]
protocol = "tcp"
address = "localhost:9090"
timeout = "3s"
[[inputs.net_response]]
protocol = "tcp"
address = "localhost:22"
timeout = "3s"
- Restart telegraf
[ Guance - Metrics - net_response, view port data ]
Process Metrics¶
Detect important processes in the operating system.
- Modify the main configuration file telegraf.conf
- Enable process detection
- Restart telegraf
[ Guance - Metrics - procstat, view process data ]
Single-point Testing¶
Using the local machine as a testing point, detect important interfaces/sites.
For multi-point testing, see Synthetic Tests.
- Modify the main configuration file telegraf.conf
- Enable HTTP detection
[[inputs.http_response]]
urls = ["https://www.baidu.com","https://guance.com","http://localhost:9090"]
- Restart telegraf
[ Guance - Metrics - http_response, view test data ]
Monitoring Rules¶
Used to set alarm rules and notification targets to monitor system stability in real time.
Built-in Templates¶
Guance already includes some built-in detection library templates that can be used directly.
[ Guance - Monitoring - Create from Template - Host Detection Library] [ Guance - Monitoring - Create from Template - Ping Status Detection Library] [ Guance - Monitoring - Create from Template - Port Detection Library]
Custom Detection Libraries¶
Add detection rules through customization. Guance supports multiple detections such as thresholds, processes, logs, and network detection.
Threshold Detection¶
[ Guance - Monitoring - Create Monitor - Threshold Detection ]
Detection Metric: Alarm rule expression, where
Trigger Condition: Final threshold range, triggers an alarm when the condition is met; after triggering, if the threshold is not met again upon rechecking, it can recover (normal needs to have a detection cycle filled out).
Event name/content can reference variables, event content uses markdown text format (for example, a new line requires two spaces).
Notification Targets¶
Customize settings for alarm rule notification targets.
[ Guance - Manage - Notification Targets ]
Group monitors and add notification targets according to the monitors.
[ Guance - Monitoring - Monitors - Grouping - Alert Configuration ]