Collector Configuration¶
The collector configuration in DataKit is in the Toml format, and all collector configurations are in the conf.d directory:
- Linux/Mac:
/usr/local/datakit/conf.d/
- Windows:
C:\Program Files\datakit\conf.d\
Each collection is categorized and located in the lower subdirectory of conf.d. You can refer to the specific collector configuration instructions to find the corresponding subdirectory.
A typical configuration collector file has the following structure:
[[inputs.some_name]] # The line is required to indicate which collector configuration this toml file is
key = value
...
[[inputs.some_name.other_options]] # The line is optional, and some collectors are configured with this line, while others are not
key = value
...
Attention
Because DataKit only searches for files in the conf.d/
directory that are extended by .conf
, all collector configurations must be placed in the conf.d
directory (or its lower subdirectory) and must be suffixed by .conf
, otherwise DataKit will ignore the processing of the configuration file.
How to Modify Collector Configuration¶
At present, some collectors can be turned on without configuration, while others need to edit the configuration manually.
Enable Multiple Collections with the Same Collector¶
Taking MySQL as an example, if you want to configure multiple different MySQL collections, there are two ways:
- Add a new conf file, such as mysql-2.conf, which can be placed in the same directory as the existing mysql.conf.
- In the existing mysql.conf, add a paragraph like this:
# The first MySQL collection
[[inputs.mysql]]
host = "localhost"
user = "datakit"
pass = "<PASS>"
port = 3306
interval = "10s"
[inputs.mysql.log]
files = ["/var/log/mysql/*.log"]
[inputs.mysql.tags]
# Omit other configuration items...
#-----------------------------------------
# Another MySQL collection
#-----------------------------------------
[[inputs.mysql]]
host = "localhost"
user = "datakit"
pass = "<PASS>"
port = 3306
interval = "10s"
[inputs.mysql.log]
files = ["/var/log/mysql/*.log"]
[inputs.mysql.tags]
# Omit other configuration items...
#-----------------------------------------
# Continue to add another one below
#-----------------------------------------
[[inputs.mysql]]
...
The second method is probably simpler to manage, which manages all collectors with the same name with the same conf, and the first method may lead to confusion in the configuration directory.
To sum up, the structure of the second multi-acquisition configuration is as follows:
This is actually a Toml array structure, the structure is suitable for multiple configurations of all collectors.
Attention
-
Two collector configuration files with identical contents (file names can be different). To prevent misconfiguration, only one of them will be applied.
-
Configuring multiple different collectors (such as MySQL and Nginx) into one conf is not recommended, which can cause some odd problems and is not easy to administer.
-
Some collectors are limited to single-instance operation, see input-singleton for details.
Single Instance Collector¶
Some collectors only allow a single instance to run, and even if multiple copies are configured, only a single instance will run. These single instance collectors are listed as follows:
Collector Name | Description |
---|---|
cpu |
Collect the CPU usage of the host |
disk |
Collect disk occupancy |
diskio |
Collect the disk IO status of the host |
ebpf |
Collect TCP and UDP connection information of host network, Bash execution log, etc. |
mem |
Collect the memory usage of the host |
swap |
Collect Swap memory usage |
system |
Collect the load of host operating system |
net |
Collect host network traffic |
netstat |
Collect network connections, including TCP/UDP connections, waiting for connections, waiting for processing requests, etc. |
host_processes |
Collect the list of resident (surviving for more than 10min) processes on the host |
hostobject |
Collect basic information of host computer (such as operating system information, hardware information, etc.) |
container |
Collect possible containers or Kubernetes data on the host. Assuming there are no containers on the host, the collector will exit directly. |
Close the Specific Collector¶
Sometimes, we want to temporarily shut down a collector, and there are two ways:
- Rename the corresponding collector conf, such as mysql.conf to mysql.conf.bak. Just make sure the file suffix is not conf
- In conf, comment out the corresponding collection configuration, such as:
# Comment out the first MySQL collection
#[[inputs.mysql]]
# host = "localhost"
# user = "datakit"
# pass = "<PASS>"
# port = 3306
#
# interval = "10s"
#
# [inputs.mysql.log]
# files = ["/var/log/mysql/*.log"]
#
# [inputs.mysql.tags]
#
# # Omit other configuration items...
#
# Keep this MySQL collection
[[inputs.mysql]]
host = "localhost"
user = "datakit"
pass = "<PASS>"
port = 3306
interval = "10s"
[inputs.mysql.log]
files = ["/var/log/mysql/*.log"]
[inputs.mysql.tags]
# Omit other configuration items...
In contrast, the first approach is more crude and simple, and the second one needs to be carefully modified, which may lead to Toml configuration errors.
Regular Expressions in Collector Configuration¶
When editing the collector configuration, some regular expressions may need to be configured.
Since DataKit is mostly developed using Golang, the regular wild match used in the configuration section is also implemented using Golang's own regular implementation. As there are some differences in the regular systems of different languages, it is difficult to write the configuration correctly at one time.
We recommend an online tool to debug our regular wildcard. As shown in the following figure:
In addition, since Toml is used in the configuration of DataKit, it is recommended that you fill in the regular form by using '''Here is a specific regular expression'''
(that is, three English single quotation marks are used on both sides of the regular form), so as to avoid some complicated escapes.
Collector Turned on by Default¶
After DataKit is installed, a batch of collectors will be turned on by default without manual opening. These collectors are generally related to the host, and the list is as follows:
Collector Name | Description |
---|---|
cpu |
Collect the CPU usage of the host |
disk |
Collect disk occupancy |
diskio |
Collect the disk IO status of the host |
mem |
Collect the memory usage of the host |
swap |
Collect Swap memory usage |
system |
Collect the load of host operating system |
net |
Collect host network traffic |
host_processes |
Collect the list of resident (surviving for more than 10min) processes on the host |
hostobject |
Collect basic information of host computer (such as operating system information, hardware information, etc.) |
container |
Collect possible containers or Kubernetes data on the host. Assuming there are no containers on the host, the collector will exit directly. |
Password Encoding¶
In configuring connection strings, special characters in passwords, such as @#*
, need to be encoded to ensure the link string is correctly interpreted. Below is a list of encodings for these special characters:
Note: Not all special characters (like
~_-.
) require encoding, but they are listed here for reference.
Character | URL Encoding | Character | URL Encoding |
---|---|---|---|
` |
%60 |
~ |
~ |
! |
%21 |
@ |
%40 |
# |
%23 |
$ |
%24 |
% |
%25 |
^ |
%5E |
& |
%26 |
* |
%2A |
( |
%28 |
) |
%29 |
_ |
_ |
- |
- |
+ |
%2B |
= |
%3D |
{ |
%7B |
} |
%7D |
[ |
%5B |
] |
%5D |
\ |
%5C |
: |
%3A |
| |
%7C |
" |
%22 |
' |
%27 |
; |
%3B |
, |
%2C |
. |
. |
< |
%3C |
> |
%3E |
/ |
%2F |
? |
%3F |
Assuming we have the following Git connection string:
We need to convert the #
in the password to its URL-encoded form %23
: