Skip to content

Collector Configuration


The collector configuration in DataKit is in the Toml format, and all collector configurations are in the conf.d directory:

  • Linux/Mac:/usr/local/datakit/conf.d/
  • Windows:C:\Program Files\datakit\conf.d\

Each collection is categorized and located in the lower subdirectory of conf.d. You can refer to the specific collector configuration instructions to find the corresponding subdirectory.

A typical configuration collector file has the following structure:

[[inputs.some_name]] # The line is required to indicate which collector configuration this toml file is
  key = value
  ...

[[inputs.some_name.other_options]] # The line is optional, and some collectors are configured with this line, while others are not
  key = value
  ...
Attention

Because DataKit only searches for files in the conf.d/ directory that are extended by .conf, all collector configurations must be placed in the conf.d directory (or its lower subdirectory) and must be suffixed by .conf, otherwise DataKit will ignore the processing of the configuration file.

How to Modify Collector Configuration

At present, some collectors can be turned on without configuration, while others need to edit the configuration manually.

Enable Multiple Collections with the Same Collector

Taking MySQL as an example, if you want to configure multiple different MySQL collections, there are two ways:

  1. Add a new conf file, such as mysql-2.conf, which can be placed in the same directory as the existing mysql.conf.
  2. In the existing mysql.conf, add a paragraph like this:
# The first MySQL collection
[[inputs.mysql]]
  host = "localhost"
  user = "datakit"
  pass = "<PASS>"
  port = 3306

  interval = "10s"

  [inputs.mysql.log]
    files = ["/var/log/mysql/*.log"]

  [inputs.mysql.tags]

    # Omit other configuration items...

#-----------------------------------------
# Another MySQL collection
#-----------------------------------------
[[inputs.mysql]]
  host = "localhost"
  user = "datakit"
  pass = "<PASS>"
  port = 3306

  interval = "10s"

  [inputs.mysql.log]
    files = ["/var/log/mysql/*.log"]

  [inputs.mysql.tags]

    # Omit other configuration items...

#-----------------------------------------
# Continue to add another one below
#-----------------------------------------
[[inputs.mysql]]
  ...

The second method is probably simpler to manage, which manages all collectors with the same name with the same conf, and the first method may lead to confusion in the configuration directory.

To sum up, the structure of the second multi-acquisition configuration is as follows:

[[inputs.some-name]]
   ...
[[inputs.some-name]]
   ...
[[inputs.some-name]]
   ...

This is actually a Toml array structure, the structure is suitable for multiple configurations of all collectors.

Attention
  • Two collector configuration files with identical contents (file names can be different). To prevent misconfiguration, only one of them will be applied.

  • Configuring multiple different collectors (such as MySQL and Nginx) into one conf is not recommended, which can cause some odd problems and is not easy to administer.

  • Some collectors are limited to single-instance operation, see input-singleton for details.

Single Instance Collector

Some collectors only allow a single instance to run, and even if multiple copies are configured, only a single instance will run. These single instance collectors are listed as follows:

Collector Name Description
cpu Collect the CPU usage of the host
disk Collect disk occupancy
diskio Collect the disk IO status of the host
ebpf Collect TCP and UDP connection information of host network, Bash execution log, etc.
mem Collect the memory usage of the host
swap Collect Swap memory usage
system Collect the load of host operating system
net Collect host network traffic
netstat Collect network connections, including TCP/UDP connections, waiting for connections, waiting for processing requests, etc.
host_processes Collect the list of resident (surviving for more than 10min) processes on the host
hostobject Collect basic information of host computer (such as operating system information, hardware information, etc.)
container Collect possible containers or Kubernetes data on the host. Assuming there are no containers on the host, the collector will exit directly.

Close the Specific Collector

Sometimes, we want to temporarily shut down a collector, and there are two ways:

  1. Rename the corresponding collector conf, such as mysql.conf to mysql.conf.bak. Just make sure the file suffix is not conf
  2. In conf, comment out the corresponding collection configuration, such as:
# Comment out the first MySQL collection
#[[inputs.mysql]]
#  host = "localhost"
#  user = "datakit"
#  pass = "<PASS>"
#  port = 3306
#  
#  interval = "10s"
#  
#  [inputs.mysql.log]
#    files = ["/var/log/mysql/*.log"]
#  
#  [inputs.mysql.tags]
#  
#    # Omit other configuration items...
#

# Keep this MySQL collection
[[inputs.mysql]]
  host = "localhost"
  user = "datakit"
  pass = "<PASS>"
  port = 3306

  interval = "10s"

  [inputs.mysql.log]
    files = ["/var/log/mysql/*.log"]

  [inputs.mysql.tags]

    # Omit other configuration items...

In contrast, the first approach is more crude and simple, and the second one needs to be carefully modified, which may lead to Toml configuration errors.

Regular Expressions in Collector Configuration

When editing the collector configuration, some regular expressions may need to be configured.

Since DataKit is mostly developed using Golang, the regular wild match used in the configuration section is also implemented using Golang's own regular implementation. As there are some differences in the regular systems of different languages, it is difficult to write the configuration correctly at one time.

We recommend an online tool to debug our regular wildcard. As shown in the following figure:

In addition, since Toml is used in the configuration of DataKit, it is recommended that you fill in the regular form by using '''Here is a specific regular expression''' (that is, three English single quotation marks are used on both sides of the regular form), so as to avoid some complicated escapes.

Collector Turned on by Default

After DataKit is installed, a batch of collectors will be turned on by default without manual opening. These collectors are generally related to the host, and the list is as follows:

Collector Name Description
cpu Collect the CPU usage of the host
disk Collect disk occupancy
diskio Collect the disk IO status of the host
mem Collect the memory usage of the host
swap Collect Swap memory usage
system Collect the load of host operating system
net Collect host network traffic
host_processes Collect the list of resident (surviving for more than 10min) processes on the host
hostobject Collect basic information of host computer (such as operating system information, hardware information, etc.)
container Collect possible containers or Kubernetes data on the host. Assuming there are no containers on the host, the collector will exit directly.

Password Encoding

In configuring connection strings, special characters in passwords, such as @#*, need to be encoded to ensure the link string is correctly interpreted. Below is a list of encodings for these special characters:

Note: Not all special characters (like ~_-.) require encoding, but they are listed here for reference.

Character URL Encoding Character URL Encoding
` %60 ~ ~
! %21 @ %40
# %23 $ %24
% %25 ^ %5E
& %26 * %2A
( %28 ) %29
_ _ - -
+ %2B = %3D
{ %7B } %7D
[ %5B ] %5D
\ %5C : %3A
| %7C " %22
' %27 ; %3B
, %2C . .
< %3C > %3E
/ %2F ? %3F

Assuming we have the following Git connection string:

http://username:pa55w#rd@github.com/path/to/repository.git 

We need to convert the # in the password to its URL-encoded form %23:

http://username:pa55w%23rd@github.com/path/to/repository.git 

For More Readings

Feedback

Is this page helpful? ×