Skip to content

DataKit Service Management


After installing DataKit, it is necessary to provide some basic information about the installed DataKit.

Introduction to DataKit Directories

DataKit currently supports three mainstream platforms: Linux, Windows, and Mac:

Operating System Architecture Installation Path
Linux kernel version 2.6.23 or higher amd64/386/arm/arm64 /usr/local/datakit
macOS version 10.13 or higher1 amd64 /usr/local/datakit
Windows 7, Server 2008R2 or higher amd64/386 64-bit: C:\Program Files\datakit
32-bit: C:\Program Files(32)\datakit

After the installation is complete, the DataKit directory list is roughly as follows:

├── [   12]  apm_inject/
├── [    0]  gitrepos/
├── [    0]  python.d/
├── [  430]  pipeline/
├── [   26]  pipeline_remote/
├── [   42]  cache/
├── [   36]  externals/
├── [  316]  data/
├── [ 138M]  datakit
├── [  958]  conf.d/
└── [    7]  .pid
Directory Name Description
apm_inject After enabling the APM auto-injection function, this directory is used to store some dependent files.
cache Store some data caches used during the collection process.
conf.d Store configuration examples of all collectors. The DataKit main configuration file datakit.conf is located in this directory.
data Store data files required for DataKit operation, such as the IP address database.
datakit The main DataKit program. On Windows, it is datakit.exe. Most of the collection functions of DataKit are integrated in this program.
externals Some collectors are not integrated in the DataKit main program and are compiled separately.
gitrepos If Git is used to manage collector configurations, store these configurations here.
pipeline Store Pipeline scripts.
pipeline_remote Store Pipeline scripts written in Studio.
python.d Store Python scripts.
.pid Store the process ID of the currently running DataKit.

There are two DataKit log files:

Directory Name Description
gin.log DataKit can receive external HTTP data input. This log file is equivalent to the HTTP access log.
log DataKit operation log (On Linux/Mac platforms, the DataKit operation log is located in the /var/log/datakit directory. On Windows, it is located in the *C:\Program Files\datakit* directory).
Check the Kernel Version
  • Linux/Mac: uname -r
  • Windows: Execute the cmd command (Press Win key + r, enter cmd and press Enter), and input winver to get the system version information.

DataKit Service Management

You can directly use the following commands to manage DataKit:

# Linux/Mac may require sudo
datakit service -T # stop
datakit service -S # start
datakit service -R # restart
Tip

You can use datakit help service to view more help information.

Handling of Service Management Failures

Sometimes, due to bugs in some components of DataKit, the service operation may fail (for example, after datakit service -T, the service does not stop). You can force the processing in the following way.

On Linux, if the above command fails, you can use the following commands instead:

sudo service datakit stop/start/restart
sudo systemctl stop/start/restart datakit

On Mac, you can use the following commands instead:

# Start DataKit
sudo launchctl load -w /Library/LaunchDaemons/com.datakit.plist

# Stop DataKit
sudo launchctl unload -w /Library/LaunchDaemons/com.datakit.plist

Service Uninstall and Reinstall

You can directly use the following commands to uninstall or restore the DataKit service:

Note: Uninstalling DataKit here will not delete DataKit-related files.

# Linux/Mac shell
datakit service -I # re-install
datakit service -U # uninstall

Impact of DataKit on the Host Environment

During the use of DataKit, the existing system may be affected in the following ways:

  1. Log collection will lead to high-speed disk reading. The larger the log volume, the higher the iops of reading.
  2. If the RUM SDK is added to a Web/App application, continuous RUM-related data upload will occur. If there are restrictions on the upload bandwidth, it may cause the Web/App page to freeze.
  3. After eBPF collection is enabled, due to the large amount of collected data, a certain amount of memory and CPU will be occupied. After bpf-netlog is enabled, a large number of logs will be generated based on all TCP packets of the host and container network cards.
  4. When DataKit is busy (a large number of logs/Traces are accessed, and external data is imported, etc.), it will occupy a considerable amount of CPU and memory resources. It is recommended to set reasonable resource limit configurations for control.
  5. When DataKit is deployed in Kubernetes, there will be a certain request pressure on the API server.
  6. When the default collector is enabled, the memory (RSS) consumption is approximately 100MB, and the CPU consumption is controlled within 10%. In addition to its own logs, the disk consumption also includes additional disk cache. The network traffic depends on the specific amount of collected data. The traffic uploaded by DataKit is compressed and uploaded using GZip by default.

FAQ

Failure to Start on Windows

DataKit is started as a service on Windows. After startup, a lot of Event logs will be written. As the logs accumulate, the following error may occur:

Start service failed: The event log file is full.

This error will prevent DataKit from starting. You can set the Windows Event to solve this problem.

Further References

Other documents related to the basic use of DataKit:


  1. Golang 1.18 requires macOS-amd64 version 10.13. 

Feedback

Is this page helpful? ×