Host Installation¶

This article describes the basic installation of DataKit.

The browser visits the Guance registration portal, fills in the corresponding information, and then logs in to Guance.

Get the Installation Command¶

Log in to the workspace, click "Integration" on the left and select "DataKit" at the top, and you can see the installation commands of various platforms.

Note that the following Linux/Mac/Windows installer can automatically identify the hardware platform (arm/x86, 32bit/64bit) without making a hardware platform selection.

Linux/macOSWindows

The installation command supports bash and ash( Version-1.14.0), and the command is roughly as follows:

bash

DK_DATAWAY=https://openway.guance.com?token=<TOKEN> bash -c "$(curl -L https://static.guance.com/datakit/install.sh)"

ash

DK_DATAWAY=https://openway.guance.com?token=<TOKEN> ash -c "$(curl -L https://static.guance.com/datakit/install.sh)"

After the installation is completed, you will see a prompt that the installation is successful at the terminal.

Installation on Windows requires a Powershell command line installation and must run Powershell as an administrator. Press the Windows key, enter powershell to see the pop-up powershell icon, and right-click and select "Run as an administrator".

Remove-Item -ErrorAction SilentlyContinue Env:DK_*;
$env:DK_DATAWAY="https://openway.guance.com?token=<TOKEN>";
Set-ExecutionPolicy Bypass -scope Process -Force;
Import-Module bitstransfer;
start-bitstransfer  -source https://static.guance.com/datakit/install.ps1 -destination .install.ps1;
powershell ./.install.ps1;

Install DataKit lite¶

You can specify the environment variable DK_LITE to install DataKit lite ( Version-1.14.0):

Linux/macOSWindows

DK_DATAWAY=https://openway.guance.com?token=<TOKEN> DK_LITE=1 bash -c "$(curl -L https://static.guance.com/datakit/install.sh)"

Remove-Item -ErrorAction SilentlyContinue Env:DK_*;
$env:DK_DATAWAY="https://openway.guance.com?token=<TOKEN>";
$env:DK_LITE="1";
Set-ExecutionPolicy Bypass -scope Process -Force;
Import-Module bitstransfer;
start-bitstransfer  -source https://static.guance.com/datakit/install.ps1 -destination .install.ps1;
powershell ./.install.ps1;

DataKit lite only contains collectors as below:

Collector Name	Description
`cpu`	Collect the CPU usage of the host
`disk`	Collect disk occupancy
`diskio`	Collect the disk IO status of the host
`mem`	Collect the memory usage of the host
`swap`	Collect Swap memory usage
`system`	Collect the load of host operating system
`net`	Collect host network traffic
`host_processes`	Collect the list of resident (surviving for more than 10min) processes on the host
`hostobject`	Collect basic information of host computer (such as operating system information, hardware information, etc.)
DataKit(dk)	Collect DataKit running metrics
RUM(rum)	Collect user access monitoring data
Net dialtesting(dialtesting)	Collect the data generated by dialing test
Prom (prom)	Collect data exposed by Prometheus Exporters
logging	Collect file log data

Install DataKit eBPF Span Linker Version¶

You can specify the environment variable DK_ELINKER to install DataKit ELinker ( Version-1.30.0):

Linux/macOSWindows

DK_DATAWAY=https://openway.guance.com?token=<TOKEN> DK_ELINKER=1 bash -c "$(curl -L https://static.guance.com/datakit/install.sh)"

Remove-Item -ErrorAction SilentlyContinue Env:DK_*;
$env:DK_DATAWAY="https://openway.guance.com?token=<TOKEN>";
$env:DK_ELINKER="1";
Set-ExecutionPolicy Bypass -scope Process -Force;
Import-Module bitstransfer;
start-bitstransfer  -source https://static.guance.com/datakit/install.ps1 -destination .install.ps1;
powershell ./.install.ps1;

DataKit ELinker only contains collectors as below:

Collector Name	Description
`cpu`	Collect the CPU usage of the host
`disk`	Collect disk occupancy
`diskio`	Collect the disk IO status of the host
`ebpftrace`	Receive eBPF trace span and link these spans to generate trace id
`mem`	Collect the memory usage of the host
`swap`	Collect Swap memory usage
`system`	Collect the load of host operating system
`net`	Collect host network traffic
`hostobject`	Collect basic information of host computer (such as operating system information, hardware information, etc.)
`DataKit(dk)`	Collect DataKit running metrics

Install Specific Version¶

We can install specific DataKit version, for example 1.2.3:

DK_DATAWAY=https://openway.guance.com?token=<TOKEN> bash -c "$(curl -L https://static.guance.com/datakit/install-1.2.3.sh)"

And the same as Windows:

Remove-Item -ErrorAction SilentlyContinue Env:DK_*;
$env:DK_DATAWAY="https://openway.guance.com?token=<TOKEN>";
Set-ExecutionPolicy Bypass -scope Process -Force;
Import-Module bitstransfer;
start-bitstransfer  -source https://static.guance.com/datakit/install-1.2.3.ps1 -destination .install.ps1;
powershell ./.install.ps1;

Additional Supported Environment Variable¶

If you need to define some DataKit configuration during the installation phase, you can add environment variables to the installation command, just append them before DK_DATAWAY For example, append the DK_NAMESPACE setting:

Linux/macOSWindows

DK_DATAWAY=https://openway.guance.com?token=<TOKEN> DK_NAMESPACE=<namespace> bash -c "$(curl -L https://static.guance.com/datakit/install.sh)"

Remove-Item -ErrorAction SilentlyContinue Env:DK_*;
$env:DK_DATAWAY="https://openway.guance.com?token=<TOKEN>";
$env:DK_NAMESPACE="<namespace>";
Set-ExecutionPolicy Bypass -scope Process -Force;
Import-Module bitstransfer;
start-bitstransfer  -source https://static.guance.com/datakit/install.ps1 -destination .install.ps1;
powershell ./.install.ps1;

The setting format of the two environment variables is:

# Windows: Multiple environment variables are divided by semicolons
$env:NAME1="value1"; $env:Name2="value2"

# Linux/Mac: Multiple environment variables are divided by spaces
NAME1="value1" NAME2="value2"

The environment variables supported by the installation script are as follows (supported by the whole platform).

Note

These environment variable settings are not supported for full offline installation. However, these environment variables can be set by proxy and setting local installation address.
These environment variables are only effective in installation mode; they do not take effect in upgrade mode.

Most Commonly Used Environment Variables¶

DK_DATAWAY: Specify the DataWay address, and the DataKit installation command has been brought by default
DK_GLOBAL_TAGS: Deprecated, DK_GLOBAL_HOST_TAGS instead
DK_GLOBAL_HOST_TAGS: Support the installation phase to fill in the global host tag, format example: host=__datakit_hostname,host_ip=__datakit_ip (multiple tags are separated by English commas)
DK_GLOBAL_ELECTION_TAGS: Support filling in the global election tag during the installation phase，format example: project=my-porject,cluster=my-cluster (support filling in the global election tag during the installation phase)
DK_DEF_INPUTS: List of collector names opened by default, format example: cpu,mem,disk. We can also ban some default inputs by putting a - prefix at input name, such as -cpu,-mem,-disk. But if mixed them, such as cpu,mem,-disk,-system, we only accept the banned list, the effect is only disk and system disabled, but others enabled.
DK_CLOUD_PROVIDER: Support filling in cloud vendors during installation (Currently support following clouds aliyun/aws/tencent/hwcloud/azure). Deprecated: DataKit can infer cloud type automatically.
DK_USER_NAME：DataKit service running user name. Default is root. More details is in Attention below.
DK_LITE： When installing the simplified DataKit, you can set this variable to 1. ( Version-1.14.0)

Disable all default inputs Version-1.5.5

We can set DK_DEF_INPUTS to - to disable all default inputs:

DK_DEF_INPUTS="-" \
DK_DATAWAY=https://openway.guance.com?token=<TOKEN> \
bash -c "$(curl -L https://static.guance.com/datakit/install.sh)"

Beside, if DataKit has been installed before, we must delete all default inputs .conf files manually. During installing, DataKit able to add new inputs configure, not cant delete them.

Note

For privilege reason, using DK_USER_NAME with not root name could cause following collector unavailable:

eBPF

In addition, the following items need to be noted.

Manually create user and group first, then start install. There are difference between Linux distribution releases, below commands are for reference:

CentOS/RedHatUbuntu/Debian其它 Linux

groupadd --system datakit

adduser --system --no-create-home datakit -g datakit

usermod -s /sbin/nologin datakit

groupadd --system datakit

adduser --system --no-create-home datakit

usermod -a -G datakit datakit

usermod -s /usr/sbin/nologin datakit

groupadd --system datakit

adduser --system --no-create-home datakit

usermod -a -G datakit datakit

usermod -s /bin/false datakit

DK_USER_NAME="datakit" DK_DATAWAY="..." bash -c ...

On DataKit's Own Log¶

DK_LOG_LEVEL: Optional info/debug
DK_LOG: If changed to stdout, the log will not be written to the file, but will be output by the terminal.
DK_GIN_LOG: If changed to stdout, the log will not be written to the file, but will be output by the terminal.

On DataKit pprof¶

DK_ENABLE_PPROF(deprecated): whether to turn on pprof
DK_PPROF_LISTEN: pprof service listening address

Version-1.9.2 enabled pprof by default.

On DataKit Election¶

DK_ENABLE_ELECTION: Open the election, not by default. If you need to open it, give any non-empty string value to the environment variable. (eg True/False)
DK_NAMESPACE: Supports namespaces specified during installation (for election)

On HTTP/API Environment¶

DK_HTTP_LISTEN: Support the installation-stage specified DataKit HTTP service binding network card (default localhost)
DK_HTTP_PORT: Support specifying the port of the DataKit HTTP service binding during installation (default 9529)
DK_RUM_ORIGIN_IP_HEADER: RUM-specific
DK_DISABLE_404PAGE: Disable the DataKit 404 page (commonly used when deploying DataKit RUM on the public network. Such as True/False)
DK_INSTALL_IPDB: Specify the IP library at installation time (currently only iploc and geolite2 is supported)
DK_UPGRADE_IP_WHITELIST: Starting from DataKit 1.5.9, we can upgrade DataKit by access remote http API. This environment variable is used to set the IP whitelist of clients that can be accessed remotely(multiple IPs could be separated by commas ,). Access outside the whitelist will be denied (default not restricted).
DK_UPGRADE_LISTEN: Specify DK-Upgrader HTTP server address(default 0.0.0.0:9542) Version-1.38.1
DK_HTTP_PUBLIC_APIS: Specify which DataKit HTTP APIs can be accessed by remote, generally config combined with RUM input，support from DataKit 1.9.2.

On DCA¶

DK_DCA_ENABLE: Support DCA service to be turned on during installation (not turned on by default)
DK_DCA_WEBSOCKET_SERVER: DCA websocket server address that can be accessed by DataKit

On External Collector¶

DK_INSTALL_EXTERNALS: Used to install external collectors not packaged with DataKit

On Confd Configuration¶

Environment Variable Name	Type	Applicable Scenario	Description	Sample Value
DK_CONFD_BACKEND	string	All	Backend Source Type	`etcdv3`, `zookeeper`, `redis` or `consul`
DK_CONFD_BASIC_AUTH	string	`etcdv3`, `consul`	Optional
DK_CONFD_CLIENT_CA_KEYS	string	`etcdv3`, `consul`	Optional
DK_CONFD_CLIENT_CERT	string	`etcdv3`, `consul`	Optional
DK_CONFD_CLIENT_KEY	string	`etcdv3`, `consul` or `redis`	Optional
DK_CONFD_BACKEND_NODES	string	All	Backend Source Address	`[IP:2379,IP address 2:2379]`
DK_CONFD_PASSWORD	string	`etcdv3`, `consul`	Optional
DK_CONFD_SCHEME	string	`etcdv3`, `consul`	Optional
DK_CONFD_SEPARATOR	string	`redis`	Optional default 0
DK_CONFD_USERNAME	string	`etcdv3`, `consul`	Optional

On Git Configuration¶

DK_GIT_URL: The remote git repo address for managing configuration files. (e.g. http://username:password@github.com/username/repository.git)
DK_GIT_KEY_PATH: The full path of the local PrivateKey. (e.g. /Users/username/.ssh/id_rsa)
DK_GIT_KEY_PW: The password to use the local PrivateKey. (e.g. passwd)
DK_GIT_BRANCH: Specify the branch to pull. If it is empty, it is the default, and the default is the remotely specified main branch, which is usually master.
DK_GIT_INTERVAL: The interval of the timed pull. (e.g. 1m)

WAL¶

DK_WAL_WORKERS: Set WAL workers, default to limited-CPU-cores * 4
DK_WAL_CAPACITY: Set single WAL max disk size, default to 2GB

On Sinker Configuration¶

DK_SINKER_GLOBAL_CUSTOMER_KEYS used to setup sinker tag/field keys, here is the example:

Linux/macOSWindows

DK_DATAWAY=https://openway.guance.com?token=<TOKEN> DK_DATAWAY_ENABLE_SINKER=on DK_SINKER_GLOBAL_CUSTOMER_KEYS=key1,key2 bash -c "$(curl -L https://static.guance.com/datakit/install.sh)"

Remove-Item -ErrorAction SilentlyContinue Env:DK_*;
$env:DK_DATAWAY="https://openway.guance.com?token=<TOKEN>";
$env:DK_DATAWAY_ENABLE_SINKER="on";
$env:DK_SINKER_GLOBAL_CUSTOMER_KEYS="key1,key2";
Set-ExecutionPolicy Bypass -scope Process -Force;
Import-Module bitstransfer;
start-bitstransfer  -source https://static.guance.com/datakit/install.ps1 -destination .install.ps1;
powershell ./.install.ps1;

On Resource Limit Configuration¶

Only Linux and Windows ( Version-1.15.0) operating system are supported.

DK_LIMIT_DISABLED: Turn off Resource limit function (on by default)
DK_LIMIT_CPUMAX: Maximum CPU power, default 30.0
DK_LIMIT_MEMMAX: Limit memory (including swap), default 4096 (4GB)

APM Instrumentation¶

Version-1.62.0 · Experimental

By specifying DK_APM_INSTRUMENTATION_ENABLED in the installation command, you can automatically inject APM for Java/Python applications:

Enable host inject

DK_APM_INSTRUMENTATION_ENABLED=host \
  DK_DATAWAY=https://openway.guance.com?token=<TOKEN>  \
  bash -c "$(curl -L https://static.guance.com/datakit/install.sh)"

Enable host inject:

DK_APM_INSTRUMENTATION_ENABLED=docker \
  DK_DATAWAY=https://openway.guance.com?token=<TOKEN> \
  bash -c "$(curl -L https://static.guance.com/datakit/install.sh)"

For host deployment, after DataKit is installed, reopen a terminal and restart the corresponding Java/Python application.

For a specific process on the host or in a container, you can disable the automatic injection feature by injecting the environment variable ENV_DATAKIT_DISABLE_APM_INS and setting the value to true.

To enable or disable this feature, modify the value of the instrumentation_enabled configuration under [apm_inject] in the datakit.conf file:

Value "host", "docker" or "host,docker", enable
Value "" or "disable", disable

Notes:

Before deleting the files in the DataKit installation directory, you need to uninstall the feature first. Please execute datakit tool --remove-apm-auto-inject to clean up the system settings and Docker settings.
For Docker injection, additional steps are required to install and configure Docker injection and delete injection-related files in the DataKit installation directory
After installing and configuring Docker injection, if you need to make it effective for the created container:

# stop docker service
systemctl stop docker docker.socket

# change the runtime of the created container from runc to dk-runc provided by datakit
datakit tool --change-docker-containers-runtime dk-runc

# start docker service
systemctl start docker

# restart the container that exited due to dockerd restart
docker start <container_id1> <container_id2> ...

After uninstalling the feature (with Docker injection enabled), if you need to delete all files in the DataKit installation directory:

# stpp docker service
systemctl stop docker docker.socket

# Change the runtime of the created container from dk-runc back to runc
datakit tool --change-docker-containers-runtime runc

# start docker service
systemctl start docker

# restart the container that exited due to dockerd restart
docker start <container_id1> <container_id2> ...

Operating environment requirements:

Linux system
- CPU architecture: x86_64 or arm64
- C standard library: glibc 2.4 and above, or musl
- Java 8 and above
- Python 3.7 and above

In Kubernetes, you can inject APM through the DataKit Operator.

Other Installation Options¶

Environment Variable Name	Sample	Description
`DK_INSTALL_ONLY`	`on`	Install only, not run
`DK_HOSTNAME`	`some-host-name`	Support custom configuration hostname during installation
`DK_UPGRADE`	`1`	Upgrade to the latest version
`DK_UPGRADE_MANAGER`	`on`	Whether we upgrade the Remote Upgrade Service when upgrading DataKit, it's used in conjunction with `DK_UPGRADE`, supported start from 1.5.9
`DK_INSTALLER_BASE_URL`	`https://your-url`	You can choose the installation script for different environments, default to `https://static.guance.com/datakit`
`DK_PROXY_TYPE`	-	Proxy type. The options are: `datakit` or `nginx`, both lowercase
`DK_NGINX_IP`	-	Proxy server IP address (only need to fill in IP but not port). With the highest priority, this is mutually exclusive with the above "HTTP_PROXY" and "HTTPS_PROXY" and will override both.
`DK_INSTALL_LOG`	-	Set the setup log path, default to install.log in the current directory, if set to `stdout`, output to the command line terminal.
`HTTPS_PROXY`	`IP:Port`	Installed through the DataKit agent
`DK_INSTALL_RUM_SYMBOL_TOOLS`	`on`	Install source map tools for RUM, support from DataKit 1.9.2.
`DK_VERBOSE`	`on`	Enable more verbose info during install(only for Linux/Mac) Version-1.19.0
`DK_CRYPTO_AES_KEY`	`0123456789abcdfg`	Use the encrypted password decryption key to protect plaintext passwords in the collector. Version-1.31.0
`DK_CRYPTO_AES_KEY_FILE`	`/usr/local/datakit/enc4dk`	Another way to configure the secret key takes priority over the previous one. Put the key into the file and configure the configuration file path through environment variables.

FAQ¶

How to Deal with the Unfriendly Host Name¶

Because DataKit uses Hostname as the basis for data concatenation, in some cases, some host names are not very friendly, such as iZbp141ahn...., but for some reasons, these host names cannot be modified, which brings some troubles to use. In DataKit, this unfriendly host name can be overwritten in the main configuration.

In datakit.conf, modify the following configuration and the DataKit will read ENV_HOSTNAME to overwrite the current real hostname:

[environments]
    ENV_HOSTNAME = "your-fake-hostname-for-datakit"

Note: If a host has collected data for a period of time, after changing the host name, the historical data will no longer be associated with the new host name. Changing the host name is equivalent to adding a brand-new host.

Issue on macOS installation¶

If it appears during the installation/upgrade process when installing on macOS:

"launchctl" failed with stderr: /Library/LaunchDaemons/com.datakit.plist: Service is disabled

Execute:

sudo launchctl enable system/datakit

Then execute the following command:

sudo launchctl load -w /Library/LaunchDaemons/com.datakit.plist

Are there any high-risk operations on files and data in DataKit?¶

During its operation, DataKit reads a significant amount of system information based on the collection configuration, such as process lists, hardware and software information (e.g., OS information, CPU, memory, disk, network card, etc.). However, it does not proactively execute deletion or modification of data outside of itself. About file reading and writing, there are two parts: one related to data collection read file/port operations, and one for the necessary file reading and writing operations during DataKit's own runtime.

Host files read/write during data collecting:

During process information collection and hardware and software information collection, Linux systems will read relevant information from the /proc directory; Windows systems mainly use WMI and the Golang Windows SDK to obtain these information.
If log collection is configured, DataKit will scan and read logs that match the configuration (e.g., syslog, user application logs, etc.).
Port usage: DataKit may open some ports to receive external data for interfacing with other systems. These ports are opened as needed based on the collector.
eBPF collection: Due to its particularity, eBPF requires more binary information of the Linux kernel and processes, resulting in the following actions:
- Analyze the binary files of all (or specified) running programs (dynamic libraries, processes within containers) for symbols and addresses.
- Read and write files under the kernel DebugFS mount point or interact with the PMU (Performance Monitoring Unit) to place kprobe/uprobe/tracepoint eBPF probes.
- uprobe probes will modify the CPU instructions of user processes to read relevant data.

In addition to collection, DataKit performs the following file reading and writing operations:

Its own log files

On Linux, these are located in the /var/log/datakit/ directory; on Windows, they are located in the C:\Program Files\datakit directory.

Log files will automatically rotate when they reach a specified size (default 32MB), with a maximum number of rotations (default maximum of 5 + 1 segments).

Disk cache

Some data collection requires the use of disk cache functionality (which must be manually enabled). This cache will involve file creation and deletion during the generation and consumption process. Disk cache also has a maximum capacity setting; when full, it will automatically perform FIFO deletion operations to prevent disk overflow.

How does DataKit control its own resource consumption?¶

DataKit's resource usage can be limited through mechanisms such as cgroup. For more information, see here. If DataKit is deployed in Kubernetes, see here.

What is DataKit's own observability?¶

During its operation, DataKit exposes many internal metrics. By default, DataKit collects these metrics using the built-in collector and reports them to the user's workspace.

In addition, DataKit also comes with a monitor command-line tool that allows users to view the current operational status as well as the collection and reporting status.

Host Installation¶

Get the Installation Command¶

Install DataKit lite¶

Install DataKit eBPF Span Linker Version¶

Install Specific Version¶

Additional Supported Environment Variable¶

Most Commonly Used Environment Variables¶

On DataKit's Own Log¶

On DataKit pprof¶

On DataKit Election¶

On HTTP/API Environment¶

On DCA¶

On External Collector¶

On Confd Configuration¶

On Git Configuration¶

WAL¶

On Sinker Configuration¶

On Resource Limit Configuration¶

APM Instrumentation¶

Other Installation Options¶

FAQ¶

How to Deal with the Unfriendly Host Name¶

Issue on macOS installation¶

Are there any high-risk operations on files and data in DataKit?¶

How does DataKit control its own resource consumption?¶

What is DataKit's own observability?¶

More Readings¶

Is this page helpful? ×

Host Installation¶

Register/log in to Guance¶

Get the Installation Command¶

Install DataKit lite¶

Install DataKit eBPF Span Linker Version¶

Install Specific Version¶

Additional Supported Environment Variable¶

Most Commonly Used Environment Variables¶

On DataKit's Own Log¶

On DataKit pprof¶

On DataKit Election¶

On HTTP/API Environment¶

On DCA¶

On External Collector¶

On Confd Configuration¶

On Git Configuration¶

WAL¶

On Sinker Configuration¶

On Resource Limit Configuration¶

APM Instrumentation¶

Other Installation Options¶

FAQ¶

How to Deal with the Unfriendly Host Name¶

Issue on macOS installation¶

Are there any high-risk operations on files and data in DataKit?¶

How does DataKit control its own resource consumption?¶

What is DataKit's own observability?¶

More Readings¶

Is this page helpful? ×