Skip to content

Host Installation


This article describes the basic installation of DataKit.

Register/log in to Guance Cloud

The browser visits the Guance Cloud registration portal, fills in the corresponding information, and then logs in to Guance Cloud.

Get the Installation Command

Log in to the workspace, click "Integration" on the left and select "Datakit" at the top, and you can see the installation commands of various platforms.

Note that the following Linux/Mac/Windows installer can automatically identify the hardware platform (arm/x86, 32bit/64bit) without making a hardware platform selection.

The installation command supports bash and ash( Version-1.14.0), and the command is roughly as follows:

  • bash
DK_DATAWAY=https://openway.guance.com?token=<TOKEN> bash -c "$(curl -L https://static.guance.com/datakit/install.sh)" 
  • ash
DK_DATAWAY=https://openway.guance.com?token=<TOKEN> ash -c "$(curl -L https://static.guance.com/datakit/install.sh)"

After the installation is completed, you will see a prompt that the installation is successful at the terminal.

Installation on Windows requires a Powershell command line installation and must run Powershell as an administrator. Press the Windows key, enter powershell to see the pop-up powershell icon, and right-click and select "Run as an administrator".

Remove-Item -ErrorAction SilentlyContinue Env:DK_*;
$env:DK_DATAWAY="https://openway.guance.com?token=<TOKEN>";
Set-ExecutionPolicy Bypass -scope Process -Force;
Import-Module bitstransfer;
start-bitstransfer  -source https://static.guance.com/datakit/install.ps1 -destination .install.ps1;
powershell ./.install.ps1;

Install DataKit lite

You can specify the environment variable DK_LITE to install DataKit lite ( Version-1.14.0):

DK_DATAWAY=https://openway.guance.com?token=<TOKEN> DK_LITE=1 bash -c "$(curl -L https://static.guance.com/datakit/install.sh)"
Remove-Item -ErrorAction SilentlyContinue Env:DK_*;
$env:DK_DATAWAY="https://openway.guance.com?token=<TOKEN>";
$env:DK_LITE="1";
Set-ExecutionPolicy Bypass -scope Process -Force;
Import-Module bitstransfer;
start-bitstransfer  -source https://static.guance.com/datakit/install.ps1 -destination .install.ps1;
powershell ./.install.ps1;

DataKit lite only contains collectors as below:

Collector Name Description
cpu Collect the CPU usage of the host
disk Collect disk occupancy
diskio Collect the disk IO status of the host
mem Collect the memory usage of the host
swap Collect Swap memory usage
system Collect the load of host operating system
net Collect host network traffic
host_processes Collect the list of resident (surviving for more than 10min) processes on the host
hostobject Collect basic information of host computer (such as operating system information, hardware information, etc.)
DataKit(dk) Collect Datakit running metrics
RUM(rum) Collect user access monitoring data
Net dialtesting(dialtesting) Collect the data generated by dialing test
Prom (prom) Collect data exposed by Prometheus Exporters
logging Collect file log data

Install Specific Version

We can install specific DataKit version, for example 1.2.3:

DK_DATAWAY=https://openway.guance.com?token=<TOKEN> bash -c "$(curl -L https://static.guance.com/datakit/install-1.2.3.sh)"

And the same as Windows:

Remove-Item -ErrorAction SilentlyContinue Env:DK_*;
$env:DK_DATAWAY="https://openway.guance.com?token=<TOKEN>";
Set-ExecutionPolicy Bypass -scope Process -Force;
Import-Module bitstransfer;
start-bitstransfer  -source https://static.guance.com/datakit/install-1.2.3.ps1 -destination .install.ps1;
powershell ./.install.ps1;

Additional Supported Installation Variable

If you need to define some DataKit configuration during the installation phase, you can add environment variables to the installation command, just append them before DK_DATAWAY For example, append the DK_NAMESPACE setting:

DK_DATAWAY=https://openway.guance.com?token=<TOKEN> DK_NAMESPACE=<namespace> bash -c "$(curl -L https://static.guance.com/datakit/install.sh)"
Remove-Item -ErrorAction SilentlyContinue Env:DK_*;
$env:DK_DATAWAY="https://openway.guance.com?token=<TOKEN>";
$env:DK_NAMESPACE="<namespace>";
Set-ExecutionPolicy Bypass -scope Process -Force;
Import-Module bitstransfer;
start-bitstransfer  -source https://static.guance.com/datakit/install.ps1 -destination .install.ps1;
powershell ./.install.ps1;

The setting format of the two environment variables is:

# Windows: Multiple environment variables are divided by semicolons
$env:NAME1="value1"; $env:Name2="value2"

# Linux/Mac: Multiple environment variables are divided by spaces
NAME1="value1" NAME2="value2"

The environment variables supported by the installation script are as follows (supported by the whole platform).

Attention

These environment variable settings are not supported for full offline installation. However, these environment variables can be set by proxy and setting local installation address.

Most Commonly Used Environment Variables

  • DK_DATAWAY: Specify the DataWay address, and the DataKit installation command has been brought by default
  • DK_GLOBAL_TAGS: Deprecated, DK_GLOBAL_HOST_TAGS instead
  • DK_GLOBAL_HOST_TAGS: Support the installation phase to fill in the global host tag, format example: host=__datakit_hostname,host_ip=__datakit_ip (multiple tags are separated by English commas)
  • DK_GLOBAL_ELECTION_TAGS: Support filling in the global election tag during the installation phase,format example: project=my-porject,cluster=my-cluster (support filling in the global election tag during the installation phase)
  • DK_DEF_INPUTS: List of collector names opened by default, format example: cpu,mem,disk. We can also ban some default inputs by putting a - prefix at input name, such as -cpu,-mem,-disk. But if mixed them, such as cpu,mem,-disk,-system, we only accept the banned list, the effect is only disk and system disabled, but others enabled.
  • DK_CLOUD_PROVIDER: Support filling in cloud vendors during installation (Currently support following clouds aliyun/aws/tencent/hwcloud/azure). Deprecated: Datakit can infer cloud type automatically.
  • DK_USER_NAME:Datakit service running user name. Default is root. More details is in Attention below.
  • DK_LITE: When installing the simplified DataKit, you can set this variable to 1. ( Version-1.14.0)
Disable all default inputs Version-1.5.5

We can set DK_DEF_INPUTS to - to disable all default inputs:

DK_DEF_INPUTS="-" \
DK_DATAWAY=https://openway.guance.com?token=<TOKEN> \
bash -c "$(curl -L https://static.guance.com/datakit/install.sh)"

Beside, if Datakit has been installed before, we must delete all default inputs .conf files manually. During installing, Datakit able to add new inputs configure, not cant delete them.

Attention

For privilege reason, using DK_USER_NAME with not root name could cause following collector unavailable:

In addition, the following items need to be noted.

  • Manually create user and group first, then start install. There are difference between Linux distribution releases, below commands are for reference:

    groupadd --system datakit
    
    adduser --system --no-create-home datakit -g datakit
    
    usermod -s /sbin/nologin datakit
    
    groupadd --system datakit
    
    adduser --system --no-create-home datakit
    
    usermod -a -G datakit datakit
    
    usermod -s /usr/sbin/nologin datakit
    
    groupadd --system datakit
    
    adduser --system --no-create-home datakit
    
    usermod -a -G datakit datakit
    
    usermod -s /bin/false datakit
    
    DK_USER_NAME="datakit" DK_DATAWAY="..." bash -c ...
    

On DataKit's Own Log

  • DK_LOG_LEVEL: Optional info/debug
  • DK_LOG: If changed to stdout, the log will not be written to the file, but will be output by the terminal.
  • DK_GIN_LOG: If changed to stdout, the log will not be written to the file, but will be output by the terminal.

On DataKit pprof

  • DK_ENABLE_PPROF(deprecated): whether to turn on pprof
  • DK_PPROF_LISTEN: pprof service listening address

Version-1.9.2 enabled pprof by default.

On DataKit Election

  • DK_ENABLE_ELECTION: Open the election, not by default. If you need to open it, give any non-empty string value to the environment variable. (eg True/False)
  • DK_NAMESPACE: Supports namespaces specified during installation (for election)

On HTTP/API Environment

  • DK_HTTP_LISTEN: Support the installation-stage specified DataKit HTTP service binding network card (default localhost)
  • DK_HTTP_PORT: Support specifying the port of the DataKit HTTP service binding during installation (default 9529)
  • DK_RUM_ORIGIN_IP_HEADER: RUM-specific
  • DK_DISABLE_404PAGE: Disable the DataKit 404 page (commonly used when deploying DataKit RUM on the public network. Such as True/False)
  • DK_INSTALL_IPDB: Specify the IP library at installation time (currently only iploc and geolite2 is supported)
  • DK_UPGRADE_IP_WHITELIST: Starting from Datakit 1.5.9, we can upgrade Datakit by access remote http API. This environment variable is used to set the IP whitelist of clients that can be accessed remotely(multiple IPs could be separated by commas ,). Access outside the whitelist will be denied (default not restricted).
  • DK_HTTP_PUBLIC_APIS: Specify which Datakit HTTP APIs can be accessed by remote, generally config combined with RUM input,support from Datakit 1.9.2.

On DCA

  • DK_DCA_ENABLE: Support DCA service to be turned on during installation (not turned on by default)
  • DK_DCA_LISTEN: Support custom configuration of DCA service listening addresses and ports during installation (default 0.0.0.0:9531
  • DK_DCA_WHITE_LIST: Support setup of DCA service access whitelist, multiple whitelists split (e.g. 192.168.0.1/24,10.10.0.1/24)

On External Collector

  • DK_INSTALL_EXTERNALS: Used to install external collectors not packaged with DataKit

On Confd Configuration

Environment Variable Name Type Applicable Scenario Description Sample Value
DK_CONFD_BACKEND string All Backend Source Type etcdv3, zookeeper, redis or consul
DK_CONFD_BASIC_AUTH string etcdv3, consul Optional
DK_CONFD_CLIENT_CA_KEYS string etcdv3, consul Optional
DK_CONFD_CLIENT_CERT string etcdv3, consul Optional
DK_CONFD_CLIENT_KEY string etcdv3, consul or redis Optional
DK_CONFD_BACKEND_NODES string All Backend Source Address [IP 地址:2379,IP address 2:2379]
DK_CONFD_PASSWORD string etcdv3, consul Optional
DK_CONFD_SCHEME string etcdv3, consul Optional
DK_CONFD_SEPARATOR string redis Optional default 0
DK_CONFD_USERNAME string etcdv3, consul Optional

On Git Configuration

  • DK_GIT_URL: The remote git repo address for managing configuration files. (e.g. http://username:password@github.com/username/repository.git)
  • DK_GIT_KEY_PATH: The full path of the local PrivateKey. (e.g. /Users/username/.ssh/id_rsa)
  • DK_GIT_KEY_PW: The password to use the local PrivateKey. (e.g. passwd)
  • DK_GIT_BRANCH: Specify the branch to pull. If it is empty, it is the default, and the default is the remotely specified main branch, which is usually master.
  • DK_GIT_INTERVAL: The interval of the timed pull. (e.g. 1m)

On Sinker Configuration

DK_SINKER_GLOBAL_CUSTOMER_KEYS used to setup sinker tag/field keys, here is the example:

DK_DATAWAY=https://openway.guance.com?token=<TOKEN> DK_DATAWAY_ENABLE_SINKER=on DK_SINKER_GLOBAL_CUSTOMER_KEYS=key1,key2 bash -c "$(curl -L https://static.guance.com/datakit/install.sh)"
Remove-Item -ErrorAction SilentlyContinue Env:DK_*;
$env:DK_DATAWAY="https://openway.guance.com?token=<TOKEN>";
$env:DK_DATAWAY_ENABLE_SINKER="on";
$env:DK_SINKER_GLOBAL_CUSTOMER_KEYS="key1,key2";
Set-ExecutionPolicy Bypass -scope Process -Force;
Import-Module bitstransfer;
start-bitstransfer  -source https://static.guance.com/datakit/install.ps1 -destination .install.ps1;
powershell ./.install.ps1;

On Resource Limit Configuration

Only Linux and Windows ( Version-1.15.0) operating system are supported.

  • DK_LIMIT_DISABLED: Turn off Resource limit function (on by default)
  • DK_LIMIT_CPUMAX: Maximum CPU power, default 30.0
  • DK_LIMIT_MEMMAX: Limit memory (including swap), default 4096 (4GB)

Other Installation Options

Environment Variable Name Sample Description
DK_INSTALL_ONLY on Install only, not run
DK_HOSTNAME some-host-name Support custom configuration hostname during installation
DK_UPGRADE 1 Upgrade to the latest version (Note: Once this option is turned on, all other options except DK_UPGRADE_MANAGER are invalid)
DK_UPGRADE_MANAGER on Whether we upgrade the Remote Upgrade Service when upgrading Datakit, it's used in conjunction with DK_UPGRADE, supported start from 1.5.9
DK_INSTALLER_BASE_URL https://your-url You can choose the installation script for different environments, default to https://static.guance.com/datakit
DK_PROXY_TYPE - Proxy type. The options are: datakit or nginx, both lowercase
DK_NGINX_IP - Proxy server IP address (only need to fill in IP but not port). With the highest priority, this is mutually exclusive with the above "HTTP_PROXY" and "HTTPS_PROXY" and will override both.
DK_INSTALL_LOG - Set the setup log path, default to install.log in the current directory, if set to stdout, output to the command line terminal.
HTTPS_PROXY IP:Port Installed through the Datakit agent
DK_INSTALL_RUM_SYMBOL_TOOLS on Install source map tools for RUM, support from Datakit 1.9.2.
DK_VERBOSE on Enable more verbose info during install(only for Linux/Mac) Version-1.19.0

FAQ

How to Deal with the Unfriendly Host Name

Because DataKit uses Hostname as the basis for data concatenation, in some cases, some host names are not very friendly, such as iZbp141ahn...., but for some reasons, these host names cannot be modified, which brings some troubles to use. In DataKit, this unfriendly host name can be overwritten in the main configuration.

In datakit.conf, modify the following configuration and the DataKit will read ENV_HOSTNAME to overwrite the current real hostname:

[environments]
    ENV_HOSTNAME = "your-fake-hostname-for-datakit"

Note: If a host has collected data for a period of time, after changing the host name, the historical data will no longer be associated with the new host name. Changing the host name is equivalent to adding a brand-new host.

Issue on macOS installation

If it appears during the installation/upgrade process when installing on macOS:

"launchctl" failed with stderr: /Library/LaunchDaemons/cn.dataflux.datakit.plist: Service is disabled
# or
"launchctl" failed with stderr: /Library/LaunchDaemons/com.guance.datakit.plist: Service is disabled

Execute:

sudo launchctl enable system/datakit

Then execute the following command:

sudo launchctl load -w /Library/LaunchDaemons/cn.dataflux.datakit.plist
# or
sudo launchctl load -w /Library/LaunchDaemons/com.guance.datakit.plist

More Readings

Feedback

Is this page helpful? ×