Managing DataKit Configuration with Git¶
This article explains how to use Git to manage DataKit configurations, including collection configurations and Pipeline scripts. By maintaining a local or remote Git repository, you can manage changes to DataKit's configurations while leveraging Git's version control features to track historical changes.
Operating Mechanism¶
DataKit integrates Git client functionality, regularly (by default every 1 minute) pulling the latest configuration data from the Git repository. By loading these up-to-date configurations, DataKit achieves configuration updates.
Usage Example¶
The complete steps for the usage example are as follows:
- Create a Git repository
- Plan the repository's configuration according to established directory rules
- Push the configuration to the Git repository
- Add the Git repository to the main DataKit configuration
- Restart DataKit
Note
The Git repository does not have to be created in this order. For example, you can first create a remote repository address and then clone it to make changes. The following example creates a local Git repository first, then pushes it to a remote repository.
Create a Git Repository¶
First, create a local Git repository:
Directory Planning¶
Create various basic directories:
mkdir -p conf.d && touch conf.d/.gitkeep
mkdir -p pipeline && touch pipeline/.gitkeep
mkdir -p python.d && touch python.d/.gitkeep
Push Configuration¶
Use common Git commands to push configuration changes to the repository:
# cd your/path/to/repo
git add conf.d pipeline python.d
# Add any conf or pipeline to path conf.d/pipeline/python.d...
git commit -m "init datakit repo"
# Push the repo to YOUR GitHub (ssh or https)
git remote add origin ssh://git@github.com/PATH/TO/datakit-confs.git
git push origin --all
Configure the Repository in DataKit¶
Enable the git_repos feature in the datakit.conf configuration file, locate git_repos
, as shown below:
[[git_repos.repo]]
enable = true # Enable the repo
###########################################
# Git support http/git/ssh authentication
###########################################
url = "http://username:password@github.com/PATH/TO/datakit-confs.git"
branch = "master" # Specify which branch to pull
# git/ssh authentication requires key-path key-password configuration
# url = "git@github.com:PATH/TO/datakit-confs.git"
# url = "ssh://git@github.com/PATH/TO/datakit-confs.git"
# ssh_private_key_path = "/Users/username/.ssh/id_rsa"
# ssh_private_key_password = "<YOUR-PASSWORD>"
If the password contains special characters, refer to here.
Restart DataKit¶
After the configuration is complete, restart Datakit. After a short wait, you can check the status of the collectors through Datakit Monitor.
Git Usage in Kubernetes¶
Refer to here.
FAQ¶
Error: authentication required¶
This error may occur in the following situations.
If using SSH, it is generally because the provided key is incorrect. If using HTTP, it may be due to:
- Incorrect username and password provided
- The protocol for the git address is filled in incorrectly
For example, the original address is
And it was written as
That is, https
was changed to http
, which will also result in this error. Change http
to https
here to resolve it.
Repository Directory Constraints¶
The Git repository must be stored with the following directory structure for various configurations:
+── conf.d #
├── pipeline # Dedicated to storing pipeline scripts
└── python.d # Store python scripts
Among them:
- conf.d is dedicated to storing collector configurations, and its subdirectories can be planned arbitrarily (subdirectories are allowed), any collector configuration file just needs to end with
.conf
- pipeline is used to store Pipeline scripts, and it is recommended to plan Pipeline scripts according to data type
- python.d is used to store Python scripts
Here is an example of DataKit's directory structure after Git synchronization is enabled:
DataKit root directory
├── conf.d # Default main configuration directory
├── pipeline # Top-level Pipeline scripts
├── python.d # Top-level python scripts
└── gitrepos
├── repo-1 # Repository 1
│ ├── conf.d # Dedicated to storing collector configurations
│ ├── pipeline # Dedicated to storing pipeline scripts
│ └── python.d # Store python scripts
└── repo-2 # Repository 2
├── ...
Git Configuration Loading Mechanism¶
After Git synchronization is enabled, the configuration (.conf/pipeline) priority is defined as follows:
- All collector configurations are loaded from the gitrepos directory
- The order of Git repository loading is based on the order in which they appear in datakit.conf
- For Pipeline, the first found Pipeline file is used. As shown in the example, when looking for nginx.p, if found in
repo-1
, it will not look inrepo-2
. If neither of these repositories has nginx.p, then look in the top-level Pipeline directory. The search mechanism for Python is the same.
Attention
After enabling the remote Pipeline feature, the first loaded Pipeline is the one synchronized from the center.
After enabling Git synchronization, the original collector configurations in the conf.d directory will no longer be effective. In addition, the main configuration datakit.conf cannot be managed through Git.