DataKit Pipeline Offload¶
You can use DataKit's Pipeline Offload function to reduce high data latency and high host load caused by data processing.
Configuration Method¶
It needs to be configured and enabled in the datakit.conf
main configuration file. See below for the configuration. Currently supported targets receiver
are datakit-http
and ploffload
, which allows multiple DataKit
addresses to be configured to achieve load balancing.
Notice:
- Currently only supports unloading logging (
Logging
) category data processing tasks; - The address of the current
DataKit
cannot be filled in theaddresses
configuration item, otherwise a loop will be formed, causing the data to always be in the currentDataKit
; - Please make the
DataWay
configuration of the targetDataKit
consistent with the currentDataKit
, otherwise the data recipient sends to itsDataWay
address; - If
receiver
is configured asploffload
, the DataKit on the receiving end needs to have theploffload
collector enabled.
Please check whether the target network address is locally accessible. The target cannot be reached if it is listening on the loopback address.
Reference configuration:
[pipeline]
# Offload data processing tasks to post-level data processors.
[pipeline.offload]
receiver = "datakit-http"
addresses = [
# "http://<ip>:<port>"
]
If the receiving end DataKit turns on the ploffload
collector, it can be configured as:
[pipeline]
# Offload data processing tasks to post-level data processors.
[pipeline.offload]
receiver = "ploffload"
addresses = [
# "http://<ip>:<port>"
]
Working Principle¶
After DataKit
finds the Pipeline
data processing script, it will judge whether it is a remote script from GuanceCloud, and if so, forward the data to the post-level data processor for processing (such as DataKit
). The load balancing method is round robin.
Deploy post-level data processor¶
There are several ways to deploy the data processor (DataKit) for receiving computing tasks:
- host deployment
DataKit dedicated to data processing is not currently supported; host deployment DataKit see documentation
- container deployment
The environment variables ENV_DATAWAY
and ENV_HTTP_LISTEN
need to be set, and the DataWay address needs to be consistent with the DataKit configured with the Pipeline Offload function; it is recommended to map the listening port of the DataKit running in the container to the host.
Reference command: