Skip to content

DataKit Election



When there is only one target for data collection (e.g., Kubernetes) in a cluster, but multiple DataKits are deployed in bulk with identical configurations and all of them have collection enabled for that central target, the election feature can be activated in DataKit to avoid duplicate data collection.

Currently, DataKit only supports a "self-election" mode. Under the same election namespace, one DataKit instance will be elected as the leader and take charge of all data collection tasks, while the other instances remain on standby.

The advantages and disadvantages of this mode are as follows:

  • Advantages: Simple configuration, with no need to deploy additional components.
  • Disadvantages: The elected DataKit leader bears a higher load, as all collection tasks are concentrated on this single instance, which may lead to a significant increase in its system resource usage.
Warning

Starting from DataKit Version-1.85.0, the collector task election feature has been removed. Correspondingly, DataKit-Operator v1.6.0 also no longer supports the election interface.

DataKit Self Election

Election Configuration

Edit conf.d/datakit.conf, and the election-related configuration is as follows:

[election]
  # Open the election
  enable = false

  # Set the namespace of the election (default)
  namespace = "default"

  # tag that allows election space to be appended to data
  enable_namespace_tag = false

  ## election.tags: Election-related global tags
  [election.tags]
    #  project = "my-project"
    #  cluster = "my-cluster"

You can configure the DataKit namespace if you want to separate elections for multiple DataKits, such as these 10 DataKits and the other 8 DataKits, without interfering with each other. DataKits in the same namespace participate in the same election.

After the election is opened, if enable_election_tag = true Version-1.4.7) is opened at the same time, tag: election_namespace = <your-namespace-name> is automatically added to the data collected by the election class.

After the election is opened in conf.d/datakit.conf, configure election = true in the collectors that need to participate in the election. (Currently, all collectors that support the election have election entries in their configuration files.)

Note: Collectors that support elections but are configured as election = false do not participate in elections, and their collection behavior and tag settings are not affected by elections; If `datakit.conf`` closes the election, but the collector opens the election, its collection behavior and tag setting are the same as those of closing the election.

See here

Viewing Election Status

After the election is configured, you can check the current election status of DataKit by viewing the monitor. In the Basic Info section, there will be a line like this:

Elected default::success|MacBook-Pro.local(elected: 4m40.554909s)

Here's what each part means:

  • default indicates the election-namespace in which the current DataKit participates in the election. A workspace can have multiple election-namespaces dedicated to elections.
  • success indicates that the current DataKit has election enabled and has been chosen as the leader.
  • MacBook-Pro.local shows the hostname of the DataKit that was elected in the current namespace. If this hostname is the same as the current DataKit, the duration for which it has been the leader will be displayed afterward (elected: 4m40.554909s) Version-1.5.8

If it is displayed as follows, it means that the current DataKit was not elected, but it will show which one was elected:

Elected default::defeat|host-abc

Here's the breakdown:

  • default indicates the namespace in which the current DataKit is participating in the election, as explained above.
  • defeat indicates that the current DataKit has election enabled but was not successful. In addition to this, there are several other possible statuses:

    • disabled: The election feature is not enabled.
    • success: The election was successfully completed.
    • banned: The election feature is enabled, but it is not on the whitelist allowed for election Version-1.35.0
  • host-abc shows the hostname of the DataKit that was elected in the current namespace.

Election Principle

Take MySQL as an example. In the same cluster (such as k8s cluster), suppose there are 10 DataKits, 2 MySQL instances, and all DataKits have elections turned on (in DaemonSet mode, the configuration of each DataKit is the same) and MySQL collector:

  • Once a DataKit is elected, all MySQL data collection (the same is true for other election types) will be collected by the DataKit, regardless of whether the collected objects are one or more, and the winner takes all. Other DataKits that are not selected are on standby.
  • Guance will test whether the currently leader DataKit is alive(via heartbeat). If it is abnormal, the DataKit will be kicked off forcibly, and one of other DataKits in standby state will replace it.
  • DataKit that does not open the election (it may not be in the current cluster). If MySQL collection is also configured, it will still collect MySQL data without election constraints.
  • The scope of the election is at the level of workspace + election-namespace . In a single workspace + election-namespace, only one DataKit can be selected as the leader at a time.
    • With regard to workspaces, in datakit.conf, it is represented by the token URL parameter in the DataWay address string, and each workspace has its corresponding token.
    • The namespace for the election, in datakit.conf, is represented by the namespace configuration item. Multiple namespaces from different DataKit can be configured within one workspace.

Election Class Collector's Global Tag Settings

Under the condition of conf.d/datakit.conf opening the election, all the data collected by the collector that opened the election will try to append the global-env-tag in datakit.conf:

[election]
  [election.tags]
    # project = "my-project"
    # cluster = "my-cluster"

If the original data has the corresponding tags, the tag in the original data will prevail and will not be overwritten here.

If the election is not turned on, the data collected by the election collector will be accompanied by the global_host_tags configured in datakit.conf (same as the non-election collector): Version-1.4.8.

[global_host_tags]
  ip         = "__datakit_ip"
  host       = "__datakit_hostname"

See here for the configuration of elections in Kubernetes and here for the setting of global tags.

Election Whitelist

Version-1.35.0

For host installations, the election whitelist is configured through the datakit.conf file:

[election]

  # election whitelist. If list empty, all host/node are allowed for election.
  node_whitelist = ["host-name-1", "host-name-2", "..."]

See here

Collection List Supporting Election

The list of collectors currently supporting elections is as follows:

In fact, there are more collectors that support elections, and this information may not be up-to-date. Please refer to the specific documentation of the collector for the most accurate information.

FAQ

host Field Problem

For objects collected by collectors participating in elections, such as MySQL, because the DataKit collecting their data may change (election rotation occurs), by default, the data collected by such collectors will not take the tag host to avoid timeline growth. We recommend adding an additional tags field to the MySQL collector configuration:

[inputs..tags]
  host = "real-mysql-instance-name"

This way, the host field configured in tags will continue to be used when the DataKit has an election rotation.

Feedback

Is this page helpful? ×