Skip to content

Log Collection


Guance offers comprehensive log collection capabilities, primarily divided into host log collection and K8S container log collection. The installation methods for DataKit differ between the two, and the log collection methods also vary. The collected log data is uniformly aggregated to Guance for unified storage, search, and analysis, helping us quickly locate and resolve issues.

This article mainly introduces how to collect logs in a host environment. For log collection in a K8S environment, please refer to the best practice Several Methods for Log Collection in Kubernetes Clusters.

Prerequisites

Install DataKit.

Alternatively, you can log in to Guance, go to Integration > DataKit, and select Linux, Windows, or MacOS based on your host system to obtain the DataKit installation commands and steps.

Log Collector Configuration

After DataKit is installed, you can enable either standard log collection or custom log collection to gather log data from various sources such as system logs, application logs (e.g., Nginx, Redis, Docker, ES).

Navigate to the conf.d/samples directory under the DataKit installation directory, copy logging.conf.sample, and rename it to logging.conf for configuration. After configuration, restart DataKit for the changes to take effect.

For details, refer to Host Log Collection.

By enabling the standard log collectors supported by Guance, such as Nginx, Redis, ES, etc., you can start log collection with one click.

Note

When configuring a log collector, you need to enable the Pipeline function for logs to extract the time and status fields:

  • time: The generation time of the log. If the time field is not extracted or parsing fails, the system current time is used by default.
  • status: The level of the log. If the status field is not extracted, status is set to unknown by default.

For more details, refer to the documentation Pipeline Configuration and Usage.

Log Data Storage

After the log collector is configured, restart DataKit, and the log data will be uniformly reported to the Guance workspace.

  • For users with a large volume of log data, we can configure Log Index or Log Blacklist to save on data storage costs.
  • For users who need long-term log storage, we can use Log Backup to preserve log data.
Writing Data with Large Time Deviation

❓ When writing data points with timestamps that significantly deviate from the current time, it can harm the efficiency of the min-max index in the storage engine's data blocks. This means that even querying a small time range might require scanning a large number of data blocks, severely degrading query performance.

❗ To mitigate this issue, the system filters out data points whose timestamps deviate from the current time by more than 12 hours during the write process (only the timed-out data points are discarded, not the entire data packet). This mechanism helps maintain index effectiveness and improves query efficiency.

Kodo-X added configuration:

kodo-x.yaml: Add 3 parameters. enable_discard_expired_data: Whether to enable discarding data with large time deviation. Enabled by default. discard_expired_seconds: The criterion for judging large time deviation. Default value is 12 hours.

global:
   enable_discard_expired_data: true
   discard_expired_seconds: 12 * 3600

discard_data_type: The data types to discard. Default value:

DiscardDataType: map[string]bool{
                // "metering": true,
                // "TAE":      true,
                // "AE":       true,
                "B":  true,
                "CO": true,
                "D":  true,
                "E":  true,
                "EL": true,
                "L":  true,
                // "NM":       true,
                "N":  true,
                "OH": true,
                "O":  true,
                "P":  true,
                // "RM":       true,
                "R": true,
                "S": true,
                // "TM":       true,
                "T": true,
            },

Feedback

Is this page helpful? ×