Skip to content

Guance VS ELK, EFK


Overview of ELK, EFK, and Guance

As software systems become increasingly complex, logs are typically generated by servers and output to different files. These can include system logs, application logs, and security logs. These logs are stored分散ly on different machines. When a system failure occurs, engineers need to log into various servers and use Linux scripting tools like grep, sed, awk to find the cause of the failure. Without a logging system, it is necessary first to identify the server handling the request. If multiple instances are deployed on this server, one needs to search for log files in each application instance's log directory. Each application instance also sets up a log rotation policy (e.g., generating one file per day) and has an archiving strategy for compressed logs. This series of procedures makes it difficult to troubleshoot and promptly find the cause of the issue.

When deploying on the cloud, logging into various nodes to check logs from different modules is almost impossible. Not only is it inefficient, but sometimes due to security reasons, engineers cannot directly access physical nodes. Moreover, large-scale software systems generally adopt a cluster deployment method, meaning that for each service, multiple identical PODs provide services externally. Each container generates its own logs. Simply from the generated logs, you can't tell which POD produced them, making distributed log viewing more challenging.

Therefore, if we can centrally manage these logs and provide centralized search functionality, it not only improves diagnostic efficiency but also provides a comprehensive understanding of the system situation, avoiding the passive state of firefighting after an incident.

ELK

So, what exactly is ELK? "ELK" is the acronym for three open-source projects: Elasticsearch, Logstash, and Kibana. Elasticsearch is a search and analytics engine. Logstash is a server-side data processing pipeline that can simultaneously collect data from multiple sources, transform data, and send it to storage repositories such as Elasticsearch. Kibana allows users to visualize data in Elasticsearch using graphs and charts.

Elasticsearch

Elasticsearch is a JSON-based distributed search and analytics engine. It can be accessed via RESTful Web service interfaces and uses JSON (JavaScript Object Notation) documents to store data. It is based on the Java programming language, allowing Elasticsearch to run on different platforms. Users can search vast amounts of data very quickly.

Key Features

  • Distributed real-time document storage with every field indexed and searchable
  • Distributed real-time analytical search engine
  • Can scale to hundreds of servers, handling petabytes of structured or unstructured data

image.png

Logstash

Logstash is an open-source stream processing tool for ETL (Extract, Transform, Load), capable of establishing data pipelines within minutes. It features horizontal scalability and resilience with adaptive buffering. It has an ecosystem of over 200 integrated and processor plugins, monitored and managed using the Elastic Stack.

Key Features

  • Accesses almost any type of data
  • Integrates with multiple external applications
  • Supports elastic scalability

Components of Logstash

  • Inputs: Inputs primarily provide rules for receiving data, such as collecting file content;

  • Filters: Filters mainly filter transmitted data, such as using grok rules for data filtering;

  • Outputs: Outputs mainly define output patterns for received data, such as sending data to Elasticsearch;

image.png

Kibana

Kibana is an open-source data analysis and visualization platform, part of the Elastic Stack designed to work with Elasticsearch. You can use Kibana to search, view, and interact with data indexed in Elasticsearch. You can easily analyze and present data in diverse formats using charts, tables, and maps.

Kibana makes big data understandable. It is simple, browser-based interface facilitates the quick creation and sharing of dynamic dashboards to track real-time changes in Elasticsearch data.

EFK

EFK is not a single software but a solution set. EFK stands for Elasticsearch, Fluentd, Kibana or Elasticsearch, Filebeat, Kibana. Elasticsearch handles log analysis and storage, Fluentd and Filebeat handle log collection, and Kibana handles interface display. They work together seamlessly, efficiently meeting many application requirements and forming a mainstream log analysis system solution.

Fluentd

Fluentd is an open-source data collector designed for handling data streams, using JSON as the data format. It adopts a plugin-based architecture with high scalability and availability, implementing reliable message forwarding. In use, we can send various information sources to Fluentd first, then forward the information to different destinations according to configuration through different plugins, such as files, SaaS platforms, databases, or even another Fluentd.

Key Features

  • Easy installation
  • Small footprint
  • Semi-structured log recording
  • Flexible plugin mechanism
  • Reliable buffering
  • Log forwarding

Components of Fluentd

Fluentd's Input/Buffer/Output structure is similar to Flume's Source/Channel/Sink.

  • Input: Input is responsible for receiving data or actively fetching data. Supports syslog, http, file tail, etc.

  • Buffer: Buffer ensures performance and reliability of data acquisition, with configurable file or memory buffers.

  • Output: Output is responsible for sending data to destinations such as files, AWS S3, or other Fluentd instances.

image.png

Filebeat

Filebeat is a lightweight log collector implemented in Golang and is part of the Elasticsearch stack. Essentially an agent, it can be installed on various nodes to read logs from specified locations and report them accordingly.

Filebeat is highly reliable, ensuring at least once delivery of logs. It also considers various issues during log collection, such as resuming reading from breakpoints, handling filename changes, and dealing with truncated logs.

Filebeat does not depend on Elasticsearch and can operate independently. We can use Filebeat for log reporting and collection separately. Filebeat comes with built-in output components like Kafka, Elasticsearch, Redis, and can output to console and file for debugging purposes. Existing output components can be used to report logs. Of course, custom output components can also be defined to forward logs to desired destinations.

Filebeat is part of the elastic/beats suite, along with HeartBeat, PacketBeat, all of which are implemented based on the libbeat framework.

Components of Filebeat

  • Harvester: The harvester’s main responsibility is to read the contents of a single file. It reads each file and sends the content to the output. Each file starts a harvester, which opens and closes the file, keeping file descriptors open during runtime. If a file is deleted or renamed while being read, Filebeat will continue reading the file.

  • Prospector: The prospector’s main responsibility is to manage harvesters and find all sources of files to read. If the input type is logs, the prospector looks for all files matching the path and starts a harvester for each file. Each prospector runs in its own Go coroutine.

Note: Filebeat prospector can only read local files and has no functionality to connect to remote hosts to read stored files or logs. Due to limited application scope, Filebeat is not extensively compared in this article.

image.png

Guance

DataKit

DataKit is a fundamental data collection tool running on user's local machines, mainly used for collecting various metrics and logs from system operations, aggregating them to Guance. In Guance, users can view and analyze their various metrics and logs. DataKit is a critical data collection component in Guance, where all data in Guance originates from DataKit.

  1. DataKit collects data periodically from various metrics and sends it to DataWay via HTTP(s) at regular intervals. Each DataKit is configured with a corresponding token to identify different users.

  2. After DataWay receives the data, it forwards it to Guance, attaching API signatures to the data sent to Guance.

  3. Guance receives legitimate data and writes it to different storages based on the data type.

For data collection business, partial data loss is allowed (since data is collected intermittently, data within the interval can be considered lost). Currently, the entire data transmission chain has the following loss protection:

  1. If DataKit fails to send data to DataWay due to network issues, it caches up to 1000 data points. When the cache exceeds this limit, it gets cleared.

  2. If DataWay fails to send data to Guance due to certain reasons or heavy traffic, it persists the data to disk. Once traffic decreases or the network recovers, DataWay sends the cached data to Guance. Delayed data does not affect timeliness as timestamps are attached to the cached data.

On DataWay, to protect disks, the maximum disk usage is configurable to prevent node storage from being overwhelmed. For exceeding usage, DataWay discards the data. However, this capacity is generally set to a larger value.

image.png

Components of DataKit

From top to bottom, DataKit internally consists of three main layers:

  • Top Layer: Includes entry modules and some common modules
  • Configuration Loading Module: Besides its main configuration (conf.d/datakit.conf), DataKit's configurations for individual collectors are separated. Combining them would result in a large configuration file, making editing inconvenient.
  • Service Management Module: Manages the entire DataKit service.
  • Toolchain Module: Apart from data collection, DataKit provides many peripheral functions, such as viewing documentation, restarting services, updating, etc., all implemented in the toolchain module.
  • Pipeline Module: In log processing, through Pipeline scripts (Grok syntax), non-structured log data is split and transformed into structured data. Similar data processing can occur for other types of non-log data.
  • Election Module: When deploying many DataKits, users can make all DataKit configurations identical and deploy them automatically in batches. The election module ensures that in a cluster, only one DataKit collects specific data (to avoid duplication and reduce pressure on the source). In clusters with identical configurations, the election module ensures that at most one DataKit collects data at any time.
  • Documentation Module: Documentation comes with DataKit. Users can access the documentation list at http://localhost:9529/man or browse it via command line.
  • Transmission Layer: Handles almost all data input and output
  • HTTP Service Module: DataKit supports third-party data ingestion, such as Telegraf/Prometheus. Currently, this data is ingested via HTTP.
  • IO Module: After each data collection, data is sent to the IO module. The IO module encapsulates unified data construction, processing, and sending interfaces, facilitating integration with data from various collectors. Additionally, the IO module sends data to DataWay via HTTP(s) at regular intervals.
  • Collection Layer: Responsible for collecting various types of data. Based on the collection type, it is divided into two categories:
  • Active Collection Type: Collectors gather data at fixed frequencies as configured, such as CPU, network card traffic, cloud dial testing, etc.
  • Passive Collection Type: These collectors usually collect data through external data inputs, such as RUM, Tracing, etc. They generally run outside DataKit and upload standardized data via DataKit's open data upload API to Guance

image.png

Guance Platform

Based on powerful data collection capabilities, “Guance” builds observability across infrastructure, containers, middleware, databases, message queues, application chains, frontend visits, system security, and network access performance. With Guance standard products, when users correctly configure DataKit collection, they can quickly achieve complete observability for their projects. Additionally, based on line protocol (Line Protocol) and Guance's scenario building capabilities, users can customize required observable metrics for further observability.

“Guance” as a comprehensive observability technology product inherently has many technical challenges. Compared to various open-source solutions, Guance emphasizes reducing user learning costs and enhancing ease-of-use from the start. Therefore, from the installation and deployment of DataKit to all configurable capabilities, “Guance” strives to lower user configuration difficulty, aligning with the habits of most programmers and operations engineers, while improving the usability and professionalism of the UI, allowing users to quickly understand the product's value.

Platform Comparison

One of Logstash's original advantages was that it was written in JRuby, allowing it to run on Windows;

Fluentd until recently supported Windows because it no longer relied on event libraries centered on Linux platforms. Fluentd now supports Windows. You can also use the in_windows_eventlog plugin to track Windows event logs;

DataKit is an officially provided data collector by Guance, with built-in scripts for multiple data sources, supporting multiple data inputs and compatibility with Windows, Linux, Mac operating systems, ARM, X86 system types, achieving full-platform compatibility for log collection.

Logstash

Linux and Windows

Fluentd

Linux and Windows

DataKit

Supports all platforms and offers visual management of configurations, significantly reducing installation, deployment, and configuration complexity.

Event Routing Comparison

For event routing configuration, Fluentd's method is more declarative, while Logstash's method is procedural. Developers trained in procedural programming may find Logstash's configuration easier to start with. Additionally, Fluentd's tag-based routing allows clear expression of complex routes. However, Guance can achieve event alerts and data browsing without relying on other products, ensuring data security and providing excellent user experience.

Logstash Event Routing

Logstash routes all data into one stream and uses if-then statements to send them to the expected destination. Here is an example of sending error events in production to PagerDuty:

output {
if [loglevel] == "ERROR" and [deployment] == "production" {
pagerduty {
...
}
}
}

Fluentd Event Routing

Fluentd relies on tags to route events. Each Fluentd event has a tag telling Fluentd where to route it. If sending error events in production to PagerDuty, the configuration is as follows:

<source>
  @type forward
</source>

<filter app.**>
  @type record_transformer
  <record>
    hostname "#{Socket.gethostname}"
  </record>
</filter>

<match app.**>
  @type file
  # ...
</match>

DataKit Proxy

As a powerful component of Guance, DataKit directly reports data to the Guance cloud platform for observation and analysis, eliminating the need for event routing like LogStash and Fluentd to send data to other tools for analysis and caching. To ensure user data security and allow DataKit to access the Internet in internal networks without Internet access, DataKit's proxy configuration is straightforward, requiring only enabling the proxy option. Simple configuration provides rich product features.

[[inputs.proxy]]
  ## default bind ip address
  bind = "0.0.0.0"
  ## default bind port
  port = 9530

Plugin Ecosystem Comparison

Logstash, Fluentd, and DataKit all have rich plugin ecosystems covering many input systems (files and TCP/UDP, etc.), filters (field splitting and filtering).

Logstash Plugins

Logstash manages all its plugins under its GitHub repo, with over 200 plugins for inputs, filters, and outputs, maintained by users without official support.

image.png

Fluentd Plugins

Fluentd includes eight types of plugins—input, parser, filter, output, formatter, storage, service discovery, and buffer—totaling over 500 plugins. Only ten plugins are officially maintained, with the rest maintained by users without official support and technology stack support.

image.png

DataKit Plugins

DataKit includes powerful built-in functionalities—dynamic Grok query debugging, fast data querying based on proprietary DQL, real-time input collection monitoring, edge computing capabilities, and visual client configuration and deployment of collection sources. It supports over 200 officially maintained data source inputs and technology stack support, compatible with Telegraf, Beats, Logstash, Fluentd, and other external data inputs. Users can visually manage plugins and agents and monitor data collection in real-time.

image.png

Queue Comparison

Logstash lacks persistent internal message queues: currently, Logstash has an in-memory queue that can hold 20 events (fixed size) and relies on external queues like Redis for persistence upon restart. Fluentd has configurable buffering systems that can be in-memory or on-disk, but configuring reliability is complex. DataKit has built-in caching mechanisms that can be adjusted with simple parameter changes to achieve data caching effects.

Logstash Queue

Due to the lack of built-in persistent message queues, Logstash's internal queue model is simple and requires external Redis queues for persistence.

Fluentd Queue

Compared to Logstash, Fluentd has built-in reliability but is complex to configure, increasing user learning costs.

DataKit Queue

DataKit has built-in caching mechanisms. When DataKit fails to send data to DataWay due to network issues, it caches up to 1000 data points to prevent data loss. Cache limits can be controlled via DataKit configuration files. Configuration is simple and easy to use, with almost zero learning cost.

Log Parsing Comparison

Log analysis is a core technology widely used in enterprise security teams, IT development teams, and business teams. Security teams extract log data for discovering unknown security incidents, tracing known incidents, and complying with national regulations. IT development teams use log analysis to discover and analyze known issues, focusing on system monitoring and APM (APM includes all monitoring items relevant to development teams). Business teams use log analysis for risk control, operational promotion, user profiling, and website profiling. Thus, log analysis adds value to log information, indirectly reflecting a company's technical strength.

Common Logstash parsers include Grok for parsing arbitrary text, mutate for transforming event fields, drop for deleting events, clone for duplicating events, and geoip for adding geographical information about IP addresses. Fluentd log parsing commonly involves filtering events based on one or more field values, enriching events by adding new fields, and protecting privacy and compliance by deleting or masking certain fields. However, there are fewer plugins, only five: record_transformer, filter_stdout, filter_grep, parser, and filter_geoip. DataKit log parsing includes Pipelines for splitting unstructured text data or extracting parts from structured text (like JSON), using glob rules to specify log files more easily, automatic discovery and file filtering, an interactive Grok matching tool to lower the Grok usage threshold, and support for numerous script functions to make data formatting more flexible.

Logstash Log Parsing

Grok is currently the best way in Logstash to parse unstructured log data into structured and queryable content. Logstash has 120 built-in Grok templates but is managed in the GitHub repo by users, lacking official support. Tuning Grok template performance and addressing business needs require user exploration.

image.png

Fluentd Log Parsing

Fluentd's log parsing is similar to Logstash but more flexible in configuration. However, it lacks built-in Grok templates and only provides configuration examples, requiring users to configure parsing functions autonomously based on documentation examples. Technical support for issues encountered during configuration is limited, often requiring users to seek help independently.

image.png

For a sample server environment, using Nginx's access log as an example, consider the following log line of 365 bytes, structured into 14 fields:

image.png

In the upcoming tests, the log will be repeatedly written to files under different pressures, with the time field of each log entry set to the current system time, and the other 13 fields remaining the same.

Compared to actual scenarios, simulated scenarios do not differ in log parsing; however, higher data compression rates reduce network write traffic.

Logstash

Using Logstash version 7.1.0, logs are parsed via Grok and written to Kafka (built-in plugin, with gzip compression enabled).

Log parsing configuration:

grok { 
patterns_dir=> "/home/admin/workspace/survey/logstash/patterns"

match=>{ "message"=>"%{IPORHOST:ip} %{USERNAME:rt} -
\"%{WORD:method} %{DATA:url}\" %{NUMBER:status} %{NUMBER:size} \"%{DATA:ref}\" \"%{DATA:agent}\" \"%{DATA:cookie_unb}\" \"%{DATA:cookie_cookie2}\" \"%{DATA:monitor_traceid}\" %{WORD:cell} %{WORD:ups} %{BASE10NUM:remote_port}" }

remove_field=>[ "message"]
}
Test results:

Write TPS Write Traffic (KB/s) CPU Usage (%) Memory Usage (MB)
500 178.89 25.3 432
1000 346.65 46.9 476
5000 1882.23 231.1 489
10000 3564.45 511.2 512

Fluentd

Using td-agent version 4.1.0, logs are parsed via regular expressions and written to Kafka (third-party plugin fluent-plugin-kafka, with gzip compression enabled).

Log parsing configuration:

<source>
type tail
format /^(? <ip>\S+)\s(?<rt>\d+)\s-\s\[(?<time>[^\]]*)\]\s"(?<url>[^\"]+)"\s(?<status>\d+)\s(?<size>\d+)\s"(?<ref>[^\"]+)"\s"(?<agent>[^\"]+)"\s"(?<cookie_unb>\d+)"\s"(?<cookie_cookie2>\w+)"\s"(?
<monitor_traceid>\w+)"\s(?<cell>\w+)\s(?<ups>\w+)\s(?<remote_port>\d+).*$/
time_format %d/%b/%Y:%H:%M:%S %z
path /home/admin/workspace/temp/mock_log/access.log 
pos_file /home/admin/workspace/temp/mock_log/nginx_access.pos
tag nginx.access 
</source>
Test results:

Write TPS Write Traffic (KB/s) CPU Usage (%) Memory Usage (MB)
500 174.272 13.8 58
1000 336.85 24.4 61
5000 1771.43 95.3 103
10000 3522.45 140.2 140

DataKit

DataKit-1.1.8-rc3, using Pipeline to split unstructured text data.

# access log
grok(_, "%{NOTSPACE:ip} %{NOTSPACE:rt} - "%{NOTSPACE:method} %{NOTSPACE:url}\" %{NOTSPACE:status} %{NOTSPACE:size} \"%{NOTSPACE:ref}\" \"%{NOTSPACE:agent}\" \"%{NOTSPACE:cookie_unb}\" \"%{NOTSPACE:cookie_cookie2}\" \"%{NOTSPACE:monitor_traceid}\" %{NOTSPACE:cell} %{NOTSPACE:ups} %{NOTSPACE:remote_port}")

cast(status_code, "int")
cast(bytes, "int")

default_time(time)
Test results:

Write TPS Write Traffic (KB/s) CPU Usage (%) Memory Usage (MB)
500 178.24 8.5 41
1000 356.45 13.8 45
5000 1782.23 71.1 76
10000 3522.45 101.2 88

Log Collection Architecture Comparison

ELK Solution

Solution One

image.png

This is the simplest ELK architecture. Its advantage is simplicity and ease of setup. Disadvantages include high resource consumption by Logstash, leading to high CPU and memory usage, and no message queue cache, posing a risk of data loss. Users must be proficient in Logstash, Elasticsearch, and Kibana to solve complex business problems and maintain the performance and resource management of the Logstash and Elasticsearch clusters.

In this architecture, Logstash distributed on various nodes collects related logs and data, analyzes and filters them, and sends them to a remote Elasticsearch server for storage. Elasticsearch compresses and stores the data in shards and provides APIs for user queries and operations. Users can configure Kibana Web to intuitively query logs and generate reports.

Solution Two

image.png

This is a relatively mature ELK architecture. Advantages include introducing Kafka to store data temporarily if the remote Logstash cluster stops due to faults, preventing data loss. Disadvantages include complex setup and a complex tech stack, making it harder to learn quickly. Additionally, maintaining a Kafka cluster (and possibly a Zookeeper cluster for large-scale scenarios) is required. Users must be proficient in Logstash, Elasticsearch, Kafka, and Kibana to solve complex business problems and maintain the performance and resource management of the Logstash, Kafka, and Elasticsearch clusters.

This architecture introduces a messaging queue. Logstash Agents on various nodes pass data/logs to Kafka (or Redis) and then indirectly to Logstash. After filtering and analyzing, Logstash passes the data to Elasticsearch for storage. Finally, Kibana presents the logs and data to users. Introducing Kafka (or Redis) prevents data loss if the remote Logstash server stops running.

EFK Solution

Solution One

image.png

This is a more flexible EFK architecture. Advantages include greater flexibility, less resource consumption compared to Logstash, and stronger scalability. Disadvantages include requiring a large Logstash cluster for log processing and needing proficiency in Logstash, Elasticsearch, and Kibana to solve complex business problems and maintain the performance and resource management of the Logstash and Elasticsearch clusters.

This architecture replaces the Logstash collection end with Filebeats and can configure Logstash and Elasticsearch clusters for large-scale system operation log monitoring and querying.

Solution Two

image.png

Building on ELK, this architecture uses Filebeat for log collection. Advantages include not requiring a Java environment on each server, as Logstash is based on Java. Disadvantages include a complex tech stack, making it harder to learn quickly. Maintaining a Kafka cluster (and possibly a Zookeeper cluster for large-scale scenarios) is required. Users must be proficient in FileBeats, Logstash, Elasticsearch, Kafka, and Kibana to solve complex business problems and maintain the performance and resource management of the Logstash, Kafka, and Elasticsearch clusters.

In this architecture, when the collection end gathers log files, in Filebeat's input, we define a field called log_topic to categorize log files from specified paths. In Output, we specify output to Kafka. Kafka acts as a message queue, receiving all logs collected by Filebeat clients and forwarding them by type (e.g., nginx, php, system). In Kafka, we create different topics based on the custom log type defined in the input. Logstash receives messages from Kafka, classifies logs based on different topics, and writes them to Elasticsearch. Kibana matches indexes in Elasticsearch to analyze, search, and visualize log content (users need to design visualizations themselves).

Solution Three

image.png

Using Fluentd for log collection. Advantages include Fluentd consuming far fewer resources than Logstash, simplifying the architecture. Disadvantages include complex configuration, higher learning curve, and complicated configuration files. Users must be proficient in Fluentd, Elasticsearch, and Kibana to solve complex business problems and maintain the performance and resource management of the Elasticsearch cluster.

In this architecture, Fluentd collects program logs, stores them in the Elasticsearch cluster, and finally associates them with Elasticsearch in Kibana for log querying.

Guance Architecture

image.png

DataKit serves as the basic data collection tool, primarily collecting various metrics and logs from system operations, aggregating them via Dataway to Guance. In Guance, users can view and analyze their various metrics and logs. DataKit is a crucial data collection component in Guance, with all Guance data originating from DataKit.

DataKit deployment and configuration are extremely simple and intuitive, with a visual client helping users manage DataKit. DataKit collects not only log data but also APM data, infrastructure, containers, middleware, network performance, etc. Unlike LogStash and Fluentd, DataKit does not rely on components like Elasticsearch or Kafka for functional supplementation. Guance eliminates the need for users to worry about these issues, allowing them to focus solely on business optimization. DataKit does not require users to master a complex tech stack, offering low learning costs and solving complex business problems with simple configurations. ELK and EFK have significant overall maintenance costs, especially concerning Elasticsearch clusters, which can be costly. Considering cold and hot data to save costs can be challenging. Using Guance frees users from these concerns, allowing them to concentrate on their business.

Hardware Cost Comparison

Price is a factor everyone cares about. We compare the costs between ELK, EFK, and Guance using cloud services.

ELK Costs

Elastic's basic components are open-source, with main costs coming from hardware. We calculate costs for collecting logs from 10 servers, with each server generating 1GB of logs daily.

LogStash Cluster + Kafka Cluster + Elasticsearch Cluster + Kibana

  • LogStash Cluster
Billing Item Value Unit Price Cost (CNY)
Server 1 x 2-core 4GB Monthly fee: 216.7 CNY/Month 216.7
Storage 50GB ESSD: 0.5 CNY/GB 25
Total 241.7
  • Kafka Cluster
Billing Item Value Unit Price Cost (CNY)
Server 3 x 4-core 16GB Monthly fee: 788 CNY/Month 2364
Storage 200GB ESSD: 0.5 CNY/GB 300
Total 2664
  • Elasticsearch Cluster
Billing Item Value Unit Price Cost (CNY)
Server 3 x 2-core 8GB Monthly fee: 383 CNY/Month 1149
Storage 500GB ESSD: 0.5 CNY/GB 750
Total 1899
  • Kibana Node
Billing Item Value Unit Price Cost (CNY)
Server 1 x 1-core 2GB Monthly fee: 104 CNY/Month 104
Storage 50GB ESSD: 0.5 CNY/GB 25
Total 129

Total monthly cost for LogStash + Kafka + Elasticsearch + Kibana is 5175.4 CNY.

Total monthly cost for a simple architecture without Kafka is 2511.4 CNY.

EFK Costs

Elastic's basic components are open-source, with main costs coming from hardware. We calculate costs for collecting logs from 10 servers, with each server generating 1GB of logs daily.

Fluentd + Elasticsearch Cluster + Kibana

  • Fluentd

Fluentd can be deployed without a separate cluster, so Fluentd costs are not considered, only calculating Elasticsearch + Kibana.

  • Elasticsearch Cluster
Billing Item Value Unit Price Cost (CNY)
Server 3 x 2-core 8GB Monthly fee: 383 CNY/Month 1149
Storage 500GB ESSD: 0.5 CNY/GB 750
Total 1899
  • Kibana Node
Billing Item Value Unit Price Cost (CNY)
Server 1 x 1-core 2GB Monthly fee: 104 CNY/Month 104
Storage 50GB ESSD: 0.5 CNY/GB 25
Total 129

Total monthly cost for Fluentd + Elasticsearch + Kibana is 2028 CNY.

Guance Costs

Guance does not charge for the product itself but charges based on storage usage. DataKit collectors, log data volume, backup log data volume, daily task scheduling times, daily session numbers for user access monitoring, trace counts for application performance monitoring, etc., are priced accordingly. Similarly, for collecting logs from 10 servers, with each server generating 1GB of logs daily.

Billing Item / Plan Free Plan Agile Plan
DataKit Quantity Unlimited 5 CNY/day
Time Series Total Time Series < 500 Single DataKit Time Series < 500, DataKit fee = DataKit quantity × base price
If Single DataKit Time Series > 500, DataKit quantity is calculated as:
- DataKit quantity = total workspace Time Series / 500 (rounding up)
- DataKit fee = DataKit quantity × base price
Log Data Volume 2 million entries 0.5 CNY/day (per 1 million entries)
Backup Log Data Volume None 0.2 CNY/day (per 1 million entries)
Trace Count 10,000 1 CNY/day (per 1 million entries)
Session Count / PV Count 100 Sessions 1 CNY/day (per 100 Sessions or per 1000 PVs, whichever is lower)
Cloud Dial Testing API Task Times 5 1 CNY/day (per 1000 times)
Cloud Dial Testing Browser Task Times 15 CNY/day (per 1000 times)
Task Scheduling Times 5000 times 1 CNY/day (per 10,000 times)
SMS Sending Times None 0.1 CNY/day (per time)

Installing DataKit on 10 servers, collecting logs at 1GB per server per day, assuming 4KB per log entry.

Billing Item Value Unit Price Cost (CNY)
Servers 10 DataKits Monthly fee: 150 CNY/Month 1500
Storage 1GB per day 0.5 CNY/day (per 1 million entries) 325
Total 1825

Similarly, the total monthly cost for Guance is 1825 CNY.

Maintenance Cost Comparison

Regarding maintenance costs, we know that cluster integrity maintenance is essential. Let's analyze the cost differences among different solutions.

ELK Maintenance Costs

Since Elastic's components are open-source, users need to build and manage their own clusters. For complex ELK architectures, including LogStash Cluster + Kafka Cluster + Elasticsearch Cluster + Kibana Nodes, if users have large volumes of business logs and complex computation logic, the scale and configuration requirementsfor LogStash, Kafka, and Elasticsearch clusters are high. Additionally, the skill requirements for maintenance personnel vary depending on the cluster scale. Large-scale clusters may encounter various issues, requiring sufficient experience for performance optimization. Maintenance personnel must also have a solid understanding of the technology stack.

EFK Maintenance Costs

Similar to ELK, EFK is based on open-source components, requiring users to build and manage their own clusters. The advantage of EFK is that using Fluentd as the collector reduces resource consumption compared to Logstash. However, it still requires dynamic scaling of the Elasticsearch cluster based on business volume, facing similar challenges in large-scale cluster management. Fluentd's configuration is more complex than Logstash, relying heavily on the experience of maintenance personnel. Poor script optimization can impact existing services on the server, necessitating higher expertise to ensure stable online operations.

Guance Maintenance Costs

Guance is a SaaS-based observability platform. Users only need to deploy DataKit on servers where data collection is required, enabling remote visual management for configuration. Guance provides optimal log parsing templates to help users achieve maximum performance with minimal server load through simple configurations. This allows users to focus on business optimization and expansion without worrying about optimizing the collection end or log parsing clusters. Additionally, Guance offers comprehensive observability from infrastructure, containers, middleware, databases, message queues, application chains, frontend visits, system security, and network performance, allowing users to implement their own observation scenarios based on business needs. This eliminates the need to research or modify immature open-source products, achieving true zero-maintenance costs and focusing entirely on business development.

Learning Cost Comparison

For setting up or using a log analysis system, learning costs are an essential part. To use these systems effectively, one must first understand them thoroughly. Let's compare the learning difficulty among different solutions.

ELK Learning Costs

Since Elastic components are open-source, users need to set up their own clusters. For using ELK to analyze logs, users must learn about environment preparation and component configuration of ELK. This includes gaining knowledge of Elasticsearch, mastering basic Elasticsearch operations, index management, cluster planning, and performance optimization for open-source versions of Logstash and Elasticsearch.

EFK Learning Costs

Similar to ELK, EFK requires learning these contents. Fluentd's configuration is more flexible than Logstash, increasing the learning curve. Performance optimization becomes more dependent on user experience. Compared to Logstash's 120+ Grok templates, Fluentd requires more learning from documentation to be effectively utilized.

Guance Learning Costs

For Guance, users only need to deploy DataKit in the environment for data collection. Officially provided configuration references and usage guides cover over 200 supported technology stacks. Users can perform log analysis, business observability, or trace tracking by learning the relevant modules of Guance and configuring DataKit items. This avoids the need to learn numerous technologies to ensure the operation of an open-source cluster, allowing users to focus more on business issues rather than spending time ensuring the cluster runs smoothly.

User Experience Comparison

Comparing user experiences is important. How do different solutions differ in terms of user perception when implementing the same functionality?

ELK User Experience

To use ELK for log analysis, users first need to set up Logstash, Kafka, Elasticsearch clusters, and Kibana display nodes. To collect and parse specific component data, users must check if there are usable templates among the existing 120+ parsing templates. If not, they may need extensive testing to achieve data collection and parsing. Performance issues during collection or slow parsing can complicate debugging. Increased log volumes can lead to frequent Elasticsearch query index optimizations and cluster expansions. Finally, displaying specific business metrics requires learning Kibana's KQL for data queries and displays. Implementing real-time tracking and alerts may require additional open-source components. Throughout the process, 80% of the time is spent resolving various issues with open-source components, leaving only a small portion for actual business analysis and optimization.

EFK User Experience

Using EFK for log analysis similarly involves setting up Elasticsearch clusters and Kibana display nodes. Configuring Fluentd might require more time due to its flexible configuration, making successful data collection and parsing more challenging. Using Elasticsearch clusters still requires index optimization and cluster expansion. Finally, users must learn Kibana for data queries and displays. Similarly, 80% of the time is spent resolving various issues with open-source components, leaving only a small portion for actual business analysis and optimization.

Guance User Experience

Using Guance for log analysis is much friendlier. First, installing and configuring DataKit is straightforward, requiring just one command. Second, Guance supports over 200 mainstream technology stacks for collecting component logs or data, providing comprehensive support from infrastructure, containers, middleware, databases, message queues, application chains, frontend visits, system security, and network performance. It also offers a complete documentation system, addressing all user needs through official documentation. The visual client for DataKit helps reduce usage difficulties. Additionally, Guance provides many official scenario views to better observe business health status.

Guance builds observability across infrastructure, containers, middleware, databases, message queues, application chains, frontend visits, system security, and network performance. Based on Guance's standard product, after correctly configuring DataKit, users can quickly achieve complete observability for their projects. Additionally, based on line protocol (Line Protocol) and Guance's scenario-building capabilities, users can customize required observable metrics for further observability.

Simple DataKit Installation

DataKit installation can be completed with just one command.

image.png

Convenient Collection Item Management

After enabling DataKit client access, users can modify collection items directly in the DataKit client. Built-in templates allow users to enable corresponding configurations based on desired data collection.

image.png

Rich Official Component Support

DataKit includes powerful built-in features such as dynamic Grok syntax query debugging, fast data querying based on proprietary DQL, real-time input collection monitoring, edge computing capabilities, and visual client configuration and deployment of collection sources. It supports over 200 officially maintained data source inputs and technology stack support, compatible with Telegraf, Beats, Logstash, Fluentd, and other external data inputs.

image.png

Stronger Product Capabilities

Guance builds observability across infrastructure, containers, middleware, databases, message queues, application chains, frontend visits, logs, system security, and network performance. Based on Guance's standard product, after correctly configuring DataKit, users can quickly achieve complete observability for their projects. Additionally, it supports multi-technology-stack anomaly detection libraries, providing more options for handling complex business issues.

image.png


In summary, Guance offers significant advantages over traditional ELK and EFK solutions in terms of cost, ease of use, scalability, and overall user experience. By leveraging Guance, users can achieve efficient and comprehensive observability with minimal effort, allowing them to focus on optimizing and growing their business.

Feedback

Is this page helpful? ×