Log Engine Capacity Planning¶
Before using Elasticsearch/OpenSearch as the Guance log/tracing storage backend, please evaluate the storage capacity and cluster specifications. This article provides an evaluation algorithm based on community content and actual tests.
Warning
- Due to changes in the Elasticsearch open-source license, we recommend using OpenSearch as a priority;
- It is suggested to first test the monitoring data generation rate with a smaller configuration of Elasticsearch/OpenSearch, and plan the cluster according to the test results.
Data Compression¶
The following are the compression test results for Elasticsearch using the official esrally dataset nyc_taxis, which is 74GB in size.
Codec Configuration | Compression Algorithm | Compressed Size/GB | Compression Ratio |
---|---|---|---|
default | LZ4 | 35.5 | 0.48 |
best_compression | DEFLATE | 26.4 | 0.36 |
The following are the compression test results for OpenSearch using the eventdata dataset provided by opensearch-benchmark, which is 15GB in size.
Codec Configuration | Compression Algorithm | Compressed Size/GB | Compression Ratio |
---|---|---|---|
default | LZ4 | 5.1 | 0.34 |
best_compression | DEFLATE | 3.7 | 0.25 |
Disk Capacity Estimation¶
Assume the following conditions:
- Daily generation of 【1TB】 monitoring source data
- Compression ratio is 【0.5】
- Each index has 【1】 replica
- Monitoring data retention for the last 【7】 days
Here, we refer to the Tencent Cloud Elasticsearch storage calculation formula.
Parameter | Reference Value |
---|---|
Data Expansion/Index Overhead | 10% |
Internal Task Overhead | 20% |
Operating System Reserved | 5% |
Safety Reserve | 15% |
According to Guance's data retention policy, we increase the safety reserve to 30%. The original formula is as follows.
Actual Space = Source Data x Compression Ratio × (1 + Number of Replicas) × (1 + Data Expansion) / (1 - Internal Task Overhead) / (1 - Operating System Reserved) x (1 + Reserved Space)
Substituting the reference values:
For retaining the most recent 7 days of data, the possible peak value would be:
Cluster Planning¶
Data Nodes¶
Below is the Elasticsearch cluster planning. The higher the single-node configuration, the larger the cluster scale it can support.
Single Node Specification | Maximum Number of Cluster Nodes | Maximum Disk Capacity per Node |
---|---|---|
2C4GB | 10 | 200GB |
2C8GB | 10 | 400GB |
4C16GB | 20 | 800GB |
8C32GB | 40 | 1.5TB |
16C64GB | 80 | 3TB |
If using a 【4C16GB】 node, based on the above capacity estimation, the recommended number of data nodes in the cluster is:
If using a 【8C32GB】 node, based on the above capacity estimation, the recommended number of data nodes in the cluster is:
If using a 【16C64GB】 node, based on the above capacity estimation, the recommended number of data nodes in the cluster is:
Coordinating Nodes¶
It is recommended to add independent coordinating nodes at a ratio of 1:5 (starting with at least 2), with a CPU:Memory ratio of 1:4 or 1:8. For example, for 10 8C32GB data nodes, it is recommended to configure 2 independent 8C32GB coordinating nodes. Management nodes should use a highly available deployment of 2n+1 to prevent cluster split-brain.