DataKit Performance of Log Collector
Environment and Tools¶
- Operating system: Ubuntu 20.04.2 LTS
- CPU:Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
- Memory: 16GB Speed 2133 MT/s
- DataKit:1.1.8-rc1
- Log text (nginx access log):
172.17.0.1 - - [06/Jan/2017:16:16:37 +0000] "GET /datadoghq/company?test=var1%20Pl HTTP/1.1" 401 612 "http://www.perdu.com/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36" "-"
- Number of logs: 10w lines
- Using Pipeline: See below
Test Result¶
Test Conditions | Time Consuming |
---|---|
No Pipeline, pure log text processing (including encoding, multi-line, field detection and so on) | 3.63 seconds |
Use the full Pipeline (see Appendix I) | 43.70 seconds |
Use a single matching Pipeline instead of multiple matching formats, such as nginx error log, as compared to the full version, for access log only (see Appendix II) | 16.91 seconds |
Replace the performance-intensive pattern with an optimized single matching Pipeline (see Appendix III) | 4.40 seconds |
Note:
Pipeline time-consuming period, the CPU single core runs at full load, the utilization rate continues at about 100%, and the CPU falls back when the 10w log processing is finished
Memory consumption was stable during the test, with no significant increase in usage
Time-consuming computation for DataKit programs, which may be biased in different environments
Comparison¶
Using Fluentd to capture the same 10w row log, CPU utilization increased from 43% to 77% in 3 seconds and then dropped, which can be predicted to be the end of processing.
As Fluentd has a metadata caching mechanism, which outputs results in batches, it is impossible to calculate exactly how much time it takes.
Fluentd's Pipeline matching pattern is single, and there is no Pipeline with multiple formats of the same data source (for example, nginx only supports access log but not error log).
Conclusion¶
In Pipeline single matching mode, DataKit log collection processing time is about 30% different from Fluentd.
However, if you use the full version of the full matching Pipeline, the time consumption increases dramatically.
Appendix¶
1, full version/full match Pipeline¶
add_pattern("date2", "%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}")
# access log
grok(_, "%{IPORHOST:client_ip} %{NOTSPACE:http_ident} %{NOTSPACE:http_auth} \\[%{HTTPDATE:time}\\] \"%{DATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\" %{INT:status_code} %{INT:bytes}")
# access log
add_pattern("access_common", "%{IPORHOST:client_ip} %{NOTSPACE:http_ident} %{NOTSPACE:http_auth} \\[%{HTTPDATE:time}\\] \"%{DATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\" %{INT:status_code} %{INT:bytes}")
grok(_, '%{access_common} "%{NOTSPACE:referrer}" "%{GREEDYDATA:agent}')
user_agent(agent)
# error log
grok(_, "%{date2:time} \\[%{LOGLEVEL:status}\\] %{GREEDYDATA:msg}, client: %{IPORHOST:client_ip}, server: %{IPORHOST:server}, request: \"%{DATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\", (upstream: \"%{GREEDYDATA:upstream}\", )?host: \"%{IPORHOST:ip_or_host}\"")
grok(_, "%{date2:time} \\[%{LOGLEVEL:status}\\] %{GREEDYDATA:msg}, client: %{IPORHOST:client_ip}, server: %{IPORHOST:server}, request: \"%{GREEDYDATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\", host: \"%{IPORHOST:ip_or_host}\"")
grok(_,"%{date2:time} \\[%{LOGLEVEL:status}\\] %{GREEDYDATA:msg}")
group_in(status, ["warn", "notice"], "warning")
group_in(status, ["error", "crit", "alert", "emerg"], "error")
cast(status_code, "int")
cast(bytes, "int")
group_between(status_code, [200,299], "OK", status)
group_between(status_code, [300,399], "notice", status)
group_between(status_code, [400,499], "warning", status)
group_between(status_code, [500,599], "error", status)
nullif(http_ident, "-")
nullif(http_auth, "-")
nullif(upstream, "")
default_time(time)
2, a single matching Pipeline¶
# access log
grok(_, "%{IPORHOST:client_ip} %{NOTSPACE:http_ident} %{NOTSPACE:http_auth} \\[%{HTTPDATE:time}\\] \"%{DATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\" %{INT:status_code} %{INT:bytes}")
cast(status_code, "int")
cast(bytes, "int")
default_time(time)