Skip to content

DataKit Performance Test of Log Collector


Environment and Tools

  • Operating system: Ubuntu 20.04.2 LTS
  • CPU:Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  • Memory: 16GB Speed 2133 MT/s
  • DataKit:1.1.8-rc1
  • Log text (nginx access log):
172.17.0.1 - - [06/Jan/2017:16:16:37 +0000] "GET /datadoghq/company?test=var1%20Pl HTTP/1.1" 401 612 "http://www.perdu.com/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36" "-"
  • Number of logs: 10w lines
  • Using Pipeline: See below

Test Result

Test Conditions Time Consuming
No Pipeline, pure log text processing (including encoding, multi-line, field detection and so on) 3.63 seconds
Use the full Pipeline (see Appendix I) 43.70 seconds
Use a single matching Pipeline instead of multiple matching formats, such as nginx error log, as compared to the full version, for access log only (see Appendix II) 16.91 seconds
Replace the performance-intensive pattern with an optimized single matching Pipeline (see Appendix III) 4.40 seconds

Note:

Pipeline time-consuming period, the CPU single core runs at full load, the utilization rate continues at about 100%, and the CPU falls back when the 10w log processing is finished

Memory consumption was stable during the test, with no significant increase in usage

Time-consuming computation for DataKit programs, which may be biased in different environments

Comparison

Using Fluentd to capture the same 10w row log, CPU utilization increased from 43% to 77% in 3 seconds and then dropped, which can be predicted to be the end of processing.

As Fluentd has a metadata caching mechanism, which outputs results in batches, it is impossible to calculate exactly how much time it takes.

Fluentd's Pipeline matching pattern is single, and there is no Pipeline with multiple formats of the same data source (for example, nginx only supports access log but not error log).

Conclusion

In Pipeline single matching mode, DataKit log collection processing time is about 30% different from Fluentd.

However, if you use the full version of the full matching Pipeline, the time consumption increases dramatically.

Appendix

1, full version/full match Pipeline

add_pattern("date2", "%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}")

# access log
grok(_, "%{IPORHOST:client_ip} %{NOTSPACE:http_ident} %{NOTSPACE:http_auth} \\[%{HTTPDATE:time}\\] \"%{DATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\" %{INT:status_code} %{INT:bytes}")

# access log
add_pattern("access_common", "%{IPORHOST:client_ip} %{NOTSPACE:http_ident} %{NOTSPACE:http_auth} \\[%{HTTPDATE:time}\\] \"%{DATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\" %{INT:status_code} %{INT:bytes}")
grok(_, '%{access_common} "%{NOTSPACE:referrer}" "%{GREEDYDATA:agent}')
user_agent(agent)

# error log
grok(_, "%{date2:time} \\[%{LOGLEVEL:status}\\] %{GREEDYDATA:msg}, client: %{IPORHOST:client_ip}, server: %{IPORHOST:server}, request: \"%{DATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\", (upstream: \"%{GREEDYDATA:upstream}\", )?host: \"%{IPORHOST:ip_or_host}\"")
grok(_, "%{date2:time} \\[%{LOGLEVEL:status}\\] %{GREEDYDATA:msg}, client: %{IPORHOST:client_ip}, server: %{IPORHOST:server}, request: \"%{GREEDYDATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\", host: \"%{IPORHOST:ip_or_host}\"")
grok(_,"%{date2:time} \\[%{LOGLEVEL:status}\\] %{GREEDYDATA:msg}")

group_in(status, ["warn", "notice"], "warning")
group_in(status, ["error", "crit", "alert", "emerg"], "error")

cast(status_code, "int")
cast(bytes, "int")

group_between(status_code, [200,299], "OK", status)
group_between(status_code, [300,399], "notice", status)
group_between(status_code, [400,499], "warning", status)
group_between(status_code, [500,599], "error", status)


nullif(http_ident, "-")
nullif(http_auth, "-")
nullif(upstream, "")
default_time(time)

2, a single matching Pipeline

# access log
grok(_, "%{IPORHOST:client_ip} %{NOTSPACE:http_ident} %{NOTSPACE:http_auth} \\[%{HTTPDATE:time}\\] \"%{DATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\" %{INT:status_code} %{INT:bytes}")

cast(status_code, "int")
cast(bytes, "int")

default_time(time)

3, the optimized single matching Pipeline

# access log
grok(_, "%{NOTSPACE:client_ip} %{NOTSPACE:http_ident} %{NOTSPACE:http_auth} \\[%{HTTPDATE:time}\\] \"%{DATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\" %{INT:status_code} %{INT:bytes}")

cast(status_code, "int")
cast(bytes, "int")

default_time(time)

Feedback

Is this page helpful? ×