Skip to content

Performance Benchmarks and Optimizations


Test Environment

  • OS: Ubuntu 22.04.1 LTS
  • CPU: 12th Gen Intel(R) Core(TM) i7-12700H
  • Memory: 40GB DDR5-4800
  • DataKit: v1.9.1

JSON Benchmark

Test data:

{
    "tcpSeq": "71234234923",
    "language": "C",
    "channel": "26",
    "check_bit": "dSa-aoHjw7b42-dcCE2Sc-aULcaeZav",
    "cid": "2139-02102-213-122341-1190",
    "address": "application/a.out_func_a_64475d386d3:0xfe21023",
    "time": 1681508212100,
    "sub_source": "N",
    "id": 508,
    "status": "ok",
    "cost": 101020 
}

Use the functions json() and load_json() functions to extract:

  • load_json
data = load_json(_)
add_key(tcpSeq, data["tcpSeq"])
add_key(language, data["language"])
add_key(channel, data["channel"])
add_key(check_bit, data["check_bit"])
add_key(cid, data["cid"])
add_key(address, data["address"])
add_key(time, data["time"])
add_key(sub_source, data["sub_source"])
add_key(id, data["id"])
add_key(status, data["status"])
add_key(cost, data["cost"])
  • json
json(_, tcpSeq, tcpSeq)
json(_, language, language)
json(_, channel, channel)
json(_, check_bit, check_bit)
json(_, cid, cid)
json(_, address, address)
json(_, time, time)
json(_, sub_source, sub_source)
json(_, id, id)
json(_, status, status)
json(_, cost, cost)

Benchmark results:

BenchmarkScript/load_json()-20            202762          5674 ns/op        2865 B/op         61 allocs/op
BenchmarkScript/json()-20                  31024         41463 ns/op       21144 B/op        455 allocs/op

As a result, using the load_json() function compared to the json() function greatly reduces the time spent on processing test data, and the running time of a single script is reduced from 41.46us to 5.67us.

Grok Benchmark

Take nginx access logs as test data:

192.168.158.20 - - [19/Jun/2021:04:04:58 +0000] "POST /baxrrrrqc.php?daxd=a%20&d=1 HTTP/1.1" 404 118 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; fr; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8"

Use the following two scripts to test. The test content is to compare the performance impact of different Grok modes on the grok function. The main optimization is to replace IPORHOST with NOTSPACE:

  • Script before optimization
grok(_, "%{IPORHOST:client_ip} %{NOTSPACE:http_ident} %{NOTSPACE:http_auth} \\[%{HTTPDATE:time}\\] \"%{DATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\" %{INT:status_code} %{INT:bytes}")

cast(status_code, "int")
cast(bytes, "int")

default_time(time)
  • Script after optimization
grok(_, "%{NOTSPACE:client_ip} %{NOTSPACE:http_ident} %{NOTSPACE:http_auth} \\[%{HTTPDATE:time}\\] \"%{DATA:http_method} %{GREEDYDATA:http_url} HTTP/%{NUMBER:http_version}\" %{INT:status_code} %{INT:bytes}")

cast(status_code, "int")
cast(bytes, "int")

default_time(time)

Benchmark results:

BenchmarkScript/grok_nginx-20            19292       67006 ns/op        3828 B/op         42 allocs/op
BenchmarkScript/grok_p1-20              139440        7860 ns/op        3665 B/op         42 allocs/op

As a result, after replacing the Grok mode IPORHOST with NOTSPACE, in the processing of test data, the execution time of a single script was reduced from 67us to 7.86us.

Feedback

Is this page helpful? ×