Built-in Function¶
Function parameter description:
- In function arguments, the anonymous argument (
_
) refers to the original input text data - JSON path, expressed directly as
x.y.z
, without any other modifications. For example,{"a":{"first":2.3, "second":2, "third":"abc", "forth":true}, "age":47}
, where the JSON path isa.thrid
to indicate that the data to be manipulated isabc
- The relative order of all function arguments is fixed, and the engine will check it concretely
- All of the
key
parameters mentioned below refer to thekey
generated after the initial extraction (viagrok()
orjson()
) - The path of the JSON to be processed, supports the writing of identifiers, and cannot use strings. If you are generating new keys, you need to use strings
Function List¶
add_key()
¶
Function prototype: fn add_key(key, value)
Function description: Add a key to point
Function parameters:
key
: key namevalue
: key value
Example:
# input: {"age": 17, "name": "zhangsan", "height": 180}
# script
add_key(city, "shanghai")
# result
{
"age": 17,
"height": 180,
"name": "zhangsan",
"city": "shanghai"
}
add_pattern()
¶
Function prototype: fn add_pattern(name: str, pattern: str)
Function description: Create custom grok patterns. The grok pattern has scope restrictions, such as a new scope will be generated in the if else statement, and the pattern is only valid within this scope. This function cannot overwrite existing grok patterns in the same scope or in the previous scope
Function parameters:
name
: pattern namingpattern
: custom pattern content
Example:
# input data: "11,abc,end1", "22,abc,end1", "33,abc,end3"
# script
add_pattern("aa", "\\d{2}")
grok(_, "%{aa:aa}")
if false {
} else {
add_pattern("bb", "[a-z]{3}")
if aa == "11" {
add_pattern("cc", "end1")
grok(_, "%{aa:aa},%{bb:bb},%{cc:cc}")
} elif aa == "22" {
# Using pattern cc here will cause compilation failure: no pattern found for %{cc}
grok(_, "%{aa:aa},%{bb:bb},%{INT:cc}")
} elif aa == "33" {
add_pattern("bb", "[\\d]{5}") # Overwriting bb here fails
add_pattern("cc", "end3")
grok(_, "%{aa:aa},%{bb:bb},%{cc:cc}")
}
}
# result
{
"aa": "11"
"bb": "abc"
"cc": "end1"
"message": "11,abc,end1"
}
{
"aa": "22"
"message": "22,abc,end1"
}
{
"aa": "33"
"bb": "abc"
"cc": "end3"
"message": "33,abc,end3"
}
adjust_timezone()
¶
Function prototype: fn adjust_timezone(key: int, minute: int)
Function parameters:
key
: Nanosecond timestamp, such as the timestamp obtained by thedefault_time(time)
functionminute
: The return value allows the number of minutes (integer) beyond the current time, the value range is [0, 15], the default value is 2 minutes
Function description: Make the difference between the incoming timestamp minus the timestamp of the function execution time within (-60+minute, minute] minutes; it is not applicable to data whose time difference exceeds this range, otherwise it will result in wrong data being obtained. Calculation process:
- Add hours to the value of key to make it within the current hour
- At this time, calculate the difference between the two minutes. The value range of the two minutes is [0, 60), and the difference range is between (-60,0] and [0, 60)
- If the difference is less than or equal to -60 + minute, add 1 hour, and if the difference is greater than minute, subtract 1 hour
- The default value of minute is 2, and the range of the difference is allowed to be (-58, 2], if it is 11:10 at this time, the log time is 3:12:00.001, and the final result is 10:12:00.001; if at this time is 11:59:1.000, the log time is 3:01:1.000, and the final result is 12:01:1.000
Example:
Script:
json(_, time) # Extract the time field (if the time zone in the container is UTC+0000)
default_time(time) # Convert the extracted time field into a timestamp
# (Use local time zone UTC+0800/UTC+0900... parsing for data without time zone)
adjust_timezone(time)
# Automatically (re)select time zone, calibrate time offset
Execute datakit pipeline -P <name>.p -F <input_file_name> --date
:
# output 1
{
"message": "{\n \"time\":\"11 Jul 2022 12:49:20.937\",\n \"second\":2,\n \"third\":\"abc\",\n \"forth\":true\n}",
"status": "unknown",
"time": "2022-07-11T20:49:20.937+08:00"
}
local time: 2022-07-11T20:55:10.521+08:00
The times obtained by using only default_time
and parsing according to the default local time zone (UTC+8) are:
- Output result of input 1:
2022-07-11T12:49:20.937+08:00
After using adjust_timezone
will get:
- Output result of input 1:
2022-07-11T20:49:20.937+08:00
agg_create()
¶
Function prototype: fn agg_create(bucket: str, on_interval: str = "60s", on_count: int = 0, keep_value: bool = false, const_tags: map[string]string = nil, category: str = "M")
Function description: Create an aggregation measurement, set the time or number of times through on_interval
or on_count
as the aggregation period, upload the aggregated data after the aggregation is completed, and choose whether to keep the last aggregated data. This function does not work with central Pipeline.
Function parameters:
bucket
: String type, as an aggregated field, if the bucket has already been created, the function will not perform any operations.on_interval
:The default value is60s
, which takes time as the aggregation period, and the unit iss
, and the parameter takes effect when the value is greater than0
; it cannot be combined withon_count
less than or equal to 0.on_count
: The default value is0
, the number of processed points is used as the aggregation period, and the parameter takes effect when the value is greater than0
.keep_value
: The default value isfalse
.const_tags
: Custom tags, empty by default.category
: Data category for aggregated data, optional parameter, the default value is "M", indicating the indicator category data.
示例:
agg_metric()
¶
Function prototype: fn agg_metric(bucket: str, new_field: str, agg_fn: str, agg_by: []string, agg_field: str, category: str = "M")
Function description: According to the field name in the input data, the value is automatically taken as the label of the aggregated data, and the aggregated data is stored in the corresponding bucket. This function does not work with central Pipeline.
Function parameters:
bucket
: String type, the bucket created by the agg_create function, if the bucket has not been created, the function will not perform any operations.new_field
: The name of the field in the aggregated data, the data type of its value isfloat
.agg_fn
: Aggregation function, can be one of"avg"
,"sum"
,"min"
,"max"
,"set"
.agg_by
: The name of the field in the input data will be used as the tag of the aggregated data, and the value of these fields can only be string type data.agg_field
: The field name in the input data, automatically obtain the field value for aggregation.category
: Data category for aggregated data, optional parameter, the default value is "M", indicating the indicator category data.
Example:
Take logging
category data as an example:
Multiple inputs in a row:
- Sample log one:
{"a": 1}
- Sample log two:
{"a": 2}
script:
agg_create("cpu_agg_info", on_interval="10s", const_tags={"tag1":"value_user_define_tag"})
set_tag("tag1", "value1")
field1 = load_json(_)
field1 = field1["a"]
agg_metric("cpu_agg_info", "agg_field_1", "sum", ["tag1", "host"], "field1")
metric output:
append()
¶
Function prototype: fn append(arr, elem) arr
Function description: Add the element elem to the end of the array arr.
Function parameters:
arr
: arrayelem
: element being added.
Example:
# Example 1
abc = ["1", "2"]
abc = append(abc, 5.1)
# abc = ["1", "2", 5.1]
# Example 2
a = [1, 2]
b = [3, 4]
c = append(a, b)
# c = [1, 2, [3, 4]]
b64dec()
¶
Function prototype: fn b64dec(key: str)
Function description: Base64 decodes the string data obtained on the specified field
Function parameters:
key
: fields to extract
Example:
# input data {"str": "aGVsbG8sIHdvcmxk"}
json(_, `str`)
b64enc(`str`)
# result
# {
# "str": "hello, world"
# }
b64enc()
¶
Function prototype: fn b64enc(key: str)
Function description: Base64 encode the string data obtained on the specified field
Function parameters:
key
: key name
Example:
# input data {"str": "hello, world"}
json(_, `str`)
b64enc(`str`)
# result
# {
# "str": "aGVsbG8sIHdvcmxk"
# }
cache_get()
¶
Function prototype: fn cache_get(key: str) nil|str
Function description: Giving key, cache_get() get the correspond value from cache
Function parameters:
key
:key
Example:
cache_set()
¶
Function prototype: fn cache_set(key: str, value: str, expiration: int) nil
Function description: save key value pair to cache
Function parameters:
key
:key (required)value
:value (required)expiration
:expire time (default=100s)
Example:
cast()
¶
Function prototype: fn cast(key, dst_type: str)
Function description: Convert the key value to the specified type
Function parameters:
key
: key nametype
:The target type of conversion, support\"str\", \"float\", \"int\", \"bool\"
Example:
# input data: {"first": 1,"second":2,"third":"aBC","forth":true}
# script
json(_, first)
cast(first, "str")
# result
{
"first": "1"
}
cidr()
¶
Function prototype: fn cidr(ip: str, prefix: str) bool
Function description: Determine whether the IP is in a CIDR block
Function parameters:
ip
: IP addressprefix
: IP prefix, such as192.0.2.1/24
Example:
# script
ip = "192.0.2.233"
if cidr(ip, "192.0.2.1/24") {
add_key(ip_prefix, "192.0.2.1/24")
}
# result
{
"ip_prefix": "192.0.2.1/24"
}
conv_traceid_w3c_to_dd()
¶
Function prototype: fn conv_traceid_w3c_to_dd(key)
Function description: Convert a hex-encoded 128-bit/64-bit W3C Trace ID string(length 32 characters or 16 characters) to a decimal-encoded 64-bit DataDog Trace ID string.
Function parameters:
key
: 128-bit/64-bit Trace ID to convert
Example:
# script input:
"18962fdd9eea517f2ae0771ea69d6e16"
# script:
grok(_, "%{NOTSPACE:trace_id}")
conv_traceid_w3c_to_dd(trace_id)
# result:
{
"trace_id": "3089600317904219670",
}
cover()
¶
Function prototype: fn cover(key: str, range: list)
Function description: Perform data desensitization by range on the string data obtained on the specified field
Function parameters:
key
: Key namerange
: The index range of the desensitized string ([start,end]
) Both start and end support negative subscripts, which are used to express the semantics of tracing back from the end. The interval is reasonable. If end is greater than the maximum length of the string, it will default to the maximum length
Example:
# input data {"str": "13789123014"}
json(_, `str`)
cover(`str`, [8, 9])
# input data {"abc": "13789123014"}
json(_, abc)
cover(abc, [2, 4])
create_point()
¶
Function prototype: fn create_point(name, tags, fields, ts = 0, category = "M", after_use = "")
Function description: Create new data and output. This function does not work with central Pipeline.
Function parameters:
name
: point name, which is regarded as the name of the metric set, log source, etc.tags
: data tagsfields
: data fieldsts
: optional parameter, unix nanosecond timestamp, defaults to current timecategory
: optional parameter, data category, supports category name and name abbreviation, such as metric category can be filled withM
ormetric
, log isL
orlogging
after_use
: optional parameter, after the point is created, execute the specified pl script on the created point; if the original data type is L, the created data category is M, and the script under the L category is executed at this time
Example:
datetime()
¶
Function prototype: fn datetime(key, precision: str, fmt: str, tz: str = "")
Function description: Convert timestamp to specified date format
Function parameters:
key
: Extracted timestamp (required parameter)precision
: Input timestamp precision (s, ms, us, ns)fmt
: date format, provides built-in date format and supports custom date formattz
: time zone (optional parameter), convert the timestamp to the time in the specified time zone, the default time zone of the host is used
Built-in date formats:
Built-in format | date | description |
---|---|---|
"ANSI-C" | "Mon Jan _2 15:04:05 2006" | |
"UnixDate" | "Mon Jan _2 15:04:05 MST 2006" | |
"RubyDate" | "Mon Jan 02 15:04:05 -0700 2006" | |
"RFC822" | "02 Jan 06 15:04 MST" | |
"RFC822Z" | "02 Jan 06 15:04 -0700" | RFC822 with numeric zone |
"RFC850" | "Monday, 02-Jan-06 15:04:05 MST" | |
"RFC1123" | "Mon, 02 Jan 2006 15:04:05 MST" | |
"RFC1123Z" | "Mon, 02 Jan 2006 15:04:05 -0700" | RFC1123 with numeric zone |
"RFC3339" | "2006-01-02T15:04:05Z07:00" | |
"RFC3339Nano" | "2006-01-02T15:04:05.999999999Z07:00" | |
"Kitchen" | "3:04PM" |
Custom date format:
The output date format can be customized through the combination of placeholders
character | example | description |
---|---|---|
a | %a | week abbreviation, such as Wed |
A | %A | The full letter of the week, such as Wednesday |
b | %b | month abbreviation, such as Mar |
B | %B | The full letter of the month, such as March |
C | %c | century, current year divided by 100 |
d | %d | day of the month; range [01, 31] |
e | %e | day of the month; range [1, 31] , pad with spaces |
H | %H | hour, using 24-hour clock; range [00, 23] |
I | %I | hour, using 12-hour clock; range [01, 12] |
j | %j | day of the year, range [001, 365] |
k | %k | hour, using 24-hour clock; range [0, 23] |
l | %l | hour, using 12-hour clock; range [1, 12] , padding with spaces |
m | %m | month, range [01, 12] |
M | %M | minutes, range [00, 59] |
n | %n | represents a newline character \n |
p | %p | AM or PM |
P | %P | am or pm |
s | %s | seconds since 1970-01-01 00:00:00 UTC |
S | %S | seconds, range [00, 60] |
t | %t | represents the tab character \t |
u | %u | day of the week, Monday is 1, range [1, 7] |
w | %w | day of the week, 0 for Sunday, range [0, 6] |
y | %y | year in range [00, 99] |
Y | %Y | decimal representation of the year |
z | %z | RFC 822/ISO 8601:1988 style time zone (e.g. -0600 or +0800 etc.) |
Z | %Z | time zone abbreviation, such as CST |
% | %% | represents the character % |
Example:
# input data:
# {
# "a":{
# "timestamp": "1610960605000",
# "second":2
# },
# "age":47
# }
# script
json(_, a.timestamp)
datetime(a.timestamp, 'ms', 'RFC3339')
# script
ts = timestamp()
datetime(ts, 'ns', fmt='%Y-%m-%d %H:%M:%S', tz="UTC")
# output
{
"ts": "2023-03-08 06:43:39"
}
# script
ts = timestamp()
datetime(ts, 'ns', '%m/%d/%y %H:%M:%S %z', "Asia/Tokyo")
# output
{
"ts": "03/08/23 15:44:59 +0900"
}
decode()
¶
Function prototype: fn decode(text: str, text_encode: str)
Function description: Convert text to UTF8 encoding to deal with the problem that the original log is not UTF8 encoded. Currently supported encodings are utf-16le/utf-16be/gbk/gb18030 (these encoding names can only be lowercase)
Example:
decode("wwwwww", "gbk")
# Extracted data(drop: false, cost: 33.279µs):
# {
# "message": "wwwwww",
# }
default_time()
¶
Function prototype: fn default_time(key: str, timezone: str = "")
Function description: Use an extracted field as the timestamp of the final data
Function parameters:
key
: key nametimezone
: Specifies the time zone used by the time text to be formatted, optional parameter, the default is the current system time zone, time zone example+8/-8/+8:30
The pending data supports the following formatting times
date format | date format | date format | date format |
---|---|---|---|
2014-04-26 17:24:37.3186369 |
May 8, 2009 5:57:51 PM |
2012-08-03 18:31:59.257000000 |
oct 7, 1970 |
2014-04-26 17:24:37.123 |
oct 7, '70 |
2013-04-01 22:43 |
oct. 7, 1970 |
2013-04-01 22:43:22 |
oct. 7, 70 |
2014-12-16 06:20:00 UTC |
Mon Jan 2 15:04:05 2006 |
2014-12-16 06:20:00 GMT |
Mon Jan 2 15:04:05 MST 2006 |
2014-04-26 05:24:37 PM |
Mon Jan 02 15:04:05 -0700 2006 |
2014-04-26 13:13:43 +0800 |
Monday, 02-Jan-06 15:04:05 MST |
2014-04-26 13:13:43 +0800 +08 |
Mon, 02 Jan 2006 15:04:05 MST |
2014-04-26 13:13:44 +09:00 |
Tue, 11 Jul 2017 16:28:13 +0200 (CEST) |
2012-08-03 18:31:59.257000000 +0000 UTC |
Mon, 02 Jan 2006 15:04:05 -0700 |
2015-09-30 18:48:56.35272715 +0000 UTC |
Thu, 4 Jan 2018 17:53:36 +0000 |
2015-02-18 00:12:00 +0000 GMT |
Mon 30 Sep 2018 09:09:09 PM UTC |
2015-02-18 00:12:00 +0000 UTC |
Mon Aug 10 15:44:11 UTC+0100 2015 |
2015-02-08 03:02:00 +0300 MSK m=+0.000000001 |
Thu, 4 Jan 2018 17:53:36 +0000 |
2015-02-08 03:02:00.001 +0300 MSK m=+0.000000001 |
Fri Jul 03 2015 18:04:07 GMT+0100 (GMT Daylight Time) |
2017-07-19 03:21:51+00:00 |
September 17, 2012 10:09am |
2014-04-26 |
September 17, 2012 at 10:09am PST-08 |
2014-04 |
September 17, 2012, 10:10:09 |
2014 |
2014:3:31 |
2014-05-11 08:20:13,787 |
2014:03:31 |
3.31.2014 |
2014:4:8 22:05 |
03.31.2014 |
2014:04:08 22:05 |
08.21.71 |
2014:04:2 03:00:51 |
2014.03 |
2014:4:02 03:00:51 |
2014.03.30 |
2012:03:19 10:11:59 |
20140601 |
2012:03:19 10:11:59.3186369 |
20140722105203 |
2014 年 04 月 08 日 |
1332151919 |
2006-01-02T15:04:05+0000 |
1384216367189 |
2009-08-12T22:15:09-07:00 |
1384216367111222 |
2009-08-12T22:15:09 |
1384216367111222333 |
2009-08-12T22:15:09Z |
Example JSON extraction:
# raw json
{
"time":"06/Jan/2017:16:16:37 +0000",
"second":2,
"third":"abc",
"forth":true
}
# script
json(_, time) # extract time field
default_time(time) # convert the extracted time field into a timestamp
# result
{
"time": 1483719397000000000,
}
Text extraction example:
# raw log text
# 2021-01-11T17:43:51.887+0800 DEBUG io io/io.go:458 post cost 6.87021ms
# script
grok(_, '%{TIMESTAMP_ISO8601:log_time}') # Extract the log time and name the field log_time
default_time(log_time) # Convert the extracted log_time field into a timestamp
# result
{
"log_time": 1610358231887000000,
}
# For the data collected by logging, it is better to name the time field as time, otherwise the logging collector will fill it with the current time
rename("time", log_time)
# result
{
"time": 1610358231887000000,
}
delete()
¶
Function prototype: fn delete(src: map[string]any, key: str)
Function description: Delete the key in the JSON map
# input
# {"a": "b", "b":[0, {"c": "d"}], "e": 1}
# script
j_map = load_json(_)
delete(j_map["b"][-1], "c")
delete(j_map, "a")
add_key("j_map", j_map)
# result:
# {
# "j_map": "{\"b\":[0,{}],\"e\":1}",
# }
drop()
¶
Function prototype: fn drop()
Function description: Discard the entire log without uploading
Example:
# in << {"str_a": "2", "str_b": "3"}
json(_, str_a)
if str_a == "2"{
drop()
exit()
}
json(_, str_b)
# Extracted data(drop: true, cost: 30.02µs):
# {
# "message": "{\"str_a\": \"2\", \"str_b\": \"3\"}",
# "str_a": "2"
# }
drop_key()
¶
Function prototype: fn drop_key(key)
Function description: Delete key
Function parameters:
key
: key to be deleted
Example:
# data = "{\"age\": 17, \"name\": \"zhangsan\", \"height\": 180}"
json(_, age,)
json(_, name)
json(_, height)
drop_key(height)
# result
# {
# "age": 17,
# "name": "zhangsan"
# }
drop_origin_data()
¶
Function prototype: fn drop_origin_data()
Function description: Discard the initial text, otherwise the initial text is placed in the message field
Example:
# input data: {"age": 17, "name": "zhangsan", "height": 180}
# delete message field
drop_origin_data()
duration_precision()
¶
Function prototype: fn duration_precision(key, old_precision: str, new_precision: str)
Function description: Perform duration precision conversion, and specify the current precision and target precision through Function parameters:. Support conversion between s, ms, us, ns.
Example:
# in << {"ts":12345}
json(_, ts)
cast(ts, "int")
duration_precision(ts, "ms", "ns")
# Extracted data(drop: false, cost: 33.279µs):
# {
# "message": "{\"ts\":12345}",
# "ts": 12345000000
# }
exit()
¶
Function prototype: fn exit()
Function description: End the parsing of the current log, if the function drop() is not called, the parsed part will still be output
# in << {"str_a": "2", "str_b": "3"}
json(_, str_a)
if str_a == "2"{
exit()
}
json(_, str_b)
# Extracted data(drop: false, cost: 48.233µs):
# {
# "message": "{\"str_a\": \"2\", \"str_b\": \"3\"}",
# "str_a": "2"
# }
format_int()
¶
Function prototype: fn format_int(val: int, base: int) str
Function description: Converts a numeric value to a numeric string in the specified base.
Function parameters:
val
: The number to be converted.base
: Base, ranging from 2 to 36; when the base is greater than 10, lowercase letters a to z are used to represent values 10 and later.
Example:
# script0
a = 7665324064912355185
b = format_int(a, 16)
if b != "6a60b39fd95aaf71" {
add_key(abc, b)
} else {
add_key(abc, "ok")
}
# result
'''
{
"abc": "ok"
}
'''
# script1
a = "7665324064912355185"
b = format_int(parse_int(a, 10), 16)
if b != "6a60b39fd95aaf71" {
add_key(abc, b)
} else {
add_key(abc, "ok")
}
# result
'''
{
"abc": "ok"
}
'''
geoip()
¶
Function prototype: fn geoip(ip: str)
Function description: Append more IP information to IP. geoip()
will generate additional fields, such as:
isp
: operatorcity
: cityprovince
: provincecountry
: country
Function parameters:
ip
: The extracted IP field supports both IPv4 and IPv6
Example:
# input data: {"ip":"1.2.3.4"}
# script
json(_, ip)
geoip(ip)
# result
{
"city" : "Brisbane",
"country" : "AU",
"ip" : "1.2.3.4",
"province" : "Queensland",
"isp" : "unknown"
"message" : "{\"ip\": \"1.2.3.4\"}",
}
get_key()
¶
Function prototype: fn get_key(key)
Function description: Read the value of key from the input point
Function parameters:
key_name
: key name
Example:
add_key("city", "shanghai")
# Here you can directly access the value of the key with the same name in point through city
if city == "shanghai" {
add_key("city_1", city)
}
# Due to the right associativity of assignment, get the value whose key is "city" first,
# Then create a variable named city
city = city + " --- ningbo" + " --- " +
"hangzhou" + " --- suzhou ---" + ""
# get_key gets the value of "city" from point
# If there is a variable named city, it cannot be obtained directly from point
if city != get_key("city") {
add_key("city_2", city)
}
# result
"""
{
"city": "shanghai",
"city_1": "shanghai",
"city_2": "shanghai --- ningbo --- hangzhou --- suzhou ---"
}
"""
gjson()
¶
Function prototype: fn gjson(input, json_path: str, newkey: str)
Function description: Extract specified fields from JSON, rename them as new fields, and ensure they are arranged in the original order.
Function parameters:
input
: The JSON to be extracted can either be the original text (_
) or a specifickey
after the initial extraction.json_path
: JSON path informationnewkey
: Write the data to the new key after extraction
# Directly extract the field x.y from the original input JSON and rename it as a new field abc.
gjson(_, "x.y", "abc")
# Extract the x.y field from a previously extracted key, and name the extracted field as x.y.
gjson(key, "x.y")
# Extract arrays, where `key` and `abc` are arrays.
gjson(key, "1.abc.2")
Example 1:
# input data:
# {"info": {"age": 17, "name": "zhangsan", "height": 180}}
# script:
gjson(_, "info", "zhangsan")
gjson(zhangsan, "name")
gjson(zhangsan, "age", "age")
# result:
{
"age": 17,
"message": "{\"info\": {\"age\": 17, \"name\": \"zhangsan\", \"height\": 180}}",
"name": "zhangsan",
"zhangsan": "{\"age\":17,\"height\":180,\"name\":\"zhangsan\"}"
}
Example 2:
# input data:
# data = {
# "name": {"first": "Tom", "last": "Anderson"},
# "age":37,
# "children": ["Sara","Alex","Jack"],
# "fav.movie": "Deer Hunter",
# "friends": [
# {"first": "Dale", "last": "Murphy", "age": 44, "nets": ["ig", "fb", "tw"]},
# {"first": "Roger", "last": "Craig", "age": 68, "nets": ["fb", "tw"]},
# {"first": "Jane", "last": "Murphy", "age": 47, "nets": ["ig", "tw"]}
# ]
# }
# script:
gjson(_, "name")
gjson(name, "first")
Example 3:
# input data:
# [
# {"first": "Dale", "last": "Murphy", "age": 44, "nets": ["ig", "fb", "tw"]},
# {"first": "Roger", "last": "Craig", "age": 68, "nets": ["fb", "tw"]},
# {"first": "Jane", "last": "Murphy", "age": 47, "nets": ["ig", "tw"]}
# ]
# scripts for JSON list:
gjson(_, "0.nets.1")
grok()
¶
Function prototype: fn grok(input: str, pattern: str, trim_space: bool = true) bool
Function description: Extract the contents of the text string input
by pattern
, and return true when pattern matches input successfully, otherwise return false.
Function parameters:
input
:The text to be extracted can be the original text (_
) or akey
after the initial extractionpattern
: grok expression, the data type of the specified key is supported in the expression: bool, float, int, string (corresponding to Pipeline's str, can also be written as str), the default is stringtrim_space
: Delete the leading and trailing blank characters in the extracted characters, the default value is true
grok(_, pattern) #Use the entered text directly as raw data
grok(key, pattern) # For a key that has been extracted before, do grok again
示例:
# input data: "12/01/2021 21:13:14.123"
# script
add_pattern("_second", "(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)")
add_pattern("_minute", "(?:[0-5][0-9])")
add_pattern("_hour", "(?:2[0123]|[01]?[0-9])")
add_pattern("time", "([^0-9]?)%{_hour:hour:string}:%{_minute:minute:int}(?::%{_second:second:float})([^0-9]?)")
grok_match_ok = grok(_, "%{DATE_US:date} %{time}")
add_key(grok_match_ok)
# result
{
"date": "12/01/2021",
"hour": "21",
"message": "12/01/2021 21:13:14.123",
"minute": 13,
"second": 14.123
}
{
"date": "12/01/2021",
"grok_match_ok": true,
"hour": "21",
"message": "12/01/2021 21:13:14.123",
"minute": 13,
"second": 14.123,
"status": "unknown",
"time": 1665994187473917724
}
group_between()
¶
Function prototype: fn group_between(key: int, between: list, new_value: int|float|bool|str|map|list|nil, new_key)
Function description: If the key
value is within the specified range between
(note: it can only be a single interval, such as [0,100]
), a new field can be created and assigned a new value. If no new field is provided, the original field value will be overwritten
Example 1:
# input data: {"http_status": 200, "code": "success"}
json(_, http_status)
# If the field http_status value is within the specified range, change its value to "OK"
group_between(http_status, [200, 300], "OK")
# result
# {
# "http_status": "OK"
# }
Example 2:
# input data: {"http_status": 200, "code": "success"}
json(_, http_status)
# If the value of the field http_status is within the specified range, create a new status field with the value "OK"
group_between(http_status, [200, 300], "OK", status)
# result
{
"http_status": 200,
"status": "OK"
}
group_in()
¶
Function prototype: fn group_in(key: int|float|bool|str, range: list, new_value: int|float|bool|str|map|list|nil, new-key = "")
Function description: If the key
value is in the list in
, a new field can be created and assigned the new value. If no new field is provided, the original field value will be overwritten
Example:
# If the field log_level value is in the list, change its value to "OK"
group_in(log_level, ["info", "debug"], "OK")
# If the field http_status value is in the specified list, create a new status field with the value "not-ok"
group_in(log_level, ["error", "panic"], "not-ok", status)
hash()
¶
Function prototype: fn hash(text: str, method: str) -> str
Function description: Calculate the hash of the text
Function parameters:
text
: input textmethod
: Hash algorithm, allowing values includingmd5
,sha1
,sha256
,sha512
Example:
http_request()
¶
Function prototype: fn http_request(method: str, url: str, headers: map, body: any) map
Function description: Send an HTTP request, receive the response, and encapsulate it into a map
Function parameters:
method
: GET|POSTurl
: Request pathheaders
: Additional header,the type is map[string]stringbody
: Request body
Return type: map
key contains status code (status_code) and result body (body)
status_code
: Status codebody
: Response body
Example:
resp = http_request("GET", "http://localhost:8080/testResp")
resp_body = load_json(resp["body"])
add_key(abc, resp["status_code"])
add_key(abc, resp_body["a"])
json()
¶
Function prototype: fn json(input: str, json_path, newkey, trim_space: bool = true)
Function description: Extract the specified field in JSON and name it as a new field.
Function parameters:
input
: The JSON to be extracted can be the original text (_
) or akey
after the initial extractionjson_path
: JSON path informationnewkey
:Write the data to the new key after extractiontrim_space
: Delete the leading and trailing blank characters in the extracted characters, the default value is truedelete_after_extract
: After extract delete the extracted info from input. Only map key and map value are deletable, list(array) are not supported. Default is `false'.
# Directly extract the x.y field in the original input json, and name it as a new field abc
json(_, x.y, abc)
# For a `key` that has been extracted, extract `x.y` again, and the extracted field name is `x.y`
json(key, x.y)
Example 1:
# input data:
# {"info": {"age": 17, "name": "zhangsan", "height": 180}}
# script:
json(_, info, "zhangsan")
json(zhangsan, name)
json(zhangsan, age, "age")
# result:
{
"age": 17,
"message": "{\"info\": {\"age\": 17, \"name\": \"zhangsan\", \"height\": 180}}",
"name": "zhangsan",
"zhangsan": "{\"age\":17,\"height\":180,\"name\":\"zhangsan\"}"
}
Example 2:
# input data:
# data = {
# "name": {"first": "Tom", "last": "Anderson"},
# "age":37,
# "children": ["Sara","Alex","Jack"],
# "fav.movie": "Deer Hunter",
# "friends": [
# {"first": "Dale", "last": "Murphy", "age": 44, "nets": ["ig", "fb", "tw"]},
# {"first": "Roger", "last": "Craig", "age": 68, "nets": ["fb", "tw"]},
# {"first": "Jane", "last": "Murphy", "age": 47, "nets": ["ig", "tw"]}
# ]
# }
# script:
json(_, name)
json(name, first)
Example 3:
# input data:
# [
# {"first": "Dale", "last": "Murphy", "age": 44, "nets": ["ig", "fb", "tw"]},
# {"first": "Roger", "last": "Craig", "age": 68, "nets": ["fb", "tw"]},
# {"first": "Jane", "last": "Murphy", "age": 47, "nets": ["ig", "tw"]}
# ]
# script:
json(_, .[0].nets[-1])
Example 4:
# input data:
{"item": " not_space ", "item2":{"item3": [123]}}
# script:
json(_, item2.item3, item, delete_after_extract = true)
# result:
{
"item": "[123]",
"message": "{\"item\":\" not_space \",\"item2\":{}}",
}
Example 5:
# input data:
{"item": " not_space ", "item2":{"item3": [123]}}
# If you try to remove a list element it will fail the script check.
# Script:
json(_, item2.item3[0], item, delete_after_extract = true)
# test command:
# datakit pipeline -P j2.p -T '{"item": " not_space ", "item2":{"item3": [123]}}'
# report error:
# [E] j2.p:1:54: does not support deleting elements in the list
kv_split()
¶
Function prototype: fn kv_split(key, field_split_pattern = " ", value_split_pattern = "=", trim_key = "", trim_value = "", include_keys = [], prefix = "") -> bool
Function description: extract all key-value pairs from a string
Function parameters:
key
: key nameinclude_keys
: list of key names, only extract the keys in the list; the default value is [], do not extract any keyfield_split_pattern
: string splitting, a regular expression used to extract all key-value pairs; the default value is " "value_split_pattern
: used to split the key and value from the key-value pair string, non-recursive; the default value is "="trim_key
: delete all the specified characters leading and trailing the extracted key; the default value is ""trim_value
: remove all leading and trailing characters from the extracted value; the default value is ""prefix
: add prefix to all keys
Example:
# input: "a=1, b=2 c=3"
kv_split(_)
'''output:
{
"message": "a=1, b=2 c=3",
"status": "unknown",
"time": 1679558730846377132
}
'''
# input: "a=1, b=2 c=3"
kv_split(_, include_keys=["a", "c", "b"])
'''output:
{
"a": "1,",
"b": "2",
"c": "3",
"message": "a=1 b=2 c=3",
"status": "unknown",
"time": 1678087119072769560
}
'''
# input: "a=1, b=2 c=3"
kv_split(_, trim_value=",", include_keys=["a", "c", "b"])
'''output:
{
"a": "1",
"b": "2",
"c": "3",
"message": "a=1, b=2 c=3",
"status": "unknown",
"time": 1678087173651846101
}
'''
# input: "a=1, b=2 c=3"
kv_split(_, trim_value=",", include_keys=["a", "c"])
'''output:
{
"a": "1",
"c": "3",
"message": "a=1, b=2 c=3",
"status": "unknown",
"time": 1678087514906492912
}
'''
# input: "a::1,+b::2+c::3"
kv_split(_, field_split_pattern="\\+", value_split_pattern="[:]{2}",
prefix="with_prefix_",trim_value=",", trim_key="a", include_keys=["a", "b", "c"])
'''output:
{
"message": "a::1,+b::2+c::3",
"status": "unknown",
"time": 1678087473255241547,
"with_prefix_b": "2",
"with_prefix_c": "3"
}
'''
len()
¶
Function prototype: fn len(val: str|map|list) int
Function description: Calculate the number of bytes in string, the number of elements in map and list.
Function parameters:
val
: Can be map, list or string
Example:
# example 1
add_key(abc, len("abc"))
# result
{
"abc": 3,
}
# example 2
add_key(abc, len(["abc"]))
# result
{
"abc": 1,
}
load_json()
¶
Function prototype: fn load_json(val: str) nil|bool|float|map|list
Function description: Convert the JSON string to one of map, list, nil, bool, float, and the value can be obtained and modified through the index expression.If deserialization fails, it also returns nil instead of terminating the script run.
Function parameters:
val
: Requires data of type string.
Example:
# _: {"a":{"first": [2.2, 1.1], "ff": "[2.2, 1.1]","second":2,"third":"aBC","forth":true},"age":47}
abc = load_json(_)
add_key(abc, abc["a"]["first"][-1])
abc["a"]["first"][-1] = 11
# Need to synchronize the data on the stack to point
add_key(abc, abc["a"]["first"][-1])
add_key(len_abc, len(abc))
add_key(len_abc, len(load_json(abc["a"]["ff"])))
lowercase()
¶
Function prototype: fn lowercase(key: str)
Function description: Convert the content of the extracted key to lowercase
Function parameters:
key
: Specify the extracted field name to be converted
Example:
# input data: {"first": "HeLLo","second":2,"third":"aBC","forth":true}
# script
json(_, first) lowercase(first)
# result
{
"first": "hello"
}
match()
¶
Function prototype: fn match(pattern: str, s: str) bool
Function description: Use the specified regular expression to match the string, return true if the match is successful, otherwise return false
Function parameters:
pattern
: regular expressions
: string to match
Example:
# script
test_1 = "pattern 1,a"
test_2 = "pattern -1,"
add_key(match_1, match('''\w+\s[,\w]+''', test_1))
add_key(match_2, match('''\w+\s[,\w]+''', test_2))
# result
{
"match_1": true,
"match_2": false
}
mquery_refer_table()
¶
Function prototype: fn mquery_refer_table(table_name: str, keys: list, values: list)
Function description: Query the external reference table by specifying multiple keys, and append all columns of the first row of the query result to field. This function does not work with central Pipeline.
Function parameters:
table_name
: the name of the table to be looked upkeys
: a list of multiple column namesvalues
: the values corresponding to each column
Example:
json(_, table)
json(_, key)
json(_, value)
# Query and append the data of the current column, which is added to the data as a field by default
mquery_refer_table(table, values=[value, false], keys=[key, "col4"])
# result
# {
# "col": "ab",
# "col2": 1234,
# "col3": 1235,
# "col4": false,
# "key": "col2",
# "message": "{\"table\": \"table_abc\", \"key\": \"col2\", \"value\": 1234.0}",
# "status": "unknown",
# "table": "table_abc",
# "time": "2022-08-16T16:23:31.940600281+08:00",
# "value": 1234
# }
nullif()
¶
Function prototype: fn nullif(key, value)
Function description: If the content of the field specified by the extracted key
is equal to the value of value
, delete this field
Function parameters:
key
: specified fieldvalue
: target value
Example:
# input data: {"first": 1,"second":2,"third":"aBC","forth":true}
# script
json(_, first) json(_, second) nullif(first, "1")
# result
{
"second":2
}
Note: This feature can be implemented with
if/else
semantics:
parse_date()
¶
Function prototype: fn parse_date(key: str, yy: str, MM: str, dd: str, hh: str, mm: str, ss: str, ms: str, zone: str)
Function description: Convert the value of each part of the incoming date field into a timestamp
Function parameters:
key
: newly inserted fieldyy
: Year numeric string, supports four or two digit strings, if it is an empty string, the current year will be used when processingMM
: month string, supports numbers, English, English abbreviationdd
: day stringhh
: hour stringmm
: minute stringss
: seconds stringms
: milliseconds stringus
: microseconds stringns
: string of nanosecondszone
: time zone string, in the form of "+8" or \"Asia/Shanghai\"
Example:
parse_date(aa, "2021", "May", "12", "10", "10", "34", zone="Asia/Shanghai") # Result aa=1620785434000000000
parse_date(aa, "2021", "12", "12", "10", "10", "34", zone="Asia/Shanghai") # result aa=1639275034000000000
parse_date(aa, "2021", "12", "12", "10", "10", "34", "100", zone="Asia/Shanghai") # Result aa=1639275034000000100
parse_date(aa, "20", "February", "12", "10", "10", "34", zone="+8") result aa=1581473434000000000
parse_duration()
¶
Function prototype: fn parse_duration(key: str)
Function description: If the value of key
is a golang duration string (such as 123ms
), then key
will be automatically parsed into an integer in nanoseconds
The current duration units in golang are as follows:
ns
nanosecondsus/µs
microsecondsms
millisecondss
secondsm
minutesh
hours
Function parameters:
key
: the field to be parsed
Example:
# assume abc = "3.5s"
parse_duration(abc) # result abc = 3500000000
# Support negative numbers: abc = "-3.5s"
parse_duration(abc) # result abc = -3500000000
# support floating point: abc = "-2.3s"
parse_duration(abc) # result abc = -2300000000
parse_int()
¶
Function prototype: fn parse_int(val: int, base: int) int
Function description: Converts the string representation of a numeric value to a numeric value.
Function parameters:
val
: The string to be converted.base
: Base, the range is 0, or 2 to 36; when the value is 0, the base is judged according to the string prefix.
Example:
# script0
a = "7665324064912355185"
b = format_int(parse_int(a, 10), 16)
if b != "6a60b39fd95aaf71" {
add_key(abc, b)
} else {
add_key(abc, "ok")
}
# result
'''
{
"abc": "ok"
}
'''
# script1
a = "6a60b39fd95aaf71"
b = parse_int(a, 16) # base 16
if b != 7665324064912355185 {
add_key(abc, b)
} else {
add_key(abc, "ok")
}
# result
'''
{
"abc": "ok"
}
'''
# script2
a = "0x6a60b39fd95aaf71"
b = parse_int(a, 0) # the true base is implied by the string's
if b != 7665324064912355185 {
add_key(abc, b)
} else {
c = format_int(b, 16)
if "0x"+c != a {
add_key(abc, c)
} else {
add_key(abc, "ok")
}
}
# result
'''
{
"abc": "ok"
}
'''
point_window()
¶
Function prototype: fn point_window(before: int, after: int, stream_tags = ["filepath", "host"])
Function description: Record the discarded data and use it with the window_hit
function to upload the discarded context Point
data.
Function parameters:
before
: The maximum number of points that can be temporarily stored before the functionwindow_hit
is executed, and the data that has not been discarded is included in the count.after
: The number of points retained after thewindow_hit
function is executed, and the data that has not been discarded is included in the count.stream_tags
: Differentiate log (metrics, tracing, etc.) streams by labels on the data, the default number usingfilepath
andhost
can be used to distinguish logs from the same file.
Example:
# It is recommended to place it in the first line of the script
#
point_window(8, 8)
# If it is a panic log, keep the first 8 entries
# and the last 8 entries (including the current one)
#
if grok(_, "abc.go:25 panic: xxxxxx") {
# This function will only take effect if point_window() is executed during this run.
# Trigger data recovery behavior within the window
#
window_hit()
}
# By default, all logs whose service is test_app are discarded;
# If it contains panic logs, keep the 15 adjacent ones and the current one.
#
if service == "test_app" {
drop()
}
pt_kvs_del()
¶
Function prototype: fn pt_kvs_del(name: str)
Function description: Delete the key specified in Point
Function parameters:
name
: Key to be deleted
Example:
key_blacklist = ["k1", "k2", "k3"]
for k in pt_kvs_keys() {
if k in key_blacklist {
pt_kvs_del(k)
}
}
pt_kvs_get()
¶
Function prototype: fn pt_kvs_get(name: str) -> any
Function description: Return the value of the specified key in Point
Function parameters:
name
: Key name
Example:
pt_kvs_keys()
¶
Function prototype: fn pt_kvs_keys(tags: bool = true, fields: bool = true) -> list
Function description: Return the key list in Point
Function parameters:
tags
: Whether to include the names of all tagsfields
: Whether to include the names of all fields
Example:
pt_kvs_set()
¶
Function prototype: fn pt_kvs_set(name: str, value: any, as_tag: bool = false) -> bool
Function description: Add a key to a Point or modify the value of a key in a Point
Function parameters:
name
: The name of the field or label to be added or modifiedvalue
: The value of a field or labelas_tag
: Set as tag or not
Example:
pt_name()
¶
Function prototype: fn pt_name(name: str = "") -> str
Function description: Get the name of point; if the parameter is not empty, set the new name.
Function parameters:
name
: Value as point name; defaults to empty string.
The field mapping relationship between point name and various types of data storage:
category | field name |
---|---|
custom_object | class |
keyevent | - |
logging | source |
metric | - |
network | source |
object | class |
profiling | source |
rum | source |
security | rule |
tracing | source |
query_refer_table()
¶
Function prototype: fn query_refer_table(table_name: str, key: str, value)
Function description: Query the external reference table through the specified key, and append all the columns of the first row of the query result to field. This function does not work with central Pipeline.
Function parameters:
table_name
: the name of the table to be looked upkey
: column namevalue
: the value corresponding to the column
Example:
# extract table name, column name, column value from input
json(_, table)
json(_, key)
json(_, value)
# Query and append the data of the current column, which is added to the data as a field by default
query_refer_table(table, key, value)
Result:
{
"col": "ab",
"col2": 1234,
"col3": 123,
"col4": true,
"key": "col2",
"message": "{\"table\": \"table_abc\", \"key\": \"col2\", \"value\": 1234.0}",
"status": "unknown",
"table": "table_abc",
"time": "2022-08-16T15:02:14.158452592+08:00",
"value": 1234
}
rename()
¶
Function prototype: fn rename(new_key, old_key)
Function description: Rename the extracted fields
Function parameters:
new_key
: new field nameold_key
: the extracted field name
Example:
# Data to be processed: {"info": {"age": 17, "name": "zhangsan", "height": 180}}
# process script
json(_, info.name, "name")
# process result
{
"message": "{\"info\": {\"age\": 17, \"name\": \"zhangsan\", \"height\": 180}}",
"zhangsan": {
"age": 17,
"height": 180,
"Name": "zhangsan"
}
}
replace()
¶
Function prototype: fn replace(key: str, regex: str, replace_str: str)
Function description: Replace the string data obtained on the specified field according to regular rules
Function parameters:
key
: the field to be extractedregex
: regular expressionreplace_str
: string to replace
Example:
# Phone number: {"str_abc": "13789123014"}
json(_, str_abc)
replace(str_abc, "(1[0-9]{2})[0-9]{4}([0-9]{4})", "$1****$2")
# English name {"str_abc": "zhang san"}
json(_, str_abc)
replace(str_abc, "([a-z]*) \\w*", "$1 ***")
# ID number {"str_abc": "362201200005302565"}
json(_, str_abc)
replace(str_abc, "([1-9]{4})[0-9]{10}([0-9]{4})", "$1**********$2")
# Chinese name {"str_abc": "Little Aka"}
json(_, str_abc)
replace(str_abc, '([\u4e00-\u9fa5])[\u4e00-\u9fa5]([\u4e00-\u9fa5])', "$1*$2")
sample()
¶
Function prototype: fn sample(p)
Function description: Choose to collect/discard data with probability p.
Function parameters:
p
: the probability that the sample function returns true, the value range is [0, 1]
Example:
# process script
if !sample(0.3) { # sample(0.3) indicates that the sampling rate is 30%, that is, it returns true with a 30% probability, and 70% of the data will be discarded here
drop() # mark the data to be discarded
exit() # Exit the follow-up processing process
}
set_measurement()
¶
Function prototype: fn set_measurement(name: str, delete_key: bool = false)
Function description: change the name of the line protocol
Function parameters:
name
: The value is used as the measurement name, which can be passed in as a string constant or variabledelete_key
: If there is a tag or field with the same name as the variable in point, delete it
The field mapping relationship between row protocol name and various types of data storage or other purposes:
category | field name | other usage |
---|---|---|
custom_object | class | - |
keyevent | - | - |
logging | source | - |
metric | - | metric set name |
network | source | - |
object | class | - |
profiling | source | - |
rum | source | - |
security | rule | - |
tracing | source | - |
set_tag()
¶
Function prototype: fn set_tag(key, value: str)
Function description: mark the specified field as tag output, after setting as tag, other functions can still operate on the variable. If the key set as a tag is a field that has been cut out, it will not appear in the field, so as to avoid the same name of the cut out field key as the tag key on the existing data
Function parameters:
key
: the field to be taggedvalue
: can be a string literal or a variable
# in << {"str": "13789123014"}
set_tag(str)
json(_, str) # str == "13789123014"
replace(str, "(1[0-9]{2})[0-9]{4}([0-9]{4})", "$1****$2")
# Extracted data(drop: false, cost: 49.248µs):
# {
# "message": "{\"str\": \"13789123014\", \"str_b\": \"3\"}",
# "str#": "137****3014"
# }
# * The character `#` is only the tag whose field is tag when datakit --pl <path> --txt <str> output display
# in << {"str_a": "2", "str_b": "3"}
json(_, str_a)
set_tag(str_a, "3") # str_a == 3
# Extracted data(drop: false, cost: 30.069µs):
# {
# "message": "{\"str_a\": \"2\", \"str_b\": \"3\"}",
# "str_a#": "3"
# }
# in << {"str_a": "2", "str_b": "3"}
json(_, str_a)
json(_, str_b)
set_tag(str_a, str_b) # str_a == str_b == "3"
# Extracted data(drop: false, cost: 32.903µs):
# {
# "message": "{\"str_a\": \"2\", \"str_b\": \"3\"}",
# "str_a#": "3",
# "str_b": "3"
# }
slice_string()
¶
Function prototype: fn slice_string(name: str, start: int, end: int) -> str
Function description: Returns the substring of the string from index start to end.
Function Parameters:
name
: The string to be slicedstart
: The starting index of the substring (inclusive)end
: The ending index of the substring (exclusive)
Example:
sql_cover()
¶
Function prototype: fn sql_cover(sql_test: str)
Function description: desensitized SQL statement
Example:
# in << {"select abc from def where x > 3 and y < 5"}
sql_cover(_)
# Extracted data(drop: false, cost: 33.279µs):
# {
# "message": "select abc from def where x > ? and y < ?"
# }
strfmt()
¶
Function prototype: fn strfmt(key, fmt: str, args ...: int|float|bool|str|list|map|nil)
Function description: Format the content of the field specified by the extracted arg1, arg2, ...
according to fmt
, and write the formatted content into the key
field
Function parameters:
key
: Specify the field name of the formatted data to be writtenfmt
: format string templateargs
: Variable Function parameters:, which can be multiple extracted field names to be formatted
Example:
# Data to be processed: {"a":{"first":2.3,"second":2,"third":"abc","forth":true},"age":47}
# process script
json(_, a.second)
json(_, a.thrid)
cast(a. second, "int")
json(_, a.forth)
strfmt(bb, "%v %s %v", a.second, a.thrid, a.forth)
timestamp()
¶
Function prototype: fn timestamp(precision: str = "ns") -> int
Function description: 返回当前 Unix 时间戳,默认精度为 ns
Function parameters:
precision
: 时间戳精度,取值范围为 "ns", "us", "ns", "s", 默认值 "ns"。
Example:
# process script
add_key(time_now_record, timestamp())
datetime(time_now_record, "ns",
"%Y-%m-%d %H:%M:%S", "UTC")
# process result
{
"time_now_record": "2023-03-07 10:41:12"
}
# process script
add_key(time_now_record, timestamp())
datetime(time_now_record, "ns",
"%Y-%m-%d %H:%M:%S", "Asia/Shanghai")
# process result
{
"time_now_record": "2023-03-07 18:41:49"
}
# process script
add_key(time_now_record, timestamp("ms"))
# process result
{
"time_now_record": 1678185980578
}
trim()
¶
Function prototype: fn trim(key, cutset: str = "")
Function description: delete the characters specified at the beginning and end of the key, and delete all blank characters by default when the cutset
is an empty string
Function parameters:
key
: a field that has been extracted, string typecutset
: Delete the first and last characters in thecutset
string in the key
Example:
# Data to be processed: "trim(key, cutset)"
# process script
add_key(test_data, "ACCAA_test_DataA_ACBA")
trim(test_data, "ABC_")
# process result
{
"test_data": "test_Data"
}
uppercase()
¶
Function prototype: fn uppercase(key: str)
Function description: Convert the content in the extracted key to uppercase
Function parameters:
key
: Specify the extracted field name to be converted, and convert the content ofkey
to uppercase
Example:
# Data to be processed: {"first": "hello","second":2,"third":"aBC","forth":true}
# process script
json(_, first) uppercase(first)
# process result
{
"first": "HELLO"
}
url_decode()
¶
Function prototype: fn url_decode(key: str)
Function description: parse the URL in the extracted key
into plain text
Function parameters:
key
: akey
that has been extracted
Example:
# Data to be processed: {"url":"http%3a%2f%2fwww.baidu.com%2fs%3fwd%3d%e6%b5%8b%e8%af%95"}
# process script
json(_, url) url_decode(url)
# process result
{
"message": "{"url":"http%3a%2f%2fwww.baidu.com%2fs%3fwd%3d%e6%b5%8b%e8%af%95"}",
"url": "http://www.baidu.com/s?wd=test"
}
url_parse()
¶
Function prototype: fn url_parse(key)
Function description: parse the url whose field name is key.
Function parameters:
key
: field name of the url to parse.
Example:
# Data to be processed: {"url": "https://www.baidu.com"}
# process script
json(_, url)
m = url_parse(url)
add_key(scheme, m["scheme"])
# process result
{
"url": "https://www.baidu.com",
"scheme": "https"
}
The above example extracts its scheme from the url. In addition, it can also extract information such as host, port, path, and Function parameters: carried in the url from the url, as shown in the following example:
# Data to be processed: {"url": "https://www.google.com/search?q=abc&sclient=gws-wiz"}
# process script
json(_, url)
m = url_parse(url)
add_key(sclient, m["params"]["sclient"]) # The Function parameters: carried in the url are saved under the params field
add_key(h, m["host"])
add_key(path, m["path"])
# process result
{
"url": "https://www.google.com/search?q=abc&sclient=gws-wiz",
"h": "www.google.com",
"path": "/search",
"sclient": "gws-wiz"
}
use()
¶
Function prototype: fn use(name: str)
Parameter:
name
: script name, such as abp.p
Function description: call other scripts, all current data can be accessed in the called script
Example:
# Data to be processed: {"ip":"1.2.3.4"}
# Process script a.p
use(\"b.p\")
# Process script b.p
json(_, ip)
geoip (ip)
# Execute the processing result of script a.p
{
"city" : "Brisbane",
"country" : "AU",
"ip" : "1.2.3.4",
"province" : "Queensland",
"isp" : "unknown"
"message" : "{\"ip\": \"1.2.3.4\"}",
}
user_agent()
¶
Function prototype: fn user_agent(key: str)
Function description: Obtain client information on the specified field
Function parameters:
key
: the field to be extracted
user_agent()
will generate multiple fields, such as:
os
: operating systembrowser
: browser
Example:
# data to be processed
# {
# "userAgent" : "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36",
# "second" : 2,
# "third" : "abc",
# "forth" : true
# }
json(_, userAgent) user_agent(userAgent)
valid_json()
¶
Function prototype: fn valid_json(val: str) bool
Function description: Determine if it is a valid JSON string.
Function parameters:
val
: Requires data of type string.
Example:
a = "null"
if valid_json(a) { # true
if load_json(a) == nil {
add_key("a", "nil")
}
}
b = "[1, 2, 3]"
if valid_json(b) { # true
add_key("b", load_json(b))
}
c = "{\"a\": 1}"
if valid_json(c) { # true
add_key("c", load_json(c))
}
d = "???{\"d\": 1}"
if valid_json(d) { # true
add_key("d", load_json(c))
} else {
add_key("d", "invalid json")
}
Result:
value_type()
¶
Function prototype: fn value_type(val) str
Function description: Obtain the type of the variable's value and return the value range ["int", "float", "bool", "str", "list", "map", "]. If the value is nil, return an empty string.
Function parameters:
val
: The value of the type to be determined.
Example:
Input:
Script:
Output:
// Fields
{
"message": "{\"a\":{\"first\": [2.2, 1.1], \"ff\": \"[2.2, 1.1]\",\"second\":2,\"third\":\"aBC\",\"forth\":true},\"age\":47}",
"val_type": "map"
}
window_hit()
¶
Function prototype: fn window_hit()
Function description: Trigger the recovery event of the context discarded data, and recover from the data recorded by the point_window
function。
Function parameters: None
Example:
# It is recommended to place it in the first line of the script
#
point_window(8, 8)
# If it is a panic log, keep the first 8 entries
# and the last 8 entries (including the current one)
#
if grok(_, "abc.go:25 panic: xxxxxx") {
# This function will only take effect if point_window() is executed during this run.
# Trigger data recovery behavior within the window
#
window_hit()
}
# By default, all logs whose service is test_app are discarded;
# If it contains panic logs, keep the 15 adjacent ones and the current one.
#
if service == "test_app" {
drop()
}
xml()
¶
Function prototype: fn xml(input: str, xpath_expr: str, key_name)
Function description: Extract fields from XML through xpath expressions.
Function parameters:
- input: XML to extract
- xpath_expr: xpath expression
- key_name: The extracted data is written to a new key
Example one:
# data to be processed
<entry>
<fieldx>valuex</fieldx>
<fieldy>...</fieldy>
<fieldz>...</fieldz>
<field array>
<fielda>element_a_1</fielda>
<fielda>element_a_2</fielda>
</fieldarray>
</entry>
# process script
xml(_, '/entry/fieldarray//fielda[1]/text()', field_a_1)
# process result
{
"field_a_1": "element_a_1", # extracted element_a_1
"message": "\t\t\u003centry\u003e\n \u003cfieldx\u003evaluex\u003c/fieldx\u003e\n \u003cfieldy\u003e...\u003c/fieldy\u003e\n \u003cfieldz\u003e...\ u003c/fieldz\u003e\n \u003cfieldarray\u003e\n \u003cfielda\u003eelement_a_1\u003c/fielda\u003e\n \u003cfielda\u003eelement_a_2\u003c/fielda\u003e\n \u003c/fieldarray\n\c\u003 u003e",
"status": "unknown",
"time": 1655522989104916000
}
Example two:
# data to be processed
<OrderEvent actionCode = "5">
<OrderNumber>ORD12345</OrderNumber>
<VendorNumber>V11111</VendorNumber>
</OrderEvent>
# process script
xml(_, '/OrderEvent/@actionCode', action_code)
xml(_, '/OrderEvent/OrderNumber/text()', OrderNumber)
# process result
{
"OrderNumber": "ORD12345",
"action_code": "5",
"message": "\u003cOrderEvent actionCode = \"5\"\u003e\n \u003cOrderNumber\u003eORD12345\u003c/OrderNumber\u003e\n \u003cVendorNumber\u003eV11111\u003c/VendorNumber\n\u003e\u003e"
"status": "unknown",
"time": 1655523193632471000
}