Built-in Function¶

Function parameter description:

In function arguments, the anonymous argument (_) refers to the original input text data
JSON path, expressed directly as x.y.z, without any other modifications. For example, {"a":{"first":2.3, "second":2, "third":"abc", "forth":true}, "age":47}, where the JSON path is a.thrid to indicate that the data to be manipulated is abc
The relative order of all function arguments is fixed, and the engine will check it concretely
All of the key parameters mentioned below refer to the key generated after the initial extraction (via grok() or json())
The path of the JSON to be processed, supports the writing of identifiers, and cannot use strings. If you are generating new keys, you need to use strings

Function List¶

`add_key()`¶

Function prototype: fn add_key(key, value)

Function description: Add a key to point

Function parameters:

key: key name
value: key value

Example:

# input: {"age": 17, "name": "zhangsan", "height": 180}

# script
add_key(city, "shanghai")

# result
{
    "age": 17,
    "height": 180,
    "name": "zhangsan",
    "city": "shanghai"
}

`add_pattern()`¶

Function prototype: fn add_pattern(name: str, pattern: str)

Function description: Create custom grok patterns. The grok pattern has scope restrictions, such as a new scope will be generated in the if else statement, and the pattern is only valid within this scope. This function cannot overwrite existing grok patterns in the same scope or in the previous scope

Function parameters:

name: pattern naming
pattern: custom pattern content

Example:

# input data: "11,abc,end1", "22,abc,end1", "33,abc,end3"

# script
add_pattern("aa", "\\d{2}")
grok(_, "%{aa:aa}")
if false {

} else {
    add_pattern("bb", "[a-z]{3}")
    if aa == "11" {
        add_pattern("cc", "end1")
        grok(_, "%{aa:aa},%{bb:bb},%{cc:cc}")
    } elif aa == "22" {
        # Using pattern cc here will cause compilation failure: no pattern found for %{cc}
        grok(_, "%{aa:aa},%{bb:bb},%{INT:cc}")
    } elif aa == "33" {
        add_pattern("bb", "[\\d]{5}") # Overwriting bb here fails
        add_pattern("cc", "end3")
        grok(_, "%{aa:aa},%{bb:bb},%{cc:cc}")
    }
}

# result
{
    "aa":      "11"
    "bb":      "abc"
    "cc":      "end1"
    "message": "11,abc,end1"
}
{
    "aa":      "22"
    "message": "22,abc,end1"
}
{
    "aa":      "33"
    "bb":      "abc"
    "cc":      "end3"
    "message": "33,abc,end3"
}

`adjust_timezone()`¶

Function prototype: fn adjust_timezone(key: int, minute: int)

Function parameters:

key: Nanosecond timestamp, such as the timestamp obtained by the default_time(time) function
minute: The return value allows the number of minutes (integer) beyond the current time, the value range is [0, 15], the default value is 2 minutes

Function description: Make the difference between the incoming timestamp minus the timestamp of the function execution time within (-60+minute, minute] minutes; it is not applicable to data whose time difference exceeds this range, otherwise it will result in wrong data being obtained. Calculation process:

Add hours to the value of key to make it within the current hour
At this time, calculate the difference between the two minutes. The value range of the two minutes is [0, 60), and the difference range is between (-60,0] and [0, 60)
If the difference is less than or equal to -60 + minute, add 1 hour, and if the difference is greater than minute, subtract 1 hour
The default value of minute is 2, and the range of the difference is allowed to be (-58, 2], if it is 11:10 at this time, the log time is 3:12:00.001, and the final result is 10:12:00.001; if at this time is 11:59:1.000, the log time is 3:01:1.000, and the final result is 12:01:1.000

Example:

# input data 1 
{
    "time":"11 Jul 2022 12:49:20.937", 
    "second":2,
    "third":"abc",
    "forth":true
}

Script：

json(_, time)      # Extract the time field (if the time zone in the container is UTC+0000)
default_time(time) # Convert the extracted time field into a timestamp
                   # (Use local time zone UTC+0800/UTC+0900... parsing for data without time zone)
adjust_timezone(time)
                   # Automatically (re)select time zone, calibrate time offset

Execute datakit pipeline -P <name>.p -F <input_file_name> --date:

# output 1
{
  "message": "{\n    \"time\":\"11 Jul 2022 12:49:20.937\",\n    \"second\":2,\n    \"third\":\"abc\",\n    \"forth\":true\n}",
  "status": "unknown",
  "time": "2022-07-11T20:49:20.937+08:00"
}

local time: 2022-07-11T20:55:10.521+08:00

The times obtained by using only default_time and parsing according to the default local time zone (UTC+8) are:

Output result of input 1： 2022-07-11T12:49:20.937+08:00

After using adjust_timezone will get:

Output result of input 1： 2022-07-11T20:49:20.937+08:00

`agg_create()`¶

Function prototype: fn agg_create(bucket: str, on_interval: str = "60s", on_count: int = 0, keep_value: bool = false, const_tags: map[string]string = nil, category: str = "M")

Function description: Create an aggregation measurement, set the time or number of times through on_interval or on_count as the aggregation period, upload the aggregated data after the aggregation is completed, and choose whether to keep the last aggregated data. This function does not work with central Pipeline.

Function parameters:

bucket: String type, as an aggregated field, if the bucket has already been created, the function will not perform any operations.
on_interval：The default value is 60s, which takes time as the aggregation period, and the unit is s, and the parameter takes effect when the value is greater than 0; it cannot be combined with on_count less than or equal to 0.
on_count: The default value is 0, the number of processed points is used as the aggregation period, and the parameter takes effect when the value is greater than 0.
keep_value: The default value is false.
const_tags: Custom tags, empty by default.
category: Data category for aggregated data, optional parameter, the default value is "M", indicating the indicator category data.

示例：

agg_create("cpu_agg_info", on_interval = "30s")

`agg_metric()`¶

Version-1.5.10

Function prototype: fn agg_metric(bucket: str, new_field: str, agg_fn: str, agg_by: []string, agg_field: str, category: str = "M")

Function description: According to the field name in the input data, the value is automatically taken as the label of the aggregated data, and the aggregated data is stored in the corresponding bucket. This function does not work with central Pipeline.

Function parameters:

bucket: String type, the bucket created by the agg_create function, if the bucket has not been created, the function will not perform any operations.
new_field： The name of the field in the aggregated data, the data type of its value is float.
agg_fn: Aggregation function, can be one of "avg", "sum", "min", "max", "set".
agg_by: The name of the field in the input data will be used as the tag of the aggregated data, and the value of these fields can only be string type data.
agg_field: The field name in the input data, automatically obtain the field value for aggregation.
category: Data category for aggregated data, optional parameter, the default value is "M", indicating the indicator category data.

Example:

Take logging category data as an example:

Multiple inputs in a row:

Sample log one: {"a": 1}
Sample log two: {"a": 2}

script:

agg_create("cpu_agg_info", on_interval="10s", const_tags={"tag1":"value_user_define_tag"})

set_tag("tag1", "value1")

field1 = load_json(_)

field1 = field1["a"]

agg_metric("cpu_agg_info", "agg_field_1", "sum", ["tag1", "host"], "field1")

metric output:

{
    "host": "your_hostname",
    "tag1": "value1",
    "agg_field_1": 3
}

`append()`¶

Function prototype: fn append(arr, elem) arr

Function description: Add the element elem to the end of the array arr.

Function parameters:

arr: array
elem: element being added.

Example:

# Example 1
abc = ["1", "2"]
abc = append(abc, 5.1)
# abc = ["1", "2", 5.1]

# Example 2
a = [1, 2]
b = [3, 4]
c = append(a, b)
# c = [1, 2, [3, 4]]

`b64dec()`¶

Function prototype: fn b64dec(key: str)

Function description: Base64 decodes the string data obtained on the specified field

Function parameters:

key: fields to extract

Example:

# input data {"str": "aGVsbG8sIHdvcmxk"}
json(_, `str`)
b64enc(`str`)

# result
# {
#   "str": "hello, world"
# }

`b64enc()`¶

Function prototype: fn b64enc(key: str)

Function description: Base64 encode the string data obtained on the specified field

Function parameters:

key: key name

Example:

# input data {"str": "hello, world"}
json(_, `str`)
b64enc(`str`)

# result
# {
#   "str": "aGVsbG8sIHdvcmxk"
# }

`cache_get()`¶

Function prototype: fn cache_get(key: str) nil|str

Function description: Giving key, cache_get() get the correspond value from cache

Function parameters:

key：key

Example:

a = cache_get("a")
add_key(abc, a)

`cache_set()`¶

Function prototype: fn cache_set(key: str, value: str, expiration: int) nil

Function description: save key value pair to cache

Function parameters:

key：key (required)
value：value (required)
expiration：expire time (default=100s)

Example:

a = cache_set("a", "123")
a = cache_get("a")
add_key(abc, a)

`cast()`¶

Function prototype: fn cast(key, dst_type: str)

Function description: Convert the key value to the specified type

Function parameters:

key: key name
type：The target type of conversion, support \"str\", \"float\", \"int\", \"bool\"

Example:

# input data: {"first": 1,"second":2,"third":"aBC","forth":true}

# script
json(_, first) 
cast(first, "str")

# result
{
  "first": "1"
}

`cidr()`¶

Function prototype: fn cidr(ip: str, prefix: str) bool

Function description: Determine whether the IP is in a CIDR block

Function parameters:

ip: IP address
prefix： IP prefix, such as 192.0.2.1/24

Example:

# script
ip = "192.0.2.233"
if cidr(ip, "192.0.2.1/24") {
    add_key(ip_prefix, "192.0.2.1/24")
}

# result
{
  "ip_prefix": "192.0.2.1/24"
}

`conv_traceid_w3c_to_dd()`¶

Function prototype: fn conv_traceid_w3c_to_dd(key)

Function description: Convert a hex-encoded 128-bit/64-bit W3C Trace ID string(length 32 characters or 16 characters) to a decimal-encoded 64-bit DataDog Trace ID string.

Function parameters:

key: 128-bit/64-bit Trace ID to convert

Example:

# script input:

"18962fdd9eea517f2ae0771ea69d6e16"

# script:

grok(_, "%{NOTSPACE:trace_id}")

conv_traceid_w3c_to_dd(trace_id)

# result:

{
    "trace_id": "3089600317904219670",
}

`cover()`¶

Function prototype: fn cover(key: str, range: list)

Function description: Perform data desensitization by range on the string data obtained on the specified field

Function parameters:

key: Key name
range: The index range of the desensitized string ([start,end]) Both start and end support negative subscripts, which are used to express the semantics of tracing back from the end. The interval is reasonable. If end is greater than the maximum length of the string, it will default to the maximum length

Example:

# input data {"str": "13789123014"}
json(_, `str`)
cover(`str`, [8, 9])

# input data {"abc": "13789123014"}
json(_, abc)
cover(abc, [2, 4])

`create_point()`¶

Function prototype: fn create_point(name, tags, fields, ts = 0, category = "M", after_use = "")

Function description: Create new data and output. This function does not work with central Pipeline.

Function parameters:

name: point name, which is regarded as the name of the metric set, log source, etc.
tags: data tags
fields: data fields
ts: optional parameter, unix nanosecond timestamp, defaults to current time
category: optional parameter, data category, supports category name and name abbreviation, such as metric category can be filled with M or metric, log is L or logging
after_use: optional parameter, after the point is created, execute the specified pl script on the created point; if the original data type is L, the created data category is M, and the script under the L category is executed at this time

Example:

# input
'''
{"a": "b"}
'''
fields = load_json(_)
create_point("name_pt", {"a": "b"}, fields)

`datetime()`¶

Function prototype: fn datetime(key, precision: str, fmt: str, tz: str = "")

Function description: Convert timestamp to specified date format

Function parameters:

key: Extracted timestamp (required parameter)
precision: Input timestamp precision (s, ms, us, ns)
fmt: date format, provides built-in date format and supports custom date format
tz: time zone (optional parameter), convert the timestamp to the time in the specified time zone, the default time zone of the host is used

Built-in date formats:

Built-in format	date	description
"ANSI-C"	"Mon Jan _2 15:04:05 2006"
"UnixDate"	"Mon Jan _2 15:04:05 MST 2006"
"RubyDate"	"Mon Jan 02 15:04:05 -0700 2006"
"RFC822"	"02 Jan 06 15:04 MST"
"RFC822Z"	"02 Jan 06 15:04 -0700"	RFC822 with numeric zone
"RFC850"	"Monday, 02-Jan-06 15:04:05 MST"
"RFC1123"	"Mon, 02 Jan 2006 15:04:05 MST"
"RFC1123Z"	"Mon, 02 Jan 2006 15:04:05 -0700"	RFC1123 with numeric zone
"RFC3339"	"2006-01-02T15:04:05Z07:00"
"RFC3339Nano"	"2006-01-02T15:04:05.999999999Z07:00"
"Kitchen"	"3:04PM"

Custom date format:

The output date format can be customized through the combination of placeholders

character	example	description
a	%a	week abbreviation, such as `Wed`
A	%A	The full letter of the week, such as `Wednesday`
b	%b	month abbreviation, such as `Mar`
B	%B	The full letter of the month, such as `March`
C	%c	century, current year divided by 100
d	%d	day of the month; range `[01, 31]`
e	%e	day of the month; range `[1, 31]`, pad with spaces
H	%H	hour, using 24-hour clock; range `[00, 23]`
I	%I	hour, using 12-hour clock; range `[01, 12]`
j	%j	day of the year, range `[001, 365]`
k	%k	hour, using 24-hour clock; range `[0, 23]`
l	%l	hour, using 12-hour clock; range `[1, 12]`, padding with spaces
m	%m	month, range `[01, 12]`
M	%M	minutes, range `[00, 59]`
n	%n	represents a newline character `\n`
p	%p	`AM` or `PM`
P	%P	`am` or `pm`
s	%s	seconds since 1970-01-01 00:00:00 UTC
S	%S	seconds, range `[00, 60]`
t	%t	represents the tab character `\t`
u	%u	day of the week, Monday is 1, range `[1, 7]`
w	%w	day of the week, 0 for Sunday, range `[0, 6]`
y	%y	year in range `[00, 99]`
Y	%Y	decimal representation of the year
z	%z	RFC 822/ISO 8601:1988 style time zone (e.g. `-0600` or `+0800` etc.)
Z	%Z	time zone abbreviation, such as `CST`
%	%%	represents the character `%`

Example:

# input data:
#    {
#        "a":{
#            "timestamp": "1610960605000",
#            "second":2
#        },
#        "age":47
#    }

# script
json(_, a.timestamp)
datetime(a.timestamp, 'ms', 'RFC3339')

# script
ts = timestamp()
datetime(ts, 'ns', fmt='%Y-%m-%d %H:%M:%S', tz="UTC")

# output
{
  "ts": "2023-03-08 06:43:39"
}

# script
ts = timestamp()
datetime(ts, 'ns', '%m/%d/%y  %H:%M:%S %z', "Asia/Tokyo")

# output
{
  "ts": "03/08/23  15:44:59 +0900"
}

`decode()`¶

Function prototype: fn decode(text: str, text_encode: str)

Function description: Convert text to UTF8 encoding to deal with the problem that the original log is not UTF8 encoded. Currently supported encodings are utf-16le/utf-16be/gbk/gb18030 (these encoding names can only be lowercase)

Example:

decode("wwwwww", "gbk")

# Extracted data(drop: false, cost: 33.279µs):
# {
#   "message": "wwwwww",
# }

`default_time()`¶

Function prototype: fn default_time(key: str, timezone: str = "")

Function description: Use an extracted field as the timestamp of the final data

Function parameters:

key: key name
timezone: Specifies the time zone used by the time text to be formatted, optional parameter, the default is the current system time zone, time zone example +8/-8/+8:30

The pending data supports the following formatting times

date format	date format	date format	date format
`2014-04-26 17:24:37.3186369`	`May 8, 2009 5:57:51 PM`	`2012-08-03 18:31:59.257000000`	`oct 7, 1970`
`2014-04-26 17:24:37.123`	`oct 7, '70`	`2013-04-01 22:43`	`oct. 7, 1970`
`2013-04-01 22:43:22`	`oct. 7, 70`	`2014-12-16 06:20:00 UTC`	`Mon Jan 2 15:04:05 2006`
`2014-12-16 06:20:00 GMT`	`Mon Jan 2 15:04:05 MST 2006`	`2014-04-26 05:24:37 PM`	`Mon Jan 02 15:04:05 -0700 2006`
`2014-04-26 13:13:43 +0800`	`Monday, 02-Jan-06 15:04:05 MST`	`2014-04-26 13:13:43 +0800 +08`	`Mon, 02 Jan 2006 15:04:05 MST`
`2014-04-26 13:13:44 +09:00`	`Tue, 11 Jul 2017 16:28:13 +0200 (CEST)`	`2012-08-03 18:31:59.257000000 +0000 UTC`	`Mon, 02 Jan 2006 15:04:05 -0700`
`2015-09-30 18:48:56.35272715 +0000 UTC`	`Thu, 4 Jan 2018 17:53:36 +0000`	`2015-02-18 00:12:00 +0000 GMT`	`Mon 30 Sep 2018 09:09:09 PM UTC`
`2015-02-18 00:12:00 +0000 UTC`	`Mon Aug 10 15:44:11 UTC+0100 2015`	`2015-02-08 03:02:00 +0300 MSK m=+0.000000001`	`Thu, 4 Jan 2018 17:53:36 +0000`
`2015-02-08 03:02:00.001 +0300 MSK m=+0.000000001`	`Fri Jul 03 2015 18:04:07 GMT+0100 (GMT Daylight Time)`	`2017-07-19 03:21:51+00:00`	`September 17, 2012 10:09am`
`2014-04-26`	`September 17, 2012 at 10:09am PST-08`	`2014-04`	`September 17, 2012, 10:10:09`
`2014`	`2014:3:31`	`2014-05-11 08:20:13,787`	`2014:03:31`
`3.31.2014`	`2014:4:8 22:05`	`03.31.2014`	`2014:04:08 22:05`
`08.21.71`	`2014:04:2 03:00:51`	`2014.03`	`2014:4:02 03:00:51`
`2014.03.30`	`2012:03:19 10:11:59`	`20140601`	`2012:03:19 10:11:59.3186369`
`20140722105203`	`2014 年 04 月 08 日`	`1332151919`	`2006-01-02T15:04:05+0000`
`1384216367189`	`2009-08-12T22:15:09-07:00`	`1384216367111222`	`2009-08-12T22:15:09`
`1384216367111222333`	`2009-08-12T22:15:09Z`

Example JSON extraction:

# raw json
{
    "time":"06/Jan/2017:16:16:37 +0000",
    "second":2,
    "third":"abc",
    "forth":true
}

# script
json(_, time)      # extract time field
default_time(time) # convert the extracted time field into a timestamp

# result
{
  "time": 1483719397000000000,
}

Text extraction example:

# raw log text
# 2021-01-11T17:43:51.887+0800  DEBUG io  io/io.go:458  post cost 6.87021ms

# script
grok(_, '%{TIMESTAMP_ISO8601:log_time}')   # Extract the log time and name the field log_time
default_time(log_time)                     # Convert the extracted log_time field into a timestamp

# result
{
  "log_time": 1610358231887000000,
}

# For the data collected by logging, it is better to name the time field as time, otherwise the logging collector will fill it with the current time
rename("time", log_time)

# result
{
  "time": 1610358231887000000,
}

`delete()`¶

Function prototype: fn delete(src: map[string]any, key: str)

Function description: Delete the key in the JSON map

# input
# {"a": "b", "b":[0, {"c": "d"}], "e": 1}

# script
j_map = load_json(_)

delete(j_map["b"][-1], "c")

delete(j_map, "a")

add_key("j_map", j_map)

# result:
# {
#   "j_map": "{\"b\":[0,{}],\"e\":1}",
# }

`drop()`¶

Function prototype: fn drop()

Function description: Discard the entire log without uploading

Example:

# in << {"str_a": "2", "str_b": "3"}
json(_, str_a)
if str_a == "2"{
  drop()
  exit()
}
json(_, str_b)

# Extracted data(drop: true, cost: 30.02µs):
# {
#   "message": "{\"str_a\": \"2\", \"str_b\": \"3\"}",
#   "str_a": "2"
# }

`drop_key()`¶

Function prototype: fn drop_key(key)

Function description: Delete key

Function parameters:

key: key to be deleted

Example：

# data = "{\"age\": 17, \"name\": \"zhangsan\", \"height\": 180}"

json(_, age,)
json(_, name)
json(_, height)
drop_key(height)

# result
# {
#     "age": 17,
#     "name": "zhangsan"
# }

`drop_origin_data()`¶

Function prototype: fn drop_origin_data()

Function description: Discard the initial text, otherwise the initial text is placed in the message field

Example:

# input data: {"age": 17, "name": "zhangsan", "height": 180}

# delete message field
drop_origin_data()

`duration_precision()`¶

Function prototype: fn duration_precision(key, old_precision: str, new_precision: str)

Function description: Perform duration precision conversion, and specify the current precision and target precision through Function parameters:. Support conversion between s, ms, us, ns.

Example:

# in << {"ts":12345}
json(_, ts)
cast(ts, "int")
duration_precision(ts, "ms", "ns")

# Extracted data(drop: false, cost: 33.279µs):
# {
#   "message": "{\"ts\":12345}",
#   "ts": 12345000000
# }

`exit()`¶

Function prototype: fn exit()

Function description: End the parsing of the current log, if the function drop() is not called, the parsed part will still be output

# in << {"str_a": "2", "str_b": "3"}
json(_, str_a)
if str_a == "2"{
  exit()
}
json(_, str_b)

# Extracted data(drop: false, cost: 48.233µs):
# {
#   "message": "{\"str_a\": \"2\", \"str_b\": \"3\"}",
#   "str_a": "2"
# }

`format_int()`¶

Function prototype: fn format_int(val: int, base: int) str

Function description: Converts a numeric value to a numeric string in the specified base.

Function parameters:

val: The number to be converted.
base: Base, ranging from 2 to 36; when the base is greater than 10, lowercase letters a to z are used to represent values 10 and later.

Example:

# script0
a = 7665324064912355185
b = format_int(a, 16)
if b != "6a60b39fd95aaf71" {
    add_key(abc, b)
} else {
    add_key(abc, "ok")
}

# result
'''
{
    "abc": "ok"
}
'''

# script1
a = "7665324064912355185"
b = format_int(parse_int(a, 10), 16)
if b != "6a60b39fd95aaf71" {
    add_key(abc, b)
} else {
    add_key(abc, "ok")
}

# result
'''
{
    "abc": "ok"
}
'''

`geoip()`¶

Function prototype: fn geoip(ip: str)

Function description: Append more IP information to IP. geoip() will generate additional fields, such as:

isp: operator
city: city
province: province
country: country

Function parameters:

ip: The extracted IP field supports both IPv4 and IPv6

Example:

# input data: {"ip":"1.2.3.4"}

# script
json(_, ip)
geoip(ip)

# result
{
  "city"     : "Brisbane",
  "country"  : "AU",
  "ip"       : "1.2.3.4",
  "province" : "Queensland",
  "isp"      : "unknown"
  "message"  : "{\"ip\": \"1.2.3.4\"}",
}

`get_key()`¶

Function prototype: fn get_key(key)

Function description: Read the value of key from the input point

Function parameters:

key_name: key name

Example:

add_key("city", "shanghai")

# Here you can directly access the value of the key with the same name in point through city
if city == "shanghai" {
  add_key("city_1", city)
}

# Due to the right associativity of assignment, get the value whose key is "city" first,
# Then create a variable named city
city = city + " --- ningbo" + " --- " +
    "hangzhou" + " --- suzhou ---" + ""

# get_key gets the value of "city" from point
# If there is a variable named city, it cannot be obtained directly from point
if city != get_key("city") {
  add_key("city_2", city)
}

# result
"""
{
  "city": "shanghai",
  "city_1": "shanghai",
  "city_2": "shanghai --- ningbo --- hangzhou --- suzhou ---"
}
"""

`gjson()`¶

Function prototype: fn gjson(input, json_path: str, newkey: str)

Function description: Extract specified fields from JSON, rename them as new fields, and ensure they are arranged in the original order.

Function parameters:

input: The JSON to be extracted can either be the original text (_) or a specific key after the initial extraction.
json_path: JSON path information
newkey: Write the data to the new key after extraction

# Directly extract the field x.y from the original input JSON and rename it as a new field abc.
gjson(_, "x.y", "abc")

# Extract the x.y field from a previously extracted key, and name the extracted field as x.y.
gjson(key, "x.y")

# Extract arrays, where `key` and `abc` are arrays.
gjson(key, "1.abc.2")

Example 1:

# input data:
# {"info": {"age": 17, "name": "zhangsan", "height": 180}}

# script:
gjson(_, "info", "zhangsan")
gjson(zhangsan, "name")
gjson(zhangsan, "age", "age")

# result:
{
  "age": 17,
  "message": "{\"info\": {\"age\": 17, \"name\": \"zhangsan\", \"height\": 180}}",
  "name": "zhangsan",
  "zhangsan": "{\"age\":17,\"height\":180,\"name\":\"zhangsan\"}"
}

Example 2:

# input data:
#    data = {
#        "name": {"first": "Tom", "last": "Anderson"},
#        "age":37,
#        "children": ["Sara","Alex","Jack"],
#        "fav.movie": "Deer Hunter",
#        "friends": [
#            {"first": "Dale", "last": "Murphy", "age": 44, "nets": ["ig", "fb", "tw"]},
#            {"first": "Roger", "last": "Craig", "age": 68, "nets": ["fb", "tw"]},
#            {"first": "Jane", "last": "Murphy", "age": 47, "nets": ["ig", "tw"]}
#        ]
#    }

# script:
gjson(_, "name")
gjson(name, "first")

Example 3:

# input data:
#    [
#            {"first": "Dale", "last": "Murphy", "age": 44, "nets": ["ig", "fb", "tw"]},
#            {"first": "Roger", "last": "Craig", "age": 68, "nets": ["fb", "tw"]},
#            {"first": "Jane", "last": "Murphy", "age": 47, "nets": ["ig", "tw"]}
#    ]

# scripts for JSON list:
gjson(_, "0.nets.1")

`grok()`¶

Function prototype: fn grok(input: str, pattern: str, trim_space: bool = true) bool

Function description: Extract the contents of the text string input by pattern, and return true when pattern matches input successfully, otherwise return false.

Function parameters:

input：The text to be extracted can be the original text (_) or a key after the initial extraction
pattern: grok expression, the data type of the specified key is supported in the expression: bool, float, int, string (corresponding to Pipeline's str, can also be written as str), the default is string
trim_space: Delete the leading and trailing blank characters in the extracted characters, the default value is true

grok(_, pattern)    #Use the entered text directly as raw data
grok(key, pattern)  # For a key that has been extracted before, do grok again

示例：

# input data: "12/01/2021 21:13:14.123"

# script
add_pattern("_second", "(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)")
add_pattern("_minute", "(?:[0-5][0-9])")
add_pattern("_hour", "(?:2[0123]|[01]?[0-9])")
add_pattern("time", "([^0-9]?)%{_hour:hour:string}:%{_minute:minute:int}(?::%{_second:second:float})([^0-9]?)")

grok_match_ok = grok(_, "%{DATE_US:date} %{time}")

add_key(grok_match_ok)

# result
{
  "date": "12/01/2021",
  "hour": "21",
  "message": "12/01/2021 21:13:14.123",
  "minute": 13,
  "second": 14.123
}

{
  "date": "12/01/2021",
  "grok_match_ok": true,
  "hour": "21",
  "message": "12/01/2021 21:13:14.123",
  "minute": 13,
  "second": 14.123,
  "status": "unknown",
  "time": 1665994187473917724
}

`group_between()`¶

Function description: If the key value is within the specified range between (note: it can only be a single interval, such as [0,100]), a new field can be created and assigned a new value. If no new field is provided, the original field value will be overwritten

Example 1:

# input data: {"http_status": 200, "code": "success"}

json(_, http_status)

# If the field http_status value is within the specified range, change its value to "OK"
group_between(http_status, [200, 300], "OK")

# result
# {
#     "http_status": "OK"
# }

Example 2:

# input data: {"http_status": 200, "code": "success"}

json(_, http_status)

# If the value of the field http_status is within the specified range, create a new status field with the value "OK"
group_between(http_status, [200, 300], "OK", status)

# result
{
    "http_status": 200,
    "status": "OK"
}

`group_in()`¶

Function description: If the key value is in the list in, a new field can be created and assigned the new value. If no new field is provided, the original field value will be overwritten

Example:

# If the field log_level value is in the list, change its value to "OK"
group_in(log_level, ["info", "debug"], "OK")

# If the field http_status value is in the specified list, create a new status field with the value "not-ok"
group_in(log_level, ["error", "panic"], "not-ok", status)

`hash()`¶

Function prototype: fn hash(text: str, method: str) -> str

Function description: Calculate the hash of the text

Function parameters:

text: input text
method: Hash algorithm, allowing values including md5, sha1, sha256, sha512

Example:

pt_kvs_set("md5sum", hash("abc", "sha1"))

`http_request()`¶

Function prototype: fn http_request(method: str, url: str, headers: map, body: any) map

Function description: Send an HTTP request, receive the response, and encapsulate it into a map

Function parameters:

method: GET|POST
url: Request path
headers: Additional header，the type is map[string]string
body: Request body

Return type: map

key contains status code (status_code) and result body (body)

status_code: Status code
body: Response body

Example:

resp = http_request("GET", "http://localhost:8080/testResp")
resp_body = load_json(resp["body"])

add_key(abc, resp["status_code"])
add_key(abc, resp_body["a"])

`json()`¶

Function prototype: fn json(input: str, json_path, newkey, trim_space: bool = true)

Function description: Extract the specified field in JSON and name it as a new field.

Function parameters:

input: The JSON to be extracted can be the original text (_) or a key after the initial extraction
json_path: JSON path information
newkey：Write the data to the new key after extraction
trim_space: Delete the leading and trailing blank characters in the extracted characters, the default value is true
delete_after_extract: After extract delete the extracted info from input. Only map key and map value are deletable, list(array) are not supported. Default is `false'.

# Directly extract the x.y field in the original input json, and name it as a new field abc
json(_, x.y, abc)

# For a `key` that has been extracted, extract `x.y` again, and the extracted field name is `x.y`
json(key, x.y)

Example 1:

# input data: 
# {"info": {"age": 17, "name": "zhangsan", "height": 180}}

# script:
json(_, info, "zhangsan")
json(zhangsan, name)
json(zhangsan, age, "age")

# result:
{
  "age": 17,
  "message": "{\"info\": {\"age\": 17, \"name\": \"zhangsan\", \"height\": 180}}",
  "name": "zhangsan",
  "zhangsan": "{\"age\":17,\"height\":180,\"name\":\"zhangsan\"}"
}

Example 2:

# input data:
#    data = {
#        "name": {"first": "Tom", "last": "Anderson"},
#        "age":37,
#        "children": ["Sara","Alex","Jack"],
#        "fav.movie": "Deer Hunter",
#        "friends": [
#            {"first": "Dale", "last": "Murphy", "age": 44, "nets": ["ig", "fb", "tw"]},
#            {"first": "Roger", "last": "Craig", "age": 68, "nets": ["fb", "tw"]},
#            {"first": "Jane", "last": "Murphy", "age": 47, "nets": ["ig", "tw"]}
#        ]
#    }

# script:
json(_, name) 
json(name, first)

Example 3:

# input data:
#    [
#            {"first": "Dale", "last": "Murphy", "age": 44, "nets": ["ig", "fb", "tw"]},
#            {"first": "Roger", "last": "Craig", "age": 68, "nets": ["fb", "tw"]},
#            {"first": "Jane", "last": "Murphy", "age": 47, "nets": ["ig", "tw"]}
#    ]

# script:
json(_, .[0].nets[-1])

Example 4:

# input data:
{"item": " not_space ", "item2":{"item3": [123]}}

# script:
json(_, item2.item3, item, delete_after_extract = true)

# result:
{
  "item": "[123]",
  "message": "{\"item\":\" not_space \",\"item2\":{}}",
}

Example 5:

# input data:
{"item": " not_space ", "item2":{"item3": [123]}}

# If you try to remove a list element it will fail the script check.
# Script:
json(_, item2.item3[0], item, delete_after_extract = true)


# test command:
# datakit pipeline -P j2.p -T '{"item": " not_space ", "item2":{"item3": [123]}}'
# report error:
# [E] j2.p:1:54: does not support deleting elements in the list

`kv_split()`¶

Function prototype: fn kv_split(key, field_split_pattern = " ", value_split_pattern = "=", trim_key = "", trim_value = "", include_keys = [], prefix = "") -> bool

Function description: extract all key-value pairs from a string

Function parameters:

key: key name
include_keys: list of key names, only extract the keys in the list; the default value is [], do not extract any key
field_split_pattern: string splitting, a regular expression used to extract all key-value pairs; the default value is " "
value_split_pattern: used to split the key and value from the key-value pair string, non-recursive; the default value is "="
trim_key: delete all the specified characters leading and trailing the extracted key; the default value is ""
trim_value: remove all leading and trailing characters from the extracted value; the default value is ""
prefix: add prefix to all keys

Example:

# input: "a=1, b=2 c=3"
kv_split(_)

'''output:
{
  "message": "a=1, b=2 c=3",
  "status": "unknown",
  "time": 1679558730846377132
}
'''

# input: "a=1, b=2 c=3"
kv_split(_, include_keys=["a", "c", "b"])

'''output:
{
  "a": "1,",
  "b": "2",
  "c": "3",
  "message": "a=1 b=2 c=3",
  "status": "unknown",
  "time": 1678087119072769560
}
'''

# input: "a=1, b=2 c=3"
kv_split(_, trim_value=",", include_keys=["a", "c", "b"])

'''output:
{
  "a": "1",
  "b": "2",
  "c": "3",
  "message": "a=1, b=2 c=3",
  "status": "unknown",
  "time": 1678087173651846101
}
'''

# input: "a=1, b=2 c=3"
kv_split(_, trim_value=",", include_keys=["a", "c"])

'''output:
{
  "a": "1",
  "c": "3",
  "message": "a=1, b=2 c=3",
  "status": "unknown",
  "time": 1678087514906492912
}
'''

# input: "a::1,+b::2+c::3" 
kv_split(_, field_split_pattern="\\+", value_split_pattern="[:]{2}",
    prefix="with_prefix_",trim_value=",", trim_key="a", include_keys=["a", "b", "c"])

'''output:
{
  "message": "a::1,+b::2+c::3",
  "status": "unknown",
  "time": 1678087473255241547,
  "with_prefix_b": "2",
  "with_prefix_c": "3"
}
'''

`len()`¶

Function prototype: fn len(val: str|map|list) int

Function description: Calculate the number of bytes in string, the number of elements in map and list.

Function parameters:

val: Can be map, list or string

Example:

# example 1
add_key(abc, len("abc"))
# result
{
 "abc": 3,
}

# example 2
add_key(abc, len(["abc"]))
# result
{
  "abc": 1,
}

`load_json()`¶

Function prototype: fn load_json(val: str) nil|bool|float|map|list

Function description: Convert the JSON string to one of map, list, nil, bool, float, and the value can be obtained and modified through the index expression.If deserialization fails, it also returns nil instead of terminating the script run.

Function parameters:

val: Requires data of type string.

Example:

# _: {"a":{"first": [2.2, 1.1], "ff": "[2.2, 1.1]","second":2,"third":"aBC","forth":true},"age":47}
abc = load_json(_)

add_key(abc, abc["a"]["first"][-1])

abc["a"]["first"][-1] = 11

# Need to synchronize the data on the stack to point
add_key(abc, abc["a"]["first"][-1])

add_key(len_abc, len(abc))

add_key(len_abc, len(load_json(abc["a"]["ff"])))

`lowercase()`¶

Function prototype: fn lowercase(key: str)

Function description: Convert the content of the extracted key to lowercase

Function parameters:

key: Specify the extracted field name to be converted

Example:

# input data: {"first": "HeLLo","second":2,"third":"aBC","forth":true}

# script
json(_, first) lowercase(first)

# result
{
    "first": "hello"
}

`match()`¶

Function prototype: fn match(pattern: str, s: str) bool

Function description: Use the specified regular expression to match the string, return true if the match is successful, otherwise return false

Function parameters:

pattern: regular expression
s: string to match

Example:

# script
test_1 = "pattern 1,a"
test_2 = "pattern -1,"

add_key(match_1, match('''\w+\s[,\w]+''', test_1)) 

add_key(match_2, match('''\w+\s[,\w]+''', test_2)) 

# result
{
    "match_1": true,
    "match_2": false
}

`mquery_refer_table()`¶

Function prototype: fn mquery_refer_table(table_name: str, keys: list, values: list)

Function description: Query the external reference table by specifying multiple keys, and append all columns of the first row of the query result to field. This function does not work with central Pipeline.

Function parameters:

table_name: the name of the table to be looked up
keys: a list of multiple column names
values: the values corresponding to each column

Example:

json(_, table)
json(_, key)
json(_, value)

# Query and append the data of the current column, which is added to the data as a field by default
mquery_refer_table(table, values=[value, false], keys=[key, "col4"])

# result

# {
#   "col": "ab",
#   "col2": 1234,
#   "col3": 1235,
#   "col4": false,
#   "key": "col2",
#   "message": "{\"table\": \"table_abc\", \"key\": \"col2\", \"value\": 1234.0}",
#   "status": "unknown",
#   "table": "table_abc",
#   "time": "2022-08-16T16:23:31.940600281+08:00",
#   "value": 1234
# }

`nullif()`¶

Function prototype: fn nullif(key, value)

Function description: If the content of the field specified by the extracted key is equal to the value of value, delete this field

Function parameters:

key: specified field
value: target value

Example:

# input data: {"first": 1,"second":2,"third":"aBC","forth":true}

# script
json(_, first) json(_, second) nullif(first, "1")

# result
{
    "second":2
}

Note: This feature can be implemented with if/else semantics:

if first == "1" {
    drop_key(first)
}

`parse_date()`¶

Function prototype: fn parse_date(key: str, yy: str, MM: str, dd: str, hh: str, mm: str, ss: str, ms: str, zone: str)

Function description: Convert the value of each part of the incoming date field into a timestamp

Function parameters:

key: newly inserted field
yy : Year numeric string, supports four or two digit strings, if it is an empty string, the current year will be used when processing
MM: month string, supports numbers, English, English abbreviation
dd: day string
hh: hour string
mm: minute string
ss: seconds string
ms: milliseconds string
us: microseconds string
ns: string of nanoseconds
zone: time zone string, in the form of "+8" or \"Asia/Shanghai\"

Example:

parse_date(aa, "2021", "May", "12", "10", "10", "34", zone="Asia/Shanghai") # Result aa=1620785434000000000

parse_date(aa, "2021", "12", "12", "10", "10", "34", zone="Asia/Shanghai") # result aa=1639275034000000000

parse_date(aa, "2021", "12", "12", "10", "10", "34", "100", zone="Asia/Shanghai") # Result aa=1639275034000000100

parse_date(aa, "20", "February", "12", "10", "10", "34", zone="+8") result aa=1581473434000000000

`parse_duration()`¶

Function prototype: fn parse_duration(key: str)

Function description: If the value of key is a golang duration string (such as 123ms), then key will be automatically parsed into an integer in nanoseconds

The current duration units in golang are as follows:

ns nanoseconds
us/µs microseconds
ms milliseconds
s seconds
m minutes
h hours

Function parameters:

key: the field to be parsed

Example:

# assume abc = "3.5s"
parse_duration(abc) # result abc = 3500000000

# Support negative numbers: abc = "-3.5s"
parse_duration(abc) # result abc = -3500000000

# support floating point: abc = "-2.3s"
parse_duration(abc) # result abc = -2300000000

`parse_int()`¶

Function prototype: fn parse_int(val: int, base: int) int

Function description: Converts the string representation of a numeric value to a numeric value.

Function parameters:

val: The string to be converted.
base: Base, the range is 0, or 2 to 36; when the value is 0, the base is judged according to the string prefix.

Example:

# script0
a = "7665324064912355185"
b = format_int(parse_int(a, 10), 16)
if b != "6a60b39fd95aaf71" {
    add_key(abc, b)
} else {
    add_key(abc, "ok")
}

# result
'''
{
    "abc": "ok"
}
'''

# script1
a = "6a60b39fd95aaf71" 
b = parse_int(a, 16)            # base 16
if b != 7665324064912355185 {
    add_key(abc, b)
} else {
    add_key(abc, "ok")
}

# result
'''
{
    "abc": "ok"
}
'''


# script2
a = "0x6a60b39fd95aaf71" 
b = parse_int(a, 0)            # the true base is implied by the string's 
if b != 7665324064912355185 {
    add_key(abc, b)
} else {
    c = format_int(b, 16)
    if "0x"+c != a {
        add_key(abc, c)
    } else {
        add_key(abc, "ok")
    }
}


# result
'''
{
    "abc": "ok"
}
'''

`point_window()`¶

Function prototype: fn point_window(before: int, after: int, stream_tags = ["filepath", "host"])

Function description: Record the discarded data and use it with the window_hit function to upload the discarded context Point data.

Function parameters:

before: The maximum number of points that can be temporarily stored before the function window_hit is executed, and the data that has not been discarded is included in the count.
after: The number of points retained after the window_hit function is executed, and the data that has not been discarded is included in the count.
stream_tags: Differentiate log (metrics, tracing, etc.) streams by labels on the data, the default number using filepath and host can be used to distinguish logs from the same file.

Example:

# It is recommended to place it in the first line of the script
#
point_window(8, 8)

# If it is a panic log, keep the first 8 entries 
# and the last 8 entries (including the current one)
#
if grok(_, "abc.go:25 panic: xxxxxx") {
    # This function will only take effect if point_window() is executed during this run.
    # Trigger data recovery behavior within the window
    #
    window_hit()
}

# By default, all logs whose service is test_app are discarded;
# If it contains panic logs, keep the 15 adjacent ones and the current one.
#
if service == "test_app" {
    drop()
}

`pt_kvs_del()`¶

Function prototype: fn pt_kvs_del(name: str)

Function description: Delete the key specified in Point

Function parameters:

name: Key to be deleted

Example:

key_blacklist = ["k1", "k2", "k3"]
for k in pt_kvs_keys() {
    if k in key_blacklist {
        pt_kvs_del(k)
    }
}

`pt_kvs_get()`¶

Function prototype: fn pt_kvs_get(name: str) -> any

Function description: Return the value of the specified key in Point

Function parameters:

name: Key name

Example:

host = pt_kvs_get("host")

`pt_kvs_keys()`¶

Function prototype: fn pt_kvs_keys(tags: bool = true, fields: bool = true) -> list

Function description: Return the key list in Point

Function parameters:

tags: Whether to include the names of all tags
fields: Whether to include the names of all fields

Example:

for k in pt_kvs_keys() {
    if match("^prefix_", k) {
        pt_kvs_del(k)
    }
}

`pt_kvs_set()`¶

Function prototype: fn pt_kvs_set(name: str, value: any, as_tag: bool = false) -> bool

Function description: Add a key to a Point or modify the value of a key in a Point

Function parameters:

name: The name of the field or label to be added or modified
value: The value of a field or label
as_tag: Set as tag or not

Example:

kvs = {
    "a": 1,
    "b": 2
}

for k in kvs {
    pt_kvs_set(k, kvs[k])
}

`pt_name()`¶

Function prototype: fn pt_name(name: str = "") -> str

Function description: Get the name of point; if the parameter is not empty, set the new name.

Function parameters:

name: Value as point name; defaults to empty string.

The field mapping relationship between point name and various types of data storage:

category	field name
custom_object	class
keyevent	-
logging	source
metric	-
network	source
object	class
profiling	source
rum	source
security	rule
tracing	source

`query_refer_table()`¶

Function prototype: fn query_refer_table(table_name: str, key: str, value)

Function description: Query the external reference table through the specified key, and append all the columns of the first row of the query result to field. This function does not work with central Pipeline.

Function parameters:

table_name: the name of the table to be looked up
key: column name
value: the value corresponding to the column

Example:

# extract table name, column name, column value from input
json(_, table)
json(_, key)
json(_, value)

# Query and append the data of the current column, which is added to the data as a field by default
query_refer_table(table, key, value)

Result:

{
   "col": "ab",
   "col2": 1234,
   "col3": 123,
   "col4": true,
   "key": "col2",
   "message": "{\"table\": \"table_abc\", \"key\": \"col2\", \"value\": 1234.0}",
   "status": "unknown",
   "table": "table_abc",
   "time": "2022-08-16T15:02:14.158452592+08:00",
   "value": 1234
}

`rename()`¶

Function prototype: fn rename(new_key, old_key)

Function description: Rename the extracted fields

Function parameters:

new_key: new field name
old_key: the extracted field name

Example:

# Rename the extracted abc field to abc1
rename('abc1', abc)

# or

rename(abc1, abc)

# Data to be processed: {"info": {"age": 17, "name": "zhangsan", "height": 180}}

# process script
json(_, info.name, "name")

# process result
{
   "message": "{\"info\": {\"age\": 17, \"name\": \"zhangsan\", \"height\": 180}}",
   "zhangsan": {
     "age": 17,
     "height": 180,
     "Name": "zhangsan"
   }
}

`replace()`¶

Function prototype: fn replace(key: str, regex: str, replace_str: str)

Function description: Replace the string data obtained on the specified field according to regular rules

Function parameters:

key: the field to be extracted
regex: regular expression
replace_str: string to replace

Example:

# Phone number: {"str_abc": "13789123014"}
json(_, str_abc)
replace(str_abc, "(1[0-9]{2})[0-9]{4}([0-9]{4})", "$1****$2")

# English name {"str_abc": "zhang san"}
json(_, str_abc)
replace(str_abc, "([a-z]*) \\w*", "$1 ***")

# ID number {"str_abc": "362201200005302565"}
json(_, str_abc)
replace(str_abc, "([1-9]{4})[0-9]{10}([0-9]{4})", "$1**********$2")

# Chinese name {"str_abc": "Little Aka"}
json(_, str_abc)
replace(str_abc, '([\u4e00-\u9fa5])[\u4e00-\u9fa5]([\u4e00-\u9fa5])', "$1＊$2")

`sample()`¶

Function prototype: fn sample(p)

Function description: Choose to collect/discard data with probability p.

Function parameters:

p: the probability that the sample function returns true, the value range is [0, 1]

Example:

# process script
if !sample(0.3) { # sample(0.3) indicates that the sampling rate is 30%, that is, it returns true with a 30% probability, and 70% of the data will be discarded here
   drop() # mark the data to be discarded
   exit() # Exit the follow-up processing process
}

`set_measurement()`¶

Function prototype: fn set_measurement(name: str, delete_key: bool = false)

Function description: change the name of the line protocol

Function parameters:

name: The value is used as the measurement name, which can be passed in as a string constant or variable
delete_key: If there is a tag or field with the same name as the variable in point, delete it

The field mapping relationship between row protocol name and various types of data storage or other purposes:

category	field name	other usage
custom_object	class	-
keyevent	-	-
logging	source	-
metric	-	metric set name
network	source	-
object	class	-
profiling	source	-
rum	source	-
security	rule	-
tracing	source	-

`set_tag()`¶

Function prototype: fn set_tag(key, value: str)

Function description: mark the specified field as tag output, after setting as tag, other functions can still operate on the variable. If the key set as a tag is a field that has been cut out, it will not appear in the field, so as to avoid the same name of the cut out field key as the tag key on the existing data

Function parameters:

key: the field to be tagged
value: can be a string literal or a variable

# in << {"str": "13789123014"}
set_tag(str)
json(_, str) # str == "13789123014"
replace(str, "(1[0-9]{2})[0-9]{4}([0-9]{4})", "$1****$2")
# Extracted data(drop: false, cost: 49.248µs):
# {
# "message": "{\"str\": \"13789123014\", \"str_b\": \"3\"}",
# "str#": "137****3014"
# }
# * The character `#` is only the tag whose field is tag when datakit --pl <path> --txt <str> output display

# in << {"str_a": "2", "str_b": "3"}
json(_, str_a)
set_tag(str_a, "3") # str_a == 3
# Extracted data(drop: false, cost: 30.069µs):
# {
# "message": "{\"str_a\": \"2\", \"str_b\": \"3\"}",
# "str_a#": "3"
# }


# in << {"str_a": "2", "str_b": "3"}
json(_, str_a)
json(_, str_b)
set_tag(str_a, str_b) # str_a == str_b == "3"
# Extracted data(drop: false, cost: 32.903µs):
# {
# "message": "{\"str_a\": \"2\", \"str_b\": \"3\"}",
# "str_a#": "3",
# "str_b": "3"
# }

`setopt()`¶

Function prototype: fn setopt(status_mapping: bool = true)

Function description: Modify Pipeline settings, parameters must be in the form of key=value

Function parameters:

status_mapping: Set the mapping function of the status field of log data, enabled by default

Example:

# Disable the mapping function for the status field
setopt(status_mapping=false)

add_key("status", "w")

# Processing result
{
"status": "w",
}

# Enable the mapping function for the status field by default
setopt(status_mapping=true)

add_key("status", "w")

# Processing result
{
"status": "warning",
}

`slice_string()`¶

Function prototype: fn slice_string(name: str, start: int, end: int) -> str

Function description: Returns the substring of the string from index start to end.

Function Parameters:

name: The string to be sliced
start: The starting index of the substring (inclusive)
end: The ending index of the substring (exclusive)

Example:

substring = slice_string("15384073392", 0, 3)
# substring will be "153"

`sql_cover()`¶

Function prototype: fn sql_cover(sql_test: str)

Function description: desensitized SQL statement

Example:

# in << {"select abc from def where x > 3 and y < 5"}
sql_cover(_)

# Extracted data(drop: false, cost: 33.279µs):
# {
# "message": "select abc from def where x > ? and y < ?"
# }

`strfmt()`¶

Function prototype: fn strfmt(key, fmt: str, args ...: int|float|bool|str|list|map|nil)

Function description: Format the content of the field specified by the extracted arg1, arg2, ... according to fmt, and write the formatted content into the key field

Function parameters:

key: Specify the field name of the formatted data to be written
fmt: format string template
args: Variable Function parameters:, which can be multiple extracted field names to be formatted

Example:

# Data to be processed: {"a":{"first":2.3,"second":2,"third":"abc","forth":true},"age":47}

# process script
json(_, a.second)
json(_, a.thrid)
cast(a. second, "int")
json(_, a.forth)
strfmt(bb, "%v %s %v", a.second, a.thrid, a.forth)

`strlen()`¶

函数原型：fn strlen(val: str) int

函数说明：计算字符串的字符数量，而不是字节数。

参数：

val: 输入字符串

示例：

add_key("len_char", strlen("hello 你好"))
add_key("len_byte", strlen("hello 你好"))

输出：

{
 "len_char": 8,
 "len_byte": 12
}

`timestamp()`¶

Function prototype: fn timestamp(precision: str = "ns") -> int

Function description: 返回当前 Unix 时间戳，默认精度为 ns

Function parameters:

precision: 时间戳精度，取值范围为 "ns", "us", "ns", "s", 默认值 "ns"。

Example:

# process script
add_key(time_now_record, timestamp())

datetime(time_now_record, "ns", 
    "%Y-%m-%d %H:%M:%S", "UTC")


# process result
{
  "time_now_record": "2023-03-07 10:41:12"
}

# process script
add_key(time_now_record, timestamp())

datetime(time_now_record, "ns", 
    "%Y-%m-%d %H:%M:%S", "Asia/Shanghai")


# process result
{
  "time_now_record": "2023-03-07 18:41:49"
}

# process script
add_key(time_now_record, timestamp("ms"))


# process result
{
  "time_now_record": 1678185980578
}

`trim()`¶

Function prototype: fn trim(key, cutset: str = "")

Function description: delete the characters specified at the beginning and end of the key, and delete all blank characters by default when the cutset is an empty string

Function parameters:

key: a field that has been extracted, string type
cutset: Delete the first and last characters in the cutset string in the key

Example:

# Data to be processed: "trim(key, cutset)"

# process script
add_key(test_data, "ACCAA_test_DataA_ACBA")
trim(test_data, "ABC_")

# process result
{
   "test_data": "test_Data"
}

`uppercase()`¶

Function prototype: fn uppercase(key: str)

Function description: Convert the content in the extracted key to uppercase

Function parameters:

key: Specify the extracted field name to be converted, and convert the content of key to uppercase

Example:

# Data to be processed: {"first": "hello","second":2,"third":"aBC","forth":true}

# process script
json(_, first) uppercase(first)

# process result
{
    "first": "HELLO"
}

`url_decode()`¶

Function prototype: fn url_decode(key: str)

Function description: parse the URL in the extracted key into plain text

Function parameters:

key: a key that has been extracted

Example:

# Data to be processed: {"url":"http%3a%2f%2fwww.baidu.com%2fs%3fwd%3d%e6%b5%8b%e8%af%95"}

# process script
json(_, url) url_decode(url)

# process result
{
   "message": "{"url":"http%3a%2f%2fwww.baidu.com%2fs%3fwd%3d%e6%b5%8b%e8%af%95"}",
   "url": "http://www.baidu.com/s?wd=test"
}

`url_parse()`¶

Function prototype: fn url_parse(key)

Function description: parse the url whose field name is key.

Function parameters:

key: field name of the url to parse.

Example:

# Data to be processed: {"url": "https://www.baidu.com"}

# process script
json(_, url)
m = url_parse(url)
add_key(scheme, m["scheme"])

# process result
{
     "url": "https://www.baidu.com",
     "scheme": "https"
}

The above example extracts its scheme from the url. In addition, it can also extract information such as host, port, path, and Function parameters: carried in the url from the url, as shown in the following example:

# Data to be processed: {"url": "https://www.google.com/search?q=abc&sclient=gws-wiz"}

# process script
json(_, url)
m = url_parse(url)
add_key(sclient, m["params"]["sclient"]) # The Function parameters: carried in the url are saved under the params field
add_key(h, m["host"])
add_key(path, m["path"])

# process result
{
     "url": "https://www.google.com/search?q=abc&sclient=gws-wiz",
     "h": "www.google.com",
     "path": "/search",
     "sclient": "gws-wiz"
}

`use()`¶

Function prototype: fn use(name: str)

Parameter:

name: script name, such as abp.p

Function description: call other scripts, all current data can be accessed in the called script

Example:

# Data to be processed: {"ip":"1.2.3.4"}

# Process script a.p
use(\"b.p\")

# Process script b.p
json(_, ip)
geoip (ip)

# Execute the processing result of script a.p
{
   "city" : "Brisbane",
   "country" : "AU",
   "ip" : "1.2.3.4",
   "province" : "Queensland",
   "isp" : "unknown"
   "message" : "{\"ip\": \"1.2.3.4\"}",
}

`user_agent()`¶

Function prototype: fn user_agent(key: str)

Function description: Obtain client information on the specified field

Function parameters:

key: the field to be extracted

user_agent() will generate multiple fields, such as:

os: operating system
browser: browser

Example:

# data to be processed
# {
# "userAgent" : "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36",
# "second" : 2,
# "third" : "abc",
# "forth" : true
# }

json(_, userAgent) user_agent(userAgent)

`valid_json()`¶

Function prototype: fn valid_json(val: str) bool

Function description: Determine if it is a valid JSON string.

Function parameters:

val: Requires data of type string.

Example:

a = "null"
if valid_json(a) { # true
    if load_json(a) == nil {
        add_key("a", "nil")
    }
}

b = "[1, 2, 3]"
if valid_json(b) { # true
    add_key("b", load_json(b))
}

c = "{\"a\": 1}"
if valid_json(c) { # true
    add_key("c", load_json(c))
}

d = "???{\"d\": 1}"
if valid_json(d) { # true
    add_key("d", load_json(c))
} else {
    add_key("d", "invalid json")
}

Result:

{
  "a": "nil",
  "b": "[1,2,3]",
  "c": "{\"a\":1}",
  "d": "invalid json",
}

`value_type()`¶

Function prototype: fn value_type(val) str

Function description: Obtain the type of the variable's value and return the value range ["int", "float", "bool", "str", "list", "map", "]. If the value is nil, return an empty string.

Function parameters:

val: The value of the type to be determined.

Example:

Input:

{"a":{"first": [2.2, 1.1], "ff": "[2.2, 1.1]","second":2,"third":"aBC","forth":true},"age":47}

Script:

d = load_json(_)

if value_type(d) == "map" && "a" in d  {
    add_key("val_type", value_type(d["a"]))
}

Output:

// Fields
{
  "message": "{\"a\":{\"first\": [2.2, 1.1], \"ff\": \"[2.2, 1.1]\",\"second\":2,\"third\":\"aBC\",\"forth\":true},\"age\":47}",
  "val_type": "map"
}

`window_hit()`¶

Function prototype: fn window_hit()

Function description: Trigger the recovery event of the context discarded data, and recover from the data recorded by the point_window function。

Function parameters: None

Example:

# It is recommended to place it in the first line of the script
#
point_window(8, 8)

# If it is a panic log, keep the first 8 entries 
# and the last 8 entries (including the current one)
#
if grok(_, "abc.go:25 panic: xxxxxx") {
    # This function will only take effect if point_window() is executed during this run.
    # Trigger data recovery behavior within the window
    #
    window_hit()
}

# By default, all logs whose service is test_app are discarded;
# If it contains panic logs, keep the 15 adjacent ones and the current one.
#
if service == "test_app" {
    drop()
}

`xml()`¶

Function prototype: fn xml(input: str, xpath_expr: str, key_name)

Function description: Extract fields from XML through xpath expressions.

Function parameters:

input: XML to extract
xpath_expr: xpath expression
key_name: The extracted data is written to a new key

Example one:

# data to be processed
        <entry>
         <fieldx>valuex</fieldx>
         <fieldy>...</fieldy>
         <fieldz>...</fieldz>
         <field array>
             <fielda>element_a_1</fielda>
             <fielda>element_a_2</fielda>
         </fieldarray>
     </entry>

# process script
xml(_, '/entry/fieldarray//fielda[1]/text()', field_a_1)

# process result
{
   "field_a_1": "element_a_1", # extracted element_a_1
   "message": "\t\t\u003centry\u003e\n \u003cfieldx\u003evaluex\u003c/fieldx\u003e\n \u003cfieldy\u003e...\u003c/fieldy\u003e\n \u003cfieldz\u003e...\ u003c/fieldz\u003e\n \u003cfieldarray\u003e\n \u003cfielda\u003eelement_a_1\u003c/fielda\u003e\n \u003cfielda\u003eelement_a_2\u003c/fielda\u003e\n \u003c/fieldarray\n\c\u003 u003e",
   "status": "unknown",
   "time": 1655522989104916000
}

Example two:

# data to be processed
<OrderEvent actionCode = "5">
  <OrderNumber>ORD12345</OrderNumber>
  <VendorNumber>V11111</VendorNumber>
</OrderEvent>

# process script
xml(_, '/OrderEvent/@actionCode', action_code)
xml(_, '/OrderEvent/OrderNumber/text()', OrderNumber)

# process result
{
   "OrderNumber": "ORD12345",
   "action_code": "5",
   "message": "\u003cOrderEvent actionCode = \"5\"\u003e\n \u003cOrderNumber\u003eORD12345\u003c/OrderNumber\u003e\n \u003cVendorNumber\u003eV11111\u003c/VendorNumber\n\u003e\u003e"
   "status": "unknown",
   "time": 1655523193632471000
}

Built-in Function¶

Function List¶

add_key()¶

add_pattern()¶

adjust_timezone()¶

agg_create()¶

agg_metric()¶

append()¶

b64dec()¶

b64enc()¶

cache_get()¶

cache_set()¶

cast()¶

cidr()¶

conv_traceid_w3c_to_dd()¶

cover()¶

create_point()¶

datetime()¶

decode()¶

default_time()¶

delete()¶

drop()¶

drop_key()¶

drop_origin_data()¶

duration_precision()¶

exit()¶

format_int()¶

geoip()¶

get_key()¶

gjson()¶

grok()¶

group_between()¶

group_in()¶

hash()¶

http_request()¶

json()¶

kv_split()¶

len()¶

load_json()¶

lowercase()¶

match()¶

mquery_refer_table()¶

nullif()¶

parse_date()¶

parse_duration()¶

parse_int()¶

point_window()¶

pt_kvs_del()¶

pt_kvs_get()¶

pt_kvs_keys()¶

pt_kvs_set()¶

pt_name()¶

query_refer_table()¶

rename()¶

replace()¶

sample()¶

set_measurement()¶

set_tag()¶

setopt()¶

slice_string()¶

sql_cover()¶

strfmt()¶

strlen()¶

timestamp()¶

trim()¶

uppercase()¶

url_decode()¶

url_parse()¶

use()¶

user_agent()¶

valid_json()¶

value_type()¶

window_hit()¶

xml()¶

Is this page helpful? ×

`add_key()`¶

`add_pattern()`¶

`adjust_timezone()`¶

`agg_create()`¶

`agg_metric()`¶

`append()`¶

`b64dec()`¶

`b64enc()`¶

`cache_get()`¶

`cache_set()`¶

`cast()`¶

`cidr()`¶

`conv_traceid_w3c_to_dd()`¶

`cover()`¶

`create_point()`¶

`datetime()`¶

`decode()`¶

`default_time()`¶

`delete()`¶

`drop()`¶

`drop_key()`¶

`drop_origin_data()`¶

`duration_precision()`¶

`exit()`¶

`format_int()`¶

`geoip()`¶

`get_key()`¶

`gjson()`¶

`grok()`¶

`group_between()`¶

`group_in()`¶

`hash()`¶

`http_request()`¶

`json()`¶

`kv_split()`¶

`len()`¶

`load_json()`¶

`lowercase()`¶

`match()`¶

`mquery_refer_table()`¶

`nullif()`¶

`parse_date()`¶

`parse_duration()`¶

`parse_int()`¶

`point_window()`¶

`pt_kvs_del()`¶

`pt_kvs_get()`¶

`pt_kvs_keys()`¶

`pt_kvs_set()`¶

`pt_name()`¶

`query_refer_table()`¶

`rename()`¶

`replace()`¶

`sample()`¶

`set_measurement()`¶

`set_tag()`¶

`setopt()`¶

`slice_string()`¶

`sql_cover()`¶

`strfmt()`¶

`strlen()`¶

`timestamp()`¶

`trim()`¶

`uppercase()`¶

`url_decode()`¶

`url_parse()`¶

`use()`¶

`user_agent()`¶

`valid_json()`¶

`value_type()`¶

`window_hit()`¶

`xml()`¶