The following is the DataFlux Query Language (dql) definition. With the gradual support of different syntax, the document will be adjusted, added and deleted to varying degrees.
The whole constraints are as follows:
Non-keywords (such as metric name and tag name) are case sensitive; keywords and function names are case-insensitive.
With # as line comment character; inline comment is not supported.
Supported operators:
+ - addition
- - subtraction
* - multiplication
/ - division
% - modulo
= - equal to
!= - not equal to
<= - greater than or equal to
< - less than
>= - greater than or equal to
> - greater than
^ - exponential operation
&& - logical and
|| - logical or
Supported keywords:
AND AS ASC AUTO
BY DESC FALSE
LIMIT LINEAR
NIL OFFSET OR PREVIOUS
SLIMIT SOFFSET TRUE
Identifiers: There are several forms of identifiers, which are compatible with various variable naming forms.
Only [_a-zA-Z0-9] can appear in normal variable names, and the first character cannot be a number, such as _abc, _abc123, _123ab.
Other forms of processing variable name:
this+is-a*xx/yy^zz?variable, by should be written as `this+is-a*xx/yy^zz?variable`, `by`, with operators in the former variable, and by in the latter is the DQL keyword.
Support UTF8 identifiers such as Chinese, such as M::cpu:(usage AS 使用率) [5m]
Support emoji: M::cpu:(usage AS 使用率👍) [5m]
The variable is enclosed in an inverse quotation mark, this`is-a-vairalbe should be modified by `identifier("this`is-a-vairalbe")` .
String values can be used in double and single quotation marks: "this is a string" and 'this is a string' are equivalent.
Special string
Base64 strings: DQL supports processing Base64 strings. For Bas64 strings, DQL can automatically solve the original strings when querying, which is written as follows:
b64`some-base64-string`
b64'some-base64-string'
b64"some-base64-string"
Regular expression strings: The original re('xxx') is deprecated, and the following form is recommended to identify regular strings.
The query follows the following syntax pattern, noting that the relative order between the parts cannot be reversed, for example, time-expr cannot appear before filter-clause.
Syntactically, data-source is required (similar to the FROM clause in SQL), and the rest is optional. However, in the actual query process, some constraints are imposed on the actual execution of the query. For example, time_expr does not allow too long a time span.
Examples:
# Get the data of all fields of the measurement cpu in the last 5 minutes.M::cpu[5m]}# Find the data that matches all metrics of regular expression *db in the last 5 minutes.M::re('*db')[5m]# Get all field data of measurement cpu from 10 minutes ago to 5 minutes ago.M::cpu[10m:5m]# Get all the field data of the measurement cpu from 10 minutes ago to 5 minutes ago, and aggregate them at 1 minute intervals.M::cpu[10m:5m:1m]# Query the time_active and time_guest_nice of the time series data measurement cpu in the last 5 minutes,# Filter with two tags of host and cpu, and display the results in groups of host and cpu.M::cpu:(time_active,time_guest_nice){host="host-name",cpu="cpu0"}[5m]BYhost,cpu# Reverse by height and get the top tenO::human:(height,age){age>100,sex="straight men"}ORDERBYheightLIMIT10M::cpu,mem:(time_active,time_guest_nice,host){host="host-name",cpu="cpu0"}[5m]BYhost,cpu
Note: that white space characters can be added on both sides of :: and : , and the following statements are equivalent:
M::cpu:(time_active,system_usage){host="biz_prod"}[5m]# Here, the calculation between different metrics (types should be basically matched) on the same measurement is supported.M::cpu:(time_active+1,time_active/time_guest_nice)[5m]
The filter clause is used to filter the resulting data, similar to the where condition in SQL:
# Query the height of 100-year-old straight men in the population object (__class=human)O::human:(height){age>100,sex="直男"}# Filtering with regularityO::human:(height){age>100,sex!=re("男")}# Filtering with computed expressionsO::human:(height){(age+1)/2>31,sex!=re("男")}# Filtering of expressions with OR operationO::human:(height){age>31||sex!=re("男"),weight>70}# Result column with aggregationM::cpu:(avg(time_active)AStime_active_avg,time_guest_nice)[1d::1h]# Result column with aggregate paddingM::cpu:(fill(avg(time_active)AStime_active_avg,0.1),time_guest_nice)[1d::1h]# Query with in list, where the option relationship in in is logical or, and in list can only be numeric value or string.O::human:(height){agein[30,40,50],weight>70}
About filling:
Numeric filling: such as cpu:(fill(f1, 123), fill(f2, "foo bar"), fill(f3, 123.456))
Linear filling: such as cpu:(fill(f1, LINEAR))
Pre-value filling: such as cpu:(fill(f1, PREVIOUS))
Note: Between multiple filter conditions. The default is an AND relationship, but if you want to express an OR relationship, you can use the || operator. The following two statements have the same meaning:
DataFlux data features all have a time attribute, so the expression of time is represented by a separate clause:
time-expr consists of 3 parts [start_time:end_time:interval:rollup]:
Number
Name
Required
Description
Example
1
start_time
No
The start time for time filtering
1672502400000 / 1672502400
2
end_time
No
The end time for time filtering
1672588800000 / 1672588800
3
interval
No
Time aggregation period, generally used in conjunction with aggregation or rolling aggregation, supports time units like s, m, h, d, etc., and can be used in combination
1s/1m/1h etc.
4
rollup
No
The rollup function name, currently supported avg, sum, min, max, count, first, last, last
Note:
start_time and end_time support 3 formats:
Numerical value with time unit, e.g., 1m
Timestamp, e.g., 1672502400
Time value in milliseconds, 1672502400000
The interval time unit supports the following:
ns - nanoseconds
us - microseconds
ms - milliseconds
s - seconds
m - minutes
h - hours
d - days
w - weeks
y - years, specified as 365d, leap years not distinguished.
Rollup function names for aggregation currently supported include:
avg: Average
sum: Summation
min: Minimum value
max: Maximum value
count: Count
first: First value
last: Last value
deriv: Rate of change per second, estimated by subtracting the first value from the last and dividing by the time interval
rate: Rate of change per second, similar to deriv but returns no negative results, consistent with PromQL logic
irate: Instantaneous rate of change, estimated by dividing the difference of the last two values by the time interval, consistent with PromQL logic
p99, p95, p90, p75: Percentile calculations, allowing any number to follow p to express a percentage
median: Median, equivalent to p50
stddev: Standard deviation
Rollup aggregation splits a single timeline into different segments based on the given interval and aggregates values in each segment. When rollup is null, it indicates no rolling aggregation is performed.
The ORDER BY clause sorts the results, similar to the ORDER BY in MySQL.
⚠️ 1. "Metric Data" only time field sorting is supported.; 2. When grouping is by in the query, order-by will not take effect. Please use sorder-by sort.
# Get the CPU utilization of different hosts, in reverse order of time.M::cpu:(`usage_total`)orderbytimedesc
# Get all log data, in ascending order of response time.L::`*`:(`*`)orderbyresponse_timeasc
For time series data, if the dql statement contains both the by phrase and the limit phrase, the limit constrains the number of returned items in each aggregate group.
# Back to three cpu recordsM::cpu:()limit3# Back to three cpu records per hostM::cpu:()byhostlimit3
# Back to cpu usage information for three hostsM::cpu:(last(usage_total))byhostslimit3# Back to CPU information of three hosts, where each host returns three recordsM::cpu:()byhostlimit3slimit3
DQL supports secondary evaluation of query results:
func::dataflux__dql:(EXPR_EVAL(expr='data1.f1+data1.f2',data=dql('M::cpu:(f1, f2)')))# Calculate across data sets through funcF::dataflux__dql:(SOME_FUNC(data1=dql('M::cpu:(f1, f2)'),data2=dql('O::ecs:(f1, f2)'),some_args))# Calculate complex expression evaluation across data sets through funcF::dataflux__dql:(EXPR_EVAL(expr='data1.f1/(data2.f2+3)',# expressiondata1=dql('M::cpu:(f1, f2)'),data2=dql('O::ecs:(f1, f2)'),))
metric::(# the second layer querymetric::(# the third layer querymetric::a:(f1,f2,f3){host="foo"}[10m::1m]):(f1,f2)):(f1)
In principle, there is no restriction on nesting level, but multiple horizontal subqueries are not allowed, for example:
object::(# the second layer queryobject::(# the third layer queryobject::a:(f1,f2,f3){host="foo"}[10m::1m]):(f1,f2),object::(# the third layer query: unavailableobject::b:(f1,f2,f3){host="foo"}[10m::1m]):(f1,f2)):(f1)
If the message field is the data type of json (currently supported by L/O/T/R and so on), it is supported to extract the field directly through DQL in the following form: