DQL Definition¶
The following is the DataFlux Query Language (dql) definition. With the gradual support of different syntax, the document will be adjusted, added and deleted to varying degrees.
The whole constraints are as follows:
-
Non-keywords (such as metric name and tag name) are case sensitive; keywords and function names are case-insensitive.
-
With
#
as line comment character; inline comment is not supported. -
Supported operators:
-
+
- addition -
- subtraction*
- multiplication/
- division%
- modulo=
- equal to!=
- not equal to<=
- greater than or equal to<
- less than>=
- greater than or equal to>
- greater than^
- exponential operation&&
- logical and-
||
- logical or -
Supported keywords:
-
Identifiers: There are several forms of identifiers, which are compatible with various variable naming forms.
-
Only
[_a-zA-Z0-9]
can appear in normal variable names, and the first character cannot be a number, such as_abc, _abc123, _123ab
. -
Other forms of processing variable name:
this+is-a*xx/yy^zz?variable
,by
should be written as`this+is-a*xx/yy^zz?variable`
,`by`
, with operators in the former variable, andby
in the latter is the DQL keyword.- Support UTF8 identifiers such as Chinese, such as
M::cpu:(usage AS 使用率) [5m]
- Support emoji:
M::cpu:(usage AS 使用率👍) [5m]
- The variable is enclosed in an inverse quotation mark,
this`is-a-vairalbe
should be modified by`identifier("this`is-a-vairalbe")`
.
-
String values can be used in double and single quotation marks:
"this is a string"
and'this is a string'
are equivalent. -
Special string
-
Base64 strings: DQL supports processing Base64 strings. For Bas64 strings, DQL can automatically solve the original strings when querying, which is written as follows:
b64`some-base64-string`
b64'some-base64-string'
b64"some-base64-string"
-
Regular expression strings: The original
re('xxx')
is deprecated, and the following form is recommended to identify regular strings.re`some-regexp`
(recommended)re'some-regexp'
re"some-regexp"
-
Supported data types:
- Support for floating points (
123.4
,5.67E3
) - Plastic (
123
,-1
) - String (
'zhangsan'
,"hello world"
) - Boolean(
true
,false
) - Duration(
1y
,1w
,1d
,1h
,1m
,1s
,1ms
,1us
,1ns
for 1 year/week/day/hour/minute/second/millisecond/microsecond/nanosecond, respectively)
Query¶
The query follows the following syntax pattern, noting that the relative order between the parts cannot be reversed, for example, time-expr
cannot appear before filter-clause
.
namespace::
data-source
target-clause
filter-clause
time-expr
by-clause
having-clause
order-by-clause
limit-clause
offset-clause
sorder-by-clause
slimit-clause
soffset-clause
Syntactically, data-source
is required (similar to the FROM
clause in SQL), and the rest is optional. However, in the actual query process, some constraints are imposed on the actual execution of the query. For example, time_expr
does not allow too long a time span.
Examples:
# Get the data of all fields of the measurement cpu in the last 5 minutes.
M::cpu [5m]}
# Find the data that matches all metrics of regular expression *db in the last 5 minutes.
M::re('*db') [5m]
# Get all field data of measurement cpu from 10 minutes ago to 5 minutes ago.
M::cpu [10m:5m]
# Get all the field data of the measurement cpu from 10 minutes ago to 5 minutes ago, and aggregate them at 1 minute intervals.
M::cpu [10m:5m:1m]
# Query the time_active and time_guest_nice of the time series data measurement cpu in the last 5 minutes,
# Filter with two tags of host and cpu, and display the results in groups of host and cpu.
M:: cpu:(time_active, time_guest_nice)
{ host = "host-name", cpu = "cpu0" } [5m] BY host,cpu
# Reverse by height and get the top ten
O::human:(height, age) { age > 100, sex = "straight men" } ORDER BY height LIMIT 10
M::cpu,mem:(time_active, time_guest_nice, host) { host = "host-name", cpu = "cpu0" } [5m] BY host,cpu
Note: that white space characters can be added on both sides of ::
and :
, and the following statements are equivalent:
M::cpu:(time_active, time_guest_nice)
{ host = "host-name", cpu = "cpu0" } [5m]
M ::cpu : (time_active, time_guest_nice)
{ host = "host-name", cpu = "cpu0" } [5m]
M :: cpu : (time_active, time_guest_nice)
{ host = "host-name", cpu = "cpu0" } [5m]
Statement¶
Namespace¶
At the semantic level, the following data sources are currently supported:
- M/metric - time series index data
- O/object - object data
- CO/custom_object - user-defined object data
- L/logging - log data
- E/event - event data
- T/tracing - tracking data
- R/rum - RUM data
- F/func - Func function computation
- N/network - network eBPF data lookup
At the syntax level, the data source is not constrained for the time being. The data source syntax is as follows:
In a specific query, if no data source is specified, the default is metric
(or M
), that is, the time series metric is the default data source for DQL.
Target-clause¶
List of the query results:
M::cpu:(time_active, system_usage) {host="biz_prod"} [5m]
# Here, the calculation between different metrics (types should be basically matched) on the same measurement is supported.
M::cpu:(time_active+1, time_active/time_guest_nice) [5m]
Filter-clause¶
The filter clause is used to filter the resulting data, similar to the where
condition in SQL:
# Query the height of 100-year-old straight men in the population object (__class=human)
O::human:(height) { age > 100, sex = "直男" }
# Filtering with regularity
O::human:(height) { age > 100, sex != re("男") }
# Filtering with computed expressions
O::human:(height) { (age + 1)/2 > 31, sex != re("男") }
# Filtering of expressions with OR operation
O::human:(height) { age > 31 || sex != re("男"), weight > 70}
# Result column with aggregation
M::cpu:(avg(time_active) AS time_active_avg, time_guest_nice) [1d::1h]
# Result column with aggregate padding
M::cpu:(fill(avg(time_active) AS time_active_avg, 0.1), time_guest_nice) [1d::1h]
# Query with in list, where the option relationship in in is logical or, and in list can only be numeric value or string.
O::human:(height) { age in [30, 40, 50], weight > 70}
About filling:
- Numeric filling: such as
cpu:(fill(f1, 123), fill(f2, "foo bar"), fill(f3, 123.456))
- Linear filling: such as
cpu:(fill(f1, LINEAR))
- Pre-value filling: such as
cpu:(fill(f1, PREVIOUS))
Note: Between multiple filter conditions. The default is an
AND
relationship, but if you want to express anOR
relationship, you can use the||
operator. The following two statements have the same meaning:
A complex filter expression:
Time-expr¶
DataFlux data features all have a time attribute, so the expression of time is represented by a separate clause:
time-expr
consists of 3 parts [start_time
:end_time
:interval
:rollup
]:
Number | Name | Required | Description | Example |
---|---|---|---|---|
1 | start_time |
No | The start time for time filtering | 1672502400000 / 1672502400 |
2 | end_time |
No | The end time for time filtering | 1672588800000 / 1672588800 |
3 | interval |
No | Time aggregation period, generally used in conjunction with aggregation or rolling aggregation, supports time units like s , m , h , d , etc., and can be used in combination |
1s /1m /1h etc. |
4 | rollup |
No | The rollup function name, currently supported avg , sum , min , max , count , first , last , last |
Note:
start_time
and end_time
support 3 formats:
- Numerical value with time unit, e.g.,
1m
- Timestamp, e.g.,
1672502400
- Time value in milliseconds,
1672502400000
The interval
time unit supports the following:
ns
- nanosecondsus
- microsecondsms
- millisecondss
- secondsm
- minutesh
- hoursd
- daysw
- weeksy
- years, specified as 365d, leap years not distinguished.
Rollup
function names for aggregation currently supported include:
avg
: Averagesum
: Summationmin
: Minimum valuemax
: Maximum valuecount
: Countfirst
: First valuelast
: Last valuederiv
: Rate of change per second, estimated by subtracting the first value from the last and dividing by the time intervalrate
: Rate of change per second, similar toderiv
but returns no negative results, consistent with PromQL logicirate
: Instantaneous rate of change, estimated by dividing the difference of the last two values by the time interval, consistent with PromQL logicp99
,p95
,p90
,p75
: Percentile calculations, allowing any number to followp
to express a percentagemedian
: Median, equivalent top50
stddev
: Standard deviation
Rollup
aggregation splits a single timeline into different segments based on the given interval
and aggregates values in each segment. When rollup
is null, it indicates no rolling aggregation is performed.
By-clause Statement¶
The BY
clause is used to categorize and aggregate the results. Similar to GROUP BY
in MySQL.
Having-clause Statement¶
The HAVING
clause is used to filter the results obtained after aggregation, similar to the HAVING
clause in MySQL.
# Retrieve all hosts with CPU usage greater than 80%
M::cpu:(max(`usage_total`) as `max_usage`) by host having max_usage > 80
Order-by-clause Statement¶
The ORDER BY
clause sorts the results, similar to the ORDER BY
in MySQL.
⚠️ 1. "Metric Data" only time field sorting is supported.; 2. When grouping is by in the query, order-by will not take effect. Please use sorder-by sort.
# Get the CPU utilization of different hosts, in reverse order of time.
M::cpu:(`usage_total`) order by time desc
sorder-by-clause statement¶
The SORDER BY
clause sorts groups.
# Get the maximum CPU utilization of different hosts, in reverse order of hostname.
M::cpu:(max(`usage_total`)) by host sorder by host desc
# Get the maximum CPU utilization of different hosts, in ascending order of the maximum CPU utilization.
M::cpu:(max(`usage_total`) as m) by host sorder by m
# Get CPU utilization, in ascending order of the last CPU utilization.
M::cpu:(`usage_total`) sorder by usage_total
Limit Statement¶
Used to specify the number of rows returned
Note:
For time series data, if the dql statement contains both the by phrase and the limit phrase, the limit constrains the number of returned items in each aggregate group.
# Back to three cpu records
M::cpu:() limit 3
# Back to three cpu records per host
M::cpu:() by host limit 3
slimit statement¶
Used to specify the number of packets
# Back to cpu usage information for three hosts
M::cpu:(last(usage_total)) by host slimit 3
# Back to CPU information of three hosts, where each host returns three records
M::cpu:() by host limit 3 slimit 3
SHOW Statement¶
SHOW_xxx
is used to browse data (function names are not case sensitive):
SHOW_MEASUREMENT()
- view a list of measurements, supportingfilter-clause
、limit
andoffset
statementSHOW_OBJECT_CLASS()
- view object classification listSHOW_CUSTOM_OBJECT_SOURCE()
- view a list of custom object data typesSHOW_EVENT_SOURCE()
- view list of event sourcesSHOW_LOGGING_SOURCE()
- view log source listSHOW_TRACING_SERVICE()
- view tracing source listSHOW_RUM_TYPE()
- view list of RUM data typesSHOW_NETWORK_SOURCE()
- view list of network eBPF data typesSHOW_SECURITY_SOURCE()
- view list of security patrol data typesSHOW_WORKSPACES()
- view the current workspace and its authorized workspace information For more show functions, seeFunction Document
Result Set Function Settlement¶
DQL supports secondary evaluation of query results:
func::dataflux__dql:(EXPR_EVAL(expr='data1.f1+data1.f2', data=dql('M::cpu:(f1, f2)')))
# Calculate across data sets through func
F::dataflux__dql:(SOME_FUNC(
data1=dql('M::cpu:(f1, f2)'),
data2=dql('O::ecs:(f1, f2)'), some_args))
# Calculate complex expression evaluation across data sets through func
F::dataflux__dql:(EXPR_EVAL(
expr='data1.f1/(data2.f2+3)', # expression
data1=dql('M::cpu:(f1, f2)'),
data2=dql('O::ecs:(f1, f2)'),))
Nested Queries and Statement Blocks¶
Use ()
to indicate the separation of subqueries and outer queries, such as two layers of nesting.
metric::(
# Subquery
metric::cpu,mem:(f1, f2) {host="abcd"} [1m:2m:5s] BY f1 DESC
):(f1) # Outer query target column
{ host=re("abc*") } # Outer query filter criteria
[1m:2m:1s] # Outer query time limit
Three-layer nesting
metric::( # the second layer query
metric::( # the third layer query
metric::a:(f1,f2,f3) {host="foo"} [10m::1m]
):(f1,f2)
):(f1)
In principle, there is no restriction on nesting level, but multiple horizontal subqueries are not allowed, for example:
object::( # the second layer query
object::( # the third layer query
object::a:(f1,f2,f3) {host="foo"} [10m::1m]
):(f1,f2),
object::( # the third layer query: unavailable
object::b:(f1,f2,f3) {host="foo"} [10m::1m]
):(f1,f2)
):(f1)
Special Usage¶
If the message field is the data type of json (currently supported by L/O/T/R and so on), it is supported to extract the field directly through DQL in the following form:
L::nginx { @abc.def = "xyz" }
It is equivalent to the following query, which means `message@json` by `@`, which is a shorthand.
``` python
L::nginx { `message@jons.abc.def` = "xyz" }
Function Description¶
See DQL Functions