DQL¶
DQL (Debug Query Language) is the core query language of the Guance platform, specifically designed for efficiently querying and analyzing time series data, log data, event data, etc. DQL combines the semantic expression of SQL with the syntax structure of PromQL, aiming to provide users with a flexible and powerful query tool.
This document will help you quickly understand the basic syntax and design concepts of DQL, and demonstrate how to write DQL queries through examples.
Basic Query Structure¶
The basic query structure of DQL is as follows:
namespace[index]::datasource[:select-clause] [{where-clause}] [time-expr] [group-by-clause] [having-clause] [order-by-clause] [limit-clause] [sorder-by-clause] [slimit-clause] [soffset-clause]
Execution Order¶
The execution order of a DQL query is very important, as it determines the semantics and performance of the query.
-
Data Filtering: Filter data based on namespace::datasource, where-clause, time-expr
- Determine the data source
- Apply WHERE condition to filter original data rows
- Apply time range filtering
- Filter data as early as possible at this stage to improve subsequent processing efficiency
-
Time Aggregation: If time-expr contains rollup, execute rollup logic first
- Rollup functions preprocess data on the time dimension
- For Counter type Metrics, rate or increment is usually calculated instead of using the raw value directly
- For Gauge type Metrics, aggregation functions like last, avg might be used
-
Group Aggregation: Execute group-by-clause grouping, and execute aggregation functions in the select-clause within groups
- Group data according to the expressions in the BY clause
- Calculate aggregation functions (sum, count, avg, max, min, etc.) within each group
- If there is also a time window, a two-dimensional data structure will be formed
-
Group Filtering: Execute having-clause to filter aggregated groups
- The HAVING clause acts on the aggregated results
- Results of aggregation functions can be used for filtering
- This is a key difference from the WHERE clause
-
Non-aggregation Functions: Execute non-aggregation functions in the select-clause
- Process expressions and functions that do not require aggregation
- Perform further calculations on the aggregation results
-
Intra-group Sorting: Execute order-by-clause, limit-clause to sort and paginate data within groups
- ORDER BY is executed independently within each group
- LIMIT limits the number of data rows returned per group
-
Inter-group Sorting: Execute sorder-by-clause, slimit-clause, soffset-clause to sort and paginate groups
- SORDER BY sorts the groups themselves
- Requires dimensionality reduction of the group results (e.g., using functions like max, avg, last)
- SLIMIT limits the number of groups returned
Complete Example¶
Let's understand the structure of DQL through a complete example:
M("production")::cpu:(avg(usage) as avg_usage, max(usage) as max_usage) {host =~ 'web-.*', usage > 50} [1h::5m] BY host, env HAVING avg_usage > 60 ORDER BY time DESC LIMIT 100 SORDER BY avg_usage DESC SLIMIT 10
The meaning of this query is:
- Namespace: M (Metric data)
- Index:
production - Datasource:
cpu - Selected Fields: Average and maximum values of the
usagefield - Time Range: Last 1 hour, aggregated every 5 minutes
- Filter Conditions: Hostname starts with web and CPU usage is greater than 50%
- Grouping: Grouped by hostname and environment
- Group Filtering: Average CPU usage greater than 60%
- Sorting: Sorted in descending order by time, maximum 100 entries per group
- Inter-group Sorting: Sorted in descending order by average usage, maximum 10 groups
Namespace (namespace)¶
Namespaces are used to distinguish different types of data. Each data type has its specific query methods and storage strategies. DQL supports querying various business data types:
| Namespace | Description | Typical Uses |
|---|---|---|
| M | Metric, time series Metrics data | CPU usage, memory usage, request count, etc. |
| L | Logging, log data | Application logs, system logs, error logs, etc. |
| O | Object, infrastructure object data | Server information, container information, network devices, etc. |
| OH | History object, object history data | Server configuration change history, performance Metrics history, etc. |
| CO | Custom object, custom object data | Business-defined object information |
| COH | History custom object, custom object history data | Historical change records of custom objects |
| N | Network, network data | Network traffic, DNS queries, HTTP requests, etc. |
| T | Trace, link call data | Distributed tracing, call chain analysis, etc. |
| P | Profile, call profiling data | Performance profiling, CPU flame graphs, etc. |
| R | RUM, user access data | Frontend performance, user behavior analysis, etc. |
| E | Event, event data | Alert events, deployment events, system events, etc. |
| UE | Unrecover Event, unrecovered event data | Unresolved alerts and events |
Index (index)¶
Index is an important mechanism for DQL query optimization, which can be understood as a table or partition in a traditional database. Within a single namespace, the system may split data due to factors such as data source, data volume, and access patterns to improve query performance and management efficiency.
Role of Index¶
- Performance Optimization: Disperse data storage through indexes to reduce the amount of data scanned in a single query
- Data Isolation: Data from different businesses, environments, or time ranges can be stored in different indexes
- Permission Management: Different access permissions can be set for different indexes
- Lifecycle Management: Different data retention policies can be set for different indexes
Naming Rules for Index¶
- Index names must be explicitly declared; wildcards or regular expressions for matching are not supported
- Index names usually reflect the business attributes of the data, such as:
production,staging,web-logs,api-logs - The default index name is
default, used when not explicitly specified
Basic Syntax¶
// Use default index (automatically uses default when not specified)
M::cpu // Equivalent to M("default")::cpu
L::nginx // Equivalent to L("default")::nginx
// Specify a single index
M("production")::cpu // Query CPU Metrics in the production index
L("web-logs")::nginx // Query Nginx logs in the web-logs index
// Multi-index query (query data from multiple indexes simultaneously)
M("production", "staging")::cpu // Query CPU Metrics from production and staging environments
L("web-logs", "api-logs")::nginx // Query Web and API logs
Index and Performance¶
Reasonable use of indexes can significantly improve query performance:
- Precise Index: When you know exactly which index the data is in, specify that index directly
- Multi-index Query: When querying across multiple indexes is needed, use multi-index syntax instead of wildcards
- Avoid Full Index Scans: Try to reduce the data scan range through the combination of index and WHERE conditions
Compatible Syntax (Not Recommended)¶
For historical reasons, DQL also supports specifying indexes in the where clause, but its use is not recommended:
// Old syntax, not recommended
L::nginx { index = "web-logs" }
L::nginx { index IN ["web-logs", "api-logs"] }
Application Examples¶
// Query CPU usage in the production environment
M("production")::cpu:(avg(usage)) [1h] BY host
// Compare production and staging environments
M("production", "staging")::cpu:(avg(usage)) [1h] BY index, host
// Analyze Web server logs
L("web-logs")::nginx:(count(*)) {status >= 400} [1h] BY status
// Compare error rates between Web and API servers
L("web-logs", "api-logs")::*:(count(*)) {status >= 500} [1h] BY index
Datasource (datasource)¶
The datasource specifies the specific data source for the query, which can be a dataset name, wildcard pattern, regular expression, or subquery.
Basic Datasource¶
The definition of datasource varies across different namespaces:
| Namespace | Datasource Type | Examples |
|---|---|---|
| M | Measurement | cpu, memory, network |
| L | Source | nginx, tomcat, java-app |
| O | Infrastructure object classification | host, container, process |
| T | Service name (service) | user-service, order-service |
| R | RUM data type | session, view, resource, error |
Datasource Syntax¶
Specify Datasource Name¶
M::cpu:(usage) // Query CPU Metrics
L::nginx:(count(*)) // Query Nginx logs
T::user-service:(traces) // Query user service traces
Wildcard Matching¶
Regular Expression Matching¶
M::re('cpu.*'):(usage) // Query Metrics starting with cpu
L::re('web.*'):(count(*)) // Query logs starting with web
T::re('.*-service'):(traces) // Query services ending with -service
Subquery Datasource¶
Subqueries are an important feature in DQL for implementing complex analysis. They allow the result of one query to be used as the datasource for another query. This nested query mechanism supports multi-level analysis requirements.
Execution Mechanism¶
The execution of subqueries follows these principles:
- Serial Execution: The inner subquery executes first, and its result is then used as the datasource for the outer query
- Result Encapsulation: The subquery result is encapsulated into a temporary table structure for use by the outer query
- Namespace Mixing: Subqueries support mixing different namespaces, enabling cross-data type analysis
- Performance Consideration: Subqueries increase computational complexity, so the query logic needs to be designed reasonably
Basic Syntax¶
Execution Process¶
Take a typical subquery as an example:
The execution process is:
-
Inner Subquery:
L::*:(count(*)) {level = 'error'} BY app_id- Scan all log data
- Filter out logs with error level
- Group by app_id to count the number of errors
- Generate temporary table:
app_id | count(*)
-
Outer Query:
L::(...):(count_distinct(app_id))- Use the subquery result as the datasource
- Count how many distinct app_ids there are
- Final result: Number of applications with errors
Application Examples¶
// Count the number of applications with errors
L::(L::*:(count(*)) {level = 'error'} BY app_id):(count_distinct(app_id))
// Analyze servers with high CPU usage
M::(M::cpu:(avg(usage)) [1h] BY host {avg(usage) > 80}):(count(host))
// First find service endpoints with an error rate exceeding 1%, then count the number of affected services
M::(M::http_requests:(sum(request_count), sum(error_count)) [1h] BY service, endpoint
{sum(error_count) / sum(request_count) > 0.01}
):(count(service))
Select Clause (select-clause)¶
The Select clause is used to specify the fields or expressions to be returned by the query. It is one of the most basic and important parts of a DQL query.
Field Selection¶
Basic Syntax¶
// Select a single field
M::cpu:(usage)
// Select multiple fields
M::cpu:(usage, system, user)
// Select all fields
M::cpu:(*)
Field Name Rules¶
Field names can be written in the following forms:
-
Write the name directly: Suitable for ordinary identifiers
- ✅
message - ✅
host_name - ✅
response_time
- ✅
-
Enclosed in backticks: Suitable for field names containing special characters or keywords
- ✅
message - ✅
limit - ✅
host-name - ✅
column with spaces
- ✅
-
Avoid these forms: Single and double quotes enclose strings, not field names
- ❌
'message' - ❌
"message"
- ❌
JSON Field Extraction¶
When data fields contain content in JSON format, a subset of JSON Path syntax can be used to extract data from internal fields.
Basic Syntax¶
JSON Path Syntax¶
- Dot notation for object properties:
.field_name - Square brackets for object properties:
["key"](suitable for keys containing spaces or special characters) - Array index for value extraction:
[index]
Application Examples¶
Assume the following JSON log data:
{
"message": "User login attempt",
"request": {
"method": "POST",
"path": "/api/login",
"headers": {
"user-agent": "Mozilla/5.0",
"content-type": "application/json"
},
"body": {
"username": "john.doe",
"password": "***",
"permissions": ["read", "write", "admin"]
}
},
"response": {
"status": 200,
"time": 156,
"data": [
{"id": 1, "name": "user1"},
{"id": 2, "name": "user2"}
]
}
}
// Extract request method
L::auth_logs:(message@request.method)
// Extract request path
L::auth_logs:(message@request.path)
// Extract username
L::auth_logs:(message@request.body.username)
// Extract response status
L::auth_logs:(message@response.status)
// Extract User-Agent (contains hyphen, requires square brackets)
L::auth_logs:(message@request.headers["user-agent"])
// Extract the first element of the permissions array
L::auth_logs:(message@request.body.permissions[0])
// Extract the name of the first object in the response data
L::auth_logs:(message@response.data[0].name)
// Count the number of different request methods
L::auth_logs:(count(*)) [1h] BY message@request.method
// Analyze response time distribution
L::auth_logs:(avg(message@response.time), max(message@response.time)) [1h] BY message@request.method
// Extract multiple fields
L::auth_logs:(
message,
message@request.method as method,
message@request.path as path,
message@response.status as status,
message@response.time as response_time
) {message@response.status >= 400} [1h]
Calculated Fields¶
Expression Calculation¶
Supports basic arithmetic operations:
// Unit conversion (milliseconds to seconds)
L::nginx:(response_time / 1000) as response_time_seconds
// Calculate percentage
M::memory:(used / total * 100) as usage_percentage
// Compound calculation
M::network:((bytes_in + bytes_out) / 1024 / 1024) as total_traffic_mb
Function Calculation¶
Supports various aggregation and transformation functions:
// Aggregation functions
M::cpu:(max(usage), min(usage), avg(usage)) [1h] BY host
// Transformation functions
L::logs:(int(response_time) as response_time_seconds)
L::logs:(floor(response_time) as response_time_seconds)
Aliases¶
Assign aliases to fields or expressions to make results more readable and facilitate subsequent references.
Basic Syntax¶
Application Examples¶
// Simple alias
M::cpu:(avg(usage) as avg_usage, max(usage) as max_usage) [1h] BY host
// Expression alias
M::memory:((used / total) * 100 as usage_percent) [1h] BY host
// Function alias
L::logs:(count(*) as error_count) {level = 'error'} [1h] BY service
// JSON extraction alias
L::api_logs:(
message@request.method as http_method,
message@response.status as http_status,
message@response.time as response_time_ms
) [1h]
Usage Tips¶
In DQL, the result of an aggregation function can be referenced directly using the original field name, which reduces the need for aliases:
M::cpu:(max(usage)) [1h] BY host
// The result will contain a max(usage) column, which can be directly referenced in subsequent steps using the usage column name to get the `max(usage)` column from the subquery result:
M::(M::cpu:(max(usage)) [1h] BY host):(max(usage)) { usage > 80 }
However, when multiple aggregation functions use the same field, aliases must be used to correctly distinguish them later:
// Cases where aliases are mandatory
M::cpu:(max(usage) as max_usage, min(usage) as min_usage) [1h] BY host
Time Clause (time-clause)¶
The time clause is one of the core features of DQL, used to specify the query time range, aggregation time window, and Rollup aggregation function.
Basic Syntax¶
Time Range¶
Absolute Timestamp¶
Relative Time¶
Supports various duration units, which can be mixed:
[1h] // Past 1 hour to now
[1h:5m] // From past 1 hour to past 5 minutes
[1h30m] // Past 1 hour 30 minutes
[2h15m30s] // Past 2 hours 15 minutes 30 seconds
Duration Expression Description¶
| Unit | Description | Example |
|---|---|---|
| s | Seconds | 30s |
| m | Minutes | 5m |
| h | Hours | 2h |
| d | Days | 7d |
| w | Weeks | 4w |
| y | Years | 1y |
When used in a time clause, a duration expression represents an offset from the current time backwards. When used in a Select clause or Where clause, etc., it is treated as a millisecond integer for calculation.
When used in aggregation queries, two additional duration units are supported:
| Unit | Description | Example |
|---|---|---|
| i, is | Multiples of the aggregation time window, return value is float seconds | 1i, 1is |
| ims | Multiples of the aggregation time window, return value is integer milliseconds | 1ims |
O::HOST:(count(*)){ `last_update_time` > (now()-10m) } // 10m is treated as 600,000 integer for calculation
L::*:( count(*) / 1i ) [::1m] // Divide by the time window size (1m) in seconds, calculate the QPS of log writes
Preset Time Ranges¶
Provides commonly used time range keywords:
| Keyword | Description | Time Range |
|---|---|---|
| TODAY | Today | From today 00:00 to now |
| YESTERDAY | Yesterday | From yesterday 00:00 to today 00:00 |
| THIS WEEK | This week | From Monday 00:00 of this week to now |
| LAST WEEK | Last week | From Monday 00:00 of last week to Monday 00:00 of this week |
| THIS MONTH | This month | From the 1st 00:00 of this month to now |
| LAST MONTH | Last month | From the 1st 00:00 of last month to the 1st 00:00 of this month |
[TODAY] // Today's data
[YESTERDAY] // Yesterday's data
[THIS WEEK] // This week's data
[LAST WEEK] // Last week's data
[THIS MONTH] // This month's data
[LAST MONTH] // Last month's data
When using time range keywords, please confirm that the timezone setting of the workspace is correct, and it must be converted strictly according to the user's requested timezone.
Time Window Aggregation¶
Time windows group and aggregate data at specified time intervals. The time column in the returned result represents the start time of each time window.
Single Time Window¶
The entire time range is aggregated into one value:
Query Result:
Time Window Aggregation¶
Group and aggregate by time interval:
Query Result:
{
"columns": ["time", "max(usage_total)"],
"values": [
[1721059200000, 37.46],
[1721058600000, 34.12],
[1721058000000, 33.81],
[1721057400000, 30.92],
[1721058000000, 34.53],
[1721057400000, 36.11]
]
}
Rollup Functions¶
Rollup functions are an important preprocessing step in DQL. They are executed before group aggregation and are used to preprocess raw time series data.
Execution Sequence¶
The position of Rollup in the query execution flow:
Raw data → WHERE filtering → **Rollup preprocessing** → Group aggregation → HAVING filtering → Final result
Execution Mechanism¶
The Rollup execution process is divided into two stages:
- Per-time series processing: Apply the Rollup function separately to each independent time series
- Aggregation calculation: Execute group aggregation on the results after Rollup processing
Application Scenarios¶
A typical application scenario for Rollup functions is Counter Metrics processing.
For Prometheus Counter type Metrics, aggregating using raw values directly is meaningless because Counters are monotonically increasing. It is necessary to first calculate the growth rate for each time series, and then perform aggregation.
Problem Example: Assume there are request counters for two servers:*
{
"host": "web-server-01",
"data": [
{"time": "2024-07-15 08:25:00", "request_count": 150},
{"time": "2024-07-15 08:20:00", "request_count": 140},
{"time": "2024-07-15 08:15:00", "request_count": 130},
{"time": "2024-07-15 08:10:00", "request_count": 120},
{"time": "2024-07-15 08:05:00", "request_count": 110},
{"time": "2024-07-15 08:00:00", "request_count": 100}
]
}
{
"host": "web-server-02",
"data": [
{"time": "2024-07-15 08:25:00", "request_count": 250},
{"time": "2024-07-15 08:20:00", "request_count": 240},
{"time": "2024-07-15 08:15:00", "request_count": 230},
{"time": "2024-07-15 08:10:00", "request_count": 220},
{"time": "2024-07-15 08:05:00", "request_count": 210},
{"time": "2024-07-15 08:00:00", "request_count": 200}
]
}
Problem with direct aggregation:
- The starting value of request_count for web-server-01 is 100
- The starting value of request_count for web-server-02 is 200
- Although the request rates of the two servers are the same (both 10 requests per 5 minutes), the absolute values are different
Solution using Rollup:
Execution process:
-
Rollup stage (executed separately on each time series):
- web-server-01: rate([100, 110, 120, 130, 140, 150]) = 2 requests/minute
- web-server-02: rate([200, 210, 220, 230, 240, 250]) = 2 requests/minute
-
Aggregation stage:
- sum([2, 2]) = 4 requests/minute
Function Types¶
Common Rollup functions include:
| Function Type | Description | Applicable Scenarios |
|---|---|---|
rate() |
Calculate growth rate | Counter type Metrics |
increase() |
Calculate growth amount | Counter type Metrics |
last() |
Take the last value | Gauge type Metrics |
avg() |
Calculate average value | Data smoothing |
max() |
Take maximum value | Peak analysis |
min() |
Take minimum value | Trough analysis |
But almost all aggregation functions that return a single value can be used, so the full list of functions is not listed here.
Default Rollup¶
If no Rollup function is explicitly specified, DQL defaults to not performing Rollup calculation. The default Rollup for PromQL is last, so if you are calculating Prometheus Metrics, be sure to understand this difference and manually specify the Rollup function.
Application Examples¶
// Calculate the total request rate for all servers
M::http_requests:(sum(request_count)) [rate]
// Calculate error rate
M::http_requests:(
sum(error_count) as errors,
sum(request_count) as requests
) [rate] BY service
Flexible Time Window Syntax¶
DQL supports various shorthand formats for time windows, making query writing more convenient.
Shorthand Formats¶
[1h] // Only specify time range
[1h::5m] // Time range + aggregation step
[1h:5m] // Start time + end time
[1h:5m:1m] // Start + end + step
[1h:5m:1m:avg] // Full format
[::5m] // Only specify aggregation step
[:::sum] // Only specify rollup function
[sum] // Only specify rollup function (simplest form)
Filter Conditions (where-clause)¶
Filter conditions are used to filter data rows, retaining only data that meets the conditions for subsequent processing.
Basic Syntax¶
Multiple conditions can be connected by commas, AND, OR, &&, ||.
Comparison Operators¶
| Operator | Description | Example |
|---|---|---|
= |
Equal to | host = 'web-01' |
!= |
Not equal to | status != 200 |
> |
Greater than | cpu_usage > 80 |
>= |
Greater than or equal to | memory_usage >= 90 |
< |
Less than | response_time < 1000 |
<= |
Less than or equal to | disk_usage <= 80 |
Pattern Matching Operators¶
| Operator | Description | Example |
|---|---|---|
=~ |
Regular expression match | message =~ 'error.*\\d+' |
!~ |
Regular expression does not match | message !~ 'debug.*' |
Set Operators¶
| Operator | Description | Example |
|---|---|---|
IN |
In the set | status IN [200, 201, 202] |
NOT IN |
Not in the set | level NOT IN ['debug', 'info'] |
Logical Operators¶
| Operator | Description | Example |
|---|---|---|
AND or && |
Logical AND | cpu > 80 AND memory > 90 |
OR or | | |
Logical OR | status = 500 OR status = 502 |
NOT |
Logical NOT | NOT status = 200 |
Tip: The
ORoperator, besides performing logical OR, also provides a null value fallback semantics—when the left expression returnsNULL, it directly returns the result of the right expression. This can be used to implement priority fallback logic, such asstatus OR backup_status.
Application Examples¶
Basic Filtering¶
// Single condition
M::cpu:(usage) {host = 'web-01'} [1h]
// Multiple AND conditions
M::cpu:(usage) {host = 'web-01', usage > 80} [1h]
// Use AND keyword
M::cpu:(usage) {host = 'web-01' AND usage > 80} [1h]
// Mixed use of logical operators
M::cpu:(usage) {(host = 'web-01' OR host = 'web-02') AND usage > 80} [1h]
Regular Expression Matching¶
// Match error logs
L::logs:(message) {message =~ 'ERROR.*\\d{4}'} [1h]
// Match logs of specific format
L::logs:(message) {message =~ '\\[(ERROR|WARN)\\].*'} [1h]
// Exclude debug information
L::logs:(message) {message !~ 'DEBUG.*'} [1h]
// Hostname pattern matching
M::cpu:(usage) {host =~ 'web-.*\\.prod\\.com'} [1h]
Set Operations¶
// Status code filtering
L::nginx:(count(*)) {status IN [200, 201, 202, 204]} [1h]
// Exclude specific status codes
L::nginx:(count(*)) {status NOT IN [404, 500, 502]} [1h]
// Log level filtering
L::app_logs:(count(*)) {level IN ['ERROR', 'WARN', 'CRITICAL']} [1h]
Handling Array Fields¶
When the field type is an array, DQL supports various array matching operations.
Assume a field tags = ['web', 'prod', 'api']:
Recommended Syntax: Use IN and NOT IN¶
// Check if the array contains a certain value
{tags IN ['web']} // true, because tags contains 'web'
{tags IN ['mobile']} // false, because tags does not contain 'mobile'
// Check if the array does not contain a certain value
{tags NOT IN ['mobile']} // true, because tags does not contain 'mobile'
{tags NOT IN ['web']} // false, because tags contains 'web'
// Check if the array contains all specified values
{tags IN ['web', 'api']} // true, contains 'web' and 'api'
{tags IN ['web', 'api', 'mobile']} // false, does not contain 'mobile'
Compatible Syntax (Not Recommended)¶
The following syntax is retained for historical compatibility purposes and is not recommended for use in new queries. The semantics of these operators on array fields are different from those on ordinary fields and can easily cause confusion.
// Historical syntax: Single value inclusion check (overloaded semantics of the equal operator)
{tags = 'web'} // true, because tags contains 'web'
{tags = 'mobile'} // false, because tags does not contain 'mobile'
// Historical syntax: Single value exclusion check (overloaded semantics of the not equal operator)
{tags != 'mobile'} // true, because tags does not contain 'mobile'
{tags != 'web'} // false, because tags contains 'web'
Function Filtering¶
Any function that returns a Boolean value can be used as a filter condition.
// String matching
L::logs:(message) { match(message, 'error') }
L::logs:(message) { wildcard(message, 'error*') }
// Query requests with abnormal response time
L::access_logs:(*) {
response_time > 1000 AND
match(message, 'timeout')
}
// Query hosts with abnormal memory usage
M::memory:(usage) {
(usage > 90 OR usage < 10) AND
host =~ 'prod-.*' AND
tags IN ['critical', 'important']
}
WHERE Subquery¶
WHERE subqueries are a powerful feature in DQL for implementing dynamic filtering. They allow the result of one query to be used as a filter condition for another query. This mechanism supports dynamic filtering based on data analysis results.
Query Characteristics¶
- Dynamic Filtering: The filter condition is not a fixed value but is dynamically calculated through a query
- Namespace Mixing: Supports cross-namespace queries, enabling correlation analysis of different data types
- Serial Execution: The subquery executes first, and its result is used for filtering the main query
- Array Result: The subquery result is encapsulated as an array, therefore only the
INandNOT INoperators are supported
Execution Flow¶
Take a typical WHERE subquery as an example:
Execution process:
-
Subquery Execution:
O::HOST:(hostname) {provider = 'cloud-a'}- Query all infrastructure objects
- Filter out hosts where provider is 'cloud-a'
- Return hostname list:
['host-01', 'host-02', 'host-03']
-
Main Query Execution:
M::cpu:(avg(usage)) [1h] BY host {host IN [...]}- Query CPU usage data
- Only count hosts returned by the subquery
- Calculate average usage grouped by host
Application Examples¶
// Monitor servers of a specific cloud provider
M::cpu:(avg(usage)) { host IN (O::HOST:(hostname) {provider = 'cloud-a'}) } [1h] BY host
// Compare performance of different cloud providers
M::memory:(avg(used / total * 100)) { host IN (O::HOST:(hostname) {provider IN ['cloud-a', 'cloud-b']}) } [1h] BY host
// Analyze logs of specific business
L::app_logs:(count(*)) { service IN (T::services:(service_name) {business_unit = 'ecommerce'}) } [1h] BY level
// Monitor application performance of key businesses
M::response_time:(avg(response_time)) { service IN (T::services:(service_name) {criticality = 'high'}) } [1h] BY service
Grouping (group-by-clause)¶
Grouping is a core function of data analysis, used to group and aggregate data according to specified dimensions.
Basic Syntax¶
Grouping Types¶
Field Grouping¶
// Single field grouping
M::cpu:(avg(usage)) [1h] BY host
// Multiple field grouping
M::cpu:(avg(usage)) [1h] BY host, env
// Nested grouping
M::cpu:(avg(usage)) [1h] BY datacenter, rack, host
Expression Grouping¶
// Compound mathematical expression
M::memory:(avg(used)) [1h] BY ((used / total) * 100) as
usage_percent
// Multiple field mathematical operation
M::performance:(avg(response_time)) [1h] BY
(response_time / 1000) as response_seconds
Function Grouping¶
// Drain clustering algorithm
L::logs:(count(*)) BY drain(message, 0.7) as sample
// Regular expression extraction grouping
L::logs:(count(*)) [1h] BY regexp_extract(message, 'error_code: (\\d+)', 1)
Grouping Result Processing¶
When a query contains both grouping and time windows, a two-dimensional data structure is produced. Group By can be used mixed with time windows. The result of such a query will be a two-dimensional array. The first layer of this two-dimensional array consists of multiple groups distinguished by the grouping keys, and the second dimension is the multi-time interval data within a single group.
Two-dimensional data structure example:
Query: M::cpu:(max(usage_total)) [1h::10m] by host
Query result structure:
{
"series": [
{
"columns": ["time", "max(usage_total)"],
"name": "cpu",
"tags": {"host": "web-server-01"},
"values": [
[1721059200000, 78.5],
[1721058600000, 82.3],
[1721058000000, 75.8],
[1721057400000, 88.2]
]
},
{
"columns": ["time", "max(usage_total)"],
"name": "cpu",
"tags": {"host": "web-server-02"},
"values": [
[1721059200000, 45.2],
[1721058600000, 52.8],
[1721058000000, 48.5],
[1721057400000, 61.3]
]
},
{
"columns": ["time", "max(usage_total)"],
"name": "cpu",
"tags": {"host": "web-server-03"},
"values": [
[1721059200000, 92.1],
[1721058600000, 95.7],
[1721058000000, 89.4],
[1721057400000, 97.6]
]
}
]
}
Further processing this two-dimensional array:
- To filter the results of this two-dimensional array, use the Having clause
- To sort or paginate the data within a single group in the two-dimensional array, use the order by, limit, offset series of statements
- To sort or paginate the groups of the two-dimensional array, use the sorder by, slimit, soffset series of statements
For a detailed description of the two-dimensional data structure and sorting and pagination functions, please refer to Sorting and Pagination.
Having Clause (having-clause)¶
The Having clause is used to filter the results after group aggregation, similar to the WHERE clause, but acts on the aggregated data.
Basic Syntax¶
Difference from WHERE¶
WHERE and HAVING are both clauses used to filter data, but they operate at different stages of query execution and handle different types of filter conditions.
Difference in Execution Sequence¶
-
WHERE clause:
- Executed before group aggregation
- Acts on original data rows
- Filters data rows that do not meet the conditions, reducing the amount of data for subsequent processing
-
HAVING clause:
- Executed after group aggregation
- Acts on the aggregated results
- Filters based on the results of aggregation functions
Application Scenarios¶
The HAVING clause is suitable for filtering aggregation results:
// Filtering based on aggregation function values
M::cpu:(avg(usage) as avg_usage) [1h] BY host HAVING avg_usage > 80
// Filtering based on multiple aggregation conditions
M::http_requests:(
sum(request_count) as total,
sum(error_count) as errors
) [1h] BY service, endpoint
HAVING errors / total > 0.01 AND total > 1000
// Filtering based on group statistics
L::logs:(count(*) as count) [1h] BY service HAVING count > 100
// Filtering based on compound aggregation conditions
M::response_time:(
avg(response_time) as avg_time,
max(response_time) as max_time,
min(response_time) as min_time
) [1h] BY endpoint
HAVING avg_time > 1000 AND max_time > 5000 AND (max_time - min_time) > 2000
Sorting and Pagination¶
Sorting and pagination in DQL is a very important and unique feature. It designs a dual sorting mechanism for the characteristics of time series data: intra-group sorting and inter-group sorting. This design enables DQL to efficiently handle complex multi-dimensional time series data analysis requirements.
Understanding DQL's Data Structure¶
Before delving into sorting and pagination, first understand the two-dimensional data structure of DQL query results. When a query contains both grouping (BY) and time windows, a two-dimensional array is produced:
- First dimension (Grouping dimension): Multiple groups distinguished by grouping keys
- Second dimension (Time dimension): Data aggregated by time window within each group
For a detailed description of the two-dimensional data structure and JSON format example, please refer to Grouping Result Processing. This two-dimensional structure is the foundation of DQL's sorting and pagination functions. Understanding this structure is crucial to mastering DQL's sorting mechanism.
Intra-group Sorting and Pagination (ORDER BY, LIMIT, OFFSET)¶
Intra-group sorting and pagination act on the second dimension of the two-dimensional structure, i.e., the data within each group. This sorting is executed independently within each group and does not affect data in other groups.
Basic Syntax¶
Execution Mechanism¶
The execution process of intra-group sorting:
- Group Processing: Perform sorting operations independently for each group
- Sorting Basis: Time fields, aggregation function results, or calculation expressions can be used
- Pagination Limit: LIMIT restricts the number of data rows returned per group
- Offset Handling: OFFSET skips the first N rows of data in each group
Application Examples¶
Basic Time Sorting¶
// Sort in descending order by time, show the latest data for each host
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time DESC
// Sort in ascending order by time, show historical trends
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time ASC
Execution Result (ORDER BY time DESC):
{
"series": [
{
"columns": ["time", "max(usage_total)"],
"name": "cpu",
"tags": {"host": "web-server-01"},
"values": [
[1721059200000, 78.5], // 12:00:00
[1721058600000, 82.3], // 11:50:00
[1721058000000, 75.8], // 11:40:00
[1721057400000, 88.2], // 11:30:00
[1721056800000, 72.1], // 11:20:00
[1721056200000, 69.4] // 11:10:00
]
},
{
"columns": ["time", "max(usage_total)"],
"name": "cpu",
"tags": {"host": "web-server-02"},
"values": [
[1721059200000, 45.2], // 12:00:00
[1721058600000, 52.8], // 11:50:00
[1721058000000, 48.5], // 11:40:00
[1721057400000, 61.3], // 11:30:00
[1721056800000, 55.7], // 11:20:00
[1721056200000, 58.9] // 11:10:00
]
}
]
}
Value-based Sorting¶
// Sort in descending order by CPU usage, find the peak periods for each host
M::cpu:(max(usage_total) as max_usage_total) [1h::10m] BY host ORDER BY max_usage_total DESC
// Sort in ascending order by response time, find the periods with the best performance
M::response_time:(avg(response_time) as avg_response_time) [1h::5m] BY endpoint ORDER BY avg_response_time ASC
Intra-group Pagination¶
// Show only the latest 5 data points for each host
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time DESC LIMIT 3
// Skip the latest 2 data points, show the next 3
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time DESC LIMIT 3 OFFSET 2
Execution Result (LIMIT 3):
{
"series": [
{
"columns": ["time", "max(usage_total)"],
"name": "cpu",
"tags": {"host": "web-server-01"},
"values": [
[1721059200000, 78.5], // 12:00:00 - Latest
[1721058600000, 82.3], // 11:50:00
[1721058000000, 75.8] // 11:40:00
// Only the first 3 data points are returned
]
},
{
"columns": ["time", "max(usage_total)"],
"name": "cpu",
"tags": {"host": "web-server-02"},
"values": [
[1721059200000, 45.2], // 12:00:00 - Latest
[1721058600000, 52.8], // 11:50:00
[1721058000000, 48.5] // 11:40:00
// Only the first 3 data points are returned
]
}
]
}
Inter-group Sorting and Pagination (SORDER BY, SLIMIT, SOFFSET)¶
Inter-group sorting and pagination are distinctive features of DQL. They act on the first dimension of the two-dimensional structure, i.e., sorting the groups themselves. This type of sorting requires reducing the data of each group to a single value, and then comparing this value across different groups.
Basic Syntax¶
Execution Mechanism¶
The execution process of inter-group sorting:
- Dimensionality Reduction Calculation: Apply an aggregation function to each group to calculate a representative numerical value
- Group Sorting: Sort all groups based on the value after dimensionality reduction
- Group Pagination: SLIMIT limits the number of groups returned, SOFFSET skips the first N groups
Choice of Dimensionality Reduction Function¶
Inter-group sorting must use aggregation functions for dimensionality reduction. When no aggregation function is specified, the default aggregation function is last.
Commonly used dimensionality reduction functions include:
| Function | Description | Applicable Scenarios |
|---|---|---|
last() |
Take the last value | Suitable for current state of time series |
max() |
Take maximum value | Suitable for peak analysis |
min() |
Take minimum value | Suitable for trough analysis |
avg() |
Take average value | Suitable for overall trend analysis |
sum() |
Sum | Suitable for total statistics |
count() |
Count | Suitable for frequency analysis |
But almost all aggregation functions that return a single value can be used, so the full list of functions is not detailed here.
Application Examples¶
Sorting Based on Average¶
// Sort in descending order by average CPU usage, find the hosts with the highest load
M::cpu:(avg(usage)) [1h::10m] BY host SORDER BY avg(usage) DESC SLIMIT 5
// Sort in ascending order by average response time, find the services with the best performance
M::response_time:(avg(response_time)) [1h] BY service SORDER BY avg(response_time) ASC SLIMIT 10
Execution Process Analysis:
-
Dimensionality Reduction Calculation:
- web-server-01: avg(usage) = 77.7
- web-server-02: avg(usage) = 53.7
- web-server-03: avg(usage) = 90.9
-
Group Sorting (by avg(usage) DESC):
- web-server-03: 90.9
- web-server-01: 77.7
- web-server-02: 53.7
-
Final Result (SLIMIT 2):
{
"series": [
{
"columns": ["time", "avg(usage)"],
"name": "cpu",
"tags": {"host": "web-server-03"}, // Average usage: 90.9 - Rank 1
"values": [
[1721059200000, 92.1],
[1721058600000, 95.7],
[1721058000000, 89.4]
]
},
{
"columns": ["time", "avg(usage)"],
"name": "cpu",
"tags": {"host": "web-server-01"}, // Average usage: 77.7 - Rank 2
"values": [
[1721059200000, 78.5],
[1721058600000, 82.3],
[1721058000000, 75.8]
]
}
// web-server-02 (avg: 53.7) is filtered out because SLIMIT 2
]
}
Application Examples¶
// Sort by maximum CPU usage, find hosts with abnormal peaks
M::cpu:(max(usage_total)) [1h::10m] BY host SORDER BY max(usage_total) DESC SLIMIT 10
// Sort by minimum memory usage, find hosts with the lowest resource utilization
M::memory:(min(usage_percent)) [24h::1h] BY host SORDER BY min(usage_percent) ASC SLIMIT 5
// Sort by latest CPU usage, find hosts with the highest current load
M::cpu:(usage) [1h::10m] BY host SORDER BY last(usage) DESC SLIMIT 10
// Sort by latest error rate, find services with the most current issues
L::logs:(count(*) as error_count) {level = 'error'} [1h] BY service SORDER BY error_count DESC SLIMIT 5
Combined Use of Dual Sorting and Pagination¶
In practical applications, intra-group sorting and inter-group sorting are often used together to achieve complex data display requirements. This combination can simultaneously control the order of groups and the order of data within groups.
Execution Order¶
The execution order of dual sorting:
- Inter-group Sorting: First sort and paginate all groups
- Intra-group Sorting: Sort and paginate internally for the selected groups
Application Examples¶
Dashboard Scenario¶
// Find the 10 servers with the highest CPU usage, show the latest 5 data points for each
M::cpu:(avg(usage)) [1h::10m] BY host
SORDER BY avg(usage) DESC SLIMIT 10 // Inter-group sorting: Find the 10 with highest usage
ORDER BY time DESC LIMIT 5 // Intra-group sorting: Show latest 5 points for each
Execution Result:
{
"series": [
{
"columns": ["time", "avg(usage)"],
"name": "cpu",
"tags": {"host": "web-server-03"}, // Average usage: 90.9 - Rank 1
"values": [
[1721059200000, 92.1], // 12:00:00 - Latest
[1721058600000, 95.7], // 11:50:00
[1721058000000, 89.4], // 11:40:00
[1721057400000, 97.6], // 11:30:00
[1721056800000, 87.3] // 11:20:00
// Only return the latest 5 data points (LIMIT 5)
]
},
{
"columns": ["time", "avg(usage)"],
"name": "cpu",
"tags": {"host": "web-server-01"}, // Average usage: 77.7 - Rank 2
"values": [
[1721059200000, 78.5], // 12:00:00 - Latest
[1721058600000, 82.3], // 11:50:00
[1721058000000, 75.8], // 11:40:00
[1721057400000, 88.2], // 11:30:00
[1721056800000, 72.1] // 11:20:00
// Only return the latest 5 data points (LIMIT 5)
]
}
// Other hosts are filtered out because SLIMIT 10 only returns the 10 with highest usage
]
}
By mastering DQL's sorting and pagination functions, you can build powerful monitoring dashboards, performance analysis tools, and business insight systems.
Note
Reasonable use of the combination of intra-group sorting and inter-group sorting can greatly improve the efficiency and effectiveness of data analysis.