DQL¶
DQL (Debug Query Language) is the core query language of the Guance platform, designed for efficient querying and analysis of time series data, log data, event data, and more. DQL combines the semantic expression of SQL with the syntax structure of PromQL, aiming to provide users with a flexible and powerful query tool.
This document will help you quickly understand the basic syntax and design philosophy of DQL, and demonstrate how to write DQL queries through examples.
Basic Query Structure¶
The basic query structure of DQL is as follows:
namespace[index]::datasource[:select-clause] [{where-clause}] [time-expr] [group-by-clause] [having-clause] [order-by-clause] [limit-clause] [sorder-by-clause] [slimit-clause] [soffset-clause]
Execution Order¶
The execution order of DQL queries is crucial as it determines the semantics and performance of the query.
-
Data Filtering: Filter data based on namespace::datasource, where-clause, and time-expr
- Identify the data source
- Apply WHERE conditions to filter raw data rows
- Apply time range filtering
- Filter data as early as possible to improve subsequent processing efficiency
-
Time Aggregation: If time-expr includes rollup, execute the rollup logic first
- Rollup functions preprocess data on the time dimension
- For Counter-type Metrics, rate or increment is usually calculated instead of using raw values
- For Gauge-type Metrics, aggregation functions like last, avg, etc., may be used
-
Group Aggregation: Execute group-by-clause grouping and aggregation functions within the group
- Group data based on BY clause expressions
- Calculate aggregation functions (sum, count, avg, max, min, etc.) within each group
- If there is a time window, a two-dimensional data structure will be formed
-
Group Filtering: Execute having-clause to filter aggregated groups
- HAVING clause operates on aggregated results
- Can use aggregation function results for filtering
- This is the key difference from the WHERE clause
-
Non-Aggregation Functions: Execute non-aggregation functions in select-clause
- Process expressions and functions that do not require aggregation
- Perform further calculations on aggregation results
-
Intra-Group Sorting: Execute order-by-clause, limit-clause to sort and paginate data within groups
- ORDER BY is executed independently within each group
- LIMIT limits the number of data rows returned per group
-
Inter-Group Sorting: Execute sorder-by-clause, slimit-clause, soffset-clause to sort and paginate groups
- SORDER BY sorts the groups themselves
- Requires dimensionality reduction (e.g., using max, avg, last, etc.)
- SLIMIT limits the number of groups returned
Complete Example¶
Let's understand the structure of DQL through a complete example:
M("production")::cpu:(avg(usage) as avg_usage, max(usage) as max_usage) {host =~ 'web-.*', usage > 50} [1h::5m] BY host, env HAVING avg_usage > 60 ORDER BY time DESC LIMIT 100 SORDER BY avg_usage DESC SLIMIT 10
The meaning of this query is:
- Namespace: M (Metrics data)
- Index:
production
- Datasource:
cpu
- Selected Fields: Average and maximum values of the
usage
field - Time Range: Last 1 hour, aggregated every 5 minutes
- Filter Conditions: Hostname starts with web and CPU usage is greater than 50%
- Grouping: Grouped by hostname and environment
- Group Filtering: Average CPU usage greater than 60%
- Sorting: Sorted by time in descending order, with a maximum of 100 rows per group
- Inter-Group Sorting: Sorted by average usage in descending order, with a maximum of 10 groups
Namespace (namespace)¶
Namespaces are used to distinguish different types of data, each with its specific query methods and storage strategies. DQL supports querying various business data types:
Namespace | Description | Typical Uses |
---|---|---|
M | Metric, time series Metrics | CPU usage, memory usage, request count, etc. |
L | Logging, log data | Application logs, system logs, error logs, etc. |
O | Object, infrastructure object data | Server information, container information, network devices, etc. |
OH | History object, object history data | Server configuration change history, performance Metrics history, etc. |
CO | Custom object, custom object data | Business-defined object information |
COH | History custom object, custom object history data | Custom object history change records |
N | Network, network data | Network traffic, DNS queries, HTTP requests, etc. |
T | Trace, trace data | Distributed tracing, call chain analysis, etc. |
P | Profile, profiling data | Performance profiling, CPU flame graphs, etc. |
R | RUM, user access data | Frontend performance, user behavior analysis, etc. |
E | Event, event data | Alert events, deployment events, system events, etc. |
UE | Unrecover Event, unrecoverable event data | Unresolved alerts and events |
Index (index)¶
Indexes are an important mechanism for DQL query optimization, which can be understood as tables or partitions in traditional databases. Within a single namespace, the system may split data based on data sources, data volume, access patterns, etc., to improve query performance and management efficiency.
Index Functions¶
- Performance Optimization: Reduce the amount of data scanned in a single query by distributing data across indexes
- Data Isolation: Data from different businesses, environments, or time ranges can be stored in different indexes
- Permission Management: Different access permissions can be set for different indexes
- Lifecycle Management: Different retention policies can be set for different indexes
Index Naming Rules¶
- Index names must be explicitly declared; wildcards or regular expressions are not supported for matching
- Index names usually reflect the business attributes of the data, such as:
production
,staging
,web-logs
,api-logs
- The default index name is
default
, which is used when no index is explicitly specified
Basic Syntax¶
// Use the default index (automatically uses default when not specified)
M::cpu // Equivalent to M("default")::cpu
L::nginx // Equivalent to L("default")::nginx
// Specify a single index
M("production")::cpu // Query CPU Metrics in the production index
L("web-logs")::nginx // Query Nginx logs in the web-logs index
// Multi-index query (query data from multiple indexes simultaneously)
M("production", "staging")::cpu // Query CPU Metrics in production and staging environments
L("web-logs", "api-logs")::nginx // Query Web and API logs
Index and Performance¶
Proper use of indexes can significantly improve query performance:
- Precise Index: Directly specify the index when the data location is known
- Multi-index Query: Use multi-index syntax instead of wildcards when querying across multiple indexes
- Avoid Full Index Scan: Reduce data scan range by combining indexes and WHERE conditions
Compatible Syntax (Not Recommended)¶
Due to historical reasons, DQL also supports specifying indexes in the where clause, but this is not recommended:
// Old syntax, not recommended
L::nginx { index = "web-logs" }
L::nginx { index IN ["web-logs", "api-logs"] }
Application Examples¶
// Query CPU usage in the production environment
M("production")::cpu:(avg(usage)) [1h] BY host
// Compare production and staging environments
M("production", "staging")::cpu:(avg(usage)) [1h] BY index, host
// Analyze Web server logs
L("web-logs")::nginx:(count(*)) {status >= 400} [1h] BY status
// Compare error rates between Web and API servers
L("web-logs", "api-logs")::*:(count(*)) {status >= 500} [1h] BY index
Datasource (datasource)¶
Datasource specifies the specific data source to query, which can be a dataset name, wildcard pattern, regular expression, or subquery.
Basic Datasource¶
Different namespaces have different datasource definitions:
Namespace | Datasource Type | Examples |
---|---|---|
M | Measurement | cpu, memory, network |
L | Source | nginx, tomcat, java-app |
O | Infrastructure Object Class | host, container, process |
T | Service Name (service) | user-service, order-service |
R | RUM Data Type | session, view, resource, error |
Datasource Syntax¶
Specify Datasource Name¶
M::cpu:(usage) // Query CPU Metrics
L::nginx:(count(*)) // Query Nginx logs
T::user-service:(traces) // Query user service traces
Wildcard Matching¶
Regular Expression Matching¶
M::re('cpu.*'):(usage) // Query Metrics starting with cpu
L::re('web.*'):(count(*)) // Query logs starting with web
T::re('.*-service'):(traces) // Query services ending with -service
Subquery Datasource¶
Subqueries are an important feature in DQL for implementing complex analysis, allowing the results of one query to be used as the datasource for another query. This nested query mechanism supports multi-level analysis requirements.
Execution Mechanism¶
Subqueries follow these principles:
- Serial Execution: The inner subquery executes first, and its results are used as the datasource for the outer query
- Result Encapsulation: Subquery results are encapsulated into temporary table structures for use by the outer query
- Namespace Mixing: Subqueries support mixed namespace queries, enabling cross-data type analysis
- Performance Consideration: Subqueries increase computational complexity, so query logic should be designed properly
Basic Syntax¶
Execution Process¶
Take a typical subquery as an example:
The execution process is:
-
Inner Subquery:
L::*:(count(*)) {level = 'error'} BY app_id
- Scan all log data
- Filter logs with error level
- Group by app_id and count errors
- Generate temporary table:
app_id | count(*)
-
Outer Query:
L::(...):(count_distinct(app_id))
- Use subquery results as datasource
- Count the number of distinct app_ids
- Final result: Number of applications with errors
Application Examples¶
// Count the number of applications with errors
L::(L::*:(count(*)) {level = 'error'} BY app_id):(count_distinct(app_id))
// Analyze servers with high CPU usage
M::(M::cpu:(avg(usage)) [1h] BY host {avg(usage) > 80}):(count(host))
// First find service endpoints with error rates exceeding 1%, then count the number of affected services
M::(M::http_requests:(sum(request_count), sum(error_count)) [1h] BY service, endpoint
{sum(error_count) / sum(request_count) > 0.01}
):(count(service))
Select Clause (select-clause)¶
The select clause is used to specify the fields or expressions to be returned by the query, and is one of the most basic and important parts of a DQL query.
Field Selection¶
Basic Syntax¶
// Select a single field
M::cpu:(usage)
// Select multiple fields
M::cpu:(usage, system, user)
// Select all fields
M::cpu:(*)
Field Name Rules¶
Field names can be written in the following forms:
-
Direct Name: Suitable for ordinary identifiers
- ✅
message
- ✅
host_name
- ✅
response_time
- ✅
-
Backtick Wrapped: Suitable for field names containing special characters or keywords
- ✅
message
- ✅
limit
- ✅
host-name
- ✅
column with spaces
- ✅
-
Avoided Writing: Single and double quotes wrap strings, not field names
- ❌
'message'
- ❌
"message"
- ❌
JSON Field Extraction¶
When data fields contain JSON-formatted content, a subset of JSON Path syntax can be used to extract data from internal fields.
Basic Syntax¶
JSON Path Syntax¶
- Dot notation for object properties:
.field_name
- Square brackets for object properties:
["key"]
(suitable for keys containing spaces or special characters) - Array index for values:
[index]
Application Examples¶
Assume the following JSON log data:
{
"message": "User login attempt",
"request": {
"method": "POST",
"path": "/api/login",
"headers": {
"user-agent": "Mozilla/5.0",
"content-type": "application/json"
},
"body": {
"username": "john.doe",
"password": "***",
"permissions": ["read", "write", "admin"]
}
},
"response": {
"status": 200,
"time": 156,
"data": [
{"id": 1, "name": "user1"},
{"id": 2, "name": "user2"}
]
}
}
// Extract request method
L::auth_logs:(message@request.method)
// Extract request path
L::auth_logs:(message@request.path)
// Extract username
L::auth_logs:(message@request.body.username)
// Extract response status
L::auth_logs:(message@response.status)
// Extract User-Agent (contains a hyphen, requires square brackets)
L::auth_logs:(message@request.headers["user-agent"])
// Extract first element of permissions array
L::auth_logs:(message@request.body.permissions[0])
// Extract name of the first object in response data
L::auth_logs:(message@response.data[0].name)
// Count the number of different request methods
L::auth_logs:(count(*)) [1h] BY message@request.method
// Analyze response time distribution
L::auth_logs:(avg(message@response.time), max(message@response.time)) [1h] BY message@request.method
// Extract multiple fields
L::auth_logs:(
message,
message@request.method as method,
message@request.path as path,
message@response.status as status,
message@response.time as response_time
) {message@response.status >= 400} [1h]
Calculated Fields¶
Expression Calculation¶
Basic arithmetic operations are supported:
// Unit conversion (milliseconds to seconds)
L::nginx:(response_time / 1000) as response_time_seconds
// Calculate percentage
M::memory:(used / total * 100) as usage_percentage
// Compound calculation
M::network:((bytes_in + bytes_out) / 1024 / 1024) as total_traffic_mb
Function Calculation¶
Various aggregation and transformation functions are supported:
// Aggregation functions
M::cpu:(max(usage), min(usage), avg(usage)) [1h] BY host
// Transformation functions
L::logs:(int(response_time) as response_time_seconds)
L::logs:(floor(response_time) as response_time_seconds)
Aliases¶
Assign aliases to fields or expressions to make results more readable and convenient for subsequent references.
Basic Syntax¶
Application Examples¶
// Simple alias
M::cpu:(avg(usage) as avg_usage, max(usage) as max_usage) [1h] BY host
// Expression alias
M::memory:((used / total) * 100 as usage_percent) [1h] BY host
// Function alias
L::logs:(count(*) as error_count) {level = 'error'} [1h] BY service
// JSON extraction alias
L::api_logs:(
message@request.method as http_method,
message@response.status as http_status,
message@response.time as response_time_ms
) [1h]
Usage Tips¶
In DQL, aggregation function results can be directly referenced using the original field name, reducing the need for aliases:
M::cpu:(max(usage)) [1h] BY host
// The result will contain the max(usage) column, which can be directly referenced in subsequent queries:
M::(M::cpu:(max(usage)) [1h] BY host):(max(usage)) { usage > 80 }
However, when multiple aggregation functions use the same field, aliases must be used to correctly distinguish them in subsequent queries:
// Case where aliases are necessary
M::cpu:(max(usage) as max_usage, min(usage) as min_usage) [1h] BY host
Time Expression (time-expr)¶
Time expressions are one of the core features of DQL, used to specify the query time range, aggregation time window, and Rollup aggregation functions.
Basic Syntax¶
Time Range¶
Absolute Timestamp¶
Relative Time¶
Supports multiple time units, which can be mixed:
[1h] // Last 1 hour to now
[1h:5m] // From last 1 hour to last 5 minutes
[1h30m] // Last 1 hour 30 minutes
[2h15m30s] // Last 2 hours 15 minutes 30 seconds
Time Units¶
Unit | Description | Example |
---|---|---|
s | Seconds | 30s |
m | Minutes | 5m |
h | Hours | 2h |
d | Days | 7d |
w | Weeks | 4w |
y | Years | 1y |
Preset Time Ranges¶
Provides commonly used time range keywords:
Keyword | Description | Time Range |
---|---|---|
TODAY | Today | From today 00:00 to now |
YESTERDAY | Yesterday | From yesterday 00:00 to today 00:00 |
THIS WEEK | This week | From Monday 00:00 to now |
LAST WEEK | Last week | From last Monday 00:00 to this Monday 00:00 |
THIS MONTH | This month | From the 1st 00:00 to now |
LAST MONTH | Last month | From last month 1st 00:00 to this month 1st 00:00 |
[TODAY] // Today's data
[YESTERDAY] // Yesterday's data
[THIS WEEK] // This week's data
[LAST WEEK] // Last week's data
[THIS MONTH] // This month's data
[LAST MONTH] // Last month's data
When using time range keywords, ensure that the workspace timezone settings are correct, and strictly follow the user's requested timezone for conversion.
Time Window Aggregation¶
Time windows group data by specified time intervals for aggregation, and the time column in the results represents the start time of each time window.
Single Time Window¶
Aggregate the entire time range into a single value:
Query Result:
Time Window Aggregation¶
Group by time interval for aggregation:
Query Result:
{
"columns": ["time", "max(usage_total)"],
"values": [
[1721059200000, 37.46],
[1721058600000, 34.12],
[1721058000000, 33.81],
[1721057400000, 30.92],
[1721058000000, 34.53],
[1721057400000, 36.11]
]
}
Rollup Functions¶
Rollup functions are an important preprocessing step in DQL, executed before group aggregation, used to preprocess raw time series data.
Execution Sequence¶
Rollup's position in the query execution flow:
Raw data → WHERE filtering → **Rollup preprocessing** → Group aggregation → HAVING filtering → Final result
Execution Mechanism¶
Rollup execution is divided into two stages:
- Per Time Series Processing: Apply Rollup functions independently to each time series
- Aggregation Calculation: Execute group aggregation on the Rollup-processed results
Application Scenarios¶
A typical application scenario for Rollup functions is Counter Metrics processing.
For Prometheus Counter-type Metrics, directly using raw values for aggregation is meaningless because Counters are monotonically increasing. First, calculate the rate or increment for each time series, then perform aggregation.
*Problem Example: Assume there are two servers' request counters:
{
"host": "web-server-01",
"data": [
{"time": "2024-07-15 08:25:00", "request_count": 150},
{"time": "2024-07-15 08:20:00", "request_count": 140},
{"time": "2024-07-15 08:15:00", "request_count": 130},
{"time": "2024-07-15 08:10:00", "request_count": 120},
{"time": "2024-07-15 08:05:00", "request_count": 110},
{"time": "2024-07-15 08:00:00", "request_count": 100}
]
}
{
"host": "web-server-02",
"data": [
{"time": "2024-07-15 08:25:00", "request_count": 250},
{"time": "2024-07-15 08:20:00", "request_count": 240},
{"time": "2024-07-15 08:15:00", "request_count": 230},
{"time": "2024-07-15 08:10:00", "request_count": 220},
{"time": "2024-07-15 08:05:00", "request_count": 210},
{"time": "2024-07-15 08:00:00", "request_count": 200}
]
}
Direct Aggregation Problem:
- web-server-01's request_count starts at 100
- web-server-02's request_count starts at 200
- Although both servers have the same request rate (10 requests every 5 minutes), the absolute values are different
Rollup Solution:
Execution Process:
-
Rollup Stage (executed independently on each time series):
- web-server-01: rate([100, 110, 120, 130, 140, 150]) = 2 requests/minute
- web-server-02: rate([200, 210, 220, 230, 240, 250]) = 2 requests/minute
-
Aggregation Stage:
- sum([2, 2]) = 4 requests/minute
Function Types¶
Common Rollup functions include:
Function Type | Description | Applicable Scenarios |
---|---|---|
rate() |
Calculate growth rate | Counter-type Metrics |
increase() |
Calculate growth amount | Counter-type Metrics |
last() |
Take the last value | Gauge-type Metrics |
avg() |
Calculate average | Data smoothing |
max() |
Take maximum | Peak analysis |
min() |
Take minimum | Valley analysis |
But almost all aggregation functions that return a single value can be used, so the full function list is not listed here.
Default Rollup¶
If no Rollup function is explicitly specified, DQL defaults to no Rollup calculation, while PromQL's default Rollup is last, so if you are calculating Prometheus Metrics, be sure to understand this difference and manually specify the Rollup function.
Application Examples¶
// Calculate the total request rate for all servers
M::http_requests:(sum(request_count)) [rate]
// Calculate error rate
M::http_requests:(
sum(error_count) as errors,
sum(request_count) as requests
) [rate] BY service
Flexible Time Window Syntax¶
DQL supports various shorthand formats for time windows, making query writing more convenient.
Shorthand Formats¶
[1h] // Only specify time range
[1h::5m] // Time range + aggregation step
[1h:5m] | Start time + end time
[1h:5m:1m] | Start + end + step
[1h:5m:1m:avg] | Full format
[::5m] | Only specify aggregation step
[:::sum] | Only specify rollup function
[sum] | Only specify rollup function (shortest form)
Filter Conditions (where-clause)¶
Filter conditions are used to filter data rows, retaining only data that meets the conditions for subsequent processing.
Basic Syntax¶
Multiple conditions can be connected with commas, AND, OR, &&, ||.
Comparison Operators¶
Operator | Description | Example |
---|---|---|
= |
Equal | host = 'web-01' |
!= |
Not equal | status != 200 |
> |
Greater than | cpu_usage > 80 |
>= |
Greater than or equal | memory_usage >= 90 |
< |
Less than | response_time < 1000 |
<= |
Less than or equal | disk_usage <= 80 |
Pattern Matching Operators¶
Operator | Description | Example |
---|---|---|
=~ |
Regex match | message =~ 'error.*\\d+' |
!~ |
Regex not match | message !~ 'debug.*' |
Set Operators¶
Operator | Description | Example |
---|---|---|
IN |
In set | status IN [200, 201, 202] |
NOT IN |
Not in set | level NOT IN ['debug', 'info'] |
Logical Operators¶
Operator | Description | Example |
---|---|---|
AND or && |
Logical AND | cpu > 80 AND memory > 90 |
OR or | | |
Logical OR | status = 500 OR status = 502 |
NOT |
Logical NOT | NOT status = 200 |
Application Examples¶
Basic Filtering¶
// Single condition
M::cpu:(usage) {host = 'web-01'} [1h]
// Multiple AND conditions
M::cpu:(usage) {host = 'web-01', usage > 80} [1h]
// Use AND keyword
M::cpu:(usage) {host = 'web-01' AND usage > 80} [1h]
// Mixed logical operators
M::cpu:(usage) {(host = 'web-01' OR host = 'web-02') AND usage > 80} [1h]
Regular Expression Matching¶
// Match error logs
L::logs:(message) {message =~ 'ERROR.*\\d{4}'} [1h]
// Match specific log format
L::logs:(message) {message =~ '\\[(ERROR|WARN)\\].*'} [1h]
// Exclude debug information
L::logs:(message) {message !~ 'DEBUG.*'} [1h]
// Hostname pattern matching
M::cpu:(usage) {host =~ 'web-.*\\.prod\\.com'} [1h]
Set Operations¶
// Status code filtering
L::nginx:(count(*)) {status IN [200, 201, 202, 204]} [1h]
// Exclude specific status codes
L::nginx:(count(*)) {status NOT IN [404, 500, 502]} [1h]
// Log level filtering
L::app_logs:(count(*)) {level IN ['ERROR', 'WARN', 'CRITICAL']} [1h]
Array Field Handling¶
When the field type is an array, DQL supports various array matching operations.
Assume a field tags = ['web', 'prod', 'api']
:
Recommended Syntax: Use IN and NOT IN¶
// Check if array contains a value
{tags IN ['web']} // true, because tags contains 'web'
{tags IN ['mobile']} // false, because tags does not contain 'mobile'
// Check if array does not contain a value
{tags NOT IN ['mobile']} // true, because tags does not contain 'mobile'
{tags NOT IN ['web']} // false, because tags contains 'web'
// Check if array contains all specified values
{tags IN ['web', 'api']} // true, contains 'web' and 'api'
{tags IN ['web', 'api', 'mobile']} // false, does not contain 'mobile'
Compatible Syntax (Not Recommended)¶
The following syntax is retained for historical compatibility and is not recommended for new queries. These operators have different semantics on array fields than on regular fields, which can cause confusion.
// Historical syntax: Single value inclusion check (overloaded semantics of equal operator)
{tags = 'web'} // true, because tags contains 'web'
{tags = 'mobile'} // false, because tags does not contain 'mobile'
// Historical syntax: Single value exclusion check (overloaded semantics of not equal operator)
{tags != 'mobile'} // true, because tags does not contain 'mobile'
{tags != 'web'} // false, because tags contains 'web'
Function Filtering¶
Any function that returns a boolean value can be used as a filter condition.
// String matching
L::logs:(message) { match(message, 'error') }
L::logs:(message) { wildcard(message, 'error*') }
// Query requests with abnormal response times
L::access_logs:(*) {
response_time > 1000 AND
match(message, 'timeout')
}
// Query hosts with abnormal memory usage
M::memory:(usage) {
(usage > 90 OR usage < 10) AND
host =~ 'prod-.*' AND
tags IN ['critical', 'important']
}
WHERE Subquery¶
WHERE subqueries are a powerful feature in DQL for implementing dynamic filtering, allowing the results of one query to be used as the filter condition for another query. This mechanism supports dynamic filtering based on data analysis results.
Query Characteristics¶
- Dynamic Filtering: Filter conditions are not fixed values but are dynamically calculated through queries
- Namespace Mixing: Supports cross-namespace queries, enabling association analysis across different data types
- Serial Execution: Subqueries execute first, and their results are used for filtering in the main query
- Array Results: Subquery results are encapsulated as arrays, so only
IN
andNOT IN
operators are supported
Execution Flow¶
Take a typical WHERE subquery as an example:
Execution process:
-
Subquery Execution:
O::HOST:(hostname) {provider = 'cloud-a'}
- Query all infrastructure objects
- Filter hosts with provider 'cloud-a'
- Return hostname list:
['host-01', 'host-02', 'host-03']
-
Main Query Execution:
M::cpu:(avg(usage)) [1h] BY host {host IN [...]}
- Query CPU usage data
- Only count hosts returned by the subquery
- Calculate average usage by host
Application Examples¶
// Monitor servers of a specific cloud provider
M::cpu:(avg(usage)) [1h] BY host
{host IN (O::HOST:(hostname) {provider = 'cloud-a'})}
// Compare performance across different cloud providers
M::memory:(avg(used / total * 100)) [1h] BY host
{host IN (O::HOST:(hostname) {provider IN ['cloud-a', 'cloud-b']})}
// Analyze logs of a specific business
L::app_logs:(count(*)) [1h] BY level
{service IN (T::services:(service_name) {business_unit = 'ecommerce'})}
// Monitor application performance of critical businesses
M::response_time:(avg(response_time)) [1h] BY service
{service IN (T::services:(service_name) {criticality = 'high'})}
Grouping (group-by-clause)¶
Grouping is a core feature of data analysis, used to group data by specified dimensions for aggregation.
Basic Syntax¶
Grouping Types¶
Field Grouping¶
// Single field grouping
M::cpu:(avg(usage)) [1h] BY host
// Multiple field grouping
M::cpu:(avg(usage)) [1h] BY host, env
// Nested grouping
M::cpu:(avg(usage)) [1h] BY datacenter, rack, host
Expression Grouping¶
// Compound mathematical expression
M::memory:(avg(used)) [1h] BY ((used / total) * 100) as
usage_percent
// Multi-field mathematical operations
M::performance:(avg(response_time)) [1h] BY
(response_time / 1000) as response_seconds
Function Grouping¶
// Drain clustering algorithm
L::logs:(count(*)) BY drain(message, 0.7) as sample
// Regex extraction grouping
L::logs:(count(*)) [1h] BY regexp_extract(message, 'error_code: (\\d+)', 1)
Grouping Result Processing¶
When a query includes both grouping and time windows, a two-dimensional data structure is produced. Group By can be mixed with time windows, and the query result will be a two-dimensional array. The first layer of this two-dimensional array is multiple groups distinguished by group keys, and the second dimension is multi-time interval data within a single group.
Two-dimensional data structure example:
Query: M::cpu:(max(usage_total)) [1h::10m] by host
Query Result Structure:
{
"series": [
{
"columns": ["time", "max(usage_total)"],
"name": "cpu",
"tags": {"host": "web-server-01"},
"values": [
[1721059200000, 78.5],
[1721058600000, 82.3],
[1721058000000, 75.8],
[1721057400000, 88.2]
]
},
{
"columns": ["time", "max(usage_total)"],
"name": "cpu",
"tags": {"host": "web-server-02"},
"values": [
[1721059200000, 45.2],
[1721058600000, 52.8],
[1721058000000, 48.5],
[1721057400000, 61.3]
]
},
{
"columns": ["time", "max(usage_total)"],
"name": "cpu",
"tags": {"host": "web-server-03"},
"values": [
[1721059200000, 92.1],
[1721058600000, 95.7],
[1721058000000, 89.4],
[1721057400000, 97.6]
]
}
]
}
To further process this two-dimensional array:
- If you want to filter the results of this two-dimensional array, use the Having clause
- If you want to sort or paginate the data within a single group of the two-dimensional array, use the order by, limit, offset series of statements
- If you want to sort or paginate the groups of the two-dimensional array, use the sorder by, slimit, soffset series of statements
For a detailed explanation of the two-dimensional data structure and sorting and pagination features, refer to Sorting and Pagination.
Having Clause (having-clause)¶
The Having clause is used to filter grouped and aggregated results, similar to the WHERE clause, but operates on aggregated data.
Basic Syntax¶
Differences from WHERE¶
WHERE and HAVING are both clauses used to filter data, but they operate at different stages of query execution and handle different types of filter conditions.
Execution Sequence Differences¶
-
WHERE Clause:
- Executed before group aggregation
- Operates on raw data rows
- Filters data rows that do not meet conditions, reducing the amount of data for subsequent processing
-
HAVING Clause:
- Executed after group aggregation
- Operates on aggregated results
- Filters based on aggregation function results
Application Scenarios¶
The Having clause is suitable for filtering aggregation results:
// Filter based on aggregation function values
M::cpu:(avg(usage) as avg_usage) [1h] BY host HAVING avg_usage > 80
// Filter based on multiple aggregation conditions
M::http_requests:(
sum(request_count) as total,
sum(error_count) as errors
) [1h] BY service, endpoint
HAVING errors / total > 0.01 AND total > 1000
// Filter based on group statistics
L::logs:(count(*) as count) [1h] BY service HAVING count > 100
// Filter based on compound aggregation conditions
M::response_time:(
avg(response_time) as avg_time,
max(response_time) as max_time,
min(response_time) as min_time
) [1h] BY endpoint
HAVING avg_time > 1000 AND max_time > 5000 AND (max_time - min_time) > 2000
Sorting and Pagination¶
Sorting and pagination in DQL are very important and unique features, designed with a dual sorting mechanism for time series data characteristics: intra-group sorting and inter-group sorting. This design enables DQL to efficiently handle complex multi-dimensional time series data analysis requirements.
Understanding DQL's Data Structure¶
Before diving into sorting and pagination, first understand the two-dimensional data structure of DQL query results. When a query includes both grouping (BY) and time windows, a two-dimensional array is produced:
- First Dimension (Group Dimension): Multiple groups distinguished by group keys
- Second Dimension (Time Dimension): Data aggregated by time windows within each group
For a detailed explanation of the two-dimensional data structure and JSON format example, refer to Grouping Result Processing. This two-dimensional structure is the foundation of DQL's sorting and pagination features, and understanding this structure is crucial for mastering DQL's sorting mechanism.
Intra-Group Sorting and Pagination (ORDER BY, LIMIT, OFFSET)¶
Intra-group sorting and pagination operate on the second dimension of the two-dimensional structure, i.e., the data within each group. This sorting is executed independently within each group and does not affect data in other groups.
Basic Syntax¶
Execution Mechanism¶
Intra-group sorting execution process:
- Group Processing: Sorting is executed independently for each group
- Sorting Basis: Can use time fields, aggregation function results, or calculation expressions
- Pagination Limit: LIMIT limits the number of data rows returned per group
- Offset Handling: OFFSET skips the first N rows of data per group
Application Examples¶
Basic Time Sorting¶
// Sort by time in descending order, show the latest data for each host
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time DESC
// Sort by time in ascending order, show historical trends
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time ASC
Execution Result (ORDER BY time DESC):
{
"series": [
{
"columns": ["time", "max(usage_total)"],
"name": "cpu",
"tags": {"host": "web-server-01"},
"values": [
[1721059200000, 78.5], // 12:00:00
[1721058600000, 82.3], // 11:50:00
[1721058000000, 75.8], // 11:40:00
[1721057400000, 88.2], // 11:30:00
[1721056800000, 72.1], // 11:20:00
[1721056200000, 69.4] // 11:10:00
]
},
{
"columns": ["time", "max(usage_total)"],
"name": "cpu",
"tags": {"host": "web-server-02"},
"values": [
[1721059200000, 45.2], // 12:00:00
[1721058600000, 52.8], // 11:50:00
[1721058000000, 48.5], // 11:40:00
[1721057400000, 61.3], // 11:30:00
[1721056800000, 55.7], // 11:20:00
[1721056200000, 58.9] // 11:10:00
]
}
]
}
Value-Based Sorting¶
// Sort by CPU usage in descending order, find peak periods for each host
M::cpu:(max(usage_total) as max_usage_total) [1h::10m] BY host ORDER BY max_usage_total DESC
// Sort by response time in ascending order, find the best performance periods
M::response_time:(avg(response_time) as avg_response_time) [1h::5m] BY endpoint ORDER BY avg_response_time ASC
Intra-Group Pagination¶
// Each host only shows the latest 3 data points
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time DESC LIMIT 3
// Skip the latest 2 data points, show the next 3
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time DESC LIMIT 3 OFFSET 2
Execution Result (LIMIT 3):
{
"series": [
{
"columns": ["time", "max(usage_total)"],
"name": "cpu",
"tags": {"host": "web-server-01"},
"values": [
[1721059200000, 78.5], // 12:00:00 - Latest
[1721058600000, 82.3], // 11:50:00
[1721058000000, 75.8] // 11:40:00
// Only return the first 3 data points
]
},
{
"columns": ["time", "max(usage_total)"],
"name": "cpu",
"tags": {"host": "web-server-02"},
"values": [
[1721059200000, 45.2], // 12:00:00 - Latest
[1721058600000, 52.8], // 11:50:00
[1721058000000, 48.5] // 11:40:00
// Only return the first 3 data points
]
}
]
}
Inter-Group Sorting and Pagination (SORDER BY, SLIMIT, SOFFSET)¶
Inter-group sorting and pagination are unique features of DQL, operating on the first dimension of the two-dimensional structure, i.e., sorting the groups themselves. This sorting requires reducing each group's data to a single value, then comparing this value across different groups.
Basic Syntax¶
Execution Mechanism¶
Inter-group sorting execution process:
- Dimensionality Reduction: Apply an aggregation function to each group to calculate a representative value
- Group Sorting: Sort all groups based on the dimensionality-reduced value
- Group Pagination: SLIMIT limits the number of groups returned, SOFFSET skips the first N groups
Dimensionality Reduction Function Selection¶
Inter-group sorting must use an aggregation function for dimensionality reduction. When no aggregation function is specified, the default aggregation function is last
.
Common dimensionality reduction functions include:
Function | Description | Applicable Scenarios |
---|---|---|
last() |
Take the last value | Suitable for current state of time series |
max() |
Take maximum | Suitable for peak analysis |
min() |
Take minimum | Suitable for valley analysis |
avg() |
Take average | Suitable for overall trend analysis |
sum() |
Sum | Suitable for total statistics |
count() |
Count | Suitable for frequency analysis |
But almost all aggregation functions that return a single value can be used, so the full function list is not listed here.
Application Examples¶
Average-Based Sorting¶
// Sort by average CPU usage in descending order, find the highest load hosts
M::cpu:(avg(usage)) [1h::10m] BY host SORDER BY avg(usage) DESC SLIMIT 5
// Sort by average response time in ascending order, find the best performing services
M::response_time:(avg(response_time)) [1h] BY service SORDER BY avg(response_time) ASC SLIMIT 10
Execution Process Analysis:
-
Dimensionality Reduction:
- web-server-01: avg(usage) = 77.7
- web-server-02: avg(usage) = 53.7
- web-server-03: avg(usage) = 90.9
-
Group Sorting (by avg(usage) DESC):
- web-server-03: 90.9
- web-server-01: 77.7
- web-server-02: 53.7
-
Final Result (SLIMIT 2):
{
"series": [
{
"columns": ["time", "avg(usage)"],
"name": "cpu",
"tags": {"host": "web-server-03"}, // Average usage: 90.9 - Rank 1
"values": [
[1721059200000, 92.1],
[1721058600000, 95.7],
[1721058000000, 89.4]
]
},
{
"columns": ["time", "avg(usage)"],
"name": "cpu",
"tags": {"host": "web-server-01"}, // Average usage: 77.7 - Rank 2
"values": [
[1721059200000, 78.5],
[1721058600000, 82.3],
[1721058000000, 75.8]
]
}
// web-server-02 (avg: 53.7) is filtered out due to SLIMIT 2
]
}
Application Examples¶
// Sort by maximum CPU usage, find hosts with abnormal peaks
M::cpu:(max(usage_total)) [1h::10m] BY host SORDER BY max(usage_total) DESC SLIMIT 10
// Sort by minimum memory usage, find hosts with the lowest resource utilization
M::memory:(min(usage_percent)) [24h::1h] BY host SORDER BY min(usage_percent) ASC SLIMIT 5
// Sort by latest CPU usage, find the highest load hosts
M::cpu:(usage) [1h::10m] BY host SORDER BY last(usage) DESC SLIMIT 10
// Sort by latest error rate, find services with the most current issues
L::logs:(count(*) as error_count) {level = 'error'} [1h] BY service SORDER BY error_count DESC SLIMIT 5
Combined Use of Dual Sorting and Pagination¶
In practical applications, intra-group sorting and inter-group sorting are often used together to achieve complex data display requirements. This combination can simultaneously control the order of groups and the order of data within groups.
Execution Sequence¶
Dual sorting execution sequence:
- Inter-Group Sorting: First sort and paginate all groups
- Intra-Group Sorting: Sort and paginate the selected groups internally
Application Examples¶
Monitoring Dashboard Scenario¶
// Find the 10 servers with the highest CPU usage, show the latest 5 data points for each
M::cpu:(avg(usage)) [1h::10m] BY host
SORDER BY avg(usage) DESC SLIMIT 10 // Inter-group sorting: Find the 10 highest usage servers
ORDER BY time DESC LIMIT 5 // Intra-group sorting: Show the latest 5 points for each
Execution Result:
{
"series": [
{
"columns": ["time", "avg(usage)"],
"name": "cpu",
"tags": {"host": "web-server-03"}, // Average usage: 90.9 - Rank 1
"values": [
[1721059200000, 92.1], // 12:00:00 - Latest
[1721058600000, 95.7], // 11:50:00
[1721058000000, 89.4], // 11:40:00
[1721057400000, 97.6], // 11:30:00
[1721056800000, 87.3] // 11:20:00
// Only return the latest 5 data points (LIMIT 5)
]
},
{
"columns": ["time", "avg(usage)"],
"name": "cpu",
"tags": {"host": "web-server-01"}, // Average usage: 77.7 - Rank 2
"values": [
[1721059200000, 78.5], // 12:00:00 - Latest
[1721058600000, 82.3], // 11:50:00
[1721058000000, 75.8], // 11:40:00
[1721057400000, 88.2], // 11:30:00
[1721056800000, 72.1] // 11:20:00
// Only return the latest 5 data points (LIMIT 5)
]
}
// Other hosts are filtered out due to SLIMIT 10, only returning the 10 highest usage servers
]
}
By mastering DQL's sorting and pagination features, you can build powerful monitoring dashboards, performance analysis tools, and business insight systems.
Note
Proper use of the combination of intra-group sorting and inter-group sorting can greatly improve the efficiency and effectiveness of data analysis.