DQL¶

DQL (Debug Query Language) is the core query language of the Guance platform, specifically designed for efficient querying and analysis of time series data, log data, event data, and more. DQL combines the semantic expression of SQL with the syntactic structure of PromQL, aiming to provide users with a flexible and powerful query tool.

This document will help you quickly understand the basic syntax and design philosophy of DQL, and demonstrate how to write DQL queries through examples.

Basic Query Structure¶

The basic structure of a DQL query is as follows:

namespace[index]::datasource[:select-clause] [{where-clause}] [time-expr] [group-by-clause] [having-clause] [order-by-clause] [limit-clause] [sorder-by-clause] [slimit-clause] [soffset-clause]

Where index can be omitted; when omitted, the default index is used.

Execution Order¶

The execution order of a DQL query is crucial, as it determines the query's semantics and performance.

Data Filtering: Filter data based on namespace::datasource, where-clause, time-expr
- Determine the data source
- Apply WHERE conditions to filter raw data rows
- Apply time range filtering
- Filter data as early as possible at this stage to improve subsequent processing efficiency
Time Aggregation: If time-expr includes rollup, execute rollup logic first
- Rollup functions preprocess data along the time dimension
- For Counter-type Metrics, rate or increase is typically calculated instead of using raw values directly
- For Gauge-type Metrics, aggregation functions like last, avg may be used
Group Aggregation: Execute group-by-clause grouping, and execute aggregation functions in the select-clause within each group
- Group data based on expressions in the BY clause
- Calculate aggregation functions (sum, count, avg, max, min, etc.) within each group
- If there is also a time window, a two-dimensional data structure is formed
Group Filtering: Execute having-clause to filter aggregated groups
- The HAVING clause acts on aggregated results
- Can filter using the results of aggregation functions
- This is a key difference from the WHERE clause
Non-Aggregate Functions: Execute non-aggregate functions in the select-clause
- Process expressions and functions that do not require aggregation
- Perform further calculations on aggregated results
Intra-group Sorting: Execute order-by-clause, limit-clause to sort and paginate data within each group
- ORDER BY is executed independently within each group
- LIMIT restricts the number of data rows returned per group
Inter-group Sorting: Execute sorder-by-clause, slimit-clause, soffset-clause to sort and paginate groups
- SORDER BY sorts the groups themselves
- Requires dimensionality reduction of group results (e.g., using functions like max, avg, last)
- SLIMIT restricts the number of groups returned

Complete Example¶

Let's understand the structure of DQL through a complete example:

M::cpu:(avg(usage) as avg_usage, max(usage) as max_usage) {host =~ 'web-.*', usage > 50} [1h::5m] BY host, env HAVING avg_usage > 60 ORDER BY time DESC LIMIT 100 SORDER BY avg_usage DESC SLIMIT 10

The meaning of this query is:

Namespace: M (Metric data)
Index: production
Datasource: cpu
Selected Fields: Average and maximum values of the usage field
Time Range: Past 1 hour, aggregated by 5 minutes
Filter Conditions: Hostname starts with 'web' and CPU usage is greater than 50%
Grouping: Group by hostname and environment
Group Filtering: Average CPU usage greater than 60%
Sorting: Sort by time descending, up to 100 entries per group
Inter-group Sorting: Sort by average usage descending, up to 10 groups

Namespace (namespace)¶

Namespaces are used to distinguish different types of data, each with its specific query methods and storage strategies. DQL supports querying various business data types:

Namespace	Description	Typical Use Cases
M	Metric, time series Metrics data	CPU usage, memory usage, request count, etc.
L	Logging, log data	Application logs, system logs, error logs, etc.
O	Object, infrastructure object data	Server information, container information, network devices, etc.
OH	History object, object historical data	Server configuration change history, performance Metrics history, etc.
CO	Custom object, custom object data	Business-defined object information
COH	History custom object, custom object historical data	Historical change records of custom objects
N	Network, network data	Network traffic, DNS queries, HTTP requests, etc.
T	Trace, link call data	Distributed tracing, call chain analysis, etc.
P	Profile, call profiling data	Performance profiling, CPU flame graphs, etc.
R	RUM, user access data	Frontend performance, user behavior analysis, etc.
E	Event, event data	Alert events, deployment events, system events, etc.
UE	Unrecovered Event, unrecovered event data	Unresolved alerts and events
B	Cloud billing, cloud billing data

Index (index)¶

Index is an important mechanism for DQL query optimization, which can be understood as tables or partitions in traditional databases. Within a single namespace, the system may split data due to factors such as data source, data volume, and access patterns to improve query performance and management efficiency.

Role of Index¶

Performance Optimization: Distribute data storage through indexes to reduce the amount of data scanned per query
Data Isolation: Data from different businesses, environments, or time ranges can be stored in different indexes
Permission Management: Different access permissions can be set for different indexes
Lifecycle Management: Different data retention policies can be set for different indexes

Index Naming Rules¶

Index names can be explicitly declared; when not explicitly declared, default is used
When explicitly declaring an index name, wildcards or regular expressions for matching are not supported
Index names typically reflect the business attributes of the data, e.g., production, staging, web-logs, api-logs

Basic Syntax¶

// Use default index (automatically uses default when not specified)
M::cpu           // Equivalent to M("default")::cpu
L::nginx         // Equivalent to L("default")::nginx

// Specify a single index
M("production")::cpu     // Query CPU Metrics in the production index
L("web-logs")::nginx     // Query Nginx logs in the web-logs index

// Multi-index query (query data from multiple indexes simultaneously)
M("production", "staging")::cpu     // Query CPU Metrics from production and staging environments
L("web-logs", "api-logs")::nginx     // Query Web and API logs

Index and Performance¶

Using indexes appropriately can significantly improve query performance:

Precise Index: Directly specify the index when you know exactly which index the data is in
Multi-index Query: Use multi-index syntax when querying across multiple indexes instead of using wildcards
Avoid Full Index Scans: Try to reduce data scan range through a combination of indexes and WHERE conditions

Compatible Syntax (Not Recommended)¶

Due to historical reasons, DQL also supports specifying indexes in the where clause, but it is not recommended:

// Old syntax, not recommended
L::nginx { index = "web-logs" }
L::nginx { index IN ["web-logs", "api-logs"] }

Application Examples¶

// Query CPU usage in the production environment
M::cpu:(avg(usage)) [1h] BY host

// Compare production and staging environments
M("production", "staging")::cpu:(avg(usage)) [1h] BY index, host

// Analyze Web server logs
L("web-logs")::nginx:(count(*)) {status >= 400} [1h] BY status

// Compare error rates between Web and API servers
L("web-logs", "api-logs")::*:(count(*)) {status >= 500} [1h] BY index

Datasource (datasource)¶

Datasource specifies the specific data source for the query, which can be a dataset name, wildcard pattern, regular expression, or subquery.

Basic Datasource¶

Datasource definitions differ across namespaces:

Namespace	Datasource Type	Examples
M	Measurement	cpu, memory, network
L	Source	nginx, tomcat, java-app
O	Infrastructure object classification	host, container, process
T	Service name	user-service, order-service
R	RUM data type	session, view, resource, error

Datasource Syntax¶

Specify Datasource Name¶

M::cpu:(usage)                    // Query CPU Metrics
L::nginx:(count(*))                // Query Nginx logs
T::user-service:(traces)           // Query user service traces

Wildcard Matching¶

M::*:(usage)                      // Query all Metrics

Regular Expression Matching¶

M::re('cpu.*'):(usage)            // Query Metrics starting with cpu
L::re('web.*'):(count(*))          // Query logs starting with web
T::re('.*-service'):(traces)      // Query services ending with -service

Subquery Datasource¶

Subqueries are an important feature in DQL for implementing complex analysis, allowing the results of one query to be used as the datasource for another. This nested query mechanism supports multi-level analysis requirements.

Execution Mechanism¶

Subquery execution follows these principles:

Serial Execution: The inner subquery executes first, and its results are then used as the datasource for the outer query
Result Encapsulation: Subquery results are encapsulated into a temporary table structure for use by the outer query
Namespace Mixing: Subqueries support mixing different namespaces, enabling cross-data type analysis
Performance Consideration: Subqueries increase computational complexity; query logic needs to be designed appropriately

Basic Syntax¶

namespace::(subquery):(projections)

Execution Process¶

Take a typical subquery as an example:

L::(L::*:(count(*)) {level = 'error'} BY app_id):(count_distinct(app_id))

The execution process is:

Inner Subquery: L::*:(count(*)) {level = 'error'} BY app_id
- Scan all log data
- Filter logs with error level
- Count error quantity grouped by app_id
- Generate temporary table: app_id | count(*)
Outer Query: L::(...):(count_distinct(app_id))
- Use subquery results as datasource
- Count how many distinct app_ids there are
- Final result: Number of applications with errors

Application Examples¶

// Count the number of applications with errors
L::(L::*:(count(*)) {level = 'error'} BY app_id):(count_distinct(app_id))

// Analyze servers with high CPU usage
M::(M::cpu:(avg(usage)) [1h] BY host {avg(usage) > 80}):(count(host))

// First find service endpoints with error rates exceeding 1%, then count the number of affected services
M::(M::http_requests:(sum(request_count), sum(error_count)) [1h] BY service, endpoint
   {sum(error_count) / sum(request_count) > 0.01}
):(count(service))

Select Clause (select-clause)¶

The Select clause is used to specify the fields or expressions to be returned by the query, and is one of the most basic and important parts of a DQL query.

Field Selection¶

Basic Syntax¶

// Select a single field
M::cpu:(usage)

// Select multiple fields
M::cpu:(usage, system, user)

// Select all fields
M::cpu:(*)

Field Name Rules¶

Field names can be written in the following forms:

Write name directly: Suitable for ordinary identifiers
- ✅ message
- ✅ host_name
- ✅ response_time
Enclosed in backticks: Suitable for field names containing special characters or keywords
- ✅ message
- ✅ limit
- ✅ host-name
- ✅ column with spaces
Avoid this writing: Single and double quotes enclose strings, not field names
- ❌ 'message'
- ❌ "message"

JSON Field Extraction¶

When data fields contain JSON-formatted content, a subset of JSON Path syntax can be used to extract data from internal fields.

Basic Syntax¶

field-name@json-path
@json-path                    // Uses the message field by default

JSON Path Syntax¶

Dot notation for object properties: .field_name
Bracket notation for object properties: ["key"] (suitable for keys containing spaces or special characters)
Array index for values: [index]

Application Examples¶

Assume the following JSON log data:

{
  "message": "User login attempt",
  "request": {
    "method": "POST",
    "path": "/api/login",
    "headers": {
      "user-agent": "Mozilla/5.0",
      "content-type": "application/json"
    },
    "body": {
      "username": "john.doe",
      "password": "***",
      "permissions": ["read", "write", "admin"]
    }
  },
  "response": {
    "status": 200,
    "time": 156,
    "data": [
      {"id": 1, "name": "user1"},
      {"id": 2, "name": "user2"}
    ]
  }
}

// Extract request method
L::auth_logs:(message@request.method)

// Extract request path
L::auth_logs:(message@request.path)

// Extract username
L::auth_logs:(message@request.body.username)

// Extract response status
L::auth_logs:(message@response.status)

// Extract User-Agent (contains hyphen, needs brackets)
L::auth_logs:(message@request.headers["user-agent"])

// Extract first element of permissions array
L::auth_logs:(message@request.body.permissions[0])

// Extract name of first object in response data
L::auth_logs:(message@response.data[0].name)

// Count number of different request methods
L::auth_logs:(count(*)) [1h] BY message@request.method

// Analyze response time distribution
L::auth_logs:(avg(message@response.time), max(message@response.time)) [1h] BY message@request.method

// Extract multiple fields
L::auth_logs:(
    message,
    message@request.method as method,
    message@request.path as path,
    message@response.status as status,
    message@response.time as response_time
) {message@response.status >= 400} [1h]

Calculated Fields¶

Expression Calculation¶

Supports basic arithmetic operations:

// Unit conversion (milliseconds to seconds)
L::nginx:(response_time / 1000) as response_time_seconds

// Calculate percentage
M::memory:(used / total * 100) as usage_percentage

// Compound calculation
M::network:((bytes_in + bytes_out) / 1024 / 1024) as total_traffic_mb

Function Calculation¶

Supports various aggregation and transformation functions:

// Aggregation functions
M::cpu:(max(usage), min(usage), avg(usage)) [1h] BY host

// Transformation functions
L::logs:(int(response_time) as response_time_seconds)
L::logs:(floor(response_time) as response_time_seconds)

Conditional Expressions¶

CASE WHEN is used to select different values based on conditions within a query. It is often used with aggregation functions for conditional counting, conditional summation, or normalizing fields based on conditions.

Basic Syntax¶

CASE
  WHEN condition THEN value
  [WHEN condition THEN value ...]
  [ELSE default_value]
END

Simple CASE syntax can also be used for multi-value matching on the same field:

CASE field
  WHEN value1 THEN result1
  WHEN value2 THEN result2
  ELSE default_value
END

WHEN matches sequentially in the written order, returning the corresponding THEN value upon hitting the first condition; if none match, the ELSE value is returned. If ELSE is not explicitly written, nil is returned by default.

Application Examples¶

Conditional Summation: Only sum traffic for 5xx requests

L::nginx_access:(
    sum(CASE WHEN status >= 500 THEN bytes ELSE 0 END) as error_bytes
) [1h] BY host

Conditional Counting: Count number of error requests

L::nginx_access:(
    count(CASE WHEN status >= 500 THEN 1 ELSE nil END) as error_count
) [1h] BY service

Note: count(expr) counts non-nil values. 0 is also non-nil, so count(CASE WHEN condition THEN 1 ELSE 0 END) will count all rows, not just rows meeting the condition. For conditional counting, it is recommended to use ELSE nil, or use sum(CASE WHEN condition THEN 1 ELSE 0 END).

Multi-branch Classification: Generate levels based on status codes

L::nginx_access:(
    max(CASE
        WHEN status >= 500 THEN 3
        WHEN status >= 400 THEN 2
        WHEN status >= 300 THEN 1
        ELSE 0
    END) as status_level
) [1h] BY service

Field cleaning before judgment: Count error logs ignoring case

L::app_logs:(
    sum(CASE WHEN lower(level) = "error" THEN 1 ELSE 0 END) as error_count
) [1h] BY service

Type conversion before aggregation: Convert string field to numeric for summation

L::nginx_access:(
    sum(CASE WHEN status >= 500 THEN int(bytes) ELSE 0 END) as error_bytes
) [1h] BY host

Supported Scope¶

CASE WHEN is currently a limited row-level conditional expression, designed to support high-performance pushdown execution. It can be placed inside aggregation functions, such as sum(CASE ...), count(CASE ...), max(CASE ...).

CASE internally supports:

Fields
Literals and nil
Boolean conditions
The following scalar functions: int, float, string, md5, lower, upper, trim, ltrim, rtrim, length

Aggregation functions are not supported inside CASE. Aggregation functions should wrap CASE:

// Recommended: Calculate CASE row by row first, then aggregate
L::nginx_access:(
    sum(CASE WHEN status >= 500 THEN bytes ELSE 0 END) as error_bytes
) [1h] BY host

// Not supported: Using aggregation functions inside CASE
L::nginx_access:(
    CASE WHEN sum(bytes) > 0 THEN "has_bytes" ELSE "empty" END
) [1h] BY host

Complex scalar functions not listed in the supported scope, such as regexp_extract, are also not supported inside CASE. If such complex processing is needed, it is recommended to split the logic first through query conditions, field cleaning, or subqueries to avoid triggering large-scale detail scans within CASE.

Aliases¶

Assign aliases to fields or expressions to make results more readable and convenient for subsequent reference.

Basic Syntax¶

expression as alias_name

Application Examples¶

// Simple alias
M::cpu:(avg(usage) as avg_usage, max(usage) as max_usage) [1h] BY host

// Expression alias
M::memory:((used / total) * 100 as usage_percent) [1h] BY host

// Function alias
L::logs:(count(*) as error_count) {level = 'error'} [1h] BY service

// JSON extraction alias
L::api_logs:(
    message@request.method as http_method,
    message@response.status as http_status,
    message@response.time as response_time_ms
) [1h]

Usage Tips¶

In DQL, aggregation function results can be directly referenced using the original field name, reducing the need for aliases:

M::cpu:(max(usage)) [1h] BY host

// The result will contain a max(usage) column, which can be directly referenced later using the usage column name to get the `max(usage)` column from the subquery result:
M::(M::cpu:(max(usage)) [1h] BY host):(max(usage)) { usage > 80 }

However, when multiple aggregation functions use the same field, aliases must be used to correctly distinguish them later:

// Situation where aliases are necessary
M::cpu:(max(usage) as max_usage, min(usage) as min_usage) [1h] BY host

Time Clause (time-clause)¶

The Time clause is one of the core features of DQL, used to specify the query time range, aggregation time window, and Rollup aggregation functions.

Basic Syntax¶

[start_time:end_time:interval:rollup]

Time Range¶

Absolute Timestamp¶

[1672502400000:1672588800000]     // Millisecond timestamp
[1672502400:1672588800]           // Second timestamp

Relative Time¶

Supports multiple duration units, which can be mixed:

[1h]                             // Past 1 hour to now
[1h:5m]                          // From past 1 hour to past 5 minutes
[1h30m]                          // Past 1 hour 30 minutes
[2h15m30s]                       // Past 2 hours 15 minutes 30 seconds

Duration Expression Description¶

Unit	Description	Example
s	seconds	30s
m	minutes	5m
h	hours	2h
d	days	7d
w	weeks	4w
y	years	1y

When used in a time clause, duration expressions represent an offset from the current time backwards. When used elsewhere like in a Select clause or Where clause, they are treated as integer milliseconds for calculation.

When used in aggregation queries, two additional duration units are supported:

Unit	Description	Example
i, is	Multiples of the aggregation time window, return value is floating-point seconds	1i, 1is
ims	Multiples of the aggregation time window, return value is integer milliseconds	1ims

O::HOST:(count(*)){ `last_update_time` > (now()-10m) } // 10m is treated as integer 600,000 for calculation
L::*:( count(*) / 1i ) [::1m] // Divide by the time window size (1m) in seconds to calculate the QPS of log writes

Preset Time Ranges¶

Provides commonly used time range keywords:

Keyword	Description	Time Range
TODAY	Today	From today 00:00 to now
YESTERDAY	Yesterday	From yesterday 00:00 to today 00:00
THIS WEEK	This week	From Monday 00:00 of this week to now
LAST WEEK	Last week	From Monday 00:00 of last week to Monday 00:00 of this week
THIS MONTH	This month	From 1st 00:00 of this month to now
LAST MONTH	Last month	From 1st 00:00 of last month to 1st 00:00 of this month

[TODAY]                          // Today's data
[YESTERDAY]                      // Yesterday's data
[THIS WEEK]                      // This week's data
[LAST WEEK]                      // Last week's data
[THIS MONTH]                     // This month's data
[LAST MONTH]                     // Last month's data

When using time range keywords, please confirm that the workspace's timezone setting is correct, and conversion must be performed strictly according to the user's requested timezone.

Time Window Aggregation¶

Time windows group and aggregate data at specified time intervals. The time column in the returned results represents the start time of each time window.

Single Time Window¶

The entire time range is aggregated into a single value:

M::cpu:(max(usage_total)) [1h]

Query Result:

{
  "columns": ["time", "max(usage_total)"],
  "values": [
    [1721059200000, 37.46]
  ]
}

Time Window Aggregation¶

Group aggregation by time interval:

M::cpu:(max(usage_total)) [1h::10m]

Query Result:

{
  "columns": ["time", "max(usage_total)"],
  "values": [
    [1721059200000, 37.46],
    [1721058600000, 34.12],
    [1721058000000, 33.81],
    [1721057400000, 30.92],
    [1721058000000, 34.53],
    [1721057400000, 36.11]
  ]
}

Rollup Functions¶

Rollup functions are an important preprocessing step in DQL, executed before group aggregation, used to preprocess raw time series data.

Execution Sequence¶

Position of Rollup in the query execution flow:

Raw data → WHERE filtering → **Rollup preprocessing** → Group aggregation → HAVING filtering → Final result

Execution Mechanism¶

The Rollup execution process is divided into two stages:

Per-time series processing: Apply Rollup functions individually to each independent time series
Aggregation calculation: Execute group aggregation on the results after Rollup processing

Application Scenarios¶

A typical application scenario for Rollup functions is Counter Metrics processing.

For Prometheus Counter-type Metrics, directly aggregating raw values is meaningless because Counters are monotonically increasing. The growth rate needs to be calculated per time series first, and then aggregated.

Problem Example: Assume request counters for two servers:

{
  "host": "web-server-01",
  "data": [
    {"time": "2024-07-15 08:25:00", "request_count": 150},
    {"time": "2024-07-15 08:20:00", "request_count": 140},
    {"time": "2024-07-15 08:15:00", "request_count": 130},
    {"time": "2024-07-15 08:10:00", "request_count": 120},
    {"time": "2024-07-15 08:05:00", "request_count": 110},
    {"time": "2024-07-15 08:00:00", "request_count": 100}
  ]
}

{
  "host": "web-server-02",
  "data": [
    {"time": "2024-07-15 08:25:00", "request_count": 250},
    {"time": "2024-07-15 08:20:00", "request_count": 240},
    {"time": "2024-07-15 08:15:00", "request_count": 230},
    {"time": "2024-07-15 08:10:00", "request_count": 220},
    {"time": "2024-07-15 08:05:00", "request_count": 210},
    {"time": "2024-07-15 08:00:00", "request_count": 200}
  ]
}

Problem with direct aggregation:

Starting value of request_count for web-server-01 is 100
Starting value of request_count for web-server-02 is 200
Although both servers have the same request rate (both 10 requests per 5 minutes), the absolute values are different

Solution using Rollup:

M::http:(sum(request_count)) [rate]

Execution Process:

Rollup stage (executed separately on each time series):
- web-server-01: rate([100, 110, 120, 130, 140, 150]) = 2 requests/minute
- web-server-02: rate([200, 210, 220, 230, 240, 250]) = 2 requests/minute
Aggregation stage:
- sum([2, 2]) = 4 requests/minute

Function Types¶

Common Rollup functions include:

Function Type	Description	Applicable Scenarios
`rate()`	Calculate growth rate	Counter-type Metrics
`increase()`	Calculate growth amount	Counter-type Metrics
`last()`	Take the last value	Gauge-type Metrics
`avg()`	Calculate average	Data smoothing
`max()`	Take maximum value	Peak analysis
`min()`	Take minimum value	Trough analysis

But almost all aggregation functions that return a single value can be used, so the complete function list is not listed here.

Default Rollup¶

If no Rollup function is explicitly specified, DQL defaults to no Rollup calculation. PromQL's default Rollup is last, so if you are calculating Prometheus Metrics, be sure to understand this difference and manually specify the Rollup function.

Application Examples¶

// Calculate total request rate for all servers
M::http_requests:(sum(request_count)) [rate]

// Calculate error rate
M::http_requests:(
    sum(error_count) as errors,
    sum(request_count) as requests
) [rate] BY service

Flexible Time Window Syntax¶

DQL supports various shorthand formats for time windows, making query writing more convenient.

Shorthand Formats¶

[1h]                             // Only specify time range
[1h::5m]                         // Time range + aggregation step
[1h:5m]                          // Start time + end time
[1h:5m:1m]                       // Start + end + step
[1h:5m:1m:avg]                   // Full format
[::5m]                           // Only specify aggregation step
[:::sum]                         // Only specify rollup function
[sum]                            // Only specify rollup function (simplest form)

Filter Conditions (where-clause)¶

Filter conditions are used to filter data rows, retaining only data that meets the conditions for subsequent processing.

Basic Syntax¶

{condition1, condition2, condition3}

Multiple conditions can be connected using commas, AND, OR, &&, ||.

Comparison Operators¶

Operator	Description	Example
`=`	Equal to	`host = 'web-01'`
`!=`	Not equal to	`status != 200`
`>`	Greater than	`cpu_usage > 80`
`>=`	Greater than or equal to	`memory_usage >= 90`
`<`	Less than	`response_time < 1000`
`<=`	Less than or equal to	`disk_usage <= 80`

Pattern Matching Operators¶

Operator	Description	Example
`=~`	Regular expression match	`message =~ 'error.*\\d+'`
`!~`	Regular expression not match	`message !~ 'debug.*'`

Set Operators¶

Operator	Description	Example
`IN`	In set	`status IN [200, 201, 202]`
`NOT IN`	Not in set	`level NOT IN ['debug', 'info']`

Logical Operators¶

Operator	Description	Example
`AND` or `&&`	Logical AND	`cpu > 80 AND memory > 90`
`OR` or `\| \|`	Logical OR	`status = 500 OR status = 502`
`NOT`	Logical NOT	`NOT status = 200`

Tip: The OR operator, besides performing logical OR, also provides a null-fallback semantics—when the left expression returns NULL, it directly returns the result of the right expression. This can be used to implement priority fallback logic, e.g., status OR backup_status.

Application Examples¶

Basic Filtering¶

// Single condition
M::cpu:(usage) {host = 'web-01'} [1h]

// Multiple AND conditions
M::cpu:(usage) {host = 'web-01', usage > 80} [1h]

// Using AND keyword
M::cpu:(usage) {host = 'web-01' AND usage > 80} [1h]

// Mixed use of logical operators
M::cpu:(usage) {(host = 'web-01' OR host = 'web-02') AND usage > 80} [1h]

Regular Expression Matching¶

// Match error logs
L::logs:(message) {message =~ 'ERROR.*\\d{4}'} [1h]

// Match logs of specific format
L::logs:(message) {message =~ '\\[(ERROR|WARN)\\].*'} [1h]

// Exclude debug information
L::logs:(message) {message !~ 'DEBUG.*'} [1h]

// Hostname pattern matching
M::cpu:(usage) {host =~ 'web-.*\\.prod\\.com'} [1h]

Set Operations¶

// Status code filtering
L::nginx:(count(*)) {status IN [200, 201, 202, 204]} [1h]

// Exclude specific status codes
L::nginx:(count(*)) {status NOT IN [404, 500, 502]} [1h]

// Log level filtering
L::app_logs:(count(*)) {level IN ['ERROR', 'WARN', 'CRITICAL']} [1h]

Handling Array Fields¶

When a field type is an array, DQL supports various array matching operations.

Assume a field tags = ['web', 'prod', 'api']:

Recommended Syntax: Use IN and NOT IN¶

// Check if array contains a value
{tags IN ['web']}           // true, because tags contains 'web'
{tags IN ['mobile']}        // false, because tags does not contain 'mobile'

// Check if array does not contain a value
{tags NOT IN ['mobile']}    // true, because tags does not contain 'mobile'
{tags NOT IN ['web']}       // false, because tags contains 'web'

// Check if array contains all specified values
{tags IN ['web', 'api']}           // true, contains both 'web' and 'api'
{tags IN ['web', 'api', 'mobile']} // false, does not contain 'mobile'

Compatible Syntax (Not Recommended)¶

The following syntax is retained for historical compatibility and is not recommended for use in new queries. The semantics of these operators on array fields differ from those on ordinary fields and can easily cause confusion.

// Historical syntax: Single-value containment check (overloaded semantics of equality operator)
{tags = 'web'}           // true, because tags contains 'web'
{tags = 'mobile'}        // false, because tags does not contain 'mobile'

// Historical syntax: Single-value non-containment check (overloaded semantics of inequality operator)
{tags != 'mobile'}       // true, because tags does not contain 'mobile'
{tags != 'web'}          // false, because tags contains 'web'

Function Filtering¶

Any function returning a boolean value can be used as a filter condition.

// String matching
L::logs:(message) { match(message, 'error') }
L::logs:(message) { wildcard(message, 'error*') }

// Query requests with abnormal response time
L::access_logs:(*) {
    response_time > 1000 AND
    match(message, 'timeout')
}

// Query hosts with abnormal memory usage
M::memory:(usage) {
    (usage > 90 OR usage < 10) AND
    host =~ 'prod-.*' AND
    tags IN ['critical', 'important']
}

WHERE Subquery¶

WHERE subqueries are a powerful feature in DQL for implementing dynamic filtering, allowing the results of one query to be used as filter conditions for another. This mechanism supports dynamic filtering based on data analysis results.

Query Characteristics¶

Dynamic Filtering: Filter conditions are not fixed values but are dynamically calculated through queries
Namespace Mixing: Supports cross-namespace queries, enabling correlation analysis across different data types
Serial Execution: Subqueries execute first, and their results are used for filtering the main query
Array Results: Subquery results are encapsulated as arrays, therefore only IN and NOT IN operators are supported

Execution Flow¶

Take a typical WHERE subquery as an example:

M::cpu:(avg(usage)) { host IN (O::HOST:(hostname) {provider = 'cloud-a'}) } [1h] BY host

Execution process:

Subquery Execution: O::HOST:(hostname) {provider = 'cloud-a'}
- Query all infrastructure objects
- Filter hosts where provider is 'cloud-a'
- Return hostname list: ['host-01', 'host-02', 'host-03']
Main Query Execution: M::cpu:(avg(usage)) [1h] BY host {host IN [...]}
- Query CPU usage data
- Only count hosts returned by the subquery
- Calculate average usage grouped by host

Application Examples¶

// Monitor servers of a specific cloud provider
M::cpu:(avg(usage)) { host IN (O::HOST:(hostname) {provider = 'cloud-a'}) } [1h] BY host


// Compare performance across different cloud providers
M::memory:(avg(used / total * 100)) { host IN (O::HOST:(hostname) {provider IN ['cloud-a', 'cloud-b']}) } [1h] BY host


// Analyze logs for a specific business
L::app_logs:(count(*)) { service IN (T::services:(service_name) {business_unit = 'ecommerce'}) } [1h] BY level


// Monitor application performance for critical businesses
M::response_time:(avg(response_time)) { service IN (T::services:(service_name) {criticality = 'high'}) } [1h] BY service

Grouping (group-by-clause)¶

Grouping is a core function for data analysis, used to group and aggregate data by specified dimensions.

Basic Syntax¶

BY expression1, expression2, ...

Grouping Types¶

Field Grouping¶

// Single field grouping
M::cpu:(avg(usage)) [1h] BY host

// Multi-field grouping
M::cpu:(avg(usage)) [1h] BY host, env

// Nested grouping
M::cpu:(avg(usage)) [1h] BY datacenter, rack, host

Expression Grouping¶

// Compound mathematical expression
M::memory:(avg(used)) [1h] BY ((used / total) * 100) as
usage_percent

// Multi-field mathematical operation
M::performance:(avg(response_time)) [1h] BY
(response_time / 1000) as response_seconds

Function Grouping¶

// Drain clustering algorithm
L::logs:(count(*)) BY drain(message, 0.7) as sample

// Regular expression extraction grouping
L::logs:(count(*)) [1h] BY regexp_extract(message, 'error_code: (\\d+)', 1)

Grouping Result Processing¶

When a query includes both grouping and a time window, a two-dimensional data structure is produced. Group By can be used in combination with time windows, and the query result will be a two-dimensional array. The first layer of this two-dimensional array is multiple groups distinguished by grouping keys, and the second dimension is the multi-time-interval data within a single group.

Two-dimensional data structure example:

Query: M::cpu:(max(usage_total)) [1h::10m] by host

Query Result Structure:

{
  "series": [
    {
      "columns": ["time", "max(usage_total)"],
      "name": "cpu",
      "tags": {"host": "web-server-01"},
      "values": [
        [1721059200000, 78.5],
        [1721058600000, 82.3],
        [1721058000000, 75.8],
        [1721057400000, 88.2]
      ]
    },
    {
      "columns": ["time", "max(usage_total)"],
      "name": "cpu",
      "tags": {"host": "web-server-02"},
      "values": [
        [1721059200000, 45.2],
        [1721058600000, 52.8],
        [1721058000000, 48.5],
        [1721057400000, 61.3]
      ]
    },
    {
      "columns": ["time", "max(usage_total)"],
      "name": "cpu",
      "tags": {"host": "web-server-03"},
      "values": [
        [1721059200000, 92.1],
        [1721058600000, 95.7],
        [1721058000000, 89.4],
        [1721057400000, 97.6]
      ]
    }
  ]
}

Further processing of this two-dimensional array:

To filter the results of this two-dimensional array, use the Having clause
To sort or paginate data within a single group in the two-dimensional array, use the order by, limit, offset series of statements
To sort or paginate the groups themselves of the two-dimensional array, use the sorder by, slimit, soffset series of statements

For a detailed explanation of the two-dimensional data structure and sorting/pagination functions, please refer to Sorting and Pagination.

Having Clause (having-clause)¶

The Having clause is used to filter the results after group aggregation, similar to the WHERE clause, but acts on aggregated data.

Basic Syntax¶

HAVING condition

Difference from WHERE¶

WHERE and HAVING are both clauses used to filter data, but they operate at different stages of query execution and handle different types of filter conditions.

Difference in Execution Sequence¶

Raw data → WHERE filtering → Group aggregation → HAVING filtering → Final result

WHERE clause:
- Executes before group aggregation
- Acts on raw data rows
- Filters data rows that do not meet conditions, reducing data volume for subsequent processing
HAVING clause:
- Executes after group aggregation
- Acts on aggregated results
- Filters based on the results of aggregation functions

Application Scenarios¶

The HAVING clause is suitable for filtering aggregated results:

// Filtering based on aggregated function values
M::cpu:(avg(usage) as avg_usage) [1h] BY host HAVING avg_usage > 80

// Filtering based on multiple aggregation conditions
M::http_requests:(
    sum(request_count) as total,
    sum(error_count) as errors
) [1h] BY service, endpoint
HAVING errors / total > 0.01 AND total > 1000

// Filtering based on group statistics
L::logs:(count(*) as count) [1h] BY service HAVING count > 100

// Filtering based on compound aggregation conditions
M::response_time:(
    avg(response_time) as avg_time,
    max(response_time) as max_time,
    min(response_time) as min_time
) [1h] BY endpoint
HAVING avg_time > 1000 AND max_time > 5000 AND (max_time - min_time) > 2000

Sorting and Pagination¶

Sorting and pagination in DQL are very important and unique features, designed with a dual-sorting mechanism for the characteristics of time series data: intra-group sorting and inter-group sorting. This design enables DQL to efficiently handle complex multi-dimensional time series data analysis requirements.

Understanding DQL's Data Structure¶

Before diving into sorting and pagination, let's understand the two-dimensional data structure of DQL query results. When a query includes both grouping (BY) and a time window, a two-dimensional array is produced:

First dimension (Grouping dimension): Multiple groups distinguished by grouping keys
Second dimension (Time dimension): Data aggregated by time window within each group

For a detailed explanation of the two-dimensional data structure and JSON format examples, please refer to Grouping Result Processing. This two-dimensional structure is the foundation of DQL's sorting and pagination functions, and understanding it is crucial for mastering DQL's sorting mechanism.

Intra-group Sorting and Pagination (ORDER BY, LIMIT, OFFSET)¶

Intra-group sorting and pagination act on the second dimension of the two-dimensional structure, i.e., the data within each group. This sorting is executed independently within each group and does not affect data in other groups.

Basic Syntax¶

ORDER BY expression [ASC|DESC]
LIMIT row_count
OFFSET row_offset

Execution Mechanism¶

The execution process of intra-group sorting:

Group Processing: Sorting operations are performed independently for each group
Sorting Basis: Can use time fields, aggregation function results, or calculation expressions
Pagination Limit: LIMIT restricts the number of data rows returned per group
Offset Handling: OFFSET skips the first N rows of data in each group

Application Examples¶

Basic Time Sorting¶

// Sort by time descending, show the latest data for each host
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time DESC

// Sort by time ascending, show historical trends
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time ASC

Execution Result (ORDER BY time DESC):

{
  "series": [
    {
      "columns": ["time", "max(usage_total)"],
      "name": "cpu",
      "tags": {"host": "web-server-01"},
      "values": [
        [1721059200000, 78.5],  // 12:00:00
        [1721058600000, 82.3],  // 11:50:00
        [1721058000000, 75.8],  // 11:40:00
        [1721057400000, 88.2],  // 11:30:00
        [1721056800000, 72.1],  // 11:20:00
        [1721056200000, 69.4]   // 11:10:00
      ]
    },
    {
      "columns": ["time", "max(usage_total)"],
      "name": "cpu",
      "tags": {"host": "web-server-02"},
      "values": [
        [1721059200000, 45.2],  // 12:00:00
        [1721058600000, 52.8],  // 11:50:00
        [1721058000000, 48.5],  // 11:40:00
        [1721057400000, 61.3],  // 11:30:00
        [1721056800000, 55.7],  // 11:20:00
        [1721056200000, 58.9]   // 11:10:00
      ]
    }
  ]
}

Sorting Based on Numeric Values¶

// Sort by CPU usage descending, find peak periods for each host
M::cpu:(max(usage_total) as max_usage_total) [1h::10m] BY host ORDER BY max_usage_total DESC

// Sort by response time ascending, find periods with best performance
M::response_time:(avg(response_time) as avg_response_time) [1h::5m] BY endpoint ORDER BY avg_response_time ASC

Intra-group Pagination¶

// Show only the latest 3 data points for each host
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time DESC LIMIT 3

// Skip the latest 2 data points, show the next 3
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time DESC LIMIT 3 OFFSET 2

Execution Result (LIMIT 3):

{
  "series": [
    {
      "columns": ["time", "max(usage_total)"],
      "name": "cpu",
      "tags": {"host": "web-server-01"},
      "values": [
        [1721059200000, 78.5],  // 12:00:00 - Latest
        [1721058600000, 82.3],  // 11:50:00
        [1721058000000, 75.8]   // 11:40:00
        // Only the first 3 data points are returned
      ]
    },
    {
      "columns": ["time", "max(usage_total)"],
      "name": "cpu",
      "tags": {"host": "web-server-02"},
      "values": [
        [1721059200000, 45.2],  // 12:00:00 - Latest
        [1721058600000, 52.8],  // 11:50:00
        [1721058000000, 48.5]   // 11:40:00
        // Only the first 3 data points are returned
      ]
    }
  ]
}

Inter-group Sorting and Pagination (SORDER BY, SLIMIT, SOFFSET)¶

Inter-group sorting and pagination are distinctive features of DQL, acting on the first dimension of the two-dimensional structure, i.e., sorting the groups themselves. This sorting requires reducing the data of each group to a single value, then comparing these values across different groups.

When a query does not use BY, the result typically has only one group. In this case, SORDER BY usually has no visible effect on sorting, but SLIMIT / SOFFSET still take effect according to the "number of groups" semantics (i.e., applied to that single group).

Basic Syntax¶

SORDER BY aggregate_function(expression) [ASC|DESC]
SLIMIT group_count
SOFFSET group_offset

Execution Mechanism¶

The execution process of inter-group sorting:

Dimensionality Reduction Calculation: Apply an aggregation function to each group to calculate a representative numerical value
Group Sorting: Sort all groups based on the dimensionality-reduced values
Group Pagination: SLIMIT restricts the number of groups returned, SOFFSET skips the first N groups

Selection of Dimensionality Reduction Functions¶

Inter-group sorting must use aggregation functions for dimensionality reduction. When no aggregation function is specified, the default aggregation function used is last.

Commonly used dimensionality reduction functions include:

Function	Description	Applicable Scenarios
`last()`	Take the last value	Suitable for current state of time series
`max()`	Take the maximum value	Suitable for peak analysis
`min()`	Take the minimum value	Suitable for trough analysis
`avg()`	Take the average value	Suitable for overall trend analysis
`sum()`	Summation	Suitable for total statistics
`count()`	Count	Suitable for frequency analysis

But almost all aggregation functions that return a single value can be used, so the complete function list is not detailed here.

Application Examples¶

Sorting Based on Average Values¶

// Sort by average CPU usage descending, find hosts with highest load
M::cpu:(avg(usage)) [1h::10m] BY host SORDER BY avg(usage) DESC SLIMIT 5

// Sort by average response time ascending, find services with best performance
M::response_time:(avg(response_time)) [1h] BY service SORDER BY avg(response_time) ASC SLIMIT 10

Execution Process Analysis:

Dimensionality Reduction Calculation:
- web-server-01: avg(usage) = 77.7
- web-server-02: avg(usage) = 53.7
- web-server-03: avg(usage) = 90.9
Group Sorting (by avg(usage) DESC):
- web-server-03: 90.9
- web-server-01: 77.7
- web-server-02: 53.7
Final Result (SLIMIT 2):

{
  "series": [
    {
      "columns": ["time", "avg(usage)"],
      "name": "cpu",
      "tags": {"host": "web-server-03"},  // Average usage: 90.9 - Rank 1
      "values": [
        [1721059200000, 92.1],
        [1721058600000, 95.7],
        [1721058000000, 89.4]
      ]
    },
    {
      "columns": ["time", "avg(usage)"],
      "name": "cpu",
      "tags": {"host": "web-server-01"},  // Average usage: 77.7 - Rank 2
      "values": [
        [1721059200000, 78.5],
        [1721058600000, 82.3],
        [1721058000000, 75.8]
      ]
    }
    // web-server-02 (avg: 53.7) is filtered out because SLIMIT 2
  ]
}

Application Examples¶

// Sort by maximum CPU usage, find hosts with abnormal peaks
M::cpu:(max(usage_total)) [1h::10m] BY host SORDER BY max(usage_total) DESC SLIMIT 10

// Sort by minimum memory usage, find hosts with lowest resource utilization
M::memory:(min(usage_percent)) [24h::1h] BY host SORDER BY min(usage_percent) ASC SLIMIT 5

// Sort by latest CPU usage, find hosts with highest current load
M::cpu:(usage) [1h::10m] BY host SORDER BY last(usage) DESC SLIMIT 10

// Sort by latest error rate, find services with most current issues
L::logs:(count(*) as error_count) {level = 'error'} [1h] BY service SORDER BY error_count DESC SLIMIT 5

Combined Use of Dual Sorting and Pagination¶

In practical applications, intra-group sorting and inter-group sorting are often used together to achieve complex data display requirements. This combination can simultaneously control the order of groups and the order of data within groups.

Execution Order¶

The execution order of dual sorting:

Inter-group Sorting: First sort and paginate all groups
Intra-group Sorting: Sort and paginate internally within the selected groups

Application Examples¶

Monitoring Dashboard Scenario¶

// Find the 10 servers with the highest CPU usage, show the latest 5 data points for each
M::cpu:(avg(usage)) [1h::10m] BY host
SORDER BY avg(usage) DESC SLIMIT 10    // Inter-group sorting: Find the 10 with highest usage
ORDER BY time DESC LIMIT 5             // Intra-group sorting: Show latest 5 points per server

Execution Result:

{
  "series": [
    {
      "columns": ["time", "avg(usage)"],
      "name": "cpu",
      "tags": {"host": "web-server-03"},  // Average usage: 90.9 - Rank 1
      "values": [
        [1721059200000, 92.1],  // 12:00:00 - Latest
        [1721058600000, 95.7],  // 11:50:00
        [1721058000000, 89.4],  // 11:40:00
        [1721057400000, 97.6],  // 11:30:00
        [1721056800000, 87.3]   // 11:20:00
        // Only the latest 5 data points are returned (LIMIT 5)
      ]
    },
    {
      "columns": ["time", "avg(usage)"],
      "name": "cpu",
      "tags": {"host": "web-server-01"},  // Average usage: 77.7 - Rank 2
      "values": [
        [1721059200000, 78.5],  // 12:00:00 - Latest
        [1721058600000, 82.3],  // 11:50:00
        [1721058000000, 75.8],  // 11:40:00
        [1721057400000, 88.2],  // 11:30:00
        [1721056800000, 72.1]   // 11:20:00
        // Only the latest 5 data points are returned (LIMIT 5)
      ]
    }
    // Other hosts are filtered out because SLIMIT 10 only returns the 10 with highest usage
  ]
}

By mastering DQL's sorting and pagination functions, you can build powerful monitoring dashboards, performance analysis tools, and business insight systems.

Note

Using the combination of intra-group sorting and inter-group sorting appropriately can greatly improve the efficiency and effectiveness of data analysis.