Skip to content

DQL


DQL (Debug Query Language) is the core query language of the Guance platform, designed for efficient querying and analysis of time series data, log data, event data, etc. DQL combines the semantic expression of SQL with the syntax structure of PromQL, aiming to provide users with a flexible and powerful query tool.

This document will help you quickly understand the basic syntax and design concepts of DQL, and demonstrate how to write DQL queries through examples.

Basic Query Structure

The basic query structure of DQL is as follows:

namespace[index]::datasource[:select-clause] [{where-clause}] [time-expr] [group-by-clause] [having-clause] [order-by-clause] [limit-clause] [sorder-by-clause] [slimit-clause] [soffset-clause]

Execution Order

The execution order of DQL queries is crucial, as it determines the semantics and performance of the query.

  1. Data Filtering: Filter data based on namespace::datasource, where-clause, time-expr

    • Identify the data source
    • Apply WHERE conditions to filter raw data rows
    • Apply time range filtering
    • Filter data early in this stage to improve subsequent processing efficiency
  2. Time Aggregation: If time-expr contains rollup, execute the rollup logic first

    • Rollup functions preprocess data on the time dimension
    • For Counter type metrics, rates or increments are usually calculated instead of using raw values
    • For Gauge type metrics, aggregation functions like last, avg, etc., may be used
  3. Group Aggregation: Execute group-by-clause grouping and aggregation functions within the group

    • Group data based on BY clause expressions
    • Calculate aggregation functions (sum, count, avg, max, min, etc.) within each group
    • If there is a time window, a two-dimensional data structure will be formed
  4. Group Filtering: Execute having-clause to filter aggregated groups

    • HAVING clause acts on aggregated results
    • Results of aggregation functions can be used for filtering
    • This is a key difference from the WHERE clause
  5. Non-Aggregation Functions: Execute non-aggregation functions in select-clause

    • Process expressions and functions that do not require aggregation
    • Perform further calculations on aggregated results
  6. Intra-Group Sorting: Execute order-by-clause, limit-clause to sort and paginate data within groups

    • ORDER BY is executed independently within each group
    • LIMIT limits the number of data rows returned per group
  7. Inter-Group Sorting: Execute sorder-by-clause, slimit-clause, soffset-clause to sort and paginate groups

    • SORDER BY sorts the groups themselves
    • Requires dimensionality reduction (e.g., using max, avg, last, etc., functions)
    • SLIMIT limits the number of groups returned

Complete Example

Let's understand the structure of DQL through a complete example:

M("production")::cpu:(avg(usage) as avg_usage, max(usage) as max_usage) {host =~ 'web-.*', usage > 50} [1h::5m] BY host, env HAVING avg_usage > 60 ORDER BY time DESC LIMIT 100 SORDER BY avg_usage DESC SLIMIT 10

The meaning of this query is:

  • Namespace: M (metric data)
  • Index: production
  • Data Source: cpu
  • Selected Fields: Average and maximum values of the usage field
  • Time Range: Last 1 hour, aggregated by 5 minutes
  • Filter Conditions: Hostnames starting with web and CPU usage greater than 50%
  • Grouping: Grouped by hostname and environment
  • Group Filtering: Average CPU usage greater than 60%
  • Sorting: Sorted by time in descending order, up to 100 rows per group
  • Inter-Group Sorting: Sorted by average usage in descending order, up to 10 groups

Namespace (namespace)

Namespaces are used to distinguish different types of data, with each data type having its specific query methods and storage strategies. DQL supports querying multiple business data types:

Namespace Description Typical Uses
M Metric, time series metrics CPU usage, memory usage, request count, etc.
L Logging, log data Application logs, system logs, error logs, etc.
O Object, infrastructure object data Server information, container information, network devices, etc.
OH History object, object history data Server configuration change history, performance metric history, etc.
CO Custom object, custom object data Business-defined object information
COH History custom object, custom object history data Historical change records of custom objects
N Network, network data Network traffic, DNS queries, HTTP requests, etc.
T Trace, trace data Distributed tracing, call chain analysis, etc.
P Profile, profiling data Performance profiling, CPU flame graphs, etc.
R RUM, user access data Frontend performance, user behavior analysis, etc.
E Event, event data Alert events, deployment events, system events, etc.
UE Unrecover Event, unrecovered event data Unresolved alerts and events

Index (index)

Indexes are an important mechanism for DQL query optimization, similar to tables or partitions in traditional databases. Within a single namespace, the system may split data based on data sources, data volume, access patterns, etc., to improve query performance and management efficiency.

Role of Indexes

  • Performance Optimization: Reduce the amount of data scanned per query by distributing data storage
  • Data Isolation: Data from different businesses, environments, or time ranges can be stored in different indexes
  • Permission Management: Different access permissions can be set for different indexes
  • Lifecycle Management: Different retention policies can be set for different indexes

Index Naming Rules

  • Index names must be explicitly declared; wildcards or regular expressions are not supported for matching
  • Index names typically reflect the business attributes of the data, e.g., production, staging, web-logs, api-logs
  • The default index name is default, used when no index is explicitly specified

Basic Syntax

// Use the default index (automatically uses default when not specified)
M::cpu           // Equivalent to M("default")::cpu
L::nginx         // Equivalent to L("default")::nginx

// Specify a single index
M("production")::cpu     // Query CPU metrics in the production index
L("web-logs")::nginx     // Query Nginx logs in the web-logs index

// Multi-index query (query data from multiple indexes simultaneously)
M("production", "staging")::cpu     // Query CPU metrics in production and staging environments
L("web-logs", "api-logs")::nginx     // Query Web and API logs

Index and Performance

Proper use of indexes can significantly improve query performance:

  • Precise Index: Directly specify the index when the data location is known
  • Multi-index Query: Use multi-index syntax when querying across multiple indexes instead of using wildcards
  • Avoid Full Index Scans: Reduce data scan range through index and WHERE condition combinations

For historical reasons, DQL also supports specifying indexes in the where clause, but this is not recommended:

// Old syntax, not recommended
L::nginx { index = "web-logs" }
L::nginx { index IN ["web-logs", "api-logs"] }

Application Examples

// Query CPU usage in the production environment
M("production")::cpu:(avg(usage)) [1h] BY host

// Compare production and staging environments
M("production", "staging")::cpu:(avg(usage)) [1h] BY index, host

// Analyze Web server logs
L("web-logs")::nginx:(count(*)) {status >= 400} [1h] BY status

// Compare error rates between Web and API servers
L("web-logs", "api-logs")::*:(count(*)) {status >= 500} [1h] BY index

Data Source (datasource)

The datasource specifies the specific data source for the query, which can be a dataset name, wildcard pattern, regular expression, or subquery.

Basic Data Sources

The definition of data sources varies across namespaces:

Namespace Data Source Type Examples
M Measurement cpu, memory, network
L Source nginx, tomcat, java-app
O Infrastructure object class host, container, process
T Service name user-service, order-service
R RUM data type session, view, resource, error

Data Source Syntax

Specifying Data Source Name

M::cpu:(usage)                    // Query CPU metrics
L::nginx:(count(*))                // Query Nginx logs
T::user-service:(traces)           // Query user service traces

Wildcard Matching

M::*:(usage)                      // Query all metrics

Regular Expression Matching

M::re('cpu.*'):(usage)            // Query metrics starting with cpu
L::re('web.*'):(count(*))          // Query logs starting with web
T::re('.*-service'):(traces)      // Query services ending with -service

Subquery Data Source

Subqueries are an important feature in DQL for implementing complex analysis, allowing the results of one query to be used as the data source for another. This nested query mechanism supports multi-level analysis requirements.

Execution Mechanism

Subqueries follow these principles:

  1. Serial Execution: Inner subqueries execute first, and their results are used as the data source for outer queries
  2. Result Encapsulation: Subquery results are encapsulated into temporary table structures for outer queries
  3. Namespace Mixing: Subqueries support mixing different namespaces, enabling cross-data type analysis
  4. Performance Consideration: Subqueries increase computational complexity, requiring logical query design

Basic Syntax

namespace::(subquery):(projections)

Execution Process

Take a typical subquery as an example:

L::(L::*:(count(*)) {level = 'error'} BY app_id):(count_distinct(app_id))

Execution process:

  1. Inner Subquery: L::*:(count(*)) {level = 'error'} BY app_id

    • Scan all log data
    • Filter out logs with error level
    • Group by app_id and count errors
    • Generate temporary table: app_id | count(*)
  2. Outer Query: L::(...):(count_distinct(app_id))

    • Use subquery results as the data source
    • Count the number of distinct app_ids
    • Final result: Number of applications with errors

Application Examples

// Count the number of applications with errors
L::(L::*:(count(*)) {level = 'error'} BY app_id):(count_distinct(app_id))

// Analyze servers with high CPU usage
M::(M::cpu:(avg(usage)) [1h] BY host {avg(usage) > 80}):(count(host))

// First find service endpoints with error rates exceeding 1%, then count the number of affected services
M::(M::http_requests:(sum(request_count), sum(error_count)) [1h] BY service, endpoint
   {sum(error_count) / sum(request_count) > 0.01}
):(count(service))

Select Clause (select-clause)

The select clause specifies the fields or expressions to be returned by the query, and is one of the most basic and important parts of a DQL query.

Field Selection

Basic Syntax

// Select a single field
M::cpu:(usage)

// Select multiple fields
M::cpu:(usage, system, user)

// Select all fields
M::cpu:(*)

Field Name Rules

Field names can be written in the following forms:

  1. Direct Name: Suitable for regular identifiers

    • message
    • host_name
    • response_time
  2. Backtick Wrapped: Suitable for field names containing special characters or keywords

    • message
    • limit
    • host-name
    • column with spaces
  3. Avoided Syntax: Single and double quotes wrap strings, not field names

    • 'message'
    • "message"

JSON Field Extraction

When data fields contain JSON content, a subset of JSON Path syntax can be used to extract internal field data.

Basic Syntax

field-name@json-path
@json-path                    // Defaults to the message field

JSON Path Syntax

  • Dot notation for object properties: .field_name
  • Square brackets for object properties: ["key"] (suitable for keys containing spaces or special characters)
  • Array index for values: [index]

Application Examples

Assume the following JSON log data:

{
  "message": "User login attempt",
  "request": {
    "method": "POST",
    "path": "/api/login",
    "headers": {
      "user-agent": "Mozilla/5.0",
      "content-type": "application/json"
    },
    "body": {
      "username": "john.doe",
      "password": "***",
      "permissions": ["read", "write", "admin"]
    }
  },
  "response": {
    "status": 200,
    "time": 156,
    "data": [
      {"id": 1, "name": "user1"},
      {"id": 2, "name": "user2"}
    ]
  }
}
// Extract request method
L::auth_logs:(message@request.method)

// Extract request path
L::auth_logs:(message@request.path)

// Extract username
L::auth_logs:(message@request.body.username)

// Extract response status
L::auth_logs:(message@response.status)

// Extract User-Agent (contains hyphen, requires square brackets)
L::auth_logs:(message@request.headers["user-agent"])

// Extract first element of permissions array
L::auth_logs:(message@request.body.permissions[0])

// Extract name of first object in response data
L::auth_logs:(message@response.data[0].name)

// Count requests by method
L::auth_logs:(count(*)) [1h] BY message@request.method

// Analyze response time distribution
L::auth_logs:(avg(message@response.time), max(message@response.time)) [1h] BY message@request.method

// Extract multiple fields
L::auth_logs:(
    message,
    message@request.method as method,
    message@request.path as path,
    message@response.status as status,
    message@response.time as response_time
) {message@response.status >= 400} [1h]

Calculated Fields

Expression Calculation

Supports basic arithmetic operations:

// Unit conversion (milliseconds to seconds)
L::nginx:(response_time / 1000) as response_time_seconds

// Calculate percentage
M::memory:(used / total * 100) as usage_percentage

// Compound calculation
M::network:((bytes_in + bytes_out) / 1024 / 1024) as total_traffic_mb

Function Calculation

Supports various aggregation and transformation functions:

// Aggregation functions
M::cpu:(max(usage), min(usage), avg(usage)) [1h] BY host

// Transformation functions
L::logs:(int(response_time) as response_time_seconds)
L::logs:(floor(response_time) as response_time_seconds)

Aliases

Assign aliases to fields or expressions to make results more readable and convenient for subsequent references.

Basic Syntax

expression as alias_name

Application Examples

// Simple alias
M::cpu:(avg(usage) as avg_usage, max(usage) as max_usage) [1h] BY host

// Expression alias
M::memory:((used / total) * 100 as usage_percent) [1h] BY host

// Function alias
L::logs:(count(*) as error_count) {level = 'error'} [1h] BY service

// JSON extraction alias
L::api_logs:(
    message@request.method as http_method,
    message@response.status as http_status,
    message@response.time as response_time_ms
) [1h]

Usage Tips

In DQL, the results of aggregation functions can be directly referenced using the original field name, reducing the need for aliases:

M::cpu:(max(usage)) [1h] BY host

// The result will contain the max(usage) column, which can be directly referenced in subsequent queries:
M::(M::cpu:(max(usage)) [1h] BY host):(max(usage)) { usage > 80 }

However, when multiple aggregation functions use the same field, aliases must be used to correctly distinguish them in subsequent queries:

// Scenario where aliases are required
M::cpu:(max(usage) as max_usage, min(usage) as min_usage) [1h] BY host

Time Clause (time-clause)

The time clause is one of the core features of DQL, used to specify the query time range, aggregation time window, and Rollup aggregation function.

Basic Syntax

[start_time:end_time:interval:rollup]

Time Range

Absolute Timestamp

[1672502400000:1672588800000]     // Millisecond timestamp
[1672502400:1672588800]           // Second timestamp

Relative Time

Supports multiple duration units, which can be mixed:

[1h]                             // Last 1 hour to now
[1h:5m]                          // From last 1 hour to last 5 minutes
[1h30m]                          // Last 1 hour 30 minutes
[2h15m30s]                       // Last 2 hours 15 minutes 30 seconds

Duration Expression Explanation

Unit Description Example
s Seconds 30s
m Minutes 5m
h Hours 2h
d Days 7d
w Weeks 4w
y Years 1y

When used in the time clause, duration expressions represent offsets from the current time. When used in the select clause or where clause, they are treated as integer milliseconds in calculations.

When used in aggregation queries, two additional duration units are supported:

Unit Description Example
i, is Multiples of the aggregation time window, returns float seconds 1i, 1is
ims Multiples of the aggregation time window, returns integer milliseconds 1ims
O::HOST:(count(*)){ `last_update_time` > (now()-10m) } // 10m is treated as 600,000 integer in calculations
L::*:( count(*) / 1i ) [::1m] // Divide by the size of the time window (1m) in seconds to calculate log write QPS

Preset Time Ranges

Provides commonly used time range keywords:

Keyword Description Time Range
TODAY Today From today 00:00 to now
YESTERDAY Yesterday From yesterday 00:00 to today 00:00
THIS WEEK This week From this Monday 00:00 to now
LAST WEEK Last week From last Monday 00:00 to this Monday 00:00
THIS MONTH This month From this month 1st 00:00 to now
LAST MONTH Last month From last month 1st 00:00 to this month 1st 00:00
[TODAY]                          // Today's data
[YESTERDAY]                      // Yesterday's data
[THIS WEEK]                      // This week's data
[LAST WEEK]                      // Last week's data
[THIS MONTH]                     // This month's data
[LAST MONTH]                     // Last month's data

When using time range keywords, ensure the workspace timezone setting is correct, and strictly follow the user's requested timezone for conversion.

Time Window Aggregation

Time windows group data by the specified time interval for aggregation, with the time column in the results representing the start time of each time window.

Single Time Window

Aggregate the entire time range into a single value:

M::cpu:(max(usage_total)) [1h]

Query Result:

{
  "columns": ["time", "max(usage_total)"],
  "values": [
    [1721059200000, 37.46]
  ]
}

Time Window Aggregation

Group by time interval for aggregation:

M::cpu:(max(usage_total)) [1h::10m]

Query Result:

{
  "columns": ["time", "max(usage_total)"],
  "values": [
    [1721059200000, 37.46],
    [1721058600000, 34.12],
    [1721058000000, 33.81],
    [1721057400000, 30.92],
    [1721058000000, 34.53],
    [1721057400000, 36.11]
  ]
}

Rollup Functions

Rollup functions are an important preprocessing step in DQL, executed before group aggregation, and used to preprocess raw time series data.

Execution Sequence

Rollup's position in the query execution flow:

Raw data → WHERE filtering → **Rollup preprocessing** → Group aggregation → HAVING filtering → Final result

Execution Mechanism

Rollup execution is divided into two stages:

  1. Per Time Series Processing: Apply Rollup functions to each independent time series
  2. Aggregation Calculation: Execute group aggregation on the Rollup-processed results

Application Scenarios

A typical application scenario for Rollup functions is Counter metric processing.

For Prometheus Counter type metrics, directly using raw values for aggregation is meaningless because Counters are monotonically increasing. It is necessary to first calculate the rate of increase for each time series, and then perform aggregation.

Problem Example: Assume two servers' request counters:

{
  "host": "web-server-01",
  "data": [
    {"time": "2024-07-15 08:25:00", "request_count": 150},
    {"time": "2024-07-15 08:20:00", "request_count": 140},
    {"time": "2024-07-15 08:15:00", "request_count": 130},
    {"time": "2024-07-15 08:10:00", "request_count": 120},
    {"time": "2024-07-15 08:05:00", "request_count": 110},
    {"time": "2024-07-15 08:00:00", "request_count": 100}
  ]
}
{
  "host": "web-server-02",
  "data": [
    {"time": "2024-07-15 08:25:00", "request_count": 250},
    {"time": "2024-07-15 08:20:00", "request_count": 240},
    {"time": "2024-07-15 08:15:00", "request_count": 230},
    {"time": "2024-07-15 08:10:00", "request_count": 220},
    {"time": "2024-07-15 08:05:00", "request_count": 210},
    {"time": "2024-07-15 08:00:00", "request_count": 200}
  ]
}

Problem with Direct Aggregation:

  • web-server-01's request_count starts at 100
  • web-server-02's request_count starts at 200
  • Although both servers have the same request rate (10 requests per 5 minutes), the absolute values are different

Solution Using Rollup:

M::http:(sum(request_count)) [rate]

Execution Process:

  1. Rollup Stage (executed on each time series separately):

    • web-server-01: rate([100, 110, 120, 130, 140, 150]) = 2 requests/minute
    • web-server-02: rate([200, 210, 220, 230, 240, 250]) = 2 requests/minute
  2. Aggregation Stage:

    • sum([2, 2]) = 4 requests/minute

Function Types

Common Rollup functions include:

Function Type Description Application Scenarios
rate() Calculate rate Counter type metrics
increase() Calculate increment Counter type metrics
last() Take last value Gauge type metrics
avg() Calculate average Data smoothing
max() Take maximum Peak analysis
min() Take minimum Trough analysis

However, almost all aggregation functions that return single values can be used, so the full list of functions is not listed here.

Default Rollup

If no Rollup function is explicitly specified, DQL defaults to no Rollup calculation, while PromQL defaults to last, so if you are calculating Prometheus metrics, be sure to understand this difference and manually specify the Rollup function.

Application Examples

// Calculate total request rate for all servers
M::http_requests:(sum(request_count)) [rate]

// Calculate error rate
M::http_requests:(
    sum(error_count) as errors,
    sum(request_count) as requests
) [rate] BY service

Flexible Time Window Syntax

DQL supports various shorthand formats for time windows, making query writing more convenient.

Shorthand Formats

[1h]                             // Only specify time range
[1h::5m]                         // Time range + aggregation step
[1h:5m]                          // Start time + end time
[1h:5m:1m]                       // Start + end + step
[1h:5m:1m:avg]                   // Full format
[::5m]                           // Only specify aggregation step
[:::sum]                         // Only specify rollup function
[sum]                            // Only specify rollup function (simplest form)

Filter Conditions (where-clause)

Filter conditions are used to filter data rows, retaining only data that meets the conditions for subsequent processing.

Basic Syntax

{condition1, condition2, condition3}

Multiple conditions can be connected with commas, AND, OR, &&, ||.

Comparison Operators

Operator Description Example
= Equal host = 'web-01'
!= Not equal status != 200
> Greater than cpu_usage > 80
>= Greater than or equal memory_usage >= 90
< Less than response_time < 1000
<= Less than or equal disk_usage <= 80

Pattern Matching Operators

Operator Description Example
=~ Regular match message =~ 'error.*\\d+'
!~ Regular mismatch message !~ 'debug.*'

Set Operators

Operator Description Example
IN In set status IN [200, 201, 202]
NOT IN Not in set level NOT IN ['debug', 'info']

Logical Operators

Operator Description Example
AND or && Logical AND cpu > 80 AND memory > 90
OR or | | Logical OR status = 500 OR status = 502
NOT Logical NOT NOT status = 200

Tip: Besides logical OR, the OR operator also offers null-coalescing semantics—when the left expression evaluates to NULL, it simply returns the right expression. Use this to implement fallback logic, for example status OR backup_status.

Application Examples

Basic Filtering

// Single condition
M::cpu:(usage) {host = 'web-01'} [1h]

// Multiple AND conditions
M::cpu:(usage) {host = 'web-01', usage > 80} [1h]

// Using AND keyword
M::cpu:(usage) {host = 'web-01' AND usage > 80} [1h]

// Mixed logical operators
M::cpu:(usage) {(host = 'web-01' OR host = 'web-02') AND usage > 80} [1h]

Regular Expression Matching

// Match error logs
L::logs:(message) {message =~ 'ERROR.*\\d{4}'} [1h]

// Match specific log formats
L::logs:(message) {message =~ '\\[(ERROR|WARN)\\].*'} [1h]

// Exclude debug information
L::logs:(message) {message !~ 'DEBUG.*'} [1h]

// Hostname pattern matching
M::cpu:(usage) {host =~ 'web-.*\\.prod\\.com'} [1h]

Set Operations

// Status code filtering
L::nginx:(count(*)) {status IN [200, 201, 202, 204]} [1h]

// Exclude specific status codes
L::nginx:(count(*)) {status NOT IN [404, 500, 502]} [1h]

// Log level filtering
L::app_logs:(count(*)) {level IN ['ERROR', 'WARN', 'CRITICAL']} [1h]

Array Field Handling

When fields are of array type, DQL supports multiple array matching operations.

Assume a field tags = ['web', 'prod', 'api']:

// Check if array contains a value
{tags IN ['web']}           // true, because tags contains 'web'
{tags IN ['mobile']}        // false, because tags does not contain 'mobile'

// Check if array does not contain a value
{tags NOT IN ['mobile']}    // true, because tags does not contain 'mobile'
{tags NOT IN ['web']}       // false, because tags contains 'web'

// Check if array contains all specified values
{tags IN ['web', 'api']}           // true, contains 'web' and 'api'
{tags IN ['web', 'api', 'mobile']} // false, does not contain 'mobile'

The following syntax is retained for historical compatibility and is not recommended for new queries. These operators have different semantics on array fields compared to regular fields, which can cause confusion.

// Historical syntax: Single value containment check (overloaded semantics of equality operator)
{tags = 'web'}           // true, because tags contains 'web'
{tags = 'mobile'}        // false, because tags does not contain 'mobile'

// Historical syntax: Single value non-containment check (overloaded semantics of inequality operator)
{tags != 'mobile'}       // true, because tags does not contain 'mobile'
{tags != 'web'}          // false, because tags contains 'web'

Function Filtering

Any function that returns a boolean value can be used as a filter condition.

// String matching
L::logs:(message) { match(message, 'error') }
L::logs:(message) { wildcard(message, 'error*') }

// Query requests with abnormal response times
L::access_logs:(*) {
    response_time > 1000 AND
    match(message, 'timeout')
}

// Query hosts with abnormal memory usage
M::memory:(usage) {
    (usage > 90 OR usage < 10) AND
    host =~ 'prod-.*' AND
    tags IN ['critical', 'important']
}

WHERE Subquery

WHERE subqueries are a powerful feature in DQL for implementing dynamic filtering, allowing the results of one query to be used as the filter condition for another. This mechanism supports dynamic filtering based on data analysis results.

Query Characteristics

  • Dynamic Filtering: Filter conditions are not fixed values but are dynamically calculated through queries
  • Namespace Mixing: Supports cross-namespace queries, enabling correlation analysis across different data types
  • Serial Execution: Subqueries execute first, and their results are used for the main query's filtering
  • Array Results: Subquery results are encapsulated as arrays, so only IN and NOT IN operators are supported

Execution Process

Take a typical WHERE subquery as an example:

M::cpu:(avg(usage)) [1h] BY host
{host IN (O::HOST:(hostname) {provider = 'cloud-a'})}

Execution process:

  1. Subquery Execution: O::HOST:(hostname) {provider = 'cloud-a'}

    • Query all infrastructure objects
    • Filter out hosts with provider 'cloud-a'
    • Return hostname list: ['host-01', 'host-02', 'host-03']
  2. Main Query Execution: M::cpu:(avg(usage)) [1h] BY host {host IN [...]}

    • Query CPU usage data
    • Only count hosts returned by the subquery
    • Calculate average usage by host

Application Examples

// Monitor servers from a specific cloud provider
M::cpu:(avg(usage)) [1h] BY host
{host IN (O::HOST:(hostname) {provider = 'cloud-a'})}

// Compare performance across different cloud providers
M::memory:(avg(used / total * 100)) [1h] BY host
{host IN (O::HOST:(hostname) {provider IN ['cloud-a', 'cloud-b']})}

// Analyze logs for a specific business
L::app_logs:(count(*)) [1h] BY level
{service IN (T::services:(service_name) {business_unit = 'ecommerce'})}

// Monitor application performance for critical businesses
M::response_time:(avg(response_time)) [1h] BY service
{service IN (T::services:(service_name) {criticality = 'high'})}

Grouping (group-by-clause)

Grouping is a core feature of data analysis, used to group data by specified dimensions for aggregation.

Basic Syntax

BY expression1, expression2, ...

Grouping Types

Field Grouping

// Single field grouping
M::cpu:(avg(usage)) [1h] BY host

// Multiple field grouping
M::cpu:(avg(usage)) [1h] BY host, env

// Nested grouping
M::cpu:(avg(usage)) [1h] BY datacenter, rack, host

Expression Grouping

// Compound mathematical expression
M::memory:(avg(used)) [1h] BY ((used / total) * 100) as
usage_percent

// Multiple field mathematical operations
M::performance:(avg(response_time)) [1h] BY
(response_time / 1000) as response_seconds

Function Grouping

// Drain clustering algorithm
L::logs:(count(*)) BY drain(message, 0.7) as sample

// Regular expression extraction grouping
L::logs:(count(*)) [1h] BY regexp_extract(message, 'error_code: (\\d+)', 1)

Grouping Result Processing

When a query includes both grouping and time windows, a two-dimensional data structure is produced. Group By can be mixed with time windows, resulting in a two-dimensional array where the first dimension is the groups themselves distinguished by the grouping key, and the second dimension is the multi-time interval data within a single group.

Two-dimensional data structure example:

Query: M::cpu:(max(usage_total)) [1h::10m] by host

Query Result Structure:

{
  "series": [
    {
      "columns": ["time", "max(usage_total)"],
      "name": "cpu",
      "tags": {"host": "web-server-01"},
      "values": [
        [1721059200000, 78.5],
        [1721058600000, 82.3],
        [1721058000000, 75.8],
        [1721057400000, 88.2]
      ]
    },
    {
      "columns": ["time", "max(usage_total)"],
      "name": "cpu",
      "tags": {"host": "web-server-02"},
      "values": [
        [1721059200000, 45.2],
        [1721058600000, 52.8],
        [1721058000000, 48.5],
        [1721057400000, 61.3]
      ]
    },
    {
      "columns": ["time", "max(usage_total)"],
      "name": "cpu",
      "tags": {"host": "web-server-03"},
      "values": [
        [1721059200000, 92.1],
        [1721058600000, 95.7],
        [1721058000000, 89.4],
        [1721057400000, 97.6]
      ]
    }
  ]
}

Further processing of this two-dimensional array:

  1. To filter the results of this two-dimensional array, use the Having clause
  2. To sort or paginate data within a single group of the two-dimensional array, use order by, limit, offset statements
  3. To sort or paginate the groups of the two-dimensional array, use sorder by, slimit, soffset statements

For a detailed explanation of the two-dimensional data structure and sorting/pagination features, refer to Sorting and Pagination.

Having Clause (having-clause)

The Having clause is used to filter the results after group aggregation, similar to the WHERE clause, but acting on aggregated data.

Basic Syntax

HAVING condition

Difference from WHERE

WHERE and HAVING are both clauses used to filter data, but they operate at different stages of query execution, and the filter conditions they handle are also different.

Difference in Execution Sequence

Raw data → WHERE filtering → Group aggregation → HAVING filtering → Final result
  1. WHERE Clause:

    • Executes before group aggregation
    • Acts on raw data rows
    • Filters data rows that do not meet the conditions, reducing the amount of data for subsequent processing
  2. HAVING Clause:

    • Executes after group aggregation
    • Acts on aggregated results
    • Filters based on the results of aggregation functions

Application Scenarios

The HAVING clause is suitable for filtering aggregated results:

// Filter based on aggregation function values
M::cpu:(avg(usage) as avg_usage) [1h] BY host HAVING avg_usage > 80

// Filter based on multiple aggregation conditions
M::http_requests:(
    sum(request_count) as total,
    sum(error_count) as errors
) [1h] BY service, endpoint
HAVING errors / total > 0.01 AND total > 1000

// Filter based on group statistics
L::logs:(count(*) as count) [1h] BY service HAVING count > 100

// Filter based on compound aggregation conditions
M::response_time:(
    avg(response_time) as avg_time,
    max(response_time) as max_time,
    min(response_time) as min_time
) [1h] BY endpoint
HAVING avg_time > 1000 AND max_time > 5000 AND (max_time - min_time) > 2000

Sorting and Pagination

Sorting and pagination in DQL are very important and unique features, designed with a dual-sorting mechanism for time series data characteristics: intra-group sorting and inter-group sorting. This design enables DQL to efficiently handle complex multi-dimensional time series data analysis requirements.

Understanding DQL's Data Structure

Before diving into sorting and pagination, first understand the two-dimensional data structure of DQL query results. When a query includes both grouping (BY) and time windows, a two-dimensional array is produced:

  • First Dimension (Grouping Dimension): Multiple groups distinguished by the grouping key
  • Second Dimension (Time Dimension): Data aggregated by time window within each group

For a detailed explanation of the two-dimensional data structure and JSON format examples, refer to Grouping Result Processing. This two-dimensional structure is the foundation of DQL's sorting and pagination features, and understanding this structure is crucial for mastering DQL's sorting mechanism.

Intra-Group Sorting and Pagination (ORDER BY, LIMIT, OFFSET)

Intra-group sorting and pagination act on the second dimension of the two-dimensional structure, i.e., the data within each group. This sorting is executed independently within each group and does not affect data in other groups.

Basic Syntax

ORDER BY expression [ASC|DESC]
LIMIT row_count
OFFSET row_offset

Execution Mechanism

Intra-group sorting execution process:

  1. Group Processing: Sort operation is executed independently for each group
  2. Sorting Basis: Can use time fields, aggregation function results, or calculation expressions
  3. Pagination Limit: LIMIT limits the number of data rows returned per group
  4. Offset Handling: OFFSET skips the first N rows of data in each group

Application Examples

Basic Time Sorting
// Sort by time in descending order, showing the latest data for each host
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time DESC

// Sort by time in ascending order, showing historical trends
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time ASC

Execution Result (ORDER BY time DESC):

{
  "series": [
    {
      "columns": ["time", "max(usage_total)"],
      "name": "cpu",
      "tags": {"host": "web-server-01"},
      "values": [
        [1721059200000, 78.5],  // 12:00:00
        [1721058600000, 82.3],  // 11:50:00
        [1721058000000, 75.8],  // 11:40:00
        [1721057400000, 88.2],  // 11:30:00
        [1721056800000, 72.1],  // 11:20:00
        [1721056200000, 69.4]   // 11:10:00
      ]
    },
    {
      "columns": ["time", "max(usage_total)"],
      "name": "cpu",
      "tags": {"host": "web-server-02"},
      "values": [
        [1721059200000, 45.2],  // 12:00:00
        [1721058600000, 52.8],  // 11:50:00
        [1721058000000, 48.5],  // 11:40:00
        [1721057400000, 61.3],  // 11:30:00
        [1721056800000, 55.7],  // 11:20:00
        [1721056200000, 58.9]   // 11:10:00
      ]
    }
  ]
}
Value-Based Sorting
// Sort by CPU usage in descending order, find peak periods for each host
M::cpu:(max(usage_total) as max_usage_total) [1h::10m] BY host ORDER BY max_usage_total DESC

// Sort by response time in ascending order, find the best performing periods
M::response_time:(avg(response_time) as avg_response_time) [1h::5m] BY endpoint ORDER BY avg_response_time ASC
Intra-Group Pagination
// Show only the latest 3 data points for each host
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time DESC LIMIT 3

// Skip the latest 2 data points, show the next 3
M::cpu:(max(usage_total)) [1h::10m] BY host ORDER BY time DESC LIMIT 3 OFFSET 2

Execution Result (LIMIT 3):

{
  "series": [
    {
      "columns": ["time", "max(usage_total)"],
      "name": "cpu",
      "tags": {"host": "web-server-01"},
      "values": [
        [1721059200000, 78.5],  // 12:00:00 - Latest
        [1721058600000, 82.3],  // 11:50:00
        [1721058000000, 75.8]   // 11:40:00
        // Only return the latest 3 data points (LIMIT 3)
      ]
    },
    {
      "columns": ["time", "max(usage_total)"],
      "name": "cpu",
      "tags": {"host": "web-server-02"},
      "values": [
        [1721059200000, 45.2],  // 12:00:00 - Latest
        [1721058600000, 52.8],  // 11:50:00
        [1721058000000, 48.5]   // 11:40:00
        // Only return the latest 3 data points (LIMIT 3)
      ]
    }
  ]
}

Inter-Group Sorting and Pagination (SORDER BY, SLIMIT, SOFFSET)

Inter-group sorting and pagination are unique features of DQL, acting on the first dimension of the two-dimensional structure, i.e., sorting the groups themselves. This sorting requires reducing each group's data to a single value, then comparing this value across different groups.

Basic Syntax

SORDER BY aggregate_function(expression) [ASC|DESC]
SLIMIT group_count
SOFFSET group_offset

Execution Mechanism

Inter-group sorting execution process:

  1. Dimensionality Reduction: Apply aggregation functions to each group to calculate a representative value
  2. Group Sorting: Sort all groups based on the dimensionality-reduced value
  3. Group Pagination: SLIMIT limits the number of groups returned, SOFFSET skips the first N groups

Dimensionality Reduction Function Selection

Inter-group sorting must use aggregation functions for dimensionality reduction. When no aggregation function is specified, the default aggregation function is last.

Common dimensionality reduction functions include:

Function Description Application Scenarios
last() Take last value Suitable for current state of time series
max() Take maximum Suitable for peak analysis
min() Take minimum Suitable for trough analysis
avg() Take average Suitable for overall trend analysis
sum() Sum Suitable for total statistics
count() Count Suitable for frequency analysis

However, almost all aggregation functions that return single values can be used, so the full list of functions is not listed here.

Application Examples

Average-Based Sorting
// Sort by average CPU usage in descending order, find the most loaded hosts
M::cpu:(avg(usage)) [1h::10m] BY host SORDER BY avg(usage) DESC SLIMIT 5

// Sort by average response time in ascending order, find the best performing services
M::response_time:(avg(response_time)) [1h] BY service SORDER BY avg(response_time) ASC SLIMIT 10

Execution Process Analysis:

  1. Dimensionality Reduction:

    • web-server-01: avg(usage) = 77.7
    • web-server-02: avg(usage) = 53.7
    • web-server-03: avg(usage) = 90.9
  2. Group Sorting (by avg(usage) DESC):

    • web-server-03: 90.9
    • web-server-01: 77.7
    • web-server-02: 53.7
  3. Final Result (SLIMIT 2):

{
  "series": [
    {
      "columns": ["time", "avg(usage)"],
      "name": "cpu",
      "tags": {"host": "web-server-03"},  // Average usage: 90.9 - Rank 1
      "values": [
        [1721059200000, 92.1],
        [1721058600000, 95.7],
        [1721058000000, 89.4]
      ]
    },
    {
      "columns": ["time", "avg(usage)"],
      "name": "cpu",
      "tags": {"host": "web-server-01"},  // Average usage: 77.7 - Rank 2
      "values": [
        [1721059200000, 78.5],
        [1721058600000, 82.3],
        [1721058000000, 75.8]
      ]
    }
    // web-server-02 (avg: 53.7) is filtered out due to SLIMIT 2
  ]
}
Application Examples
// Sort by maximum CPU usage, find hosts with abnormal peaks
M::cpu:(max(usage_total)) [1h::10m] BY host SORDER BY max(usage_total) DESC SLIMIT 10

// Sort by minimum memory usage, find hosts with the lowest resource utilization
M::memory:(min(usage_percent)) [24h::1h] BY host SORDER BY min(usage_percent) ASC SLIMIT 5

// Sort by latest CPU usage, find the most loaded hosts
M::cpu:(usage) [1h::10m] BY host SORDER BY last(usage) DESC SLIMIT 10

// Sort by latest error rate, find services with the most issues
L::logs:(count(*) as error_count) {level = 'error'} [1h] BY service SORDER BY error_count DESC SLIMIT 5

Combined Use of Dual Sorting and Pagination

In practical applications, intra-group sorting and inter-group sorting are often used together to achieve complex data display requirements. This combination can simultaneously control the order of groups and the order of data within groups.

Execution Order

Dual sorting execution order:

  1. Inter-Group Sorting: First sort and paginate all groups
  2. Intra-Group Sorting: Sort and paginate within the selected groups

Application Examples

Dashboard Scenario
// Find the 10 servers with the highest CPU usage, show the latest 5 data points for each
M::cpu:(avg(usage)) [1h::10m] BY host
SORDER BY avg(usage) DESC SLIMIT 10    // Inter-group sorting: Find the 10 servers with the highest usage
ORDER BY time DESC LIMIT 5             // Intra-group sorting: Show the latest 5 points for each

Execution Result:

{
  "series": [
    {
      "columns": ["time", "avg(usage)"],
      "name": "cpu",
      "tags": {"host": "web-server-03"},  // Average usage: 90.9 - Rank 1
      "values": [
        [1721059200000, 92.1],  // 12:00:00 - Latest
        [1721058600000, 95.7],  // 11:50:00
        [1721058000000, 89.4],  // 11:40:00
        [1721057400000, 97.6],  // 11:30:00
        [1721056800000, 87.3]   // 11:20:00
        // Only return the latest 5 data points (LIMIT 5)
      ]
    },
    {
      "columns": ["time", "avg(usage)"],
      "name": "cpu",
      "tags": {"host": "web-server-01"},  // Average usage: 77.7 - Rank 2
      "values": [
        [1721059200000, 78.5],  // 12:00:00 - Latest
        [1721058600000, 82.3],  // 11:50:00
        [1721058000000, 75.8],  // 11:40:00
        [1721057400000, 88.2],  // 11:30:00
        [1721056800000, 72.1]   // 11:20:00
        // Only return the latest 5 data points (LIMIT 5)
      ]
    }
    // Other hosts are filtered out due to SLIMIT 10, only returning the 10 servers with the highest usage
  ]
}

By mastering DQL's sorting and pagination features, you can build powerful monitoring dashboards, performance analysis tools, and business insight systems.

Note

Proper use of intra-group and inter-group sorting combinations can greatly improve the efficiency and effectiveness of data analysis.## Conclusion

DQL (Debug Query Language) is a robust and versatile query language designed to handle complex time series, log, and event data analysis. By combining the semantic expressiveness of SQL with the syntactic structure of PromQL, DQL provides users with a flexible and powerful tool for querying and analyzing data.

Key Takeaways

  1. Basic Query Structure: Understanding the fundamental structure of DQL queries, including namespaces, indexes, data sources, and clauses, is essential for writing effective queries.
  2. Execution Order: The order in which DQL queries are executed—data filtering, time aggregation, group aggregation, group filtering, non-aggregation functions, intra-group sorting, and inter-group sorting—plays a crucial role in determining the query's semantics and performance.
  3. Namespaces and Indexes: Proper use of namespaces and indexes can significantly optimize query performance and manage data more efficiently.
  4. Data Sources: Specifying the correct data source, whether through direct naming, wildcards, regular expressions, or subqueries, is key to retrieving the desired data.
  5. Select Clause: The select clause is vital for specifying the fields or expressions to be returned, including field selection, JSON field extraction, calculated fields, and aliases.
  6. Time Clause: The time clause is central to DQL, allowing users to specify time ranges, aggregation windows, and rollup functions for time series data.
  7. Filter Conditions: Using where clauses to filter data rows based on comparison operators, pattern matching, set operations, and logical operators ensures that only relevant data is processed.
  8. Grouping: Grouping data by specified dimensions using the group-by clause is fundamental for aggregation and analysis.
  9. Having Clause: The having clause filters aggregated results, providing a way to refine the output based on aggregation function results.
  10. Sorting and Pagination: DQL's dual-sorting mechanism—intra-group and inter-group sorting—along with pagination, allows for efficient data navigation and analysis.

Best Practices

  • Optimize Index Usage: Always specify indexes explicitly to reduce data scan ranges and improve query performance.
  • Use Aliases Wisely: Use aliases to make query results more readable and to distinguish between multiple aggregation functions on the same field.
  • Leverage Subqueries: Utilize subqueries for complex analysis and dynamic filtering, but be mindful of their impact on performance.
  • Understand Rollup Functions: Properly use rollup functions to preprocess time series data, especially for counter metrics.
  • Combine Sorting and Pagination: Use intra-group and inter-group sorting along with pagination to efficiently manage and analyze large datasets.

Advanced Techniques

  • Dynamic Filtering with Subqueries: Use where subqueries to implement dynamic filtering based on the results of another query.
  • Multi-Level Aggregation: Combine multiple levels of aggregation and sorting to perform detailed and nuanced data analysis.
  • Time-Based Analysis: Utilize time clauses and rollup functions to perform time-based analysis, such as rate calculations and trend analysis.

Final Thoughts

DQL is a powerful tool for anyone working with time series, log, and event data. By mastering its syntax, understanding its execution order, and applying best practices, users can unlock the full potential of DQL for their data analysis needs. Whether you're building monitoring dashboards, performing performance analysis, or gaining business insights, DQL provides the flexibility and power required to handle complex data queries efficiently.

For further exploration, refer to the official DQL documentation and experiment with different queries to gain hands-on experience. Remember, the key to mastering DQL lies in practice and continuous learning.


By following this guide, you should now have a solid foundation in DQL and be well-equipped to write and optimize your own queries. Happy querying!

Feedback

Is this page helpful? ×