DQL Function Reference¶
DQL provides a rich set of functions for data aggregation, transformation, and matching. This document details the semantics, parameters, and usage of each function.
Aggregation Functions¶
Aggregation functions are used to aggregate multiple rows of data into a single value, typically used with time windows (time-expr) and grouping (BY clause).
Basic Aggregation¶
sum¶
Calculates the sum of field values.
Syntax:
Parameters:
field: A numeric field
Examples:
// Calculate total request count
M::http_requests:(sum(request_count)) [1h]
// Calculate total request count grouped by service
M::http_requests:(sum(request_count)) [1h] BY service
avg¶
Calculates the average of field values.
Syntax:
Parameters:
field: A numeric field
Examples:
// Calculate average response time
M::response_time:(avg(duration)) [1h] BY endpoint
// Calculate average CPU usage
M::cpu:(avg(usage)) [1h] BY host
count¶
Counts the number of data rows.
Syntax:
Parameters:
field: Any field, counts the number of non-null values*: Counts all rows
Examples:
// Count log entries
L::nginx:(count(*)) [1h]
// Count requests that have a response time
M::response_time:(count(duration)) [1h] BY service
min / max¶
Calculates the minimum or maximum value of a field.
Syntax:
Parameters:
field: A numeric field
Examples:
// Find the maximum response time
M::response_time:(max(duration)) [1h] BY endpoint
// Find the range of CPU usage
M::cpu:(min(usage), max(usage)) [1h] BY host
first / last¶
Gets the first or last value (in chronological order).
Syntax:
Parameters:
field: Any field
Notes:
first: Returns the earliest value in timelast: Returns the latest value in time; if the field is an array type, it will be expandedlast_row: Returns the latest value in time; array types are not expanded
Examples:
// Get the latest status value
M::system:(last(status)) [1h] BY host
// Get initial and final values
M::counter:(first(value), last(value)) [1h] BY metric
any¶
Returns any non-null value. Suitable for obtaining sample data or scenarios where a specific aggregation order is not required.
Syntax:
Parameters:
field: Any field
Examples:
// Get a sample message
L::logs:(any(message)) [1h] BY service
// Get a sample error stack trace
L::error_logs:(any(stack_trace)) [1h] BY error_type
spread¶
Calculates the range (difference between maximum and minimum).
Syntax:
Parameters:
field: A numeric field
Examples:
// Calculate the fluctuation range of response time
M::response_time:(spread(duration)) [1h] BY endpoint
stddev¶
Calculates the standard deviation.
Syntax:
Parameters:
field: A numeric field
Examples:
// Calculate the standard deviation of response time
M::response_time:(stddev(duration)) [1h] BY endpoint
mode¶
Calculates the mode (the most frequently occurring value).
Syntax:
Parameters:
field: Any field
Examples:
count_series¶
Counts the number of time series (groupings). Returns how many independent time series exist within the current query scope.
Syntax:
Parameters:
field: Any field (typically*or any existing field)
Examples:
// Count how many hosts are reporting CPU metrics
M::cpu:(count_series(*)) [1h]
// Count how many instances each service has
M::http_requests:(count_series(*)) [1h] BY service
Statistical Aggregation (Estimation Functions)¶
The following functions use probabilistic data structures for estimation, suitable for large-scale data scenarios, balancing accuracy and performance.
count_distinct¶
Estimates the number of distinct values in a field.
Syntax:
Parameters:
field: Any field
Algorithm Notes:
Uses the HyperLogLog algorithm for cardinality estimation:
- Number of registers: 2¹⁶ = 65536
- Uses LogLog-Beta estimation method
- Standard error: approximately 0.4%
Use Cases:
- Counting unique users (UV)
- Calculating the number of distinct IP addresses
- Analyzing the number of unique request IDs
Examples:
// Count unique users
L::access_logs:(count_distinct(user_id)) [1d] BY service
// Count distinct accessing IPs
L::nginx:(count_distinct(client_ip)) [1h] BY endpoint
percentile¶
Estimates the percentile of a field.
Syntax:
Parameters:
field: A numeric fieldn: Percentile, range 0-100
Shorthand Forms:
p50(field)is equivalent topercentile(field, 50)p95(field)is equivalent topercentile(field, 95)p99(field)is equivalent topercentile(field, 99)
Algorithm Notes:
Uses a logarithmic linear interpolation histogram for estimation: - Bucket range: 10⁻⁹ to 10¹⁸, covering the vast majority of numerical scenarios - Each order of magnitude is divided into 128 buckets - Uses linear interpolation in logarithmic space to improve accuracy
Use Cases:
- Calculating P99, P95 for response times
- Analyzing tail latency of performance metrics
- Evaluating Service Level Agreement (SLA) compliance
Examples:
// Calculate P99 response time
M::response_time:(percentile(duration, 99)) [1h] BY service
// Using shorthand form
M::response_time:(p99(duration)) [1h] BY service
// Calculate multiple percentiles simultaneously
M::response_time:(p50(duration), p95(duration), p99(duration)) [1h] BY service
median¶
Calculates the median, equivalent to percentile(field, 50).
Syntax:
Examples:
Histogram Functions¶
DQL provides three histogram-related functions for different data sources and scenarios:
| Function | Use Case | Data Source Type | Recommendation |
|---|---|---|---|
histogram_auto |
Numerical distribution statistics for detailed data like logs, Trace | Detail model (Log/Trace) | ⭐⭐⭐ Recommended |
histogram |
Histogram with fixed bucket boundaries | Detail model (Log/Trace) | ⭐⭐ Deprecated |
histogram_quantile |
Calculate quantiles from Prometheus histogram metrics | Prometheus metrics | ⭐⭐⭐ Recommended |
histogram_auto (Recommended)¶
Automatically generates a distribution histogram, specifically designed for numerical distribution statistics of detailed data like logs and Trace.
Features:
- No need to specify bucket boundaries; automatically adapts to data distribution
- Uses logarithmic linear interpolation histogram algorithm, covering numerical range from 10⁻⁹ to 10¹⁸
- Returns both quantile statistics and bucket distribution information
Syntax:
Parameters:
field: A numeric field
Return Values:
| Column Name | Description |
|---|---|
lower_bounds |
Array of lower bounds for each bucket |
upper_bounds |
Array of upper bounds for each bucket |
counts |
Array of counts for each bucket |
min |
Minimum value |
p50 |
Median (50th percentile) |
p75 |
75th percentile |
p90 |
90th percentile |
p95 |
95th percentile |
p99 |
99th percentile |
max |
Maximum value |
Algorithm Notes:
Uses an estimation histogram (logarithmic linear interpolation), dividing each order of magnitude into 128 buckets, suitable for distribution statistics on large datasets.
Use Cases:
- Analyzing response time distribution in logs
- Statistics on latency distribution in Trace
- Exploratory data analysis without preset bucket boundaries
Examples:
// Analyze response time distribution in Nginx access logs
L::nginx:(histogram_auto(response_time)) [1h]
// Statistics on request duration distribution by service
L::app_logs:(histogram_auto(duration)) [1h] BY service
// Statistics on call duration distribution in Trace
T::http_client:(histogram_auto(elapsed)) [1h] BY operation
Result Example:
| lower_bounds | upper_bounds | counts | min | p50 | p75 | p90 | p95 | p99 | max |
|---|---|---|---|---|---|---|---|---|---|
| [0, 10, 100] | [10, 100, 1000] | [1000, 500, 100] | 0.5 | 45 | 120 | 280 | 450 | 850 | 1200 |
Note
lower_bounds, upper_bounds, counts are array types, representing the boundaries and counts of each bucket.
histogram (Deprecated)¶
Generates a histogram with specified bucket boundaries. This function is Deprecated, use histogram_auto instead.
Notes:
histogram requires manually specifying bucket boundary parameters, making it less flexible. histogram_auto automatically adapts to data distribution, covers a wider numerical range, and returns richer statistical information.
Syntax:
Parameters:
field: A numeric fieldleft_bound: Left boundaryright_bound: Right boundarybucket_size: Bucket sizethreshold(optional): Minimum count per bucket; buckets below this value are not returned
Return Values:
Returns two columns: bucket_le (right boundary of the bucket) and count (count)
Examples:
// Generate a histogram in the range 0-1000ms, with 100ms per bucket
M::response_time:(histogram(duration, 0, 1000, 100)) [1h]
// Recommended: Use histogram_auto instead
M::response_time:(histogram_auto(duration)) [1h]
Result Example:
| bucket_le | count |
|---|---|
| 100 | 1500 |
| 200 | 2800 |
| 300 | 3500 |
| ... | ... |
| 1000 | 5000 |
histogram_quantile¶
Calculates quantiles from Prometheus histogram metrics.
Features:
- Specifically designed to handle histogram-type metrics reported by Prometheus
- Relies on the
lelabel (or VictoriaMetrics'vmrangelabel) to identify bucket boundaries - Input data should be cumulative counts
Syntax:
Parameters:
field: Histogram count field (e.g.,http_request_duration_bucket)q: Quantile, range 0-1 (e.g., 0.99 for P99)
Use Case Comparison:
| Scenario | Recommended Function | Notes |
|---|---|---|
| Analyzing response time distribution in logs | histogram_auto |
Logs are detailed data, no pre-aggregated histogram |
| Analyzing P99 of Prometheus histogram metrics | histogram_quantile |
Metrics are pre-aggregated by le label |
| Statistics on call duration distribution in Trace | histogram_auto |
Trace is detailed data |
le Label Processing Mechanism:
histogram_quantile relies on the le label (less than or equal) to identify histogram bucket boundaries:
-
Prometheus format (default): Uses the
lelabel directly to represent the bucket upper boundarylevalue is numeric (e.g., "0.1", "1", "10") or "+Inf" (infinity)- Data should be cumulative counts
-
VictoriaMetrics format: Uses the
vmrangelabel to represent ranges- Format is
"lower...upper"(e.g.,"0.1...0.2") - Data is range counts (non-cumulative)
- The function automatically converts range counts to cumulative counts
- Format is
Calculation Process:
- Sort all buckets by
levalue - If in
vmrangeformat, accumulate counts to convert to cumulative distribution - Ensure bucket counts are monotonically increasing (fix possible abnormal data)
- Use linear interpolation to calculate the target quantile
Differences from PromQL:
| Feature | DQL | PromQL |
|---|---|---|
| Function Type | Aggregation function | Transformation function |
| Input Data | Directly reads metrics with le label |
Requires combination with sum(rate(...)) by (le) |
| Usage | histogram_quantile(field, 0.99) |
histogram_quantile(0.99, sum(rate(...)) by (le)) |
| Data Format | Supports both le and vmrange labels |
Only supports le label |
| Grouping | Through DQL's BY clause | Through explicit by (le) grouping |
Equivalent Examples:
Assume a histogram metric http_request_duration_bucket, containing le labels (e.g., 0.1, 0.5, 1, 5, +Inf) and service label.
Scenario 1: Calculate P99 Latency
DQL:
PromQL equivalent:
Scenario 2: Calculate P95 Latency per Service (Multiple Groups)
DQL:
PromQL equivalent:
Scenario 3: Calculate P50 (Median) and P99
DQL:
M::http_request_duration:(
histogram_quantile(duration_bucket, 0.50) as p50,
histogram_quantile(duration_bucket, 0.99) as p99
) [1h] BY service
PromQL equivalent:
label_join(
histogram_quantile(0.50, sum(rate(http_request_duration_bucket[1h])) by (le, service)), "quantile", "", "0.50"
)
or
label_join(
histogram_quantile(0.99, sum(rate(http_request_duration_bucket[1h])) by (le, service)), "quantile", "", "0.99"
)
Note
PromQL requires label_join or label_replace to distinguish results from different quantiles.
Notes:
- Input data must contain
leorvmrangelabels, otherwise calculation is impossible - If there is no
+Infbucket, the upper boundary of the last bucket will be used as the maximum value - Buckets with count 0 or NaN are skipped
- Returns -Inf when quantile < 0, and +Inf when > 1
TopN Functions¶
top¶
Gets the top N largest values.
Syntax:
Parameters:
field: A numeric fieldn: Number of values to return
Examples:
// Get the 5 requests with the longest response time
M::response_time:(top(duration, 5)) [1h] BY service
// Get the 10 hosts with the highest traffic
M::network:(top(bytes, 10)) [1h]
Result Example:
| service | top(duration, 5) |
|---|---|
| api | 1250 |
| api | 1180 |
| api | 1050 |
| api | 980 |
| api | 920 |
Note
Returns multiple rows, each containing one TopN value.
bottom¶
Gets the top N smallest values.
Syntax:
Parameters:
field: A numeric fieldn: Number of values to return
Examples:
// Get the 5 requests with the shortest response time
M::response_time:(bottom(duration, 5)) [1h] BY service
Result Example:
| service | bottom(duration, 5) |
|---|---|
| api | 12 |
| api | 18 |
| api | 25 |
| api | 32 |
| api | 45 |
Note
Returns multiple rows, each containing one BottomN value.
Value Collection Functions¶
distinct¶
Returns all distinct values of a field.
Syntax:
Examples:
Result Example:
| endpoint | distinct(status) |
|---|---|
| /api/v1 | 200 |
| /api/v1 | 404 |
| /api/v1 | 500 |
| /health | 200 |
Note
Returns multiple rows, each containing one distinct value.
distinct_by_collapse¶
Gets distinct values of a field according to a collapse strategy, retaining the last value of other fields during deduplication.
Syntax:
Parameters:
field: The field used as the basis for deduplicationlast_fields(optional): List of fields for which the last value should be retained
Notes:
Unlike distinct, distinct_by_collapse retains information from other related fields (taking the last value) during deduplication, suitable for scenarios requiring contextual information.
Examples:
// Get distinct user IDs, retaining the last access time for each user
L::access_logs:(distinct_by_collapse(user_id, [timestamp])) [1h]
// Get distinct hosts, retaining the last status and message
O::HOST:(distinct_by_collapse(host, [status, message])) [1h]
Result Example:
| user_id | last(timestamp) | last(path) |
|---|---|---|
| user001 | 1704067200000 | /checkout |
| user002 | 1704067100000 | /product |
| user003 | 1704067000000 | /home |
Note
Returns the deduplicated main field values, and the last value of other fields specified by last_fields.
collect¶
Collects all values (including duplicates).
Syntax:
Parameters:
field: Any fieldlimit(optional): Maximum number of values to collect
Examples:
// Collect all response times
M::response_time:(collect(duration)) [1h] BY service
// Collect at most 100 values
M::response_time:(collect(duration, 100)) [1h] BY service
Result Example:
| service | collect(duration) |
|---|---|
| api | [120, 135, 98, 142, ...] |
| web | [45, 52, 48, 61, ...] |
Note
Returns an array type, containing all collected values (may include duplicates).
collect_distinct¶
Collects all distinct values.
Syntax:
Parameters:
field: Any fieldlimit(optional): Maximum number of values to collect
Examples:
Result Example:
| service | collect_distinct(error_type) |
|---|---|
| api | ["timeout", "connection refused", "404"] |
| web | ["200", "301", "404"] |
Note
Returns an array type, containing all deduplicated values.
field_values¶
Gets all values of a field, returning an array type.
Syntax:
Examples:
Result Example:
| metric_name | field_values(tags) |
|---|---|
| cpu_usage | ["host:A", "env:prod", "team:backend"] |
| memory_used | ["host:B", "env:staging", "team:frontend"] |
Note
Returns an array type, containing all values of the field.
Filter Aggregation¶
count_filter¶
Counts the number of field values within a specified list.
Syntax:
Parameters:
field: Any fieldvalues: List of values
Examples:
// Count requests with specific status codes
M::http:(count_filter(status, [200, 201, 204])) [1h] BY endpoint
// Count error-level logs
L::logs:(count_filter(level, ["error", "critical"])) [1h] BY service
Helper Functions¶
default¶
Sets a default value for a field, returning the default value when the field is empty.
Syntax:
Parameters:
field: Any fielddefault_value: Default value (can be numeric, string, boolean, or null)
Examples:
// Set a default response time for null values
M::response_time:(default(duration, 0)) [1h] BY service
Time Series Functions¶
Time series functions are used to process data that changes over time, especially Counter-type metrics.
Rollup Functions¶
Rollup functions are used to preprocess raw time series data within a time window. For detailed explanation, refer to Rollup Functions in this document.
Writing Notes:
- Rollup is written in the time clause, e.g.,
[rate],[1h::5m:rate]. - Writing in the outer query (e.g.,
rate(DQL)) belongs to outer functions, not Rollup.
Supported Rollup Functions:
| Function | Description |
|---|---|
rate |
Calculate growth rate (per second) |
irate |
Calculate instantaneous growth rate |
increase |
Calculate increase amount |
deriv |
Calculate derivative (rate of change) |
difference |
Calculate difference |
non_negative_derivative |
Calculate non-negative derivative |
non_negative_difference |
Calculate non-negative difference |
rate_over_sum |
Calculate average per second |
rate_over_count |
Calculate count per second |
sum |
Sum |
avg |
Average |
min |
Minimum |
max |
Maximum |
count |
Count |
first |
First value |
last |
Last value |
stddev |
Standard deviation |
mode |
Mode |
spread |
Range |
any |
Any value |
Examples:
// Calculate request QPS
M::http_requests:(sum(request_count)) [1h::5m:rate] BY service
// Shorthand form
M::cpu:(max(usage)) [rate]
Growth Rate Calculation¶
rate¶
Calculates the growth rate of a metric (per second).
Syntax:
Notes:
rate calculates the average growth rate of a Counter metric within a time window. For monotonically increasing Counter-type metrics, aggregating raw values directly is meaningless; the growth rate needs to be calculated first.
Use Cases:
- Calculating request QPS
- Calculating data write rate
- Analyzing traffic growth trends
Examples:
// Calculate request QPS
M::http_requests:(sum(request_count)) [rate] BY service
// Calculate data ingestion rate
M::data_ingestion:(sum(bytes)) [rate] BY source
irate¶
Calculates the instantaneous growth rate of a metric.
Syntax:
Notes:
Unlike rate, irate uses only the last two data points to calculate the growth rate, reflecting the instantaneous rate of change, more suitable for alerting scenarios.
Examples:
increase¶
Calculates the increase amount of a metric.
Syntax:
Notes:
increase returns the total increase within the time window, not the growth rate.
Examples:
rate_over_sum¶
Calculates the average per second (sum / time window seconds).
Syntax:
Notes:
Equivalent to sum(field) / time window (seconds), used to calculate the average per second. Often used in the Rollup stage to convert cumulative values to per-second rates.
Difference from rate:
rate: Calculates the growth rate of a Counter (handles resets)rate_over_sum: Simply divides sum by time window seconds
Examples:
// Calculate average requests per second
M::http_requests:(rate_over_sum(request_count)) [1h] BY service
rate_over_count¶
Calculates the count per second (count / time window seconds).
Syntax:
Notes:
Equivalent to count(field) / time window (seconds), used to calculate occurrences per second.
Examples:
Difference Calculation¶
This section describes function semantics. The same functions can be used both as Rollup (e.g., [rate], [increase]) and written within query expressions (e.g., rate(field), increase(field)); they execute at different stages, prioritize based on business needs.
rate / deriv¶
Calculates the rate of change (derivative). rate is used for Counter-type metrics (ignores negative values), deriv is used for Gauge-type metrics (preserves negative values).
Aliases: rate alias is non_negative_derivative; deriv alias is derivative (PromQL style).
Syntax:
// Counter metrics: Calculate non-negative rate of change (ignores negative values from resets)
rate(field)
// Gauge metrics: Calculate full rate of change (including negative values)
deriv(field)
Function Selection:
| Function | Description | Use Case |
|---|---|---|
rate |
Calculates only non-negative rate of change | Counter-type metrics (monotonically increasing) |
deriv |
Calculates full rate of change (including negative) | Gauge-type metrics (can increase or decrease) |
Examples:
// Counter metrics: Calculate request QPS
M::requests:(rate(count)) [1h::5m] BY service
// Gauge metrics: Calculate memory usage change rate
M::memory:(deriv(used)) [1h::5m] BY host
increase / difference¶
Calculates the difference between adjacent values. increase is used for Counter-type metrics (ignores negative values), difference is used for Gauge-type metrics (preserves negative values).
Notes: increase and difference are two independent functions with different behaviors, not aliases.
Syntax:
// Counter metrics: Calculate non-negative difference (ignores negative values from resets)
increase(field)
// Gauge metrics: Calculate full difference (including negative values)
difference(field)
Function Selection:
| Function | Description | Use Case |
|---|---|---|
increase |
Calculates only non-negative difference | Counter-type metrics (monotonically increasing) |
difference |
Calculates full difference (including negative) | Gauge-type metrics (can increase or decrease) |
Examples:
// Counter metrics: Calculate request increase amount
M::requests:(increase(count)) [1h::5m] BY service
// Gauge metrics: Calculate request count change (may increase or decrease)
M::requests:(difference(count)) [1h::5m] BY service
Moving Calculation¶
moving_average¶
Calculates the moving average.
Syntax:
Parameters:
field: A numeric fieldn: Window size (number of data points)
Examples:
cumsum¶
Calculates the cumulative sum.
Syntax:
Examples:
Transformation Functions¶
Transformation functions are used for mathematical operations, type conversions, or string processing on field values.
Mathematical Functions¶
abs¶
Calculates the absolute value.
Syntax:
Examples:
round / ceil / floor¶
Rounding functions.
Syntax:
Examples:
// Round up response time
M::response_time:(ceil(duration)) [1h]
// Round percentage
M::cpu:(round(usage)) [1h] BY host
log / log2 / log10¶
Logarithmic functions.
Syntax:
Examples:
Type Conversion¶
int / uint / float / string / bool¶
Type conversion functions.
Syntax:
int(field) // Convert to signed integer
uint(field) // Convert to unsigned integer
float(field) // Convert to floating-point
string(field) // Convert to string
bool(field) // Convert to boolean
Examples:
// Convert string to numeric
L::logs:(int(response_time)) [1h] BY service
// Convert numeric to string for concatenation
M::metrics:(string(value)) [1h] BY metric_name
String Functions¶
substr¶
Extracts a substring.
Syntax:
Parameters:
field: A string fieldstart: Starting position (0-indexed, negative numbers count from the end)length(optional): Length of substring
Return Value:
Returns the extracted substring.
Examples:
// Extract first 100 characters of a message
L::logs:(substr(message, 0, 100)) [1h]
// Extract last 10 characters
L::logs:(substr(message, -10)) [1h]
Result Example:
| message | substr(message, 0, 10) | substr(message, -5) |
|---|---|---|
| "Error: connection timeout" | "Error: con" | "eout" |
regexp_extract¶
Regular expression extraction.
Syntax:
Parameters:
field: A string fieldpattern: Regular expressionn(optional): Extract the nth capture group, defaults to 0 (entire match)
Return Value:
Returns a single string, the content of the nth capture group. Returns null if no match.
Examples:
// Extract error code
L::logs:(regexp_extract(message, 'error_code: (\d+)', 1)) [1h] BY service
// Extract IP address
L::nginx:(regexp_extract(message, '(\d+\.\d+\.\d+\.\d+)', 1)) [1h]
Result Example:
| service | regexp_extract(message, 'error_code: (\d+)', 1) |
|---|---|
| api | "404" |
| api | "500" |
| web | null |
regexp_extract_all¶
Extracts all matching results.
Syntax:
Return Value:
Returns a string array containing all matching substrings.
Examples:
// Extract all numbers
L::logs:(regexp_extract_all(message, '\d+', 0)) [1h]
// Extract all IP addresses
L::logs:(regexp_extract_all(message, '\d+\.\d+\.\d+\.\d+', 0)) [1h]
Result Example:
| message | regexp_extract_all(message, '\d+.\d+.\d+.\d+', 0) |
|---|---|
| Request from 192.168.1.1 to 10.0.0.1 | ["192.168.1.1", "10.0.0.1"] |
md5¶
Calculates the MD5 hash.
Syntax:
Return Value:
Returns a 32-character hexadecimal string (lowercase).
Examples:
Result Example:
| service | md5(message) |
|---|---|
| api | 5d41402abc4b2a76b9719d911017c592 |
| web | 098f6bcd4621d373cade4e832627b4f6 |
concat¶
String concatenation.
Syntax:
Return Value:
Returns a single concatenated string.
Examples:
// Concatenate multiple fields
L::logs:(concat(service, ":", level)) [1h]
// Results: "api:error", "web:info", etc.
Result Example:
| service | level | concat(service, ":", level) |
|---|---|---|
| api | error | "api:error" |
| web | info | "web:info" |
set¶
Deduplicates and sorts an array field.
Syntax:
Return Value:
Returns a deduplicated and sorted array.
Examples:
// Get all distinct tags
M::metrics:(set(tags)) [1h] BY metric_name
// Deduplicate collect results
M::http:(set(collect(status))) [1h] BY endpoint
Result Example:
| metric_name | set(tags) |
|---|---|
| cpu_usage | ["env:prod", "host:A", "team:backend"] |
| memory_used | ["env:staging", "host:B", "team:frontend"] |
Log Clustering¶
drain¶
Uses the Drain algorithm to cluster logs and generate log templates.
Syntax:
Parameters:
field: A string field (usually message)similarity_threshold: Similarity threshold, range (0, 1], higher values mean stricter clusteringmax_clusters: Maximum number of clusters, range [1, 10000]
Algorithm Notes:
Drain is a log clustering algorithm based on a parse tree, capable of automatically identifying constant and variable parts in logs to generate log templates.
Use Cases:
- Log pattern recognition
- Anomaly log clustering analysis
- Log noise reduction
Examples:
// Cluster logs, similarity 0.7, up to 1000 clusters
L::logs:(drain(message, 0.7, 1000)) [1h] BY service
// Stricter clustering
L::logs:(drain(message, 0.9, 500)) [1h] BY service
Result Example:
| service | drain(message, 0.7, 1000) |
|---|---|
| api | "Request from |
| db | "Query |
Note
Returns a string-type log template, where <...> indicates variable parts.
Matching Functions¶
Matching functions are used for text matching in WHERE clauses and can also be used as expressions returning boolean values.
Substring Matching¶
match¶
Checks if a field contains a specified substring.
Syntax:
Parameters:
pattern: Substring to matchfield: Field name (optional, can be omitted in WHERE clause)
Examples:
// Use in WHERE
L::logs:(message) {match(message, "error")} [1h]
// Shorthand form
L::logs:(message) {match("error")} [1h]
// As an expression
L::logs:(match(message, "timeout")) [1h] BY match_result
Phrase Matching¶
phrase / search¶
Tokenized phrase matching, supports mixed Chinese and English.
Syntax:
Parameters:
query: Query phrasefield: Field name (optional)
Matching Rules:
- Chinese: Tokenized by character for matching
- English: Tokenized by word boundaries (space, punctuation)
- Case-insensitive
Examples:
// Match logs containing "connection timeout"
L::logs:(message) {phrase("connection timeout")} [1h]
// Chinese matching
L::logs:(message) {phrase("连接超时")} [1h]
// Mixed Chinese and English
L::logs:(message) {search("error 错误")} [1h]
Regular Expression Matching¶
re / regex / regexp¶
Regular expression matching.
Syntax:
re(pattern)
re(field, pattern)
regex(pattern)
regex(field, pattern)
regexp(pattern)
regexp(field, pattern)
Parameters:
pattern: Regular expression (supports PromRegex syntax)field: Field name (optional)
Examples:
// Match logs starting with error
L::logs:(message) {re("error.*")} [1h]
// Match error codes of specific format
L::logs:(message) {regexp(message, "ERR-\d{4}")} [1h]
// Use regex in data source
M::re('cpu.*'):(usage) [1h]
Wildcard Matching¶
wildcard¶
Wildcard pattern matching.
Syntax:
Parameters:
-
pattern: Wildcard pattern*: Matches any number of characters?: Matches a single character
-
field: Field name (optional)
Examples:
// Match messages starting with error
L::logs:(message) {wildcard("error*")} [1h]
// Match specific format
L::logs:(message) {wildcard(message, "ERR-????")} [1h]
CIDR Matching¶
cidr¶
IP address CIDR range matching.
Syntax:
Parameters:
cidr: Network segment in CIDR notation, e.g.,192.168.1.0/24field: IP address field (optional)
Examples:
// Match internal IPs
L::nginx:(*) {cidr(client_ip, "10.0.0.0/8")} [1h]
// Match specific network segment
L::nginx:(*) {cidr(client_ip, "192.168.1.0/24")} [1h]
Field Existence Check¶
exists¶
Checks if a field exists.
Syntax:
Examples:
Query String Syntax¶
query_string¶
Uses query string syntax for complex matching.
Syntax:
Parameters:
query: Query stringfield: Field name (optional, defaults to full-text matching)
Query String Syntax:
1. Term Matching¶
foo # Match content containing foo
"foo bar" # Exact phrase match, must appear consecutively
foo\ bar # Escape space, match "foo bar" as a whole
2. Wildcards¶
foo* # Match content starting with foo
foo?bar # ? matches a single character
"foo*bar" # Wildcards inside quotes are not parsed, matched literally
3. Regular Expressions¶
4. Boolean Operators¶
foo AND bar # Logical AND, both must be contained
foo OR bar # Logical OR, at least one must be contained
NOT foo # Logical NOT, does not contain foo
# Shorthand forms
foo && bar # Equivalent to foo AND bar
foo || bar # Equivalent to foo OR bar
!foo # Equivalent to NOT foo
5. Grouping¶
(foo OR bar) AND baz # Use parentheses to change precedence
!(status 429 reading) # Negate the entire expression
6. Default Operator¶
When multiple terms are separated by spaces, they are connected by OR by default (configurable to AND):
Examples:
// Simple term matching
L::logs:(message) {query_string("error timeout")} [1h]
// Boolean combination
L::logs:(message) {query_string("error AND NOT timeout")} [1h]
// Regular expression
L::logs:(message) {query_string("/ERR-\d{4}/")} [1h]
// Complex query
L::logs:(message) {query_string("(error OR warn) AND service")} [1h]
// Specify field
L::logs:(*) {query_string(message, "error AND timeout")} [1h]
// Chinese query
L::logs:(message) {query_string("错误 AND 超时")} [1h]
Outer Functions¶
Usage Recommendation
Outer functions are legacy design. For scenarios that can be solved using Rollup + aggregation functions (e.g., [rate], [last], [increase], etc.), prioritize using the Rollup approach. Outer functions should only be used in scenarios where Rollup is not applicable (e.g., requiring secondary calculation on aggregated results).
Outer functions operate on the entire DQL query result, used for secondary calculation on the time series data output by the query. Outer functions wrap the entire DQL expression, rather than being written inside the Select clause.
In-Query Functions vs. Outer Functions¶
- In-Query Functions: Used inside DQL expressions, e.g.,
sum,avg,max, etc. - Outer Functions: Wrap the entire DQL query result, performing post-processing on the output time series.
Writing Comparison:
// Rollup (time clause): First calculate growth rate for each time series, then aggregate
M::http_requests:(sum(request_count)) [rate] BY service
// Outer function: First get query result, then perform secondary calculation
rate(M::http_requests:(sum(request_count)) [1h::1m] BY service)
Syntax:
Examples:
// In-query function: Average raw data
M::cpu:(avg(usage)) [1h::5m] BY host
// Outer function: Calculate moving average on query result
moving_average(M::cpu:(avg(usage)) [1h::5m] BY host, 5)
Cumulative Calculation¶
cumsum¶
Calculates the cumulative sum, computing the sum of all previous points for each point in the time series.
Syntax:
Examples:
Difference and Derivative¶
Recommended Usage
These functions are prioritized as Rollup functions (e.g., [rate], [deriv]). The outer function form is only used for secondary calculation on aggregated results.
The following functions can be used as outer functions:
| Function | Description |
|---|---|
derivative(DQL) |
Calculate derivative (rate of change) |
difference(DQL) |
Calculate difference from previous value |
non_negative_derivative(DQL) |
Calculate non-negative derivative |
non_negative_difference(DQL) |
Calculate non-negative difference |
rate(DQL) |
Calculate growth rate (per second) |
irate(DQL) |
Calculate instantaneous growth rate |
Examples:
// Outer function: Calculate derivative on query result
derivative(M::cpu:(avg(usage)) [1h::5m] BY host)
// Recommended: Use Rollup approach
M::cpu:(deriv(usage)) [1h::5m:last] BY host
Moving Calculation¶
moving_average¶
Calculates the moving average on query results.
Recommended Usage
Prioritize using the Rollup approach moving_average(field, n). The outer function form is only used for secondary smoothing on aggregated results.
Syntax:
Parameters:
DQL_expression: DQL query expressionn: Window size (number of data points)
Examples:
// Outer function: Calculate moving average on query result
moving_average(M::cpu:(avg(usage)) [1h::1m] BY host, 5)
// Recommended: Use Rollup approach
M::cpu:(moving_average(usage, 5)) [1h::1m] BY host
TopN¶
top / bottom¶
Recommended Usage
Prioritize using the Rollup approach top(field, n) or bottom(field, n). The outer function form is only used for secondary filtering on aggregated results.
Gets TopN or BottomN from query results.
Syntax:
Examples:
// Outer function: Get TopN from query result
top(M::response_time:(max(duration)) [1h::5m] BY service, 5)
// Recommended: Use Rollup approach
M::response_time:(top(duration, 5)) [1h::5m] BY service
Null Value Filling¶
fill¶
Fills null values in query results.
For detailed explanation, refer to fill function.
Examples:
// Fill nulls with 0
fill(M::cpu:(avg(usage)) [1h::5m] BY host, 0)
// Fill with linear interpolation
fill(M::cpu:(avg(usage)) [1h::5m] BY host, LINEAR)
Other Outer Functions¶
The following functions can also be used as outer functions:
| Function | Description |
|---|---|
abs(DQL) |
Absolute value |
round(DQL) |
Round |
ceil(DQL) |
Round up |
floor(DQL) |
Round down |
log(DQL) / log2(DQL) / log10(DQL) |
Logarithmic transformation |
set(DQL) |
Deduplicate and sort |
concat(DQL, ...) |
String concatenation |
Combined Usage¶
Outer functions can be combined:
// Calculate moving average then round
round(moving_average(M::cpu:(avg(usage)) [1h::1m] BY host, 5))
// Calculate moving average of growth rate
moving_average(rate(M::requests:(sum(count)) [1h::5m] BY service), 3)
eval Expression Calculation¶
eval is a special function that allows expression calculation outside the query, referencing results from multiple subqueries for combined operations.
Syntax¶
Parameters:
expression: Mathematical expression, referencing subquery results usingname.fieldname=query: Named subqueryalias: Result alias (optional)
How It Works¶
- Execute all named subqueries
- Align results of each subquery by time
- Calculate the expression for each time point
- Return the calculation result
Use Cases¶
- Calculating ratios of multiple metrics (e.g., error rate, utilization)
- Comparing metrics across different time periods
- Combining calculation results from multiple data sources
Examples¶
Calculate Error Rate¶
// Error rate = error count / total requests * 100
eval(a / b * 100,
a=M::http:(sum(error_count)) [1h] BY service,
b=M::http:(sum(request_count)) [1h] BY service,
alias="error_rate")
Calculate CPU Usage¶
// Usage = used / total * 100
eval(used / total * 100,
used=M::memory:(sum(used_bytes)) [1h] BY host,
total=M::memory:(sum(total_bytes)) [1h] BY host,
alias="memory_usage_percent")
Calculate Year-over-Year Growth¶
// Compare this week with last week
eval(this_week / last_week - 1,
this_week=M::sales:(sum(amount)) [7d],
last_week=M::sales:(sum(amount)) [7d offset 7d],
alias="week_over_week_growth")
Reference Subquery Fields¶
// Reference specific fields of subqueries
eval(a.usage / b.total * 100,
a=M::cpu:(avg(usage)) [1h] BY host,
b=M::cpu:(avg(total)) [1h] BY host,
alias="cpu_percent")
Notes¶
- Time windows for all subqueries must be compatible
- Grouping dimensions of subqueries should be consistent
- Field names referenced in expressions use
name.fieldformat - If there is only one subquery, field names can be used directly
Other Functions¶
fill¶
Fills specified values for null values in query results.
Recommended Usage
fill is recommended to be used as an outer function, operating on the entire query result:
Although fill(avg(usage), 0) is also supported in the Select clause, fill actually fills the results after aggregation is complete, so the outer function form better matches its working mechanism.
Syntax (Outer Function):
Parameters:
DQL_expression: DQL query expressionvalue: Fill value, supports multiple modes:- Specific value: numeric, string, null
LINEAR: Linear interpolationPREVIOUS: Fill with previous non-null value
Examples:
// Recommended: Use as outer function
fill(M::cpu:(avg(usage)) [1h::5m] BY host, 0)
// Fill with linear interpolation
fill(M::cpu:(avg(usage)) [1h::5m] BY host, LINEAR)
// Fill with previous value
fill(M::cpu:(avg(usage)) [1h::5m] BY host, PREVIOUS)
now¶
Returns the current timestamp (milliseconds).
Syntax:
Examples:
// Query recently updated data
O::HOST:(*) {last_update_time > now() - 600000} // Updated within 10 minutes
unwrap¶
Unwraps the packaging of aggregated results.
Syntax:
Examples:
Show Functions¶
Show functions are used to view metadata (e.g., measurement, tag, field, cardinality and series count), often used for modeling troubleshooting and pre-query exploration.
General Syntax¶
-
where,time_window,LIMIT,OFFSETare all optional; -
LIMIT/OFFSETcannot be negative.
M Namespace Built-in Show Functions¶
| Function | Parameters | Return Columns | Description |
|---|---|---|---|
show_measurement |
Optional re('pattern') |
name |
List measurements |
show_tag_key |
Optional from=['measurement'] |
tagKey |
List tag keys |
show_field_key |
Optional from=['measurement'] |
fieldKey, fieldType |
List field keys (current fieldType is float) |
show_tag_value |
keyin=['tagKey'] (required), optional from |
key, value |
List tag values |
show_measurement_cardinality |
No mandatory parameters | count |
Number of measurements |
show_series_cardinality |
No mandatory parameters | count |
Series cardinality (estimated) |
show_tag_key_cardinality |
No mandatory parameters | count |
Tag key cardinality (estimated) |
show_tag_value_cardinality |
keyin=['tagKey'] (required) |
count |
Value cardinality for specified tag key (estimated) |
show_field_key_cardinality |
No mandatory parameters | count |
Field key cardinality (estimated) |
show_series_count_by_field_key |
from=['measurement'] (recommended) |
name, count |
Count series by field key |
show_series_count_by_tag_key |
from=['measurement'] (recommended) |
name, count, value_count |
Count series and values by tag key |
show_series_count_by_tag_value |
keyin=['tagKey'] (required), from=['measurement'] (recommended) |
name, count |
Count series by value of specified tag key |
Cardinality-related functions use HyperLogLog merging at the bottom layer, returning estimated values.
Non-M Namespace Show Functions (Suffix Pattern)¶
For non-M namespaces, the following suffix patterns are supported:
show_<namespace>_sourceshow_<namespace>_classshow_<namespace>_typeshow_<namespace>_fieldshow_<namespace>_label
Where <namespace> is automatically mapped from the middle segment of the function name, for example:
show_logging_source->Lshow_tracing_field->Tshow_object_source->O
Common examples:
show_logging_source()
show_tracing_field('mysql')
show_logging_field('*')
show_logging_label(name='env')
show_logging_label(names=['env', 'team'])
Parameter and Behavior Notes¶
from: List of measurements, supports string or string array;keyin: List of tag keys, supports string or string array;field: List of fields, supports string or string array (for metric show field filtering);-
For
show_*_field:- Unnamed parameters (e.g.,
'mysql') are usually treated as source filters '*'is equivalent to not specifying a source- Named parameters are converted to where filter conditions
- Unnamed parameters (e.g.,
-
For
show_*_label:- Requires named parameters;
namesis treated as an alias forname.
- Requires named parameters;
Constraints and Notes¶
show_tag_valueandshow_tag_value_cardinalitymust providekeyin;show_series_count_by_tag_valuemust providekeyin;show_series_count_by_*must providefrom, or equivalent source constraints in where (e.g.,@__source__condition);show_<namespace>_source,show_<namespace>_class,show_<namespace>_typecurrently share the same execution path, returning deduplicated source list;- Currently parser does not support
show_<namespace>_indexsyntax (even though the execution layer has corresponding branches); - When Query API does not provide a show time range, some log show queries will fall back to the last 30-minute window for execution.
Return Examples¶
The following examples only show typical column structure and sample rows; actual results are affected by tenant data, filter conditions, time range, and LIMIT/OFFSET.
| Query | Typical Columns | Sample Rows (Illustrative) |
|---|---|---|
show_measurement() |
name |
cpu, disk, memory |
show_tag_value(from=['cpu'], keyin=['host']) |
key, value |
host, web-01; host, web-02 |
show_series_count_by_tag_key(from=['cpu']) |
name, count, value_count |
host, 3200, 120; service, 2800, 35 |
show_tag_value_cardinality(keyin=['host']) |
count |
120 |
show_logging_field('*') |
fieldKey, fieldType, fieldIndices |
service, keyword, ["idx_service"] |
show_logging_source() |
source |
nginx, mysql, redis |
Function Classification Quick Reference¶
Basic Aggregation¶
| Function | Description | Exact/Estimated |
|---|---|---|
| sum | Sum | Exact |
| avg | Average | Exact |
| count | Count | Exact |
| count_distinct | Distinct count | Estimated (HyperLogLog, error≈0.4%) |
| min / max | Minimum/Maximum | Exact |
| first / last | First/Last value | Exact |
| any | Any value | Exact |
Statistical Aggregation¶
| Function | Description | Exact/Estimated |
|---|---|---|
| percentile / pXX | Percentile | Estimated (log histogram) |
| median | Median | Estimated |
| stddev | Standard deviation | Exact |
| mode | Mode | Exact |
| spread | Range | Exact |
| count_series | Time series count | Exact |
Filter Aggregation¶
| Function | Description | Exact/Estimated |
|---|---|---|
| top / bottom | TopN / BottomN | Exact |
| count_filter | Conditional count | Exact |
Histogram Functions¶
| Function | Description | Data Source | Exact/Estimated |
|---|---|---|---|
| histogram_auto | Automatic histogram (Recommended) | Log, Trace detail data | Estimated |
| histogram | Fixed-boundary histogram (Deprecated) | Log, Trace detail data | Exact |
| histogram_quantile | Calculate quantile from Prometheus histogram | Prometheus metrics | Estimated |
Set Functions¶
| Function | Description | Exact/Estimated |
|---|---|---|
| distinct | Distinct value list | Exact |
| distinct_by_collapse | Collapse deduplication (retain other fields) | Exact |
| collect | Collect all values | Exact |
| collect_distinct | Collect distinct values | Exact |
Helper Functions¶
| Function | Description | Exact/Estimated |
|---|---|---|
| default | Set default value | Exact |
Time Series Functions¶
| Function | Description |
|---|---|
| rate | Growth rate (per second) |
| irate | Instantaneous growth rate |
| increase | Increase amount |
| derivative | Derivative |
| difference | Difference |
| non_negative_derivative | Non-negative derivative |
| non_negative_difference | Non-negative difference |
| moving_average | Moving average |
| cumsum | Cumulative sum |
Rollup Functions¶
| Function | Description |
|---|---|
rate |
Calculate growth rate (per second) |
irate |
Calculate instantaneous growth rate |
increase |
Calculate increase amount |
rate_over_sum |
Calculate average per second |
rate_over_count |
Calculate count per second |
deriv |
Calculate derivative |
difference |
Calculate difference |
sum |
Sum |
avg |
Average |
min |
Minimum |
max |
Maximum |
count |
Count |
first |
First value |
last |
Last value |
stddev |
Standard deviation |
mode |
Mode |
spread |
Range |
any |
Any value |
Transformation Functions¶
| Function | Description |
|---|---|
| abs | Absolute value |
| round / ceil / floor | Rounding |
| log / log2 / log10 | Logarithm |
| int / uint / float / string / bool | Type conversion |
| substr | Substring |
| regexp_extract | Regex extraction |
| regexp_extract_all | Regex extract all |
| md5 | MD5 hash |
| concat | String concatenation |
| set | Array deduplicate and sort |
| drain | Log clustering |
Matching Functions¶
| Function | Description |
|---|---|
| match | Substring match |
| phrase / search | Tokenized phrase match |
| re / regex / regexp | Regex match |
| wildcard | Wildcard match |
| cidr | CIDR range match |
| query_string | Query string syntax |
| exists | Field existence check |
Outer Functions (Prioritize Rollup)¶
| Function | Description |
|---|---|
| cumsum | Cumulative sum |
| rate / irate | Growth rate (non-negative) |
| deriv | Derivative (allows negative) |
| increase | Increase amount (non-negative) |
| difference | Difference (allows negative) |
| moving_average | Moving average |
| top / bottom | TopN |
| fill | Null fill (recommended as outer) |
| abs / round / ceil / floor | Mathematical operations |
| set | Deduplicate and sort |
| concat | String concatenation |
Expression Calculation¶
| Function | Description |
|---|---|
| eval | Multi-query expression calculation |