Service Data Source Definitions and DQL Queries¶
Service Performance Data Source Definitions¶
The TM index space stores data related to service lists and performance Metrics. The data displayed on APM > Performance Metrics is primarily queried from this index space. TM aggregates service Metrics data for each service at three different granularities: minute, hour, and day, to improve query efficiency.
For example, to query all service Metrics data for a 15-minute period from 2024-03-19 15:00:00 to 2024-03-19 15:15:00, you can use the DQL:
This will return results similar to the following:
Query Result Example
[
{
"time": 1710835681000,
"time_us": 1710835681000000,
"__docid": "T_cnskg71jdosvib6m44s0",
"__source": "service_list_1m",
"source": "service_list_1m",
"__namespace": "tracing",
"r_env": "demo",
"r_error_count": 0,
"r_max_duration": 2293857,
"r_psketch": "Av1KgVq/UvA/AAAAAAAAAAAJAbgL",
"r_request_count": 1,
"r_resp_time": 2293857,
"r_service": "go-profiling-demo-1",
"r_service_sub": "go-profiling-demo-1:demo:v0.8.888",
"r_type": "custom",
"r_version": "v0.8.888",
"create_time": 1710835740447,
"date": 1710835681000,
"date_ns": 0
},
{
"time": 1710835201000,
"time_us": 1710835201000000,
"__docid": "T_cnskcf9jdosvib6jl4kg",
"__source": "service_list_1m",
"source": "service_list_1m",
"__namespace": "tracing",
"r_env": "demo",
"r_error_count": 0,
"r_max_duration": 2370648,
"r_psketch": "Av1KgVq/UvA/AAAAAAAAAAAJAboL",
"r_request_count": 1,
"r_resp_time": 2370648,
"r_service": "go-profiling-demo-1",
"r_service_sub": "go-profiling-demo-1:demo:v0.8.888",
"r_type": "custom",
"r_version": "v0.8.888",
"create_time": 1710835261477,
"date": 1710835201000,
"date_ns": 0
}
]
The main fields are described below:
| Field | Type | Description |
|---|---|---|
source |
string | Data aggregation granularity, divided into:source="service_list_1m")source="service_list_1h")source="service_list_1d") |
r_env |
string | Service deployment environment |
r_error_count |
int | Number of service errors |
r_max_duration |
int | Maximum response time within the time granularity, unit: microseconds |
r_request_count |
int | Number of requests |
r_resp_time |
int | Sum of response times aggregated within the time granularity |
r_service |
string | Service name |
r_service_sub |
string | |
r_type |
string | Service type, e.g., http/web/db/gateway... |
r_version |
string | Service version |
date |
int | Millisecond timestamp, corresponding to:hh:mm:00, when source="service_list_1m")hh:00, when source="service_list_1h")00:00:00, when source="service_list_1d") |
Similarly, to query data for a two-hour period from 2024-03-19 15:00:00 to 2024-03-19 17:00:00, you can use the DQL:
To query data for a two-day period from 2024-03-19 00:00:00 to 2024-03-21 00:00:00, you can use the DQL:
Different time granularities can be combined. For example, to query data for a two-and-a-half-hour period from 2024-03-19 15:00:00 to 2024-03-19 17:30:00, you can use the DQL:
TM::`*`:(){ (source="service_list_1h" and date >= 1710831600000 and date < 1710838800000) or (source="service_list_1m" and date >= 1710838800000 and date <= 1710840600000) }
Abstract
Of course, you could also use only the minute granularity source="service_list_1m" to query the entire time range from 2024-03-19 15:00:00 to 2024-03-19 17:30:00, even if it spans hours or days. However, this would significantly reduce query efficiency and exponentially increase the amount of data returned, so it is highly discouraged.
Further processing the query results allows calculation of relevant service-level Metrics, for example:
total_count = SUM(r_request_count)
error_count = SUM(r_error_count)
error_rate = SUM(r_error_count) / SUM(r_request_count)
max_duration = MAX(r_max_duration)
sum_resp_time = SUM(r_resp_time)
avg_per_second = SUM(r_request_count) / <query time range in seconds>
avg_resp_time = SUM(r_resp_time) / SUM(r_request_count)
p50: Generate an array [(r_resp_time1/r_request_count1)...{repeat r_request_count1 times}, (r_resp_time2/r_request_count2)...{repeat r_request_count2 times}, (r_resp_time3/r_request_count3)...{repeat r_request_count3 times}, ...], sort the array, and take the element at index SUM(r_request_count)*0.5
For example, to query the QPS (queries per second) for each service over a period of time:
TM::`*`:(r_service, sum(r_request_count) / (1737099000000 - 1737093600000) * 1000 as QPS){ (source="service_list_1h" and date >= 1737093600000 and date < 1737097200000) or (source="service_list_1m" and date >= 1737097200000 and date <= 1737099000000) } by r_service
Service Topology Data Source Definitions¶
The TSM index space primarily stores data on the calling relationships between services, pre-aggregated at a minute granularity. For example, to query all service call relationships for a 15-minute period from 2024-03-19 15:00:00 to 2024-03-19 15:15:00, you can use the DQL:
This returns query results similar to the following:
Query Result Example
[
{
"time": 1710835700438,
"time_us": 1710835700438064,
"__docid": "6340252d-331c-6e1dd9338-0ed04e818c4d",
"__source": "relationship",
"source_service": "go-profiling-demo-1",
"source_wsuuid": "wksp_8d351d83bdf14b8b8270ab75fe29a990",
"source_env": "demo",
"source_project": "",
"source_version": "v0.8.888",
"source_type": "custom",
"source_organization": "",
"source_status": "ok",
"source_start": 1710835700059433,
"source_duration": 220210272,
"target_service": "go-profiling-demo-2",
"target_wsuuid": "wksp_8d351d83bdf14b8b8270ab75fe29a990",
"target_env": "demo",
"target_project": "",
"target_version": "v0.8.888",
"target_type": "custom",
"target_organization": "",
"target_status": "ok",
"target_start": 1710835700438064,
"target_duration": 886040,
"count": 96,
"unique_id": "XTzHH-jScNjXgBSXNIdFcSOVpHwWKyAZroh71ttyPnXK9nl3jW0re0hlKOeHj6PYgo-profiling-demo-1go-profiling-demo-2",
"unique_id_env_version": "Zp-9KKvEb4m9aU0OeUMon8MiH2isxqXU742YFlVtokgL2Vy73NwtykkG3vA3X0z1go-profiling-demo-1go-profiling-demo-2",
"error_count": 0
},
{
"time": 1710835243699,
"time_us": 1710835243699376,
"__docid": "f4c7de67-ca1b-6e17fd78e-e8bcb4d1d64d",
"__source": "relationship",
"source_service": "go-profiling-demo-1",
"source_wsuuid": "wksp_8d351d83bdf14b8b8270ab75fe29a990",
"source_env": "demo",
"source_project": "",
"source_version": "v0.8.888",
"source_type": "custom",
"source_organization": "",
"source_status": "ok",
"source_start": 1710835243379392,
"source_duration": 227582208,
"target_service": "go-profiling-demo-2",
"target_wsuuid": "wksp_8d351d83bdf14b8b8270ab75fe29a990",
"target_env": "demo",
"target_project": "",
"target_version": "v0.8.888",
"target_type": "custom",
"target_organization": "",
"target_status": "ok",
"target_start": 1710835243699376,
"target_duration": 856217,
"count": 96,
"unique_id": "XTzHH-jScNjXgBSXNIdFcSOVpHwWKyAZroh71ttyPnXK9nl3jW0re0hlKOeHj6PYgo-profiling-demo-1go-profiling-demo-2",
"unique_id_env_version": "Zp-9KKvEb4m9aU0OeUMon8MiH2isxqXU742YFlVtokgL2Vy73NwtykkG3vA3X0z1go-profiling-demo-1go-profiling-demo-2",
"error_count": 0
}
]
The main fields are described below:
| Field | Type | Description |
|---|---|---|
time |
int | Millisecond timestamp, time the service call occurred |
time_us |
int | Time the service call occurred, microsecond precision |
source_service |
string | Name of the calling service |
source_wsuuid |
string | Workspace ID from which the calling service reported |
source_env |
string | Deployment environment of the calling service |
source_project |
string | Project name of the calling service |
source_version |
string | Version of the calling service |
source_type |
string | Type of the calling service |
source_organization |
string | Organization to which the workspace reporting the calling service belongs |
source_status |
string | Status of the calling service, ok/error |
source_start |
int | Start time of the calling service's Span, microsecond timestamp |
source_duration |
int | Sum of calling service Span durations per minute, unit: microseconds |
target_service |
string | Name of the called service |
target_wsuuid |
string | Workspace ID from which the called service reported |
target_env |
string | Deployment environment of the called service |
target_project |
string | Project name of the called service |
target_version |
string | Version of the called service |
target_type |
string | Type of the called service |
target_organization |
string | Organization to which the workspace reporting the called service belongs |
target_status |
string | Status of the called service, ok/error |
target_start |
int | Start time of the called service's Span (time the service call occurred), microsecond timestamp |
target_duration |
int | Sum of called service Span durations per minute, unit: microseconds |
count |
int | Number of calls per minute |
unique_id |
string | Unique ID generated only from the calling service name and the called service name |
unique_id_env_version |
int | Unique ID generated by distinguishing calling service, environment, version and called service, environment, version |
error_count |
int | Number of failed calls per minute |
For service call relationships and service call level Metrics, you can use the following DQL to query the relationships between services: