Service Data Source Definitions and DQL Queries¶
Service Performance Data Source Definition¶
The TM
index space stores data related to service lists and performance metrics. The data on the APM - Metrics page is primarily queried from this index space. TM aggregates service metric data at three different granularities—minute, hour, and day—to improve query efficiency.
For example, to query all service metric data within a 15-minute period from 2024-03-19 15:00:00
to 2024-03-19 15:15:00
, you can use DQL:
This will return results similar to the following:
Sample Query Result (Click to Expand)
[
{
"time": 1710835681000,
"time_us": 1710835681000000,
"__docid": "T_cnskg71jdosvib6m44s0",
"__source": "service_list_1m",
"source": "service_list_1m",
"__namespace": "tracing",
"r_env": "demo",
"r_error_count": 0,
"r_max_duration": 2293857,
"r_psketch": "Av1KgVq/UvA/AAAAAAAAAAAJAbgL",
"r_request_count": 1,
"r_resp_time": 2293857,
"r_service": "go-profiling-demo-1",
"r_service_sub": "go-profiling-demo-1:demo:v0.8.888",
"r_type": "custom",
"r_version": "v0.8.888",
"create_time": 1710835740447,
"date": 1710835681000,
"date_ns": 0
},
{
"time": 1710835201000,
"time_us": 1710835201000000,
"__docid": "T_cnskcf9jdosvib6jl4kg",
"__source": "service_list_1m",
"source": "service_list_1m",
"__namespace": "tracing",
"r_env": "demo",
"r_error_count": 0,
"r_max_duration": 2370648,
"r_psketch": "Av1KgVq/UvA/AAAAAAAAAAAJAboL",
"r_request_count": 1,
"r_resp_time": 2370648,
"r_service": "go-profiling-demo-1",
"r_service_sub": "go-profiling-demo-1:demo:v0.8.888",
"r_type": "custom",
"r_version": "v0.8.888",
"create_time": 1710835261477,
"date": 1710835201000,
"date_ns": 0
}
]
The main field descriptions are as follows:
Field | Type | Description |
---|---|---|
source | string | Aggregation granularity of the data, divided into per minute (source="service_list_1m" ), per hour (source="service_list_1h" ), and per day (source="service_list_1d" ) |
r_env | string | Deployment environment of the service |
r_error_count | int | Number of service errors |
r_max_duration | int | Maximum response time within the time granularity, unit: microseconds |
r_request_count | int | Number of requests |
r_resp_time | int | Sum of response times within the time granularity |
r_service | string | Service name |
r_service_sub | string | <Service Name>:<Deployment Environment>:<Service Version> |
r_type | string | Service type, http/web/db/gateway... |
r_version | string | Service version |
date | int | Millisecond timestamp corresponding to zero seconds of each minute (hh:mm:00 when source="service_list_1m" ), zero minutes of each hour (hh:00 when source="service_list_1h" ), and midnight (00:00:00 when source="service_list_1d" ) |
Similarly, to query data for two hours from 2024-03-19 15:00:00
to 2024-03-19 17:00:00
, you can use DQL:
To query data for two days from 2024-03-19 00:00:00
to 2024-03-21 00:00:00
, you can use DQL:
Different time granularities can be combined, for example, to query data for two and a half hours from 2024-03-19 15:00:00
to 2024-03-19 17:30:00
, you can use DQL:
TM::`*`:(){ (source="service_list_1h" and date >= 1710831600000 and date < 1710838800000) or (source="service_list_1m" and date >= 1710838800000 and date <= 1710840600000) }
Note
Of course, you can also use the minute granularity source="service_list_1m"
to query data across hours or days, such as from 2024-03-19 15:00:00
to 2024-03-19 17:30:00
. However, this will significantly reduce query efficiency and increase the amount of returned data, so it is not recommended.
Further processing of the query results can calculate relevant service-level metrics, for example:
total_count = SUM(r_request_count)
error_count = SUM(r_error_count)
error_rate = SUM(r_error_count) / SUM(r_request_count)
max_duration = MAX(r_max_duration)
sum_resp_time = SUM(r_resp_time)
avg_per_second = SUM(r_request_count) / <query time range>
avg_resp_time = SUM(r_resp_time) / SUM(r_request_count)
p50: Create an array [(r_resp_time1/r_request_count1)...{repeat r_request_count1 times}, (r_resp_time2/r_request_count2)...{repeat r_request_count2 times}, (r_resp_time3/r_request_count3)...{repeat r_request_count3 times}, ...], sort the array, and take the index SUM(r_request_count)*0.5
For example, to query the QPS (queries per second) for each service over a certain time range, you can use:
TM::`*`:(r_service, sum(r_request_count) / (1737099000000 - 1737093600000) * 1000 as QPS){ (source="service_list_1h" and date >= 1737093600000 and date < 1737097200000) or (source="service_list_1m" and date >= 1737097200000 and date <= 1737099000000) } by r_service
Service Topology Data Source Definition¶
The TSM
index space primarily stores service call topology relationship data, which is pre-aggregated on a per-minute basis. For example, to query all service call relationships within a 15-minute period from 2024-03-19 15:00:00
to 2024-03-19 15:15:00
, you can use DQL:
This returns query results similar to the following:
Sample Query Result (Click to Expand)
[
{
"time": 1710835700438,
"time_us": 1710835700438064,
"__docid": "6340252d-331c-6e1dd9338-0ed04e818c4d",
"__source": "relationship",
"source_service": "go-profiling-demo-1",
"source_wsuuid": "wksp_8d351d83bdf14b8b8270ab75fe29a990",
"source_env": "demo",
"source_project": "",
"source_version": "v0.8.888",
"source_type": "custom",
"source_organization": "",
"source_status": "ok",
"source_start": 1710835700059433,
"source_duration": 220210272,
"target_service": "go-profiling-demo-2",
"target_wsuuid": "wksp_8d351d83bdf14b8b8270ab75fe29a990",
"target_env": "demo",
"target_project": "",
"target_version": "v0.8.888",
"target_type": "custom",
"target_organization": "",
"target_status": "ok",
"target_start": 1710835700438064,
"target_duration": 886040,
"count": 96,
"unique_id": "XTzHH-jScNjXgBSXNIdFcSOVpHwWKyAZroh71ttyPnXK9nl3jW0re0hlKOeHj6PYgo-profiling-demo-1go-profiling-demo-2",
"unique_id_env_version": "Zp-9KKvEb4m9aU0OeUMon8MiH2isxqXU742YFlVtokgL2Vy73NwtykkG3vA3X0z1go-profiling-demo-1go-profiling-demo-2",
"error_count": 0
},
{
"time": 1710835243699,
"time_us": 1710835243699376,
"__docid": "f4c7de67-ca1b-6e17fd78e-e8bcb4d1d64d",
"__source": "relationship",
"source_service": "go-profiling-demo-1",
"source_wsuuid": "wksp_8d351d83bdf14b8b8270ab75fe29a990",
"source_env": "demo",
"source_project": "",
"source_version": "v0.8.888",
"source_type": "custom",
"source_organization": "",
"source_status": "ok",
"source_start": 1710835243379392,
"source_duration": 227582208,
"target_service": "go-profiling-demo-2",
"target_wsuuid": "wksp_8d351d83bdf14b8b8270ab75fe29a990",
"target_env": "demo",
"target_project": "",
"target_version": "v0.8.888",
"target_type": "custom",
"target_organization": "",
"target_status": "ok",
"target_start": 1710835243699376,
"target_duration": 856217,
"count": 96,
"unique_id": "XTzHH-jScNjXgBSXNIdFcSOVpHwWKyAZroh71ttyPnXK9nl3jW0re0hlKOeHj6PYgo-profiling-demo-1go-profiling-demo-2",
"unique_id_env_version": "Zp-9KKvEb4m9aU0OeUMon8MiH2isxqXU742YFlVtokgL2Vy73NwtykkG3vA3X0z1go-profiling-demo-1go-profiling-demo-2",
"error_count": 0
}
]
Main field descriptions are as follows:
Field | Type | Description |
---|---|---|
time | int | Millisecond timestamp, the time when the service call occurred |
time_us | int | Time when the service call occurred, in microseconds |
source_service | string | Name of the calling service |
source_wsuuid | string | ID of the workspace where the calling service reports |
source_env | string | Deployment environment of the calling service |
source_project | string | Project name of the calling service |
source_version | string | Version of the calling service |
source_type | string | Type of the calling service |
source_organization | string | Organization where the calling service's reporting workspace is located |
source_status | string | Status of the calling service, ok/error |
source_start | int | Start time of the calling service span, in microseconds |
source_duration | int | Sum of durations of the calling service spans within each minute, unit: microseconds |
target_service | string | Name of the called service |
target_wsuuid | string | ID of the workspace where the called service reports |
target_env | string | Deployment environment of the called service |
target_project | string | Project name of the called service |
target_version | string | Version of the called service |
target_type | string | Type of the called service |
target_organization | string | Organization where the called service's reporting workspace is located |
target_status | string | Status of the called service, ok/error |
target_start | int | Start time of the called service span (time when the service call occurred), in microseconds |
target_duration | int | Sum of durations of the called service spans within each minute, unit: microseconds |
count | int | Number of calls within each minute |
unique_id | string | Unique ID generated only by the calling service name and the called service name |
unique_id_env_version | int | Unique ID generated based on the calling service, environment, version, and the called service, environment, and version |
error_count | int | Number of failed calls within each minute |
For service call relationships and service call-level metrics, you can use the following DQL to query the call relationships between services: