Skip to content

Service Data Source Definitions and DQL Queries

Service Performance Data Source Definition

The TM index space stores data related to service lists and performance metrics. The data on the APM - Metrics page is primarily queried from this index space. TM aggregates service metric data at three different granularities—minute, hour, and day—to improve query efficiency.

For example, to query all service metric data within a 15-minute period from 2024-03-19 15:00:00 to 2024-03-19 15:15:00, you can use DQL:

TM::`*`:(){source="service_list_1m"} [1710831600000:1710832500000]

This will return results similar to the following:

Sample Query Result (Click to Expand)
[
  {
    "time": 1710835681000,
    "time_us": 1710835681000000,
    "__docid": "T_cnskg71jdosvib6m44s0",
    "__source": "service_list_1m",
    "source": "service_list_1m",
    "__namespace": "tracing",
    "r_env": "demo",
    "r_error_count": 0,
    "r_max_duration": 2293857,
    "r_psketch": "Av1KgVq/UvA/AAAAAAAAAAAJAbgL",
    "r_request_count": 1,
    "r_resp_time": 2293857,
    "r_service": "go-profiling-demo-1",
    "r_service_sub": "go-profiling-demo-1:demo:v0.8.888",
    "r_type": "custom",
    "r_version": "v0.8.888",
    "create_time": 1710835740447,
    "date": 1710835681000,
    "date_ns": 0
  },
  {
    "time": 1710835201000,
    "time_us": 1710835201000000,
    "__docid": "T_cnskcf9jdosvib6jl4kg",
    "__source": "service_list_1m",
    "source": "service_list_1m",
    "__namespace": "tracing",
    "r_env": "demo",
    "r_error_count": 0,
    "r_max_duration": 2370648,
    "r_psketch": "Av1KgVq/UvA/AAAAAAAAAAAJAboL",
    "r_request_count": 1,
    "r_resp_time": 2370648,
    "r_service": "go-profiling-demo-1",
    "r_service_sub": "go-profiling-demo-1:demo:v0.8.888",
    "r_type": "custom",
    "r_version": "v0.8.888",
    "create_time": 1710835261477,
    "date": 1710835201000,
    "date_ns": 0
  }
]

The main field descriptions are as follows:

Field Type Description
source string Aggregation granularity of the data, divided into per minute (source="service_list_1m"), per hour (source="service_list_1h"), and per day (source="service_list_1d")
r_env string Deployment environment of the service
r_error_count int Number of service errors
r_max_duration int Maximum response time within the time granularity, unit: microseconds
r_request_count int Number of requests
r_resp_time int Sum of response times within the time granularity
r_service string Service name
r_service_sub string <Service Name>:<Deployment Environment>:<Service Version>
r_type string Service type, http/web/db/gateway...
r_version string Service version
date int Millisecond timestamp corresponding to zero seconds of each minute (hh:mm:00 when source="service_list_1m"), zero minutes of each hour (hh:00 when source="service_list_1h"), and midnight (00:00:00 when source="service_list_1d")

Similarly, to query data for two hours from 2024-03-19 15:00:00 to 2024-03-19 17:00:00, you can use DQL:

TM::`*`:(){source="service_list_1h"} [1710831600000:1710838800000]

To query data for two days from 2024-03-19 00:00:00 to 2024-03-21 00:00:00, you can use DQL:

TM::`*`:(){source="service_list_1d"} [1710777600000:1710950400000]

Different time granularities can be combined, for example, to query data for two and a half hours from 2024-03-19 15:00:00 to 2024-03-19 17:30:00, you can use DQL:

TM::`*`:(){ (source="service_list_1h" and date >= 1710831600000 and date < 1710838800000) or (source="service_list_1m" and date >= 1710838800000 and date <= 1710840600000) }
Note

Of course, you can also use the minute granularity source="service_list_1m" to query data across hours or days, such as from 2024-03-19 15:00:00 to 2024-03-19 17:30:00. However, this will significantly reduce query efficiency and increase the amount of returned data, so it is not recommended.

Further processing of the query results can calculate relevant service-level metrics, for example:

total_count = SUM(r_request_count)
error_count = SUM(r_error_count)
error_rate = SUM(r_error_count) / SUM(r_request_count)
max_duration = MAX(r_max_duration)
sum_resp_time = SUM(r_resp_time)
avg_per_second = SUM(r_request_count) / <query time range>
avg_resp_time = SUM(r_resp_time) / SUM(r_request_count)
p50: Create an array [(r_resp_time1/r_request_count1)...{repeat r_request_count1 times}, (r_resp_time2/r_request_count2)...{repeat r_request_count2 times}, (r_resp_time3/r_request_count3)...{repeat r_request_count3 times}, ...], sort the array, and take the index SUM(r_request_count)*0.5

For example, to query the QPS (queries per second) for each service over a certain time range, you can use:

TM::`*`:(r_service, sum(r_request_count) / (1737099000000 - 1737093600000) * 1000 as QPS){ (source="service_list_1h" and date >= 1737093600000 and date < 1737097200000) or (source="service_list_1m" and date >= 1737097200000 and date <= 1737099000000) } by r_service

Service Topology Data Source Definition

The TSM index space primarily stores service call topology relationship data, which is pre-aggregated on a per-minute basis. For example, to query all service call relationships within a 15-minute period from 2024-03-19 15:00:00 to 2024-03-19 15:15:00, you can use DQL:

TSM::`*`:(){} [1710831600000:1710832500000]

This returns query results similar to the following:

Sample Query Result (Click to Expand)
[
    {
        "time": 1710835700438,
        "time_us": 1710835700438064,
        "__docid": "6340252d-331c-6e1dd9338-0ed04e818c4d",
        "__source": "relationship",
        "source_service": "go-profiling-demo-1",
        "source_wsuuid": "wksp_8d351d83bdf14b8b8270ab75fe29a990",
        "source_env": "demo",
        "source_project": "",
        "source_version": "v0.8.888",
        "source_type": "custom",
        "source_organization": "",
        "source_status": "ok",
        "source_start": 1710835700059433,
        "source_duration": 220210272,
        "target_service": "go-profiling-demo-2",
        "target_wsuuid": "wksp_8d351d83bdf14b8b8270ab75fe29a990",
        "target_env": "demo",
        "target_project": "",
        "target_version": "v0.8.888",
        "target_type": "custom",
        "target_organization": "",
        "target_status": "ok",
        "target_start": 1710835700438064,
        "target_duration": 886040,
        "count": 96,
        "unique_id": "XTzHH-jScNjXgBSXNIdFcSOVpHwWKyAZroh71ttyPnXK9nl3jW0re0hlKOeHj6PYgo-profiling-demo-1go-profiling-demo-2",
        "unique_id_env_version": "Zp-9KKvEb4m9aU0OeUMon8MiH2isxqXU742YFlVtokgL2Vy73NwtykkG3vA3X0z1go-profiling-demo-1go-profiling-demo-2",
        "error_count": 0
    },
    {
        "time": 1710835243699,
        "time_us": 1710835243699376,
        "__docid": "f4c7de67-ca1b-6e17fd78e-e8bcb4d1d64d",
        "__source": "relationship",
        "source_service": "go-profiling-demo-1",
        "source_wsuuid": "wksp_8d351d83bdf14b8b8270ab75fe29a990",
        "source_env": "demo",
        "source_project": "",
        "source_version": "v0.8.888",
        "source_type": "custom",
        "source_organization": "",
        "source_status": "ok",
        "source_start": 1710835243379392,
        "source_duration": 227582208,
        "target_service": "go-profiling-demo-2",
        "target_wsuuid": "wksp_8d351d83bdf14b8b8270ab75fe29a990",
        "target_env": "demo",
        "target_project": "",
        "target_version": "v0.8.888",
        "target_type": "custom",
        "target_organization": "",
        "target_status": "ok",
        "target_start": 1710835243699376,
        "target_duration": 856217,
        "count": 96,
        "unique_id": "XTzHH-jScNjXgBSXNIdFcSOVpHwWKyAZroh71ttyPnXK9nl3jW0re0hlKOeHj6PYgo-profiling-demo-1go-profiling-demo-2",
        "unique_id_env_version": "Zp-9KKvEb4m9aU0OeUMon8MiH2isxqXU742YFlVtokgL2Vy73NwtykkG3vA3X0z1go-profiling-demo-1go-profiling-demo-2",
        "error_count": 0
    }
]

Main field descriptions are as follows:

Field Type Description
time int Millisecond timestamp, the time when the service call occurred
time_us int Time when the service call occurred, in microseconds
source_service string Name of the calling service
source_wsuuid string ID of the workspace where the calling service reports
source_env string Deployment environment of the calling service
source_project string Project name of the calling service
source_version string Version of the calling service
source_type string Type of the calling service
source_organization string Organization where the calling service's reporting workspace is located
source_status string Status of the calling service, ok/error
source_start int Start time of the calling service span, in microseconds
source_duration int Sum of durations of the calling service spans within each minute, unit: microseconds
target_service string Name of the called service
target_wsuuid string ID of the workspace where the called service reports
target_env string Deployment environment of the called service
target_project string Project name of the called service
target_version string Version of the called service
target_type string Type of the called service
target_organization string Organization where the called service's reporting workspace is located
target_status string Status of the called service, ok/error
target_start int Start time of the called service span (time when the service call occurred), in microseconds
target_duration int Sum of durations of the called service spans within each minute, unit: microseconds
count int Number of calls within each minute
unique_id string Unique ID generated only by the calling service name and the called service name
unique_id_env_version int Unique ID generated based on the calling service, environment, version, and the called service, environment, and version
error_count int Number of failed calls within each minute

For service call relationships and service call-level metrics, you can use the following DQL to query the call relationships between services:

TSM::`*`:(first(source_service) as source_service,
first(source_wsuuid) as source_wsuuid,
first(target_service) as target_service,
first(target_wsuuid) as target_wsuuid,
sum(count) as total_count,
sum(error_count) as total_error_count,
sum(target_duration) as total_duration
){} by unique_id

Feedback

Is this page helpful? ×