Datakit Tracing Data Structure
Introduction
This document explains the data structures of mainstream Telemetry platforms and their mapping relationships with the Datakit platform data structure. Currently supported data structures: DataDog/Jaeger/OpenTelemetry/SkyWalking/Zipkin/PinPoint
Data transformation steps:
- External Tracing data ingestion, where data is received via multiple protocols and then deserialized.
- Deserialized objects are converted to
Line Protocol
(line protocol format).
- Span data operations include: sampling, filtering, adding specific tags, etc.
Datakit Point Protocol Data Structure
Tag |
Description |
container_host |
Host name of container |
endpoint |
Endpoint of resource |
env |
Environment arguments |
http_host |
HTTP host |
http_method |
HTTP method |
http_route |
HTTP route |
http_status_code |
HTTP status code |
http_url |
HTTP URL |
operation |
Operation of resource |
pid |
Process ID |
project |
Project name |
service |
Service name |
source_type |
Source types [app/framework/cache/message_queue/custom/db/web/... ] |
span_type |
Span types |
status |
Span status |
Metric |
Description |
Type |
Unit |
create_time |
Guancedb storage create timestamp |
int |
s |
duration |
Span duration |
int |
us |
message |
Raw data content |
string |
|
parent_id |
Parent ID of span |
string |
|
priority |
Priority rules |
string |
|
resource |
Resource of service |
string |
|
span_id |
Span ID |
string |
|
start |
Span start timestamp |
int |
us |
time |
Datakit received timestamp |
int |
ns |
trace_id |
Trace ID |
string |
|
The span_type
indicates the relative position of the current Span within the Trace. The values are as follows:
entry
: The current API is the entry point, i.e., the first call after entering the service.
local
: The current API is between the entry and exit points.
exit
: The current API is the last call in the trace on the service.
unknown
: The relative position of the current API is unclear.
The priority
represents the client-side sampling priority rules:
PRIORITY_USER_REJECT = -1
User chooses to reject reporting
PRIORITY_AUTO_REJECT = 0
Client sampler chooses to reject reporting
PRIORITY_AUTO_KEEP = 1
Client sampler chooses to report
PRIORITY_USER_KEEP = 2
User chooses to report
OpenTelemetry Tracing Data Structure
When Datakit collects data sent from the OpenTelemetry Exporter (OTLP), the simplified raw data serialized in JSON format is as follows:
resource_spans:{
resource:{
attributes:{key:"message.type" value:{string_value:"message-name"}}
attributes:{key:"service.name" value:{string_value:"test-name"}}
}
instrumentation_library_spans:{instrumentation_library:{name:"test-tracer"}
spans:{
trace_id:"\x94<\xdf\x00zx\x82\xe7Wy\xfe\x93\xab\x19\x95a"
span_id:".\xbd\x06c\x10ɫ*"
parent_span_id:"\xa7*\x80Z#\xbeL\xf6"
name:"Sample-0"
kind:SPAN_KIND_INTERNAL
start_time_unix_nano:1644312397453313100
end_time_unix_nano:1644312398464865900
status:{}
}
spans:{
...
}
}
The correspondence between resource_spans
in OpenTelemetry and DKProto is as follows:
Field Name |
Data Type |
Unit |
Description |
Correspond To |
trace_id |
[16]byte |
|
Trace ID |
DKProto.TraceID |
span_id |
[8]byte |
|
Span ID |
DKProto.SpanID |
parent_span_id |
[8]byte |
|
Parent Span ID |
DKProto.ParentID |
name |
string |
|
Span Name |
DKProto.Operation |
kind |
string |
|
Span Type |
DKProto.SpanType |
start_time_unix_nano |
int64 |
Nanoseconds |
Span Start Time |
DKProto.Start |
end_time_unix_nano |
int64 |
Nanoseconds |
Span End Time |
DKProto.Duration = end - start |
status |
string |
|
Span Status |
DKProto.Status |
name |
string |
|
Resource Name |
DKProto.Resource |
resource.attributes |
map[string]string |
|
Resource Tags |
DKProto.tags.service, DKProto.tags.project, DKProto.tags.env, DKProto.tags.version, DKProto.tags.container_host, DKProto.tags.http_method, DKProto.tags.http_status_code |
span.attributes |
map[string]string |
|
Span Tags |
DKProto.tags |
OpenTelemetry has some unique fields that do not have corresponding fields in DKProto, so they are placed in the tags. These values are only displayed when non-zero, such as:
Field |
Data Type |
Unit |
Description |
Correspond |
span.dropped_attributes_count |
int |
|
Number of dropped span attributes |
DKProto.tags.dropped_attributes_count |
span.dropped_events_count |
int |
|
Number of dropped span events |
DKProto.tags.dropped_events_count |
span.dropped_links_count |
int |
|
Number of dropped span links |
DKProto.tags.dropped_links_count |
span.events_count |
int |
|
Number of associated span events |
DKProto.tags.events_count |
span.links_count |
int |
|
Number of associated spans |
DKProto.tags.links_count |
Jaeger Tracing Data Structure
Jaeger Thrift Protocol Batch Data Structure
Field Name |
Data Type |
Unit |
Description |
Corresponds To |
Process |
struct pointer |
|
Process-related data structure |
DKProto.Service |
SeqNo |
int64 pointer |
|
Sequence number |
No direct correspondence with DKProto |
Spans |
array |
|
Span array structure |
See table below |
Stats |
struct pointer |
|
Client statistics structure |
Does not directly correspond to DKProto |
Jaeger Thrift Protocol Span Data Structure
Field Name |
Data Type |
Unit |
Description |
Corresponds To |
TraceIdHigh |
int64 |
|
High bits of Trace ID combined with TraceIdLow to form Trace ID |
DKProto.TraceID |
TraceIdLow |
int64 |
|
Low bits of Trace ID combined with TraceIdHigh to form Trace ID |
DKProto.TraceID |
ParentSpanId |
int64 |
|
Parent Span ID |
DKProto.ParentID |
SpanId |
int64 |
|
Span ID |
DKProto.SpanID |
OperationName |
string |
|
Method name that generated this Span |
DKProto.Operation |
Flags |
int32 |
|
Span Flags |
Does not directly correspond to DKProto |
Logs |
array |
|
Span Logs |
Does not directly correspond to DKProto |
References |
array |
|
Span References |
Does not directly correspond to DKProto |
StartTime |
int64 |
Nanoseconds |
Span start time |
DKProto.Start |
Duration |
int64 |
Nanoseconds |
Duration |
DKProto.Duration |
Tags |
array |
|
Span Tags currently only take Span state fields |
DKProto.Status |
SkyWalking Tracing Data Data Structure
Segment Object Generated By Protobuf Protocol V3
Field Name |
Data Type |
Unit |
Description |
Corresponds To |
TraceId |
string |
|
Trace ID |
DKProto.TraceID |
TraceSegmentId |
string |
|
Segment ID used together with Span ID to uniquely identify a Span |
DKProto.SpanID high bits |
Service |
string |
|
Service name |
DKProto.Service |
ServiceInstance |
string |
|
Logical relationship name of node |
Unused field |
Spans |
array |
|
Tracing Span array |
See table below |
IsSizeLimited |
bool |
|
Whether all Spans on the path are included |
Unused field |
SkyWalking Span Object Data Structure in Segment Object
Field Name |
Data Type |
Unit |
Description |
Corresponds To |
ComponentId |
int32 |
|
Numerical definition of third-party frameworks |
Unused field |
Refs |
array |
|
Stores Parent Segment in cross-thread cross-process scenarios |
DKProto.ParentID high bits |
ParentSpanId |
int32 |
|
Parent Span ID used together with Segment ID to uniquely identify a Parent Span |
DKProto.ParentID low bits |
SpanId |
int32 |
|
Span ID used together with Segment ID to uniquely identify a Span |
DKProto.SpanID low bits |
OperationName |
string |
|
Span Operation Name |
DKProto.Operation |
Peer |
string |
|
Communication peer |
DKProto.Endpoint |
IsError |
bool |
|
Span status field |
DKProto.Status |
SpanType |
int32 |
|
Numerical definition of Span Type |
DKProto.SpanType |
StartTime |
int64 |
Milliseconds |
Span start time |
DKProto.Start |
EndTime |
int64 |
Milliseconds |
Span end time subtracted from StartTime represents duration |
DKProto.Duration |
Logs |
array |
|
Span Logs |
Unused field |
SkipAnalysis |
bool |
|
Skip backend analysis |
Unused field |
SpanLayer |
int32 |
|
Numerical definition of Span technology stack |
Unused field |
Tags |
array |
|
Span Tags |
Unused field |
Zipkin Tracing Data Data Structure
Zipkin Thrift Protocol Span Data Structure V1
Field Name |
Data Type |
Unit |
Description |
Corresponds To |
TraceIDHigh |
uint64 |
|
High bits of Trace ID |
No direct correspondence |
TraceID |
uint64 |
|
Trace ID |
DKProto.TraceID |
ID |
uint64 |
|
Span ID |
DKProto.SpanID |
ParentID |
uint64 |
|
Parent Span ID |
DKProto.ParentID |
Annotations |
array |
|
Get Service Name |
DKProto.Service |
Name |
string |
|
Span Operation Name |
DKProto.Operation |
BinaryAnnotations |
array |
|
Get Span status field |
DKProto.Status |
Timestamp |
uint64 |
Microseconds |
Span start time |
DKProto.Start |
Duration |
uint64 |
Microseconds |
Span duration |
DKProto.Duration |
Debug |
bool |
|
Debug status field |
Unused field |
Zipkin Span Data Structure V2
Field Name |
Data Type |
Unit |
Description |
Corresponds To |
TraceID |
struct |
|
Trace ID |
DKProto.TraceID |
ID |
uint64 |
|
Span ID |
DKProto.SpanID |
ParentID |
uint64 |
|
Parent Span ID |
DKProto.ParentID |
Name |
string |
|
Span Operation Name |
DKProto.Operation |
Debug |
bool |
|
Debug status |
Unused field |
Sampled |
bool |
|
Sampling status field |
Unused field |
Err |
string |
|
Error Message |
Does not directly correspond to DKProto |
Kind |
string |
|
Span Type |
DKProto.SpanType |
Timestamp |
struct |
Microseconds |
Microsecond-level time structure representing Span start time |
DKProto.Start |
Duration |
int64 |
Microseconds |
Span duration |
DKProto.Duration |
Shared |
bool |
|
Shared status |
Unused field |
LocalEndpoint |
struct |
|
Used to get Service Name |
DKProto.Service |
RemoteEndpoint |
struct |
|
Communication peer |
DKProto.Endpoint |
Annotations |
array |
|
Used to explain delay-related events |
Unused field |
Tags |
map |
|
Used to get Span status |
DKProto.Status |