OpenTelemetry
OpenTelemetry (hereinafter referred to as OTEL) is an observability project under CNCF (Cloud Native Computing Foundation). It aims to provide a standardized solution in the field of observability, addressing standardization issues related to the data model, collection, processing, and export of observability data.
OTEL is a collection of standards and tools designed to manage observability data such as traces, metrics, and logs. This document describes how to configure and enable OTEL data ingestion on DataKit, as well as best practices for Java and Go.
Configuration¶
Navigate to the conf.d/opentelemetry
directory under the DataKit installation directory, copy opentelemetry.conf.sample
and rename it to opentelemetry.conf
. An example is as follows:
[[inputs.opentelemetry]]
## customer_tags will work as a whitelist to prevent tags send to data center.
## All . will replace to _ ,like this :
## "project.name" to send to center is "project_name"
# customer_tags = ["sink_project", "custom.otel.tag"]
## If set to true, all Attributes will be extracted and message.Attributes will be empty.
# customer_tags_all = false
## Keep rare tracing resources list switch.
## If some resources are rare enough(not presend in 1 hour), those resource will always send
## to data center and do not consider samplers and filters.
# keep_rare_resource = false
## By default every error presents in span will be send to data center and omit any filters or
## sampler. If you want to get rid of some error status, you can set the error status list here.
# omit_err_status = ["404"]
## compatible ddtrace: It is possible to compatible OTEL Trace with DDTrace trace
# compatible_ddtrace=false
## split service.name form xx.system.
## see: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/database-spans.md
split_service_name = true
## delete trace message
# del_message = true
## logging message data max length,default is 500kb
log_max = 500
## JSON marshaler: set JSON marshaler. available marshaler are:
## gojson/jsoniter/protojson
##
## For better performance, gojson and jsoniter is better than protojson,
## for compatible reason we still use protojson as default.
jmarshaler = "protojson"
## cleaned the top-level fields in message. Default true
clean_message = true
## tracing_metric_enable: trace_hits trace_hits_by_http_status trace_latency trace_errors trace_errors_by_http_status trace_apdex.
## Extract the above metrics from the collection traces.
# tracing_metric_enable = true
## Blacklist of metric tags: There are many labels in the metric: "tracing_metrics".
## If you want to remove certain tag, you can use the blacklist to remove them.
## By default, it includes: source,span_name,env,service,status,version,resource,http_status_code,http_status_class
## and "customer_tags", k8s related tags, and others service.
# tracing_metric_tag_blacklist = ["resource", "operation", "tag_a", "tag_b"]
## Ignore tracing resources map like service:[resources...].
## The service name is the full service name in current application.
## The resource list is regular expressions uses to block resource names.
## If you want to block some resources universally under all services, you can set the
## service name as "*". Note: double quotes "" cannot be omitted.
# [inputs.opentelemetry.close_resource]
# service1 = ["resource1", "resource2", ...]
# service2 = ["resource1", "resource2", ...]
# "*" = ["close_resource_under_all_services"]
# ...
## Sampler config uses to set global sampling strategy.
## sampling_rate used to set global sampling rate.
# [inputs.opentelemetry.sampler]
# sampling_rate = 1.0
# [inputs.opentelemetry.tags]
# key1 = "value1"
# key2 = "value2"
# ...
## Threads config controls how many goroutines an agent cloud start to handle HTTP request.
## buffer is the size of jobs' buffering of worker channel.
## threads is the total number fo goroutines at running time.
# [inputs.opentelemetry.threads]
# buffer = 100
# threads = 8
## Storage config a local storage space in hard dirver to cache trace data.
## path is the local file path used to cache data.
## capacity is total space size(MB) used to store data.
# [inputs.opentelemetry.storage]
# path = "./otel_storage"
# capacity = 5120
## OTEL agent HTTP config for trace and metrics
## If enable set to be true, trace and metrics will be received on path respectively, by default is:
## trace : /otel/v1/traces
## metric: /otel/v1/metrics
## and the client side should be configured properly with Datakit listening port(default: 9529)
## or custom HTTP request path.
## for example http://127.0.0.1:9529/otel/v1/traces
## The acceptable http_status_ok values will be 200 or 202.
[inputs.opentelemetry.http]
http_status_ok = 200
trace_api = "/otel/v1/traces"
metric_api = "/otel/v1/metrics"
logs_api = "/otel/v1/logs"
## OTEL agent GRPC config for trace and metrics.
## GRPC services for trace and metrics can be enabled respectively as setting either to be true.
## add is the listening on address for GRPC server.
[inputs.opentelemetry.grpc]
addr = "127.0.0.1:4317"
max_payload = 16777216 # default 16MiB
## If 'expected_headers' is well configed, then the obligation of sending certain wanted HTTP headers is on the client side,
## otherwise HTTP status code 400(bad request) will be provoked.
## Note: expected_headers will be effected on both trace and metrics if setted up.
# [inputs.opentelemetry.expected_headers]
# ex_version = "1.2.3"
# ex_name = "env_resource_name"
# ...
After configuration, restart DataKit to take effect.
You can enable the collector by injecting collector configuration via ConfigMap or configuring ENV_DATAKIT_INPUTS.
You can also modify configuration parameters via environment variables (you need to add the collector to ENV_DEFAULT_ENABLED_INPUTS as a default collector):
-
ENV_INPUT_OTEL_CUSTOMER_TAGS
Whitelist to tags
Type: JSON
input.conf:
customer_tags
Example:
'["project_id", "custom.tag"]'
-
ENV_INPUT_OTEL_CUSTOMER_TAGS_ALL
extracted all attributes to tags
Type: Boolean
input.conf:
customer_tags_all
Default: false
-
ENV_INPUT_OTEL_KEEP_RARE_RESOURCE
Keep rare tracing resources list switch
Type: Boolean
input.conf:
keep_rare_resource
Default: false
-
ENV_INPUT_OTEL_COMPATIBLE_DD_TRACE
Convert trace_id to decimal, compatible with DDTrace
Type: Boolean
input.conf:
compatible_dd_trace
Default: false
-
ENV_INPUT_OTEL_SPLIT_SERVICE_NAME
Get xx.system from span.Attributes to replace service name
Type: Boolean
input.conf:
split_service_name
Default: false
-
ENV_INPUT_OTEL_TRACING_METRIC_ENABLE
These metrics capture request counts, error counts, and latency measures.
Type: Boolean
input.conf:
tracing_metric_enable
Default: false
-
ENV_INPUT_OTEL_TRACING_METRIC_TAG_BLACKLIST
Blacklist of tags in the metric:
tracing_metrics
Type: JSON
input.conf:
tracing_metric_tag_blacklist
Example:
'["tag_a", "tag_b"]'
-
ENV_INPUT_OTEL_DEL_MESSAGE
Delete trace message
Type: Boolean
input.conf:
del_message
Default: false
-
ENV_INPUT_OTEL_OMIT_ERR_STATUS
Whitelist to error status
Type: JSON
input.conf:
omit_err_status
Example:
'["404", "403", "400"]'
-
ENV_INPUT_OTEL_CLOSE_RESOURCE
Ignore tracing resources that service (regular)
Type: JSON
input.conf:
close_resource
Example:
'{"service1":["resource1","other"],"service2":["resource2","other"]}'
-
ENV_INPUT_OTEL_SAMPLER
Global sampling rate
Type: Float
input.conf:
sampler
Example: 0.3
-
ENV_INPUT_OTEL_THREADS
Total number of threads and buffer
Type: JSON
input.conf:
threads
Example:
'{"buffer":1000, "threads":100}'
-
ENV_INPUT_OTEL_STORAGE
Local cache file path and size (MB)
Type: JSON
input.conf:
storage
Example:
'{"storage":"./otel_storage", "capacity": 5120}'
-
ENV_INPUT_OTEL_HTTP
HTTP agent config
Type: JSON
input.conf:
http
Example:
'{"enable":true, "http_status_ok": 200, "trace_api": "/otel/v1/traces", "metric_api": "/otel/v1/metrics"}'
-
ENV_INPUT_OTEL_GRPC
GRPC agent config
Type: JSON
input.conf:
grpc
Example:
'{"addr": "127.0.0.1:4317", "max_payload": 16777216 }'
-
ENV_INPUT_OTEL_EXPECTED_HEADERS
If
expected_headers
is well config, then the obligation of sending certain wanted HTTP headers is on the client sideType: JSON
input.conf:
expected_headers
Example:
'{"ex_version": "1.2.3", "ex_name": "env_resource_name"}'
-
ENV_INPUT_OTEL_CLEAN_MESSAGE
Clean message generate smaller
message
fieldType: Boolean
input.conf:
clean_message
Example:
true/false
-
ENV_INPUT_OTEL_TAGS
Customize tags. If there is a tag with the same name in the configuration file, it will be overwritten
Type: JSON
input.conf:
tags
Example:
'{"k1":"v1", "k2":"v2", "k3":"v3"}'
Notes¶
- It is recommended to use the gRPC protocol, as gRPC offers advantages such as high compression rate, fast serialization, and higher efficiency.
- Starting from DataKit version 1.10.0, the routes for the HTTP protocol are configurable. The default request paths (for Trace/Metric) are
/otel/v1/traces
,/otel/v1/logs
, and/otel/v1/metrics
respectively. - For
float/double
type data, a maximum of two decimal places will be retained. - Both HTTP and gRPC support the gzip compression format. You can configure an environment variable in the exporter to enable it:
OTEL_EXPORTER_OTLP_COMPRESSION = gzip
; gzip is disabled by default. - The HTTP protocol request format supports both JSON and Protobuf serialization formats. However, gRPC only supports the Protobuf format.
Warning
- The service name in DDTrace trace data is named based on the service name or referenced third-party libraries, while the service name of the OTEL collector is defined by
otel.service.name
. - To display service names separately, an additional field configuration is added:
spilt_service_name = true
. - The service name is extracted from the tags in the trace data. For example, if the DB-type tag is
db.system=mysql
, the service name will bemysql
. For message queue types (e.g.,messaging.system=kafka
), the service name will bekafka
. - By default, the service name is extracted from these three tags:
db.system/rpc.system/messaging.system
.
Note the environment variable configuration when using the OTEL HTTP exporter. Since the default configuration of DataKit uses /otel/v1/traces
, /otel/v1/logs
, and /otel/v1/metrics
, you need to configure trace
and metric
separately if you want to use the HTTP protocol.
Agent V2 Version¶
The V2 version uses otlp exporter
by default, changing the previous grpc
to http/protobuf
. You can set it via the command -Dotel.exporter.otlp.protocol=grpc
, or use the default http/protobuf
.
If using HTTP, the path for each exporter needs to be explicitly configured. For example:
java -javaagent:/usr/local/ddtrace/opentelemetry-javaagent-2.5.0.jar \
-Dotel.exporter=otlp \
-Dotel.exporter.otlp.protocol=http/protobuf \
-Dotel.exporter.otlp.logs.endpoint=http://localhost:9529/otel/v1/logs \
-Dotel.exporter.otlp.traces.endpoint=http://localhost:9529/otel/v1/traces \
-Dotel.exporter.otlp.metrics.endpoint=http://localhost:9529/otel/v1/metrics \
-Dotel.service.name=app \
-jar app.jar
If using the gRPC protocol, explicit configuration is required; otherwise, the default HTTP protocol will be used:
java -javaagent:/usr/local/ddtrace/opentelemetry-javaagent-2.5.0.jar \
-Dotel.exporter=otlp \
-Dotel.exporter.otlp.protocol=grpc \
-Dotel.exporter.otlp.endpoint=http://localhost:4317 \
-Dotel.service.name=app \
-jar app.jar
Logging is enabled by default. To disable log collection, set the exporter configuration to empty: -Dotel.logs.exporter=none
.
For more major changes in the V2 version, refer to the official documentation or GitHub release notes: Github-v2.0.0
Common Commands¶
The following configurations are commonly used when starting an application:
ENV (Corresponding Command) | Description |
---|---|
OTEL_SDK_DISABLED(otel.sdk.disabled) |
Disable the SDK; default is false . No trace metrics will be generated after disabling. |
OTEL_RESOURCE_ATTRIBUTES(otel.resource.attributes) |
Add global custom tags. These custom tags will be included in each span. Example: service.name=App,project=app-a |
OTEL_SERVICE_NAME(otel.service.name) |
Set the service name; it has higher priority than custom tags. |
OTEL_LOG_LEVEL(otel.log.level) |
Log level; default is info . |
OTEL_PROPAGATORS(otel.propagators) |
Set the propagation protocol; default is tracecontext,baggage . |
OTEL_TRACES_SAMPLER(otel.traces.sampler) |
Set the sampler type. |
OTEL_TRACES_SAMPLER_ARG(otel.traces.sampler.arg) |
Used with the above sampler parameter; value range is 0~1.0; default is 1.0 . |
OTEL_EXPORTER_OTLP_PROTOCOL(otel.exporter.otlp.protocol) |
Set the transmission protocol; default is grpc ; optional values are grpc,http/protobuf,http/json . |
OTEL_EXPORTER_OTLP_ENDPOINT(otel.exporter.otlp.endpoint) |
Set the Trace upload address; it should be set to the DataKit address: http://datakit-endpoint:9529/otel/v1/traces . |
OTEL_TRACES_EXPORTER(otel.traces.exporter) |
Trace exporter; default is otlp . |
OTEL_LOGS_EXPORTER(otel.logs.exporter) |
Log exporter; default is otlp . Note: Explicit configuration is required for OTEL V1 version; otherwise, it is disabled by default. |
You can pass the
otel.javaagent.debug=true
parameter to the Agent to view debug logs. Note that these logs are quite verbose; use them with caution in production environments.
Trace Sampling¶
You can use head-based sampling or tail-based sampling. For details, refer to the two best practice documents:
- Tail-based sampling with collector: OpenTelemetry Sampling Best Practices
- Head-based sampling on the Agent side: OpenTelemetry Java Agent Sampling Strategy
Tag Extraction¶
Starting from DataKit version 1.22.0, the blacklist function is deprecated. A fixed tag list is added, and only tags in this list will be extracted into top-level tags. The fixed list is as follows:
Attributes | Tags | Description |
---|---|---|
http.url |
http_url |
Full HTTP request path |
http.hostname |
http_hostname |
Hostname |
http.route |
http_route |
Route |
http.status_code |
http_status_code |
Status code |
http.request.method |
http_request_method |
Request method |
http.method |
http_method |
Same as above |
http.client_ip |
http_client_ip |
Client IP |
http.scheme |
http_scheme |
Request protocol |
url.full |
url_full |
Full request URL |
url.scheme |
url_scheme |
Request protocol |
url.path |
url_path |
Request path |
url.query |
url_query |
Request parameters |
span_kind |
span_kind |
Span type |
db.system |
db_system |
Span type |
db.operation |
db_operation |
DB action |
db.name |
db_name |
Database name |
db.statement |
db_statement |
Detailed information |
server.address |
server_address |
Service address |
net.host.name |
net_host_name |
Requested host |
server.port |
server_port |
Service port number |
net.host.port |
net_host_port |
Same as above |
network.peer.address |
network_peer_address |
Network address |
network.peer.port |
network_peer_port |
Network port |
network.transport |
network_transport |
Protocol |
messaging.system |
messaging_system |
Message queue name |
messaging.operation |
messaging_operation |
Message action |
messaging.message |
messaging_message |
Message |
messaging.destination |
messaging_destination |
Message details |
rpc.service |
rpc_service |
RPC service address |
rpc.system |
rpc_system |
RPC service name |
error |
error |
Whether an error occurred |
error.message |
error_message |
Error message |
error.stack |
error_stack |
Stack trace information |
error.type |
error_type |
Error type |
error.msg |
error_message |
Error message |
project |
project |
Project |
version |
version |
Version |
env |
env |
Environment |
host |
host |
Host tag in Attributes |
pod_name |
pod_name |
pod_name tag in Attributes |
pod_namespace |
pod_namespace |
pod_namespace tag in Attributes |
To add custom tags, use the following environment variable:
Span kind¶
All spans have the span_kind
tag, which has 6 attributes:
unspecified
: Not set.internal
: Internal span or child span type.server
: WEB service, RPC service, etc.client
: Client type.producer
: Message producer.consumer
: Message consumer.
Metric Collection¶
The OpenTelemetry Java Agent obtains MBean metric information from applications via the JMX protocol. The Java Agent reports selected JMX metrics through the internal SDK, which means all metrics are configurable.
You can enable or disable JMX metric reporting using the command otel.jmx.enabled=true/false
(enabled by default). To control the time interval between MBean detection attempts, use the otel.jmx.discovery.delay
command. This attribute defines the interval in milliseconds between the first and subsequent detection cycles.
In addition, the Agent has built-in collection configurations for some third-party software. For details, refer to: GitHub OTEL JMX Metric
We have implemented special handling for Histogram metrics:
-
OpenTelemetry histogram buckets are directly mapped to Prometheus histogram buckets.
-
The count of each bucket is converted to the Prometheus cumulative count format. For example, OpenTelemetry buckets
[0, 10)
,[10, 50)
,[50, 100)
are converted to Prometheus_bucket
metrics with thele
tag:
-
The total number of observations in the OpenTelemetry histogram is converted to the Prometheus
_count
metric. -
The sum of the OpenTelemetry histogram is converted to the Prometheus
_sum
metric, and_max
and_min
are also added.
All metrics ending with _bucket
are histogram data, and there must be corresponding metrics ending with _max
, _min
, _count
, and sum
.
You can use the le (less than or equal)
tag to categorize histogram data and filter based on tags. For all metrics and tags, refer to OpenTelemetry Metrics.
This conversion enables seamless integration of histogram data collected by OpenTelemetry into Prometheus, allowing you to leverage Prometheus' powerful query and visualization capabilities for analysis.
Log Collection¶
Currently, the JAVA Agent supports collecting stdout
logs and sending them to DataKit via the otlp
protocol using the Standard output method.
By default, log collection is disabled for OTEL Agent V1. Explicit commands are required to enable it. The enabling methods are as follows:
# env
export OTEL_LOGS_EXPORTER=OTLP
export OTEL_EXPORTER_OTLP.ENDPOINT=http://<DataKit Addr>:4317
java -jar app.jar
# command
java -javaagent:/path/to/agnet.jar \
-otel.logs.exporter=otlp \
-Dotel.exporter.otlp.endpoint=http://<DataKit Addr>:4317 \
-jar app.jar
By default, the maximum length of log content is 500KB. Content exceeding this limit will be split into multiple logs. The maximum length of log tags is 32KB (this field is not configurable), and content exceeding this limit will be truncated.
The source
of logs collected via OTEL is the service name. You can also customize it by adding a tag: log.source
. For example: -Dotel.resource.attributes="log.source=source_name"
.
Note: If the app runs in a container environment (e.g., k8s), DataKit will automatically collect logs by default. Enabling log collection again will result in duplicate collection. It is recommended to manually disable DataKit's independent log collection before enabling OTEL log collection.
For more languages, refer to the official documentation.
Collection Field Description¶
Tracing¶
opentelemetry
¶
Following is tags/fields of tracing data
Tags & Fields | Description |
---|---|
base_service ( tag ) |
Span base service name |
container_host ( tag ) |
Container hostname. Available in OpenTelemetry. Optional. |
db_host ( tag ) |
DB host name: ip or domain name. Optional. |
db_name ( tag ) |
Database name. Optional. |
db_system ( tag ) |
Database system name:mysql,oracle... Optional. |
dk_fingerprint ( tag ) |
DataKit fingerprint(always DataKit's hostname) |
endpoint ( tag ) |
Endpoint info. Available in SkyWalking, Zipkin. Optional. |
env ( tag ) |
Application environment info. Available in Jaeger. Optional. |
host ( tag ) |
Hostname. |
http_method ( tag ) |
HTTP request method name. Available in DDTrace, OpenTelemetry. Optional. |
http_route ( tag ) |
HTTP route. Optional. |
http_status_code ( tag ) |
HTTP response code. Available in DDTrace, OpenTelemetry. Optional. |
http_url ( tag ) |
HTTP URL. Optional. |
operation ( tag ) |
Span name |
out_host ( tag ) |
This is the database host, equivalent to db_host,only DDTrace-go. Optional. |
project ( tag ) |
Project name. Available in Jaeger. Optional. |
service ( tag ) |
Service name. Optional. |
source_type ( tag ) |
Tracing source type |
span_type ( tag ) |
Span type |
status ( tag ) |
Span status |
version ( tag ) |
Application version info. Available in Jaeger. Optional. |
duration | Duration of span Type: int Unit: time,μs |
message | Origin content of span Type: string Unit: N/A |
parent_id | Parent span ID of current span Type: string Unit: N/A |
resource | Resource name produce current span Type: string Unit: N/A |
span_id | Span id Type: string Unit: N/A |
start | start time of span. Type: int Unit: timeStamp,usec |
trace_id | Trace id Type: string Unit: N/A |
Metrics¶
otel_service
¶
OpenTelemetry JVM Metrics
Tags & Fields | Description |
---|---|
action ( tag ) |
GC Action |
area ( tag ) |
Heap or not |
cause ( tag ) |
GC Cause |
container_id ( tag ) |
Container ID |
db_host ( tag ) |
DB host name: ip or domain name |
db_name ( tag ) |
Database name |
db_system ( tag ) |
Database system name:mysql,oracle... |
direction ( tag ) |
received or sent |
exception ( tag ) |
Exception Information |
gc ( tag ) |
GC Type |
host ( tag ) |
Host Name |
host_arch ( tag ) |
Host arch |
host_name ( tag ) |
Host Name |
http.scheme ( tag ) |
HTTP/HTTPS |
http_method ( tag ) |
HTTP Method |
http_request_method ( tag ) |
HTTP Method |
http_response_status_code ( tag ) |
HTTP status code |
http_route ( tag ) |
HTTP Route |
id ( tag ) |
JVM Type |
instrumentation_name ( tag ) |
Metric Name |
jvm_gc_action ( tag ) |
action:end of major,end of minor GC |
jvm_gc_name ( tag ) |
name:PS MarkSweep,PS Scavenge |
jvm_memory_pool_name ( tag ) |
pool_name:code cache,PS Eden Space,PS Old Gen,MetaSpace... |
jvm_memory_type ( tag ) |
memory type:heap,non_heap |
jvm_thread_state ( tag ) |
Thread state:runnable,timed_waiting,waiting |
le ( tag ) |
*_bucket: histogram metric explicit bounds |
level ( tag ) |
Log Level |
main-application-class ( tag ) |
Main Entry Point |
method ( tag ) |
HTTP Type |
name ( tag ) |
Thread Pool Name |
net_protocol_name ( tag ) |
Net Protocol Name |
net_protocol_version ( tag ) |
Net Protocol Version |
os_type ( tag ) |
OS Type |
outcome ( tag ) |
HTTP Outcome |
path ( tag ) |
Disk Path |
pool ( tag ) |
JVM Pool Type |
scope_name ( tag ) |
Scope name |
service_name ( tag ) |
Service Name |
spanProcessorType ( tag ) |
Span Processor Type |
state ( tag ) |
Thread State:idle,used |
status ( tag ) |
HTTP Status Code |
type ( tag ) |
Kafka broker type |
unit ( tag ) |
metrics unit |
uri ( tag ) |
HTTP Request URI |
application.ready.time | Time taken (ms) for the application to be ready to service requests Type: float Unit: timeStamp,msec |
application.started.time | Time taken (ms) to start the application Type: float Unit: timeStamp,msec |
disk.free | Usable space for path Type: float Unit: digital,B |
disk.total | Total space for path Type: float Unit: digital,B |
executor.active | The approximate number of threads that are actively executing tasks Type: float Unit: count |
executor.completed | The approximate total number of tasks that have completed execution Type: float Unit: count |
executor.pool.core | The core number of threads for the pool Type: float Unit: digital,B |
executor.pool.max | The maximum allowed number of threads in the pool Type: float Unit: count |
executor.pool.size | The current number of threads in the pool Type: float Unit: digital,B |
executor.queue.remaining | The number of additional elements that this queue can ideally accept without blocking Type: float Unit: count |
executor.queued | The approximate number of tasks that are queued for execution Type: float Unit: count |
http.server.active_requests | The number of concurrent HTTP requests that are currently in-flight Type: float Unit: count |
http.server.duration | The duration of the inbound HTTP request Type: float Unit: time,ns |
http.server.request.duration | The count of HTTP request duration time in each bucket Type: float Unit: count |
http.server.requests | The http request count Type: float Unit: count |
http.server.requests.max | None Type: float Unit: digital,B |
http.server.response.size | The size of HTTP response messages Type: float Unit: digital,B |
http.server.tomcat.errorCount | The number of errors per second on all request processors Type: float Unit: count |
http.server.tomcat.maxTime | The longest request processing time Type: float Unit: timeStamp,msec |
http.server.tomcat.processingTime | Represents the total time for processing all requests Type: float Unit: timeStamp,msec |
http.server.tomcat.requestCount | The number of requests per second across all request processors Type: float Unit: count |
http.server.tomcat.sessions.activeSessions | The number of active sessions Type: float Unit: count |
http.server.tomcat.threads | Thread Count of the Thread Pool Type: float Unit: count |
http.server.tomcat.traffic | The number of bytes transmitted Type: float Unit: traffic,B/S |
jvm.buffer.count | An estimate of the number of buffers in the pool Type: float Unit: count |
jvm.buffer.memory.used | An estimate of the memory that the Java virtual machine is using for this buffer pool Type: float Unit: digital,B |
jvm.buffer.total.capacity | An estimate of the total capacity of the buffers in this pool Type: float Unit: digital,B |
jvm.classes.loaded | The number of classes that are currently loaded in the Java virtual machine Type: float Unit: count |
jvm.classes.unloaded | The total number of classes unloaded since the Java virtual machine has started execution Type: float Unit: count |
jvm.gc.live.data.size | Size of long-lived heap memory pool after reclamation Type: float Unit: digital,B |
jvm.gc.max.data.size | Max size of long-lived heap memory pool Type: float Unit: digital,B |
jvm.gc.memory.allocated | Incremented for an increase in the size of the (young) heap memory pool after one GC to before the next Type: float Unit: digital,B |
jvm.gc.memory.promoted | Count of positive increases in the size of the old generation memory pool before GC to after GC Type: float Unit: digital,B |
jvm.gc.overhead | An approximation of the percent of CPU time used by GC activities over the last look back period or since monitoring began, whichever is shorter, in the range [0..1] Type: int Unit: count |
jvm.gc.pause | Time spent in GC pause Type: float Unit: timeStamp,nsec |
jvm.gc.pause.max | Time spent in GC pause Type: float Unit: timeStamp,msec |
jvm.memory.committed | The amount of memory in bytes that is committed for the Java virtual machine to use Type: float Unit: digital,B |
jvm.memory.max | The maximum amount of memory in bytes that can be used for memory management Type: float Unit: digital,B |
jvm.memory.usage.after.gc | The percentage of long-lived heap pool used after the last GC event, in the range [0..1] Type: float Unit: percent,percent |
jvm.memory.used | The amount of used memory Type: float Unit: digital,B |
jvm.threads.daemon | The current number of live daemon threads Type: float Unit: count |
jvm.threads.live | The current number of live threads including both daemon and non-daemon threads Type: float Unit: digital,B |
jvm.threads.peak | The peak live thread count since the Java virtual machine started or peak was reset Type: float Unit: digital,B |
jvm.threads.states | The current number of threads having NEW state Type: float Unit: digital,B |
kafka.controller.active.count | The number of controllers active on the broker Type: float Unit: count |
kafka.isr.operation.count | The number of in-sync replica shrink and expand operations Type: float Unit: count |
kafka.lag.max | The max lag in messages between follower and leader replicas Type: float Unit: timeStamp,msec |
kafka.leaderElection.count | The leader election count Type: float Unit: count |
kafka.leaderElection.unclean.count | Unclean leader election count - increasing indicates broker failures Type: float Unit: count |
kafka.message.count | The number of messages received by the broker Type: float Unit: count |
kafka.network.io | The bytes received or sent by the broker Type: float Unit: digital,B |
kafka.partition.count | The number of partitions on the broker Type: float Unit: count |
kafka.partition.offline | The number of partitions offline Type: float Unit: count |
kafka.partition.underReplicated | The number of under replicated partitions Type: float Unit: count |
kafka.purgatory.size | The number of requests waiting in purgatory Type: float Unit: count |
kafka.request.count | The number of requests received by the broker Type: float Unit: count |
kafka.request.failed | The number of requests to the broker resulting in a failure Type: float Unit: count |
kafka.request.queue | Size of the request queue Type: float Unit: count |
kafka.request.time.50p | The 50th percentile time the broker has taken to service requests Type: float Unit: timeStamp,msec |
kafka.request.time.99p | The 99th percentile time the broker has taken to service requests Type: float Unit: timeStamp,msec |
kafka.request.time.total | The total time the broker has taken to service requests Type: float Unit: timeStamp,msec |
log4j2.events | Number of fatal level log events Type: float Unit: count |
otlp.exporter.exported | OTLP exporter to remote Type: int Unit: count |
otlp.exporter.seen | OTLP exporter Type: int Unit: count |
process.cpu.usage | The "recent cpu usage" for the Java Virtual Machine process Type: float Unit: percent,percent |
process.files.max | The maximum file descriptor count Type: float Unit: count |
process.files.open | The open file descriptor count Type: float Unit: digital,B |
process.runtime.jvm.buffer.count | The number of buffers in the pool Type: float Unit: count |
process.runtime.jvm.buffer.limit | Total capacity of the buffers in this pool Type: float Unit: digital,B |
process.runtime.jvm.buffer.usage | Memory that the Java virtual machine is using for this buffer pool Type: float Unit: digital,B |
process.runtime.jvm.classes.current_loaded | Number of classes currently loaded Type: float Unit: count |
process.runtime.jvm.classes.loaded | Number of classes loaded since JVM start Type: int Unit: count |
process.runtime.jvm.classes.unloaded | Number of classes unloaded since JVM start Type: float Unit: count |
process.runtime.jvm.cpu.utilization | Recent cpu utilization for the process Type: float Unit: digital,B |
process.runtime.jvm.gc.duration | Duration of JVM garbage collection actions Type: float Unit: timeStamp,nsec |
process.runtime.jvm.memory.committed | Measure of memory committed Type: float Unit: digital,B |
process.runtime.jvm.memory.init | Measure of initial memory requested Type: float Unit: digital,B |
process.runtime.jvm.memory.limit | Measure of max obtainable memory Type: float Unit: digital,B |
process.runtime.jvm.memory.usage | Measure of memory used Type: float Unit: digital,B |
process.runtime.jvm.memory.usage_after_last_gc | Measure of memory used after the most recent garbage collection event on this pool Type: float Unit: digital,B |
process.runtime.jvm.system.cpu.load_1m | Average CPU load of the whole system for the last minute Type: float Unit: percent,percent |
process.runtime.jvm.system.cpu.utilization | Recent cpu utilization for the whole system Type: float Unit: percent,percent |
process.runtime.jvm.threads.count | Number of executing threads Type: float Unit: count |
process.start.time | Start time of the process since unix epoch Type: float Unit: digital,B |
process.uptime | The uptime of the Java virtual machine Type: int Unit: timeStamp,sec |
processedSpans | The number of spans processed by the BatchSpanProcessor Type: int Unit: count |
queueSize | The number of spans queued Type: int Unit: count |
system.cpu.count | The number of processors available to the Java virtual machine Type: int Unit: count |
system.cpu.usage | The "recent cpu usage" for the whole system Type: float Unit: percent,percent |
system.load.average.1m | The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time Type: float Unit: count |
tracing_metrics
¶
Based on OpenTelemetry's span data, we count span count, span cost metrics
Tags & Fields | Description |
---|---|
env ( tag ) |
Application environment info(if set in span). |
host ( tag ) |
Hostname. |
http_status_class ( tag ) |
HTTP response code class, such as 2xx/3xx/4xx/5xx |
http_status_code ( tag ) |
HTTP response code |
operation ( tag ) |
Span name |
pod_name ( tag ) |
Pod name(if set in span). |
pod_namespace ( tag ) |
Pod namespace(if set in span). |
project ( tag ) |
Project name(if set in span). |
remote_ip ( tag ) |
Remote IP. |
resource ( tag ) |
Application resource name. |
service ( tag ) |
Service name. |
source ( tag ) |
Source, always opentelemetry |
status ( tag ) |
Span status(ok/error ) |
version ( tag ) |
Application version info. |
apdex | Measures the Apdex score for each web service. The currently set satisfaction threshold is 2 seconds.The tags for this metric are fixed: service/env/version/resource/source . The value range is 0~1.Type: float Unit: N/A |
errors | Represent the count of errors for spans. Type: int Unit: count |
errors_by_http_status | Represent the count of errors for a given span group by HTTP status code. Type: int Unit: count |
hits | Count of spans. Type: int Unit: count |
hits_by_http_status | Represent the count of hits for a given span group by HTTP status code. Type: int Unit: count |
latency_bucket | Represent the latency distribution for all services, resources, and versions across different environments and additional primary tags. Recommended for all latency measurement use cases. Use the 'le' tag for filtering Type: int Unit: count |
latency_count | The number of spans is equal to the number of web type spans. Type: int Unit: count |
latency_sum | The total latency of all web spans, corresponding to the 'latency_count' Type: int Unit: time,μs |
Deleted Tags in Metrics¶
In the otel_service
metric set, there are many useless tags in the originally reported metrics. These tags are of String type and are discarded due to high memory and bandwidth consumption. The discarded tags are as follows:
process.command_line
process.executable.path
process.runtime.description
process.runtime.name
process.runtime.version
telemetry.distro.name
telemetry.distro.version
telemetry.sdk.language
telemetry.sdk.name
telemetry.sdk.version
Examples¶
DataKit currently provides best practices for the following two languages: