Skip to content

OpenTelemetry


OpenTelemetry (hereinafter referred to as OTEL) is an observability project under CNCF (Cloud Native Computing Foundation). It aims to provide a standardized solution in the field of observability, addressing standardization issues related to the data model, collection, processing, and export of observability data.

OTEL is a collection of standards and tools designed to manage observability data such as traces, metrics, and logs. This document describes how to configure and enable OTEL data ingestion on DataKit, as well as best practices for Java and Go.

Configuration

Navigate to the conf.d/opentelemetry directory under the DataKit installation directory, copy opentelemetry.conf.sample and rename it to opentelemetry.conf. An example is as follows:

[[inputs.opentelemetry]]
  ## customer_tags will work as a whitelist to prevent tags send to data center.
  ## All . will replace to _ ,like this :
  ## "project.name" to send to center is "project_name"
  # customer_tags = ["sink_project", "custom.otel.tag"]

  ## If set to true, all Attributes will be extracted and message.Attributes will be empty.
  # customer_tags_all = false

  ## Keep rare tracing resources list switch.
  ## If some resources are rare enough(not presend in 1 hour), those resource will always send
  ## to data center and do not consider samplers and filters.
  # keep_rare_resource = false

  ## By default every error presents in span will be send to data center and omit any filters or
  ## sampler. If you want to get rid of some error status, you can set the error status list here.
  # omit_err_status = ["404"]

  ## compatible ddtrace: It is possible to compatible OTEL Trace with DDTrace trace
  # compatible_ddtrace=false

  ## split service.name form xx.system.
  ## see: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/database-spans.md
  split_service_name = true

  ## delete trace message
  # del_message = true

  ## logging message data max length,default is 500kb
  log_max = 500

  ## JSON marshaler: set JSON marshaler. available marshaler are:
  ##   gojson/jsoniter/protojson
  ##
  ## For better performance, gojson and jsoniter is better than protojson,
  ## for compatible reason we still use protojson as default.
  jmarshaler = "protojson"

  ## cleaned the top-level fields in message. Default true
  clean_message = true

  ## tracing_metric_enable: trace_hits trace_hits_by_http_status trace_latency trace_errors trace_errors_by_http_status trace_apdex.
  ## Extract the above metrics from the collection traces.
  # tracing_metric_enable = true

  ## Blacklist of metric tags: There are many labels in the metric: "tracing_metrics".
  ## If you want to remove certain tag, you can use the blacklist to remove them.
  ## By default, it includes: source,span_name,env,service,status,version,resource,http_status_code,http_status_class
  ## and "customer_tags", k8s related tags, and others service.
  # tracing_metric_tag_blacklist = ["resource", "operation", "tag_a", "tag_b"]

  ## Ignore tracing resources map like service:[resources...].
  ## The service name is the full service name in current application.
  ## The resource list is regular expressions uses to block resource names.
  ## If you want to block some resources universally under all services, you can set the
  ## service name as "*". Note: double quotes "" cannot be omitted.
  # [inputs.opentelemetry.close_resource]
    # service1 = ["resource1", "resource2", ...]
    # service2 = ["resource1", "resource2", ...]
    # "*" = ["close_resource_under_all_services"]
    # ...

  ## Sampler config uses to set global sampling strategy.
  ## sampling_rate used to set global sampling rate.
  # [inputs.opentelemetry.sampler]
    # sampling_rate = 1.0

  # [inputs.opentelemetry.tags]
    # key1 = "value1"
    # key2 = "value2"
    # ...

  ## Threads config controls how many goroutines an agent cloud start to handle HTTP request.
  ## buffer is the size of jobs' buffering of worker channel.
  ## threads is the total number fo goroutines at running time.
  # [inputs.opentelemetry.threads]
    # buffer = 100
    # threads = 8

  ## Storage config a local storage space in hard dirver to cache trace data.
  ## path is the local file path used to cache data.
  ## capacity is total space size(MB) used to store data.
  # [inputs.opentelemetry.storage]
    # path = "./otel_storage"
    # capacity = 5120

  ## OTEL agent HTTP config for trace and metrics
  ## If enable set to be true, trace and metrics will be received on path respectively, by default is:
  ## trace : /otel/v1/traces
  ## metric: /otel/v1/metrics
  ## and the client side should be configured properly with Datakit listening port(default: 9529)
  ## or custom HTTP request path.
  ## for example http://127.0.0.1:9529/otel/v1/traces
  ## The acceptable http_status_ok values will be 200 or 202.
  [inputs.opentelemetry.http]
   http_status_ok = 200
   trace_api = "/otel/v1/traces"
   metric_api = "/otel/v1/metrics"
   logs_api = "/otel/v1/logs"

  ## OTEL agent GRPC config for trace and metrics.
  ## GRPC services for trace and metrics can be enabled respectively as setting either to be true.
  ## add is the listening on address for GRPC server.
  [inputs.opentelemetry.grpc]
   addr = "127.0.0.1:4317"
   max_payload = 16777216 # default 16MiB

  ## If 'expected_headers' is well configed, then the obligation of sending certain wanted HTTP headers is on the client side,
  ## otherwise HTTP status code 400(bad request) will be provoked.
  ## Note: expected_headers will be effected on both trace and metrics if setted up.
  # [inputs.opentelemetry.expected_headers]
  # ex_version = "1.2.3"
  # ex_name = "env_resource_name"
  # ...

After configuration, restart DataKit to take effect.

You can enable the collector by injecting collector configuration via ConfigMap or configuring ENV_DATAKIT_INPUTS.

You can also modify configuration parameters via environment variables (you need to add the collector to ENV_DEFAULT_ENABLED_INPUTS as a default collector):

  • ENV_INPUT_OTEL_CUSTOMER_TAGS

    Whitelist to tags

    Type: JSON

    input.conf: customer_tags

    Example: '["project_id", "custom.tag"]'

  • ENV_INPUT_OTEL_CUSTOMER_TAGS_ALL

    extracted all attributes to tags

    Type: Boolean

    input.conf: customer_tags_all

    Default: false

  • ENV_INPUT_OTEL_KEEP_RARE_RESOURCE

    Keep rare tracing resources list switch

    Type: Boolean

    input.conf: keep_rare_resource

    Default: false

  • ENV_INPUT_OTEL_COMPATIBLE_DD_TRACE

    Convert trace_id to decimal, compatible with DDTrace

    Type: Boolean

    input.conf: compatible_dd_trace

    Default: false

  • ENV_INPUT_OTEL_SPLIT_SERVICE_NAME

    Get xx.system from span.Attributes to replace service name

    Type: Boolean

    input.conf: split_service_name

    Default: false

  • ENV_INPUT_OTEL_TRACING_METRIC_ENABLE

    These metrics capture request counts, error counts, and latency measures.

    Type: Boolean

    input.conf: tracing_metric_enable

    Default: false

  • ENV_INPUT_OTEL_TRACING_METRIC_TAG_BLACKLIST

    Blacklist of tags in the metric: tracing_metrics

    Type: JSON

    input.conf: tracing_metric_tag_blacklist

    Example: '["tag_a", "tag_b"]'

  • ENV_INPUT_OTEL_DEL_MESSAGE

    Delete trace message

    Type: Boolean

    input.conf: del_message

    Default: false

  • ENV_INPUT_OTEL_OMIT_ERR_STATUS

    Whitelist to error status

    Type: JSON

    input.conf: omit_err_status

    Example: '["404", "403", "400"]'

  • ENV_INPUT_OTEL_CLOSE_RESOURCE

    Ignore tracing resources that service (regular)

    Type: JSON

    input.conf: close_resource

    Example: '{"service1":["resource1","other"],"service2":["resource2","other"]}'

  • ENV_INPUT_OTEL_SAMPLER

    Global sampling rate

    Type: Float

    input.conf: sampler

    Example: 0.3

  • ENV_INPUT_OTEL_THREADS

    Total number of threads and buffer

    Type: JSON

    input.conf: threads

    Example: '{"buffer":1000, "threads":100}'

  • ENV_INPUT_OTEL_STORAGE

    Local cache file path and size (MB)

    Type: JSON

    input.conf: storage

    Example: '{"storage":"./otel_storage", "capacity": 5120}'

  • ENV_INPUT_OTEL_HTTP

    HTTP agent config

    Type: JSON

    input.conf: http

    Example: '{"enable":true, "http_status_ok": 200, "trace_api": "/otel/v1/traces", "metric_api": "/otel/v1/metrics"}'

  • ENV_INPUT_OTEL_GRPC

    GRPC agent config

    Type: JSON

    input.conf: grpc

    Example: '{"addr": "127.0.0.1:4317", "max_payload": 16777216 }'

  • ENV_INPUT_OTEL_EXPECTED_HEADERS

    If expected_headers is well config, then the obligation of sending certain wanted HTTP headers is on the client side

    Type: JSON

    input.conf: expected_headers

    Example: '{"ex_version": "1.2.3", "ex_name": "env_resource_name"}'

  • ENV_INPUT_OTEL_CLEAN_MESSAGE

    Clean message generate smaller message field

    Type: Boolean

    input.conf: clean_message

    Example: true/false

  • ENV_INPUT_OTEL_TAGS

    Customize tags. If there is a tag with the same name in the configuration file, it will be overwritten

    Type: JSON

    input.conf: tags

    Example: '{"k1":"v1", "k2":"v2", "k3":"v3"}'

Notes

  1. It is recommended to use the gRPC protocol, as gRPC offers advantages such as high compression rate, fast serialization, and higher efficiency.
  2. Starting from DataKit version 1.10.0, the routes for the HTTP protocol are configurable. The default request paths (for Trace/Metric) are /otel/v1/traces, /otel/v1/logs, and /otel/v1/metrics respectively.
  3. For float/double type data, a maximum of two decimal places will be retained.
  4. Both HTTP and gRPC support the gzip compression format. You can configure an environment variable in the exporter to enable it: OTEL_EXPORTER_OTLP_COMPRESSION = gzip; gzip is disabled by default.
  5. The HTTP protocol request format supports both JSON and Protobuf serialization formats. However, gRPC only supports the Protobuf format.
Warning
  • The service name in DDTrace trace data is named based on the service name or referenced third-party libraries, while the service name of the OTEL collector is defined by otel.service.name.
  • To display service names separately, an additional field configuration is added: spilt_service_name = true.
  • The service name is extracted from the tags in the trace data. For example, if the DB-type tag is db.system=mysql, the service name will be mysql. For message queue types (e.g., messaging.system=kafka), the service name will be kafka.
  • By default, the service name is extracted from these three tags: db.system/rpc.system/messaging.system.

Note the environment variable configuration when using the OTEL HTTP exporter. Since the default configuration of DataKit uses /otel/v1/traces, /otel/v1/logs, and /otel/v1/metrics, you need to configure trace and metric separately if you want to use the HTTP protocol.

Agent V2 Version

The V2 version uses otlp exporter by default, changing the previous grpc to http/protobuf. You can set it via the command -Dotel.exporter.otlp.protocol=grpc, or use the default http/protobuf.

If using HTTP, the path for each exporter needs to be explicitly configured. For example:

java -javaagent:/usr/local/ddtrace/opentelemetry-javaagent-2.5.0.jar \
  -Dotel.exporter=otlp \
  -Dotel.exporter.otlp.protocol=http/protobuf \
  -Dotel.exporter.otlp.logs.endpoint=http://localhost:9529/otel/v1/logs \
  -Dotel.exporter.otlp.traces.endpoint=http://localhost:9529/otel/v1/traces \
  -Dotel.exporter.otlp.metrics.endpoint=http://localhost:9529/otel/v1/metrics \
  -Dotel.service.name=app \
  -jar app.jar

If using the gRPC protocol, explicit configuration is required; otherwise, the default HTTP protocol will be used:

java -javaagent:/usr/local/ddtrace/opentelemetry-javaagent-2.5.0.jar \
  -Dotel.exporter=otlp \
  -Dotel.exporter.otlp.protocol=grpc \
  -Dotel.exporter.otlp.endpoint=http://localhost:4317 \
  -Dotel.service.name=app \
  -jar app.jar

Logging is enabled by default. To disable log collection, set the exporter configuration to empty: -Dotel.logs.exporter=none.

For more major changes in the V2 version, refer to the official documentation or GitHub release notes: Github-v2.0.0

Common Commands

The following configurations are commonly used when starting an application:

ENV (Corresponding Command) Description
OTEL_SDK_DISABLED(otel.sdk.disabled) Disable the SDK; default is false. No trace metrics will be generated after disabling.
OTEL_RESOURCE_ATTRIBUTES(otel.resource.attributes) Add global custom tags. These custom tags will be included in each span. Example: service.name=App,project=app-a
OTEL_SERVICE_NAME(otel.service.name) Set the service name; it has higher priority than custom tags.
OTEL_LOG_LEVEL(otel.log.level) Log level; default is info.
OTEL_PROPAGATORS(otel.propagators) Set the propagation protocol; default is tracecontext,baggage.
OTEL_TRACES_SAMPLER(otel.traces.sampler) Set the sampler type.
OTEL_TRACES_SAMPLER_ARG(otel.traces.sampler.arg) Used with the above sampler parameter; value range is 0~1.0; default is 1.0.
OTEL_EXPORTER_OTLP_PROTOCOL(otel.exporter.otlp.protocol) Set the transmission protocol; default is grpc; optional values are grpc,http/protobuf,http/json.
OTEL_EXPORTER_OTLP_ENDPOINT(otel.exporter.otlp.endpoint) Set the Trace upload address; it should be set to the DataKit address: http://datakit-endpoint:9529/otel/v1/traces.
OTEL_TRACES_EXPORTER(otel.traces.exporter) Trace exporter; default is otlp.
OTEL_LOGS_EXPORTER(otel.logs.exporter) Log exporter; default is otlp. Note: Explicit configuration is required for OTEL V1 version; otherwise, it is disabled by default.

You can pass the otel.javaagent.debug=true parameter to the Agent to view debug logs. Note that these logs are quite verbose; use them with caution in production environments.

Trace Sampling

You can use head-based sampling or tail-based sampling. For details, refer to the two best practice documents:

Tag Extraction

Starting from DataKit version 1.22.0, the blacklist function is deprecated. A fixed tag list is added, and only tags in this list will be extracted into top-level tags. The fixed list is as follows:

Attributes Tags Description
http.url http_url Full HTTP request path
http.hostname http_hostname Hostname
http.route http_route Route
http.status_code http_status_code Status code
http.request.method http_request_method Request method
http.method http_method Same as above
http.client_ip http_client_ip Client IP
http.scheme http_scheme Request protocol
url.full url_full Full request URL
url.scheme url_scheme Request protocol
url.path url_path Request path
url.query url_query Request parameters
span_kind span_kind Span type
db.system db_system Span type
db.operation db_operation DB action
db.name db_name Database name
db.statement db_statement Detailed information
server.address server_address Service address
net.host.name net_host_name Requested host
server.port server_port Service port number
net.host.port net_host_port Same as above
network.peer.address network_peer_address Network address
network.peer.port network_peer_port Network port
network.transport network_transport Protocol
messaging.system messaging_system Message queue name
messaging.operation messaging_operation Message action
messaging.message messaging_message Message
messaging.destination messaging_destination Message details
rpc.service rpc_service RPC service address
rpc.system rpc_system RPC service name
error error Whether an error occurred
error.message error_message Error message
error.stack error_stack Stack trace information
error.type error_type Error type
error.msg error_message Error message
project project Project
version version Version
env env Environment
host host Host tag in Attributes
pod_name pod_name pod_name tag in Attributes
pod_namespace pod_namespace pod_namespace tag in Attributes

To add custom tags, use the following environment variable:

# Add custom tags via startup parameters
-Dotel.resource.attributes=username=myName,env=1.1.0
Span kind

All spans have the span_kind tag, which has 6 attributes:

  • unspecified: Not set.
  • internal: Internal span or child span type.
  • server: WEB service, RPC service, etc.
  • client: Client type.
  • producer: Message producer.
  • consumer: Message consumer.

Metric Collection

The OpenTelemetry Java Agent obtains MBean metric information from applications via the JMX protocol. The Java Agent reports selected JMX metrics through the internal SDK, which means all metrics are configurable.

You can enable or disable JMX metric reporting using the command otel.jmx.enabled=true/false (enabled by default). To control the time interval between MBean detection attempts, use the otel.jmx.discovery.delay command. This attribute defines the interval in milliseconds between the first and subsequent detection cycles.

In addition, the Agent has built-in collection configurations for some third-party software. For details, refer to: GitHub OTEL JMX Metric

We have implemented special handling for Histogram metrics:

  • OpenTelemetry histogram buckets are directly mapped to Prometheus histogram buckets.

  • The count of each bucket is converted to the Prometheus cumulative count format. For example, OpenTelemetry buckets [0, 10), [10, 50), [50, 100) are converted to Prometheus _bucket metrics with the le tag:

  my_histogram_bucket{le="10"} 100
  my_histogram_bucket{le="50"} 200
  my_histogram_bucket{le="100"} 250
  • The total number of observations in the OpenTelemetry histogram is converted to the Prometheus _count metric.

  • The sum of the OpenTelemetry histogram is converted to the Prometheus _sum metric, and _max and _min are also added.

  my_histogram_count 250
  my_histogram_max 100
  my_histogram_min 50
  my_histogram_sum 12345.67

All metrics ending with _bucket are histogram data, and there must be corresponding metrics ending with _max, _min, _count, and sum.

You can use the le (less than or equal) tag to categorize histogram data and filter based on tags. For all metrics and tags, refer to OpenTelemetry Metrics.

This conversion enables seamless integration of histogram data collected by OpenTelemetry into Prometheus, allowing you to leverage Prometheus' powerful query and visualization capabilities for analysis.

Log Collection

Version-1.33.0

Currently, the JAVA Agent supports collecting stdout logs and sending them to DataKit via the otlp protocol using the Standard output method.

By default, log collection is disabled for OTEL Agent V1. Explicit commands are required to enable it. The enabling methods are as follows:

# env
export OTEL_LOGS_EXPORTER=OTLP
export OTEL_EXPORTER_OTLP.ENDPOINT=http://<DataKit Addr>:4317
java -jar app.jar

# command
java -javaagent:/path/to/agnet.jar \
  -otel.logs.exporter=otlp \
  -Dotel.exporter.otlp.endpoint=http://<DataKit Addr>:4317 \
  -jar app.jar

By default, the maximum length of log content is 500KB. Content exceeding this limit will be split into multiple logs. The maximum length of log tags is 32KB (this field is not configurable), and content exceeding this limit will be truncated.

The source of logs collected via OTEL is the service name. You can also customize it by adding a tag: log.source. For example: -Dotel.resource.attributes="log.source=source_name".

Note: If the app runs in a container environment (e.g., k8s), DataKit will automatically collect logs by default. Enabling log collection again will result in duplicate collection. It is recommended to manually disable DataKit's independent log collection before enabling OTEL log collection.

For more languages, refer to the official documentation.

Collection Field Description

Tracing

opentelemetry

Following is tags/fields of tracing data

Tags & Fields Description
base_service
(tag)
Span base service name
container_host
(tag)
Container hostname. Available in OpenTelemetry. Optional.
db_host
(tag)
DB host name: ip or domain name. Optional.
db_name
(tag)
Database name. Optional.
db_system
(tag)
Database system name:mysql,oracle... Optional.
dk_fingerprint
(tag)
DataKit fingerprint(always DataKit's hostname)
endpoint
(tag)
Endpoint info. Available in SkyWalking, Zipkin. Optional.
env
(tag)
Application environment info. Available in Jaeger. Optional.
host
(tag)
Hostname.
http_method
(tag)
HTTP request method name. Available in DDTrace, OpenTelemetry. Optional.
http_route
(tag)
HTTP route. Optional.
http_status_code
(tag)
HTTP response code. Available in DDTrace, OpenTelemetry. Optional.
http_url
(tag)
HTTP URL. Optional.
operation
(tag)
Span name
out_host
(tag)
This is the database host, equivalent to db_host,only DDTrace-go. Optional.
project
(tag)
Project name. Available in Jaeger. Optional.
service
(tag)
Service name. Optional.
source_type
(tag)
Tracing source type
span_type
(tag)
Span type
status
(tag)
Span status
version
(tag)
Application version info. Available in Jaeger. Optional.
duration Duration of span
Type: int
Unit: time,μs
message Origin content of span
Type: string
Unit: N/A
parent_id Parent span ID of current span
Type: string
Unit: N/A
resource Resource name produce current span
Type: string
Unit: N/A
span_id Span id
Type: string
Unit: N/A
start start time of span.
Type: int
Unit: timeStamp,usec
trace_id Trace id
Type: string
Unit: N/A

Metrics

otel_service

OpenTelemetry JVM Metrics

Tags & Fields Description
action
(tag)
GC Action
area
(tag)
Heap or not
cause
(tag)
GC Cause
container_id
(tag)
Container ID
db_host
(tag)
DB host name: ip or domain name
db_name
(tag)
Database name
db_system
(tag)
Database system name:mysql,oracle...
direction
(tag)
received or sent
exception
(tag)
Exception Information
gc
(tag)
GC Type
host
(tag)
Host Name
host_arch
(tag)
Host arch
host_name
(tag)
Host Name
http.scheme
(tag)
HTTP/HTTPS
http_method
(tag)
HTTP Method
http_request_method
(tag)
HTTP Method
http_response_status_code
(tag)
HTTP status code
http_route
(tag)
HTTP Route
id
(tag)
JVM Type
instrumentation_name
(tag)
Metric Name
jvm_gc_action
(tag)
action:end of major,end of minor GC
jvm_gc_name
(tag)
name:PS MarkSweep,PS Scavenge
jvm_memory_pool_name
(tag)
pool_name:code cache,PS Eden Space,PS Old Gen,MetaSpace...
jvm_memory_type
(tag)
memory type:heap,non_heap
jvm_thread_state
(tag)
Thread state:runnable,timed_waiting,waiting
le
(tag)
*_bucket: histogram metric explicit bounds
level
(tag)
Log Level
main-application-class
(tag)
Main Entry Point
method
(tag)
HTTP Type
name
(tag)
Thread Pool Name
net_protocol_name
(tag)
Net Protocol Name
net_protocol_version
(tag)
Net Protocol Version
os_type
(tag)
OS Type
outcome
(tag)
HTTP Outcome
path
(tag)
Disk Path
pool
(tag)
JVM Pool Type
scope_name
(tag)
Scope name
service_name
(tag)
Service Name
spanProcessorType
(tag)
Span Processor Type
state
(tag)
Thread State:idle,used
status
(tag)
HTTP Status Code
type
(tag)
Kafka broker type
unit
(tag)
metrics unit
uri
(tag)
HTTP Request URI
application.ready.time Time taken (ms) for the application to be ready to service requests
Type: float
Unit: timeStamp,msec
application.started.time Time taken (ms) to start the application
Type: float
Unit: timeStamp,msec
disk.free Usable space for path
Type: float
Unit: digital,B
disk.total Total space for path
Type: float
Unit: digital,B
executor.active The approximate number of threads that are actively executing tasks
Type: float
Unit: count
executor.completed The approximate total number of tasks that have completed execution
Type: float
Unit: count
executor.pool.core The core number of threads for the pool
Type: float
Unit: digital,B
executor.pool.max The maximum allowed number of threads in the pool
Type: float
Unit: count
executor.pool.size The current number of threads in the pool
Type: float
Unit: digital,B
executor.queue.remaining The number of additional elements that this queue can ideally accept without blocking
Type: float
Unit: count
executor.queued The approximate number of tasks that are queued for execution
Type: float
Unit: count
http.server.active_requests The number of concurrent HTTP requests that are currently in-flight
Type: float
Unit: count
http.server.duration The duration of the inbound HTTP request
Type: float
Unit: time,ns
http.server.request.duration The count of HTTP request duration time in each bucket
Type: float
Unit: count
http.server.requests The http request count
Type: float
Unit: count
http.server.requests.max None
Type: float
Unit: digital,B
http.server.response.size The size of HTTP response messages
Type: float
Unit: digital,B
http.server.tomcat.errorCount The number of errors per second on all request processors
Type: float
Unit: count
http.server.tomcat.maxTime The longest request processing time
Type: float
Unit: timeStamp,msec
http.server.tomcat.processingTime Represents the total time for processing all requests
Type: float
Unit: timeStamp,msec
http.server.tomcat.requestCount The number of requests per second across all request processors
Type: float
Unit: count
http.server.tomcat.sessions.activeSessions The number of active sessions
Type: float
Unit: count
http.server.tomcat.threads Thread Count of the Thread Pool
Type: float
Unit: count
http.server.tomcat.traffic The number of bytes transmitted
Type: float
Unit: traffic,B/S
jvm.buffer.count An estimate of the number of buffers in the pool
Type: float
Unit: count
jvm.buffer.memory.used An estimate of the memory that the Java virtual machine is using for this buffer pool
Type: float
Unit: digital,B
jvm.buffer.total.capacity An estimate of the total capacity of the buffers in this pool
Type: float
Unit: digital,B
jvm.classes.loaded The number of classes that are currently loaded in the Java virtual machine
Type: float
Unit: count
jvm.classes.unloaded The total number of classes unloaded since the Java virtual machine has started execution
Type: float
Unit: count
jvm.gc.live.data.size Size of long-lived heap memory pool after reclamation
Type: float
Unit: digital,B
jvm.gc.max.data.size Max size of long-lived heap memory pool
Type: float
Unit: digital,B
jvm.gc.memory.allocated Incremented for an increase in the size of the (young) heap memory pool after one GC to before the next
Type: float
Unit: digital,B
jvm.gc.memory.promoted Count of positive increases in the size of the old generation memory pool before GC to after GC
Type: float
Unit: digital,B
jvm.gc.overhead An approximation of the percent of CPU time used by GC activities over the last look back period or since monitoring began, whichever is shorter, in the range [0..1]
Type: int
Unit: count
jvm.gc.pause Time spent in GC pause
Type: float
Unit: timeStamp,nsec
jvm.gc.pause.max Time spent in GC pause
Type: float
Unit: timeStamp,msec
jvm.memory.committed The amount of memory in bytes that is committed for the Java virtual machine to use
Type: float
Unit: digital,B
jvm.memory.max The maximum amount of memory in bytes that can be used for memory management
Type: float
Unit: digital,B
jvm.memory.usage.after.gc The percentage of long-lived heap pool used after the last GC event, in the range [0..1]
Type: float
Unit: percent,percent
jvm.memory.used The amount of used memory
Type: float
Unit: digital,B
jvm.threads.daemon The current number of live daemon threads
Type: float
Unit: count
jvm.threads.live The current number of live threads including both daemon and non-daemon threads
Type: float
Unit: digital,B
jvm.threads.peak The peak live thread count since the Java virtual machine started or peak was reset
Type: float
Unit: digital,B
jvm.threads.states The current number of threads having NEW state
Type: float
Unit: digital,B
kafka.controller.active.count The number of controllers active on the broker
Type: float
Unit: count
kafka.isr.operation.count The number of in-sync replica shrink and expand operations
Type: float
Unit: count
kafka.lag.max The max lag in messages between follower and leader replicas
Type: float
Unit: timeStamp,msec
kafka.leaderElection.count The leader election count
Type: float
Unit: count
kafka.leaderElection.unclean.count Unclean leader election count - increasing indicates broker failures
Type: float
Unit: count
kafka.message.count The number of messages received by the broker
Type: float
Unit: count
kafka.network.io The bytes received or sent by the broker
Type: float
Unit: digital,B
kafka.partition.count The number of partitions on the broker
Type: float
Unit: count
kafka.partition.offline The number of partitions offline
Type: float
Unit: count
kafka.partition.underReplicated The number of under replicated partitions
Type: float
Unit: count
kafka.purgatory.size The number of requests waiting in purgatory
Type: float
Unit: count
kafka.request.count The number of requests received by the broker
Type: float
Unit: count
kafka.request.failed The number of requests to the broker resulting in a failure
Type: float
Unit: count
kafka.request.queue Size of the request queue
Type: float
Unit: count
kafka.request.time.50p The 50th percentile time the broker has taken to service requests
Type: float
Unit: timeStamp,msec
kafka.request.time.99p The 99th percentile time the broker has taken to service requests
Type: float
Unit: timeStamp,msec
kafka.request.time.total The total time the broker has taken to service requests
Type: float
Unit: timeStamp,msec
log4j2.events Number of fatal level log events
Type: float
Unit: count
otlp.exporter.exported OTLP exporter to remote
Type: int
Unit: count
otlp.exporter.seen OTLP exporter
Type: int
Unit: count
process.cpu.usage The "recent cpu usage" for the Java Virtual Machine process
Type: float
Unit: percent,percent
process.files.max The maximum file descriptor count
Type: float
Unit: count
process.files.open The open file descriptor count
Type: float
Unit: digital,B
process.runtime.jvm.buffer.count The number of buffers in the pool
Type: float
Unit: count
process.runtime.jvm.buffer.limit Total capacity of the buffers in this pool
Type: float
Unit: digital,B
process.runtime.jvm.buffer.usage Memory that the Java virtual machine is using for this buffer pool
Type: float
Unit: digital,B
process.runtime.jvm.classes.current_loaded Number of classes currently loaded
Type: float
Unit: count
process.runtime.jvm.classes.loaded Number of classes loaded since JVM start
Type: int
Unit: count
process.runtime.jvm.classes.unloaded Number of classes unloaded since JVM start
Type: float
Unit: count
process.runtime.jvm.cpu.utilization Recent cpu utilization for the process
Type: float
Unit: digital,B
process.runtime.jvm.gc.duration Duration of JVM garbage collection actions
Type: float
Unit: timeStamp,nsec
process.runtime.jvm.memory.committed Measure of memory committed
Type: float
Unit: digital,B
process.runtime.jvm.memory.init Measure of initial memory requested
Type: float
Unit: digital,B
process.runtime.jvm.memory.limit Measure of max obtainable memory
Type: float
Unit: digital,B
process.runtime.jvm.memory.usage Measure of memory used
Type: float
Unit: digital,B
process.runtime.jvm.memory.usage_after_last_gc Measure of memory used after the most recent garbage collection event on this pool
Type: float
Unit: digital,B
process.runtime.jvm.system.cpu.load_1m Average CPU load of the whole system for the last minute
Type: float
Unit: percent,percent
process.runtime.jvm.system.cpu.utilization Recent cpu utilization for the whole system
Type: float
Unit: percent,percent
process.runtime.jvm.threads.count Number of executing threads
Type: float
Unit: count
process.start.time Start time of the process since unix epoch
Type: float
Unit: digital,B
process.uptime The uptime of the Java virtual machine
Type: int
Unit: timeStamp,sec
processedSpans The number of spans processed by the BatchSpanProcessor
Type: int
Unit: count
queueSize The number of spans queued
Type: int
Unit: count
system.cpu.count The number of processors available to the Java virtual machine
Type: int
Unit: count
system.cpu.usage The "recent cpu usage" for the whole system
Type: float
Unit: percent,percent
system.load.average.1m The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time
Type: float
Unit: count

tracing_metrics

Based on OpenTelemetry's span data, we count span count, span cost metrics

Tags & Fields Description
env
(tag)
Application environment info(if set in span).
host
(tag)
Hostname.
http_status_class
(tag)
HTTP response code class, such as 2xx/3xx/4xx/5xx
http_status_code
(tag)
HTTP response code
operation
(tag)
Span name
pod_name
(tag)
Pod name(if set in span).
pod_namespace
(tag)
Pod namespace(if set in span).
project
(tag)
Project name(if set in span).
remote_ip
(tag)
Remote IP.
resource
(tag)
Application resource name.
service
(tag)
Service name.
source
(tag)
Source, always opentelemetry
status
(tag)
Span status(ok/error)
version
(tag)
Application version info.
apdex Measures the Apdex score for each web service. The currently set satisfaction threshold is 2 seconds.The tags for this metric are fixed: service/env/version/resource/source. The value range is 0~1.
Type: float
Unit: N/A
errors Represent the count of errors for spans.
Type: int
Unit: count
errors_by_http_status Represent the count of errors for a given span group by HTTP status code.
Type: int
Unit: count
hits Count of spans.
Type: int
Unit: count
hits_by_http_status Represent the count of hits for a given span group by HTTP status code.
Type: int
Unit: count
latency_bucket Represent the latency distribution for all services, resources, and versions across different environments and additional primary tags. Recommended for all latency measurement use cases. Use the 'le' tag for filtering
Type: int
Unit: count
latency_count The number of spans is equal to the number of web type spans.
Type: int
Unit: count
latency_sum The total latency of all web spans, corresponding to the 'latency_count'
Type: int
Unit: time,μs
Deleted Tags in Metrics

In the otel_service metric set, there are many useless tags in the originally reported metrics. These tags are of String type and are discarded due to high memory and bandwidth consumption. The discarded tags are as follows:

process.command_line
process.executable.path
process.runtime.description
process.runtime.name
process.runtime.version
telemetry.distro.name
telemetry.distro.version
telemetry.sdk.language
telemetry.sdk.name
telemetry.sdk.version

Examples

DataKit currently provides best practices for the following two languages:

More Documents

Feedback

Is this page helpful? ×