Pyroscope
DataKit 从 Version-1.67.0 版本开始增加了 Pyroscope 采集器,支持接入 Grafana Pyroscope Agent 上报的数据,帮助用户定位应用程序中的 CPU、内存、IO 等的性能瓶颈。
采集器配置¶
进入 DataKit 安装目录下的 conf.d/samples 目录,复制 sample.conf.sample 并命名为 sample.conf。配置文件说明如下:
[[inputs.pyroscope]]
## pyroscope Agent endpoints register by version respectively.
## Endpoints can be skipped listen by remove them from the list.
## Default value set as below. DO NOT MODIFY THESE ENDPOINTS if not necessary.
endpoints = ["/ingest"]
## set true to enable election, pull mode only
election = true
## the max allowed size of http request body (of MB), 32MB by default.
body_size_limit_mb = 32 # MB
## set false to stop generating apm metrics from ddtrace output.
generate_metrics = true
## io_config is used to control profiling uploading behavior.
## cache_path set the disk directory where temporarily cache profiling data.
## cache_capacity_mb specify the max storage space (in MiB) that profiling cache can use.
## clear_cache_on_start set whether we should clear all previous profiling cache on restarting Datakit.
## upload_workers set the count of profiling uploading workers.
## send_timeout specify the http timeout when uploading profiling data to dataway.
## send_retry_count set the max retry count when sending every profiling request.
# [inputs.pyroscope.io_config]
# cache_path = "/usr/local/datakit/cache/pyroscope_inputs" # C:\Program Files\datakit\cache\pyroscope_inputs by default on Windows
# cache_capacity_mb = 10240 # 10240MB
# clear_cache_on_start = false
# upload_workers = 8
# send_timeout = "75s"
# send_retry_count = 4
## set custom tags for profiling data
# [inputs.pyroscope.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
配置好后,重启 DataKit ,开启 Pyroscope 采集器。
目前可以通过 ConfigMap 方式注入采集器配置来开启采集器。
设置采集器全局 Tag¶
可以在采集器配置中通过 [inputs.pyroscope.tags] 指定额外标签,该标签会统一应用到所有该采集器采到的数据:
客户端 SDK 接入¶
Pyroscope 采集器目前支持 Java,Python, Go 和 Rust 等语言的 Pyroscope Agent 接入,其他语言正在持续支持中:
从 Github 下载最新的 pyroscope.jar 包,作为 Java Agent 启动你的应用:
PYROSCOPE_APPLICATION_NAME="java-pyro-demo" \
PYROSCOPE_LOG_LEVEL=debug \
PYROSCOPE_FORMAT="jfr" \
PYROSCOPE_PROFILER_EVENT="cpu" \
PYROSCOPE_LABELS="host=$(hostname),service=java-pyro-demo,version=1.2.3,env=dev,some_other_tag=other_value" \
PYROSCOPE_UPLOAD_INTERVAL="60s" \
PYROSCOPE_JAVA_STACK_DEPTH_MAX=512 \
PYROSCOPE_PROFILING_INTERVAL="10ms" \
PYROSCOPE_PROFILER_ALLOC=128k \
PYROSCOPE_PROFILER_LOCK=10ms \
PYROSCOPE_ALLOC_LIVE=false \
PYROSCOPE_GC_BEFORE_DUMP=true \
PYROSCOPE_SERVER_ADDRESS="http://127.0.0.1:9529" \
java -javaagent:pyroscope.jar -jar your-app.jar
更多细节请参考 Grafana 官方文档
-
安装
pyroscope-io依赖包: -
代码引入
pyroscope-io包:import os import pyroscope import socket pyroscope.configure( server_address="http://127.0.0.1:9529", detect_subprocesses=True, oncpu=True, enable_logging=True, report_pid=True, report_thread_id=True, report_thread_name=True, tags={ "host": socket.gethostname(), "service": 'python-pyro-demo', "version": 'v1.2.3', "env": "testing", "process_id": os.getpid(), } ) -
启动应用:
-
添加
pyroscope-go模块: -
引入模块并初启动:
import ( "log" "os" "runtime" "strconv" "time" "github.com/grafana/pyroscope-go" ) func Must[T any](t T, _ error) T { return t } runtime.SetMutexProfileFraction(5) runtime.SetBlockProfileRate(5) profiler, err := pyroscope.Start(pyroscope.Config{ ApplicationName: "go-pyroscope-demo", // replace this with the address of pyroscope server ServerAddress: "http://127.0.0.1:9529", // you can disable logging by setting this to nil Logger: pyroscope.StandardLogger, // uploading interval period UploadRate: time.Minute, // you can provide static tags via a map: Tags: map[string]string{ "service": "go-pyroscope-demo", "env": "demo", "version": "1.2.3", "host": Must(os.Hostname()), "process_id": strconv.Itoa(os.Getpid()), "runtime_id": UUID, }, ProfileTypes: []pyroscope.ProfileType{ // these profile types are enabled by default: pyroscope.ProfileCPU, pyroscope.ProfileAllocObjects, pyroscope.ProfileAllocSpace, pyroscope.ProfileInuseObjects, pyroscope.ProfileInuseSpace, // these profile types are optional: pyroscope.ProfileGoroutines, pyroscope.ProfileMutexCount, pyroscope.ProfileMutexDuration, pyroscope.ProfileBlockCount, pyroscope.ProfileBlockDuration, }, }) if err != nil { log.Fatal("unable to bootstrap pyroscope profiler: ", err) } defer profiler.Stop()
Note
Pyroscope Rust agent 目前只能正常工作在 Linux 平台上。
-
添加
pyroscope和pyroscope_pprofrscrates 到项目依赖中: -
代码中初始化并启动 Pyroscope Rust profiling 任务:
use pyroscope::{PyroscopeAgent, Result}; use pyroscope_pprofrs::{pprof_backend, PprofConfig}; fn main() -> Result<()> { let pprof_config = PprofConfig::new().sample_rate(100).report_thread_id().report_thread_name(); // 采样率等基础配置 let backend_impl = pprof_backend(pprof_config); // Pyroscope agent 配置 let agent = PyroscopeAgent::builder("http://127.0.0.1:9529", "pyroscope-rust-app") .backend(backend_impl) .tags([("version", "1.23.4"), ("env", "demo"), ("host", "<your-hostname>")].to_vec()) .build()?; // start the Pyroscope agent let agent_running = agent.start()?; // your application code... // gracefully shutdown the Pyroscope agent let agent_ready = agent_running.stop()?; agent_ready.shutdown(); Ok(()) }
与 OpenTelemetry 链路数据进行关联¶
通过与链路数据之间的关联(Grafana 称之为 Span profiles),用户可以更容易的洞察到系统的性能瓶颈,Pyroscope 提供了相关的 OpenTelemetry 插件,可以让两者之间的数据关联起来,目前支持 Java,Python 和 Go 等语言,下面分别介绍。
Note
为了便于区分同一服务的不同实例,我们可以在进程启动时随机生成一个 UUID,然后在进程的整个生命周期内把该 UUID 作为 runtime_id 标签设置到所有的链路和 profiling 数据上,这样便能关联两者数据。 除了 runtime_id 标签,还建议所有应用添加
host(主机名),service(服务名),version(服务版本),env(部署环境), process_id(服务启动进程号)等标签,方便关联各类采集到的指标和数据。
- 下载 OpenTelemetry 官方提供的 Java agent 包 opentelemetry-javaagent.jar。
- 下载 Pyroscope OpenTelemetry 插件 pyroscope-otel.jar。
- 进行相应设置并启动你的 Java 应用
uuid=$(uuidgen); OTEL_SERVICE_NAME="java-pyro-demo" \ OTEL_RESOURCE_ATTRIBUTES="runtime_id=$uuid,host=$(hostname),service.name=java-pyro-demo,service.version=1.3.55,service.env=dev" \ OTEL_JAVAAGENT_EXTENSIONS=./pyroscope-otel.jar \ OTEL_PYROSCOPE_ADD_PROFILE_URL=false \ OTEL_PYROSCOPE_ADD_PROFILE_BASELINE_URL=false \ OTEL_PYROSCOPE_START_PROFILING=true \ OTEL_TRACES_EXPORTER=otlp \ OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf" \ OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://127.0.0.1:9529/otel/v1/traces" \ OTEL_EXPORTER_OTLP_METRICS_ENDPOINT="http://127.0.0.1:9529/otel/v1/metrics" \ OTEL_EXPORTER_OTLP_LOGS_ENDPOINT="http://127.0.0.1:9529/otel/v1/logs" \ OTEL_EXPORTER_OTLP_COMPRESSION=gzip \ PYROSCOPE_APPLICATION_NAME="java-pyro-demo" \ PYROSCOPE_LOG_LEVEL=debug \ PYROSCOPE_FORMAT="jfr" \ PYROSCOPE_PROFILER_EVENT="cpu" \ PYROSCOPE_LABELS="runtime_id=$uuid,service=java-pyro-demo,version=1.2.3,env=dev,host=$(hostname),other_tag=other_value" \ PYROSCOPE_UPLOAD_INTERVAL="60s" \ PYROSCOPE_JAVA_STACK_DEPTH_MAX=512 \ PYROSCOPE_PROFILING_INTERVAL="10ms" \ PYROSCOPE_PROFILER_ALLOC=128k \ PYROSCOPE_PROFILER_LOCK=10ms \ PYROSCOPE_ALLOC_LIVE=false \ PYROSCOPE_GC_BEFORE_DUMP=true \ PYROSCOPE_SERVER_ADDRESS="http://127.0.0.1:9529" \ java -javaagent:opentelemetry-javaagent.jar -jar your-app.jar
Tips
上述使用 uuidgen 命令随机生成了一个 UUID,并通过环境变量 OTEL_RESOURCE_ATTRIBUTES 和 PYROSCOPE_LABELS 分别为链路和 profiling 设置 runtime_id tag,其它一些环境变量的设置仅供参考,请根据实际情况或参考官方文档酌情增删。
-
安装
pyroscope-otel依赖库 -
引入
pyroscope-otel和opentelemetry库并进行相应配置import uuid import socket import os import pyroscope from opentelemetry import trace from opentelemetry.sdk.resources import Resource from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from pyroscope.otel import PyroscopeSpanProcessor from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter UUID = uuid.uuid1() // 进程启动时随机生成一个 UUID otelExporter = OTLPSpanExporter(endpoint='http://127.0.0.1:4317', insecure=True, timeout=30) tracerProvider = TracerProvider(resource=Resource(attributes={ "service.name": "python-pyro-demo", "service.version": "v3.5.7", "service.env": "dev", "host": socket.gethostname(), "process_id": os.getpid(), "runtime_id": str(UUID), // 为链路设置 runtime_id tag })) tracerProvider.add_span_processor(PyroscopeSpanProcessor()) tracerProvider.add_span_processor(BatchSpanProcessor(span_exporter=otelExporter, max_queue_size=100, max_export_batch_size=30)) trace.set_tracer_provider(tracerProvider) tracer = trace.get_tracer("python-pyro-demo") pyroscope.configure( server_address="http://127.0.0.1:9529", detect_subprocesses=True, oncpu=True, enable_logging=True, report_pid=True, report_thread_id=True, report_thread_name=True, tags={ "runtime_id": str(UUID), // 为 profiling 设置 runtime_id tag "host": socket.gethostname(), "service": 'python-pyro-demo', "version": 'v0.2.3', "env": "testing", "process_id": os.getpid(), } ) if __name__ == '__main__': // your app code pyroscope.shutdown()
-
添加
pyroscope-go库到项目依赖中 -
配置并启动 OpenTelemetry 和 Pyroscope
package main import ( "github.com/google/uuid" otelpyroscope "github.com/grafana/otel-profiling-go" "github.com/grafana/pyroscope-go" "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/attribute" "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp" "go.opentelemetry.io/otel/propagation" "go.opentelemetry.io/otel/sdk/resource" tracesdk "go.opentelemetry.io/otel/sdk/trace" "go.opentelemetry.io/otel/trace" ) var ( UUID = uuid.NewString() // 生成一个全局的 UUID otelTracer trace.Tracer ) func hostname() string { host, _ := os.Hostname() return host } func main() { otelExporter, err := otlptracehttp.New(context.Background(), otlptracehttp.WithEndpointURL("http://127.0.0.1:9529/otel/v1/traces"), otlptracehttp.WithInsecure(), otlptracehttp.WithTimeout(time.Second*15), ) if err != nil { log.Fatal("unable to init otel tracing exporter: ", err) } tracerProvider := tracesdk.NewTracerProvider(tracesdk.WithBatcher(otelExporter, tracesdk.WithBatchTimeout(time.Second*3)), tracesdk.WithResource(resource.NewSchemaless( attribute.String("runtime_id", UUID), // 为链路设置 runtime_id tag attribute.String("service.name", "go-pyroscope-demo"), attribute.String("service.version", "v0.0.1"), attribute.String("service.env", "dev"), attribute.String("host", hostname()), attribute.String("process_id", strconv.Itoa(os.Getpid())) )), ) defer tracerProvider.Shutdown(context.Background()) otel.SetTracerProvider(otelpyroscope.NewTracerProvider(tracerProvider)) otelTracer = otel.Tracer("go-pyroscope-demo") log.Printf("otel tracing started....\n") runtime.SetMutexProfileFraction(5) runtime.SetBlockProfileRate(5) profiler, err := pyroscope.Start(pyroscope.Config{ ApplicationName: "go-pyroscope-demo", // replace this with the address of pyroscope server ServerAddress: "http://127.0.0.1:9529", // you can disable logging by setting this to nil Logger: pyroscope.StandardLogger, // uploading interval period UploadRate: time.Minute, // you can provide static tags via a map: Tags: map[string]string{ "runtime_id": UUID, // 为 profiling 设置 runtime_id tag "env": "demo", "version": "0.0.1", "host": hostname(), "process_id": strconv.Itoa(os.Getpid()), }, ProfileTypes: []pyroscope.ProfileType{ // these profile types are enabled by default: pyroscope.ProfileCPU, pyroscope.ProfileAllocObjects, pyroscope.ProfileAllocSpace, pyroscope.ProfileInuseObjects, pyroscope.ProfileInuseSpace, // these profile types are optional: pyroscope.ProfileGoroutines, pyroscope.ProfileMutexCount, pyroscope.ProfileMutexDuration, pyroscope.ProfileBlockCount, pyroscope.ProfileBlockDuration, }, }) if err != nil { log.Fatal("unable to bootstrap pyroscope profiler: ", err) } log.Printf("pyroscope profiler started....\n") defer profiler.Stop() // your app code... }