Skip to content

Profiling Python

DataKit Python profiling supports dd-trace-py and py-spy.

Requirements

Install DataKit and enable profile input.

Use dd-trace-py

  • Install dd-trace-py library
Info

DataKit is now compatible with dd-trace-py 1.14.x and below, higher versions are not tested.

pip3 install ddtrace
  • Profiling by attaching into the target process
DD_PROFILING_ENABLED=true \
DD_ENV=dev \
DD_SERVICE=my-web-app \
DD_VERSION=1.0.3 \
DD_TRACE_AGENT_URL=http://127.0.0.1:9529 \
ddtrace-run python app.py
  • Profiling by writing code
import time
import ddtrace
from ddtrace.profiling import Profiler

ddtrace.tracer.configure(
     https=False,
     hostname="localhost",
     port="9529",
)

prof = Profiler()
prof.start(True, True)

# your code here ...
# while True:
#     time.sleep(1)

There is no need to add ddtrace-run command

DD_ENV=testing DD_SERVICE=python-profiling-manual DD_VERSION=1.2.3 python3 app.py

View Profile

After a minute or two, you can visualize your profiles on the APM -> Profile .

Generated Metrics

Starting from Version-1.39.0, DataKit supports extracting a set of Python runtime-related metrics from dd-trace-py output. These metrics are placed under the profiling_metrics metric set. Below are some key metrics with explanations:

Tags & Fields Description
language
(tag)
Language of current profile
host
(tag)
Hostname of current profile
service
(tag)
Service name of current profile
env
(tag)
Env settings of current profile
version
(tag)
Version of current profile
prof_python_cpu_cores Number of CPU cores consumed
Unit: core
prof_python_alloc_bytes_per_sec Memory allocation rate per second
Unit: byte
prof_python_allocs_per_sec Memory allocation operations per second
Unit: count
prof_python_alloc_bytes_total Total memory allocated during a single profiling period (dd-trace defaults to 60-second collection cycles)
Unit: byte
prof_python_lock_acquisition_time Total time spent waiting for locks during a profiling period
Unit: nanosecond
prof_python_lock_acquisitions_per_sec Number of lock contentions per second
Unit: count
prof_python_lock_hold_time Total time spent holding locks during a profiling period
Unit: nanosecond
prof_python_exceptions_per_sec Number of exceptions thrown per second
Unit: count
prof_python_exceptions_total Total number of exceptions thrown during a profiling period
Unit: count
prof_python_lifetime_heap_bytes Total memory size occupied by heap objects
Unit: byte
prof_python_wall_time Wall clock time duration
Unit: nanosecond

Use py-spy

py-spyis a non-invasive Python performance metric sampling tool provided by the open source community, which has the advantages of running independently and having low impact on target program load By default, py-spy will output sampling data in different formats to a local file based on the specified parameters. To simplify the integration of py-spy and DataKit, center provides a branch version py-spy-for-datakit, with little modifications made to the original version, supporting automatic profiling send data to DataKit.

  • Installation

pip install is recommend way.

pip3 install py-spy-for-datakit

Below is Linux x86_64 platform as an example (other platforms is similar), let's introduce the installation steps of the pre compiled version:

# after download binary

# use pip to install
pip3 install --force-reinstall --no-index --find-links . py-spy-for-datakit

# confirm successful installation
py-spy-for-datakit help

if your machine has rust and cargo installed, you can use cargo to install it.

cargo install py-spy-for-datakit
  • Usage

py-spy-for-datakit has added the datakit command to the original subcommand of py-spy, specifically used to send sampling data to DataKit. You can type py-spy-for-datakit help datakit for usage help:

Option describe default
-H, --host DataKit listening host 127.0.0.1
-P, --port DataKit listening port 9529
-S, --service Your service name unnamed-service
-E, --env Your app deploy environment unnamed-env
-V, --version Your app version unnamed-version
-p, --pid Target process PID You must set this option or command
-d, --duration Profiling duration 60
-r, --rate Profiling rate 100
-s, --subprocesses Whether profiling sub process false
-i, --idle Whether profiling inactive thread false

py-spy-for-datakit can analyze the currently running program by using the --pid <PID> or -p <PID> parameters to pass the process PID of the running Python program to py-spy-for-datakit.

Imaging your target process PID is 12345, and DataKit is listening at 127.0.0.1:9529:

py-spy-for-datakit datakit \
  --host 127.0.0.1 \
  --port 9529 \
  --service <your-service-name> \
  --env testing \
  --version v0.1 \
  --duration 60 \
  --pid 12345

If needed, please add sudo prefix.

py-spy-for-datakit also supports direct startup commands with Python projects, so there is no need to specify a process PID. At the same time, data sampling will be performed when the program starts, and the running commands are similar:

py-spy-for-datakit datakit \
  --host 127.0.0.1 \
  --port 9529 \
  --service your-service-name \
  --env testing \
  --version v0.1 \
  -d 60 \
  -- python3 server.py  # There is a blank in front of python3

After a minute or two, you can visualize your profiles on the profile.

Feedback

Is this page helpful? ×