Profiling Java
DataKit now supports two Java profiling tools: dd-trace-java and async-profiler.
dd-trace-Java¶
Download dd-trace-java from the page dd-trace-java.
Note
DataKit currently supports dd-trace-java 1.47.x and lower versions. Higher versions have not been tested, and their compatibility is unknown. If you encounter any issues during use, please feel free to provide feedback to us.
Currently, dd-trace-java integrates two sets of analysis engines: Datadog Profiler and the built - in JFR (Java Flight Recorder) in the JDK.
Both engines have their own requirements for the platform and JDK version, which are listed as follows:
The Datadog Profiler currently only supports the Linux system, and has the following requirements for the JDK version:
- OpenJDK 8u352+, 11.0.17+, 17.0.5+ (including the corresponding versions built by
Eclipse Adoptium,Amazon Corretto,Azul Zulu, etc.) - Oracle JDK 8u352+, 11.0.17+, 17.0.5+
- OpenJ9 JDK 8u372+, 11.0.18+, 17.0.6+
- OpenJDK 11+
- Oracle JDK 11+
- OpenJDK 8 (version 1.8.0.262/8u262+)
- Oracle JDK 8 (commercial features need to be enabled)
Note
JFR is a commercial feature of Oracle JDK 8 and is disabled by default. If you need to enable it, you need to add the parameters -XX:+UnlockCommercialFeatures -XX:+FlightRecorder when starting the project. Since JDK 11, JFR has become an open-source project and is no longer a commercial feature of Oracle JDK.
Run Java Code
java -javaagent:/<your-path>/dd-java-agent.jar \
-XX:FlightRecorderOptions=stackdepth=256 \
-Ddd.agent.host=127.0.0.1 \
-Ddd.trace.agent.port=9529 \
-Ddd.service.name=profiling-demo \
-Ddd.env=dev \
-Ddd.version=1.2.3 \
-Ddd.profiling.enabled=true \
-Ddd.profiling.ddprof.enabled=true \
-Ddd.profiling.ddprof.cpu.enabled=true \
-Ddd.profiling.ddprof.wall.enabled=true \
-Ddd.profiling.ddprof.alloc.enabled=true \
-Ddd.profiling.ddprof.liveheap.enabled=true \
-Ddd.profiling.ddprof.memleak.enabled=true \
-jar your-app.jar
After a minute or two, you can visualize your profiles on the profile.
Explanation of some parameters:
| Parameter Name | Corresponding Environment Variable | Explanation |
|---|---|---|
-Ddd.profiling.enabled |
DD_PROFILING_ENABLED |
Whether to enable the profiling function. |
-Ddd.profiling.allocation.enabled |
DD_PROFILING_ALLOCATION_ENABLED |
Whether to enable the JFR memory Allocation analysis. High-load applications may have a certain impact on performance. It is recommended to use the Datadog Profiler Allocation function for JDK 11 and above versions. |
-Ddd.profiling.heap.enabled |
DD_PROFILING_HEAP_ENABLED |
Whether to enable the sampling of JFR memory Heap objects. |
-Ddd.profiling.directallocation.enabled |
DD_PROFILING_DIRECTALLOCATION_ENABLED |
Whether to enable the sampling of JFR JVM direct memory allocation. |
-Ddd.profiling.ddprof.enabled |
DD_PROFILING_DDPROF_ENABLED |
Whether to enable the Datadog Profiler analysis engine. |
-Ddd.profiling.ddprof.cpu.enabled |
DD_PROFILING_DDPROF_CPU_ENABLED |
Whether to enable the Datadog Profiler CPU analysis. |
-Ddd.profiling.ddprof.wall.enabled |
DD_PROFILING_DDPROF_WALL_ENABLED |
Whether to enable the collection of Datadog Profiler Wall time. This option affects the accuracy of the association between Trace and Profile, and it is recommended to enable it. |
-Ddd.profiling.ddprof.alloc.enabled |
DD_PROFILING_DDPROF_ALLOC_ENABLED |
Whether to enable the memory Allocation analysis of the Datadog Profiler engine. It has been verified that it cannot be enabled on JDK 8 currently. For JDK 8, please use -Ddd.profiling.allocation.enabled as appropriate and pay attention to the impact on system performance. |
-Ddd.profiling.ddprof.liveheap.enabled |
DD_PROFILING_DDPROF_LIVEHEAP_ENABLED |
Whether to enable the analysis of the currently live Heap by the Datadog Profiler engine. |
-Ddd.profiling.ddprof.memleak.enabled |
DD_PROFILING_DDPROF_MEMLEAK_ENABLED |
Whether to enable the memory leak analysis of the Datadog Profiler engine. |
Generated Metrics¶
Starting from Version-1.39.0, DataKit supports extracting a set of JVM runtime-related metrics from dd-trace-java output. These metrics are placed under the profiling_metrics metric set. Below are some key metrics with explanations:
| Tags & Fields | Description |
|---|---|
language( tag) |
Language of current profile |
host( tag) |
Hostname of current profile |
service( tag) |
Service name of current profile |
env( tag) |
Env settings of current profile |
version( tag) |
Version of current profile |
prof_jvm_cpu_cores |
Total CPU cores consumed by the application Unit: core |
prof_jvm_alloc_bytes_per_sec |
Total memory allocated per second by the program Unit: byte |
prof_jvm_allocs_per_sec |
Number of memory allocation operations per second Unit: count |
prof_jvm_alloc_bytes_total |
Total memory allocated during a single profiling period Unit: byte |
prof_jvm_class_loads_per_sec |
Number of class loading operations per second Unit: count |
prof_jvm_compilation_time |
Total time spent on JIT compilation during a profiling period (dd-trace defaults to 60-second collection cycles) Unit: nanosecond |
prof_jvm_context_switches_per_sec |
Number of thread context switches per second Unit: count |
prof_jvm_direct_alloc_bytes_per_sec |
Direct memory allocation size per second Unit: byte |
prof_jvm_throws_per_sec |
Number of exceptions thrown per second Unit: count |
prof_jvm_throws_total |
Total number of exceptions thrown during a profiling period Unit: count |
prof_jvm_file_io_max_read_bytes |
Maximum bytes read in a single file operation during profiling Unit: byte |
prof_jvm_file_io_max_read_time |
Maximum duration of a single file read operation during profiling Unit: nanosecond |
prof_jvm_file_io_max_write_bytes |
Maximum bytes written in a single file operation during profiling Unit: byte |
prof_jvm_file_io_max_write_time |
Maximum duration of a single file write operation during profiling Unit: nanosecond |
prof_jvm_file_io_read_bytes |
Total bytes read from files during profiling Unit: byte |
prof_jvm_file_io_time |
Total time spent on file I/O operations during profiling Unit: nanosecond |
prof_jvm_file_io_read_time |
Total time spent on file read operations during profiling Unit: nanosecond |
prof_jvm_file_io_write_time |
Total time spent on file write operations during profiling Unit: nanosecond |
prof_jvm_file_io_write_bytes |
Total bytes written to files during profiling Unit: byte |
prof_jvm_avg_gc_pause_time |
Average duration of GC-induced application pauses Unit: nanosecond |
prof_jvm_max_gc_pause_time |
Maximum GC pause duration during profiling Unit: nanosecond |
prof_jvm_gc_pauses_per_sec |
Number of GC pauses per second Unit: count |
prof_jvm_gc_pause_time |
Total time spent in GC pauses during profiling Unit: nanosecond |
prof_jvm_lifetime_heap_bytes |
Total memory occupied by live heap objects Unit: byte |
prof_jvm_lifetime_heap_objects |
Total number of live heap objects Unit: count |
prof_jvm_locks_max_wait_time |
Maximum lock contention wait time during profiling Unit: nanosecond |
prof_jvm_locks_per_sec |
Number of lock contentions per second Unit: count |
prof_jvm_socket_io_max_read_time |
Maximum socket read operation duration during profiling Unit: nanosecond |
prof_jvm_socket_io_max_write_bytes |
Maximum bytes sent in a single socket operation during profiling Unit: byte |
prof_jvm_socket_io_max_write_time |
Maximum socket write operation duration during profiling Unit: nanosecond |
prof_jvm_socket_io_read_bytes |
Total bytes received via sockets during profiling Unit: byte |
prof_jvm_socket_io_read_time |
Total time spent on socket read operations during profiling Unit: nanosecond |
prof_jvm_socket_io_write_time |
Total time spent on socket write operations during profiling Unit: nanosecond |
prof_jvm_socket_io_write_bytes |
Total bytes sent via sockets during profiling Unit: byte |
prof_jvm_threads_created_per_sec |
Number of threads created per second Unit: count |
prof_jvm_threads_deadlocked |
Number of threads in deadlock state Unit: count |
prof_jvm_uptime_nanoseconds |
Application uptime duration Unit: nanosecond |
Note
This feature is enabled by default. If not needed, you can disable it by modifying the collector configuration file <DATAKIT_INSTALL_DIR>/conf.d/profile/profile.conf and setting the generate_metrics option to false, then restart DataKit.
toml [[inputs.profile]]
Set to false to stop generating APM metrics from dd-trace output.¶
generate_metrics = false
Async Profiler¶
async-profiler is an open source Java profiler Based on HotSpot API, it can collect information such as stack and memory allocation during program operation.
async-profiler can trace the following kinds of events:
- CPU cycles
- Hardware and Software performance counters like cache misses, branch misses, page faults, context switches etc.
- Allocations in Java Heap
- Contented lock attempts, including both Java object monitors and ReentrantLocks
Install async-profiler¶
Requirements
DataKit is now compatible with async-profiler v2.9 and below, higher version compatibility is unknown.
The official website provides download for different platform binaries:
- Linux x64 (glibc): async-profiler-2.8.3-linux-x64.tar.gz
- Linux x64 (musl): async-profiler-2.8.3-linux-musl-x64.tar.gz
- Linux arm64: async-profiler-2.8.3-linux-arm64.tar.gz
- macOS x64/arm64: async-profiler-2.8.3-macos.zip
- format converter:converter.jar
Download archive and extract as below(Linux x64):
$ wget https://github.com/async-profiler/async-profiler/releases/download/v2.8.3/async-profiler-2.8.3-linux-x64.tar.gz
$ tar -zxf async-profiler-2.8.3-linux-x64.tar.gz
$ cd async-profiler-2.8.3-linux-x64 && ls
build CHANGELOG.md LICENSE profiler.sh README.md
Use async-profiler¶
- Set Linux kernel option
perf_events
As of Linux 4.6, capturing kernel call stacks using perf_events from a non-root process requires setting two runtime variables. You can set them using sysctl or as follows:
- Install Debug Symbols
If memory allocation (allocate) related events need to be collected, it is required to install Debug Symbols. Oracle JDK already has these symbols built-in, so this step can be skipped. OpenJDK needs to be installed, and the installation method is as follows:
The gdb tool can be used to verify if the debug symbols are properly installed . For example on Linux:
This command's output will either contain Symbol "UseG1GC" is at 0xxxxx or No symbol "UseG1GC" in current context.
- Check Java process PID
Before collection, you need to know the Java process's PID(use jps command)
- Profile Java process
Run profiler.sh and specify Java process PID:
After about 10s, there will generate a file named profiling.html in current dir, you can use browser to open it.
Combine DataKit with async-profiler¶
Requirements:
-
Set your service name(optional)
By default, the program name will be automatically obtained as a 'service' to report the Guance. If customization is needed, the service name can be injected when the program starts:
There are two integration methods:
automate by script¶
Automated scripts can easily integrate async profiler and DataKit, use as follows:
- create shell script
Create a file named "collect.sh" in current dir, type follow text:
???- note "collect.sh"(click to expand)
set -e
LIBRARY_VERSION=2.8.3
MAX_JFR_FILE_SIZE=6000000
datakit_url=http://localhost:9529
if [ -n "$DATAKIT_URL" ]; then
datakit_url=$DATAKIT_URL
fi
datakit_profiling_url=$datakit_url/profiling/v1/input
app_env=dev
if [ -n "$APP_ENV" ]; then
app_env=$APP_ENV
fi
app_version=0.0.0
if [ -n "$APP_VERSION" ]; then
app_version=$APP_VERSION
fi
host_name=$(hostname)
if [ -n "$HOST_NAME" ]; then
host_name=$HOST_NAME
fi
service_name=
if [ -n "$SERVICE_NAME" ]; then
service_name=$SERVICE_NAME
fi
# profiling duration, in seconds
profiling_duration=10
if [ -n "$PROFILING_DURATION" ]; then
profiling_duration=$PROFILING_DURATION
fi
# profiling event
profiling_event=cpu
if [ -n "$PROFILING_EVENT" ]; then
profiling_event=$PROFILING_EVENT
fi
# 采集的 java 应用进程 ID, 此处可以自定义需要采集的 java 进程,比如可以根据进程名称过滤
java_process_ids=$(jps -q -J-XX:+PerfDisableSharedMem)
if [ -n "$PROCESS_ID" ]; then
java_process_ids=`echo $PROCESS_ID | tr "," " "`
fi
if [[ $java_process_ids == "" ]]; then
printf "Warning: no java program found, exit now\n"
exit 1
fi
is_valid_process_id() {
if [ -n "$1" ]; then
if [[ $1 =~ ^[0-9]+$ ]]; then
return 1
fi
fi
return 0
}
profile_collect() {
# disable -e
set +e
process_id=$1
is_valid_process_id $process_id
if [[ $? == 0 ]]; then
printf "Warning: invalid process_id: $process_id, ignore"
return 1
fi
uuid=$(uuidgen)
jfr_file=$runtime_dir/profiler_$uuid.jfr
event_json_file=$runtime_dir/event_$uuid.json
arr=($(jps -v | grep "^$process_id"))
process_name="default"
for (( i = 0; i < ${#arr[@]}; i++ ))
do
value=${arr[$i]}
if [ $i == 1 ]; then
process_name=$value
elif [[ $value =~ "-Ddk.service=" ]]; then
service_name=${value/-Ddk.service=/}
fi
done
start_time=$(date +%FT%T.%N%:z)
./profiler.sh -d $profiling_duration --fdtransfer -e $profiling_event -o jfr -f $jfr_file $process_id
end_time=$(date +%FT%T.%N%:z)
if [ ! -f $jfr_file ]; then
printf "Warning: generating profiling file failed for %s, pid %d\n" $process_name $process_id
return
else
printf "generate profiling file successfully for %s, pid %d\n" $process_name $process_id
fi
jfr_zip_file=$jfr_file.gz
gzip -qc $jfr_file > $jfr_zip_file
zip_file_size=`ls -la $jfr_zip_file | awk '{print $5}'`
if [ -z "$service_name" ]; then
service_name=$process_name
fi
if [ $zip_file_size -gt $MAX_JFR_FILE_SIZE ]; then
printf "Warning: the size of the jfr file generated is bigger than $MAX_JFR_FILE_SIZE bytes, now is $zip_file_size bytes\n"
else
tags="library_version:$LIBRARY_VERSION,library_type:async_profiler,process_id:$process_id,process_name:$process_name,service:$service_name,host:$host_name,env:$app_env,version:$app_version"
if [ -n "$PROFILING_TAGS" ]; then
tags="$tags,$PROFILING_TAGS"
fi
cat >$event_json_file <<END
{
"tags_profiler": "$tags",
"start": "$start_time",
"end": "$end_time",
"family": "java",
"format": "jfr"
}
END
res=$(curl -i $datakit_profiling_url \
-F "main=@$jfr_zip_file;filename=main.jfr" \
-F "event=@$event_json_file;filename=event.json;type=application/json" | head -n 1 )
if [[ ! $res =~ 2[0-9][0-9] ]]; then
printf "Warning: send profile file to datakit failed, %s\n" "$res"
printf "$res"
else
printf "Info: send profile file to datakit successfully\n"
rm -rf $event_json_file $jfr_file $jfr_zip_file
fi
fi
set -e
}
runtime_dir=runtime
if [ ! -d $runtime_dir ]; then
mkdir $runtime_dir
fi
for process_id in $java_process_ids; do
printf "profiling process %d\n" $process_id
profile_collect $process_id > $runtime_dir/$process_id.log 2>&1 &
done
wait
for process_id in $java_process_ids; do
log_file=$runtime_dir/$process_id.log
if [ -f $log_file ]; then
echo
cat $log_file
rm $log_file
fi
done
- Execute script
After the script is executed, the collected profiling data will be reported to the center platform through DataKit, which can be viewed later in the "APM" - "Profile" page.
available env:
DATAKIT_URL:DataKit URL address, default:http://localhost:9529APP_ENV:current env, for example:dev/prod/testAPP_VERSION:your application versionHOST_NAME:hostnameSERVICE_NAME:your service namePROFILING_DURATION:duration, in secondsPROFILING_EVENT:events, for example:cpu/alloc/lockPROFILING_TAGS:set custom tags, split by comma if multiples, e.g.,key1:value1,key2:value2PROCESS_ID:target process PID, for example:98789,33432
DATAKIT_URL=http://localhost:9529 APP_ENV=test APP_VERSION=1.0.0 HOST_NAME=datakit PROFILING_EVENT=cpu,alloc PROFILING_DURATION=60 PROFILING_TAGS="tag1:val1,tag2:val2" PROCESS_ID=98789,33432 bash collect.sh
manually collect¶
Compared to automated scripts, manual operations have higher degrees of freedom and can meet the needs of different scenarios
- generate profiling file, format in "jfr"
- prepare "event.JSON" file
{
"tags_profiler": "library_version:2.8.3,library_type:async_profiler,process_id:16718,host:host_name,service:profiling-demo,env:dev,version:1.0.0",
"start": "2022-10-28T14:30:39.122688553+08:00",
"end": "2022-10-28T14:32:39.122688553+08:00",
"family": "java",
"format": "jfr"
}
fields:
tags_profiler: profiling tags,library_version:async-profilerversionlibrary_type: profiler nameprocess_id: Java process PIDhost: hostnameservice: your service nameenv: your service envversion: your app version- others
start: profiling start timeend: profiling end timefamily: language-
format: format -
upload to DataKit
$ curl http://localhost:9529/profiling/v1/input \
-F "main=@profiling.jfr;filename=main.jfr" \
-F "event=@event.json;filename=event.json;type=application/json"
If the http response body contains {"content":{"ProfileID":"xxxxxxxx"}} indicate successfully uploading.