Skip to content

Profiling Java

Datakit now supports two Java profiling tools: dd-trace-java and async-profiler.

dd-trace-Java

Download dd-trace-java from the page dd-trace-java.

Note

Datakit currently supports dd-trace-java 1.47.x and lower versions. Higher versions have not been tested, and their compatibility is unknown. If you encounter any issues during use, please feel free to provide feedback to us.

Currently, dd-trace-java integrates two sets of analysis engines: Datadog Profiler and the built - in JFR (Java Flight Recorder) in the JDK. Both engines have their own requirements for the platform and JDK version, which are listed as follows:

The Datadog Profiler currently only supports the Linux system, and has the following requirements for the JDK version:

  • OpenJDK 8u352+, 11.0.17+, 17.0.5+ (including the corresponding versions built by Eclipse Adoptium, Amazon Corretto, Azul Zulu, etc.)
  • Oracle JDK 8u352+, 11.0.17+, 17.0.5+
  • OpenJ9 JDK 8u372+, 11.0.18+, 17.0.6+
  • OpenJDK 11+
  • Oracle JDK 11+
  • OpenJDK 8 (version 1.8.0.262/8u262+)
  • Oracle JDK 8 (commercial features need to be enabled)
Note

JFR is a commercial feature of Oracle JDK 8 and is disabled by default. If you need to enable it, you need to add the parameters -XX:+UnlockCommercialFeatures -XX:+FlightRecorder when starting the project. Since JDK 11, JFR has become an open-source project and is no longer a commercial feature of Oracle JDK.

Run Java Code

java -javaagent:/<your-path>/dd-java-agent.jar \
    -XX:FlightRecorderOptions=stackdepth=256 \
    -Ddd.agent.host=127.0.0.1 \
    -Ddd.trace.agent.port=9529 \
    -Ddd.service.name=profiling-demo \
    -Ddd.env=dev \
    -Ddd.version=1.2.3  \
    -Ddd.profiling.enabled=true  \
    -Ddd.profiling.ddprof.enabled=true \
    -Ddd.profiling.ddprof.cpu.enabled=true \
    -Ddd.profiling.ddprof.wall.enabled=true \
    -Ddd.profiling.ddprof.alloc.enabled=true \
    -Ddd.profiling.ddprof.liveheap.enabled=true \
    -Ddd.profiling.ddprof.memleak.enabled=true \
    -jar your-app.jar 

After a minute or two, you can visualize your profiles on the profile.

Explanation of some parameters:

Parameter Name Corresponding Environment Variable Explanation
-Ddd.profiling.enabled DD_PROFILING_ENABLED Whether to enable the profiling function.
-Ddd.profiling.allocation.enabled DD_PROFILING_ALLOCATION_ENABLED Whether to enable the JFR memory Allocation analysis. High-load applications may have a certain impact on performance. It is recommended to use the Datadog Profiler Allocation function for JDK 11 and above versions.
-Ddd.profiling.heap.enabled DD_PROFILING_HEAP_ENABLED Whether to enable the sampling of JFR memory Heap objects.
-Ddd.profiling.directallocation.enabled DD_PROFILING_DIRECTALLOCATION_ENABLED Whether to enable the sampling of JFR JVM direct memory allocation.
-Ddd.profiling.ddprof.enabled DD_PROFILING_DDPROF_ENABLED Whether to enable the Datadog Profiler analysis engine.
-Ddd.profiling.ddprof.cpu.enabled DD_PROFILING_DDPROF_CPU_ENABLED Whether to enable the Datadog Profiler CPU analysis.
-Ddd.profiling.ddprof.wall.enabled DD_PROFILING_DDPROF_WALL_ENABLED Whether to enable the collection of Datadog Profiler Wall time. This option affects the accuracy of the association between Trace and Profile, and it is recommended to enable it.
-Ddd.profiling.ddprof.alloc.enabled DD_PROFILING_DDPROF_ALLOC_ENABLED Whether to enable the memory Allocation analysis of the Datadog Profiler engine. It has been verified that it cannot be enabled on JDK 8 currently. For JDK 8, please use -Ddd.profiling.allocation.enabled as appropriate and pay attention to the impact on system performance.
-Ddd.profiling.ddprof.liveheap.enabled DD_PROFILING_DDPROF_LIVEHEAP_ENABLED Whether to enable the analysis of the currently live Heap by the Datadog Profiler engine.
-Ddd.profiling.ddprof.memleak.enabled DD_PROFILING_DDPROF_MEMLEAK_ENABLED Whether to enable the memory leak analysis of the Datadog Profiler engine.

Async Profiler

async-profiler is an open source Java profiler Based on HotSpot API, it can collect information such as stack and memory allocation during program operation.

async-profiler can trace the following kinds of events:

  • CPU cycles
  • Hardware and Software performance counters like cache misses, branch misses, page faults, context switches etc.
  • Allocations in Java Heap
  • Contented lock attempts, including both Java object monitors and ReentrantLocks

Install async-profiler

Requirements

Datakit is now compatible with async-profiler v2.9 and below, higher version compatibility is unknown.

The official website provides download for different platform binaries:

Download archive and extract as below(Linux x64):

$ wget https://github.com/async-profiler/async-profiler/releases/download/v2.8.3/async-profiler-2.8.3-linux-x64.tar.gz 
$ tar -zxf async-profiler-2.8.3-linux-x64.tar.gz 
$ cd async-profiler-2.8.3-linux-x64 && ls

  build  CHANGELOG.md  LICENSE  profiler.sh  README.md

Use async-profiler

  • Set Linux kernel option perf_events

As of Linux 4.6, capturing kernel call stacks using perf_events from a non-root process requires setting two runtime variables. You can set them using sysctl or as follows:

sudo sysctl kernel.perf_event_paranoid=1
sudo sysctl kernel.kptr_restrict=0 
  • Install Debug Symbols

If memory allocation (allocate) related events need to be collected, it is required to install Debug Symbols. Oracle JDK already has these symbols built-in, so this step can be skipped. OpenJDK needs to be installed, and the installation method is as follows:

sudo apt install openjdk-8-dbg # OpenJDK 8
# Or
sudo apt install openjdk-11-dbg # OpenJDK 11
sudo debuginfo-install java-1.8.0-openjdk

The gdb tool can be used to verify if the debug symbols are properly installed . For example on Linux:

gdb $JAVA_HOME/lib/server/libjvm.so -ex 'info address UseG1GC'

This command's output will either contain Symbol "UseG1GC" is at 0xxxxx or No symbol "UseG1GC" in current context.

  • Check Java process PID

Before collection, you need to know the Java process's PID(use jps command)

$ jps

9234 Jps
8983 Computey
  • Profile Java process

Run profiler.sh and specify Java process PID:

./profiler.sh -d 10 -f profiling.html 8983 

Profiling for 10 seconds
Done

After about 10s, there will generate a file named profiling.html in current dir, you can use browser to open it.

Combine DataKit with async-profiler

Requirements:

By default, the program name will be automatically obtained as a 'service' to report the Guance. If customization is needed, the service name can be injected when the program starts:

java -Ddk.service=<service-name> ... -jar <your-jar>

There are two integration methods:

automate by script

Automated scripts can easily integrate async profiler and DataKit, use as follows:

  • create shell script

Create a file named "collect.sh" in current dir, type follow text:

???- note "collect.sh"(click to expand)

set -e
LIBRARY_VERSION=2.8.3

MAX_JFR_FILE_SIZE=6000000

datakit_url=http://localhost:9529
if [ -n "$DATAKIT_URL" ]; then
    datakit_url=$DATAKIT_URL
fi

datakit_profiling_url=$datakit_url/profiling/v1/input


app_env=dev
if [ -n "$APP_ENV" ]; then
    app_env=$APP_ENV
fi

app_version=0.0.0
if [ -n "$APP_VERSION" ]; then
    app_version=$APP_VERSION
fi

host_name=$(hostname)
if [ -n "$HOST_NAME" ]; then
    host_name=$HOST_NAME
fi

service_name=
if [ -n "$SERVICE_NAME" ]; then
    service_name=$SERVICE_NAME
fi

# profiling duration, in seconds
profiling_duration=10
if [ -n "$PROFILING_DURATION" ]; then
    profiling_duration=$PROFILING_DURATION
fi

# profiling event
profiling_event=cpu
if [ -n "$PROFILING_EVENT" ]; then
    profiling_event=$PROFILING_EVENT
fi

# 采集的 java 应用进程 ID, 此处可以自定义需要采集的 java 进程,比如可以根据进程名称过滤
java_process_ids=$(jps -q -J-XX:+PerfDisableSharedMem)
if [ -n "$PROCESS_ID" ]; then
    java_process_ids=`echo $PROCESS_ID | tr "," " "`
fi

if [[ $java_process_ids == "" ]]; then
    printf "Warning: no java program found, exit now\n"
    exit 1
fi

is_valid_process_id() {
    if [ -n "$1" ]; then
        if [[ $1 =~ ^[0-9]+$ ]]; then
            return 1
        fi
    fi
    return 0
}

profile_collect() {
    # disable -e
    set +e

    process_id=$1
    is_valid_process_id $process_id
    if [[ $? == 0 ]]; then
        printf "Warning: invalid process_id: $process_id, ignore"
        return 1
    fi

    uuid=$(uuidgen)
    jfr_file=$runtime_dir/profiler_$uuid.jfr
    event_json_file=$runtime_dir/event_$uuid.json

    arr=($(jps -v | grep "^$process_id"))

    process_name="default"

    for (( i = 0; i < ${#arr[@]}; i++ ))
    do
        value=${arr[$i]}
        if [ $i == 1 ]; then
            process_name=$value
        elif [[ $value =~ "-Ddk.service=" ]]; then
            service_name=${value/-Ddk.service=/}
        fi
    done

    start_time=$(date +%FT%T.%N%:z)
    ./profiler.sh -d $profiling_duration --fdtransfer -e $profiling_event -o jfr -f $jfr_file $process_id
    end_time=$(date +%FT%T.%N%:z)

    if [ ! -f $jfr_file ]; then
        printf "Warning: generating profiling file failed for %s, pid %d\n" $process_name $process_id
        return
    else
        printf "generate profiling file successfully for %s, pid %d\n" $process_name $process_id
    fi

    jfr_zip_file=$jfr_file.gz

    gzip -qc $jfr_file > $jfr_zip_file

    zip_file_size=`ls -la $jfr_zip_file | awk '{print $5}'`

    if [ -z "$service_name" ]; then
        service_name=$process_name
    fi

    if [ $zip_file_size -gt $MAX_JFR_FILE_SIZE ]; then
        printf "Warning: the size of the jfr file generated is bigger than $MAX_JFR_FILE_SIZE bytes, now is $zip_file_size bytes\n"
    else
        tags="library_version:$LIBRARY_VERSION,library_type:async_profiler,process_id:$process_id,process_name:$process_name,service:$service_name,host:$host_name,env:$app_env,version:$app_version"
        if [ -n "$PROFILING_TAGS" ]; then
          tags="$tags,$PROFILING_TAGS"
        fi
        cat >$event_json_file <<END
{
        "tags_profiler": "$tags",
        "start": "$start_time",
        "end": "$end_time",
        "family": "java",
        "format": "jfr"
}
END

        res=$(curl -i $datakit_profiling_url \
            -F "main=@$jfr_zip_file;filename=main.jfr" \
            -F "event=@$event_json_file;filename=event.json;type=application/json" | head -n 1 )

        if [[ ! $res =~ 2[0-9][0-9] ]]; then
            printf "Warning: send profile file to datakit failed, %s\n" "$res"
            printf "$res"
        else
            printf "Info: send profile file to datakit successfully\n"
            rm -rf $event_json_file $jfr_file $jfr_zip_file
        fi
    fi

    set -e
}

runtime_dir=runtime
if [ ! -d $runtime_dir ]; then
    mkdir $runtime_dir
fi

for process_id in $java_process_ids; do
    printf "profiling process %d\n" $process_id
    profile_collect $process_id > $runtime_dir/$process_id.log 2>&1 &
done

wait

for process_id in $java_process_ids; do
    log_file=$runtime_dir/$process_id.log
    if [ -f $log_file ]; then
        echo
        cat $log_file
        rm $log_file
    fi
done

  • Execute script
bash collect.sh

After the script is executed, the collected profiling data will be reported to the GuanceCloud platform through DataKit, which can be viewed later in the "APM" - "Profile" page.

available env:

  • DATAKIT_URL :DataKit URL address, default: http://localhost:9529
  • APP_ENV :current env, for example: dev/prod/test
  • APP_VERSION :your application version
  • HOST_NAME :hostname
  • SERVICE_NAME :your service name
  • PROFILING_DURATION :duration, in seconds
  • PROFILING_EVENT :events, for example: cpu/alloc/lock
  • PROFILING_TAGS :set custom tags, split by comma if multiples, e.g., key1:value1,key2:value2
  • PROCESS_ID :target process PID, for example: 98789,33432
DATAKIT_URL=http://localhost:9529 APP_ENV=test APP_VERSION=1.0.0 HOST_NAME=datakit PROFILING_EVENT=cpu,alloc PROFILING_DURATION=60 PROFILING_TAGS="tag1:val1,tag2:val2" PROCESS_ID=98789,33432 bash collect.sh

manually collect

Compared to automated scripts, manual operations have higher degrees of freedom and can meet the needs of different scenarios

  • generate profiling file, format in "jfr"
./profiler.sh -d 10 -o jfr -f profiling.jfr jps
  • prepare "event.JSON" file
{
    "tags_profiler": "library_version:2.8.3,library_type:async_profiler,process_id:16718,host:host_name,service:profiling-demo,env:dev,version:1.0.0",
    "start": "2022-10-28T14:30:39.122688553+08:00",
    "end": "2022-10-28T14:32:39.122688553+08:00",
    "family": "java",
    "format": "jfr"
}

fields:

  • tags_profiler: profiling tags,
    • library_version: async-profiler version
    • library_type: profiler name
    • process_id: Java process PID
    • host: hostname
    • service: your service name
    • env: your service env
    • version: your app version
    • others
  • start: profiling start time
  • end: profiling end time
  • family: language
  • format: format

  • upload to DataKit

$ curl http://localhost:9529/profiling/v1/input \
  -F "main=@profiling.jfr;filename=main.jfr" \
  -F "event=@event.json;filename=event.json;type=application/json"

If the http response body contains {"content":{"ProfileID":"xxxxxxxx"}} indicate successfully uploading.

Feedback

Is this page helpful? ×