Skip to content

Profiling Java

Datakit now supports two Java profiling tools: dd-trace-java and async-profiler.

dd-trace-Java

Firstly Download dd-trace-Java from https://github.com/DataDog/dd-trace-java/releases .

Requirements

Datakit is now compatible with dd-trace-Java 1.15.x and below, the compatibility with higher version is unknown. please feel free to send your feedback to us if you encounter any incompatibility.

Required JDK version:

  • OpenJDK 11.0.17+, 17.0.5+
  • Oracle JDK 11.0.17+, 17.0.5+
  • OpenJDK 8 version 8u352+

Run Java Code

java -javaagent:/<your-path>/dd-java-agent.jar \
    -Ddd.service.name=profiling-demo \
    -Ddd.env=dev \
    -Ddd.version=1.2.3  \
    -Ddd.profiling.enabled=true  \
    -XX:FlightRecorderOptions=stackdepth=256 \
    -Ddd.trace.agent.port=9529 \
    -jar your-app.jar 

After a minute or two, you can visualize your profiles on the profile.

Async Profiler

async-profiler is an open source Java profiler Based on HotSpot API, it can collect information such as stack and memory allocation during program operation.

async-profiler can trace the following kinds of events:

  • CPU cycles
  • Hardware and Software performance counters like cache misses, branch misses, page faults, context switches etc.
  • Allocations in Java Heap
  • Contented lock attempts, including both Java object monitors and ReentrantLocks

Install async-profiler

Requirements

Datakit is now compatible with async-profiler v2.9 and below, higher version compatibility is unknown.

The official website provides download for different platform binaries:

Download archive and extract as below(Linux x64):

$ wget https://github.com/async-profiler/async-profiler/releases/download/v2.8.3/async-profiler-2.8.3-linux-x64.tar.gz 
$ tar -zxf async-profiler-2.8.3-linux-x64.tar.gz 
$ cd async-profiler-2.8.3-linux-x64 && ls

  build  CHANGELOG.md  LICENSE  profiler.sh  README.md

Use async-profiler

  • Set Linux kernel option perf_events

As of Linux 4.6, capturing kernel call stacks using perf_events from a non-root process requires setting two runtime variables. You can set them using sysctl or as follows:

sudo sysctl kernel.perf_event_paranoid=1
sudo sysctl kernel.kptr_restrict=0 
  • Install Debug Symbols

If memory allocation (allocate) related events need to be collected, it is required to install Debug Symbols. Oracle JDK already has these symbols built-in, so this step can be skipped. OpenJDK needs to be installed, and the installation method is as follows:

sudo apt install openjdk-8-dbg # OpenJDK 8
# Or
sudo apt install openjdk-11-dbg # OpenJDK 11
sudo debuginfo-install java-1.8.0-openjdk

The gdb tool can be used to verify if the debug symbols are properly installed . For example on Linux:

gdb $JAVA_HOME/lib/server/libjvm.so -ex 'info address UseG1GC'

This command's output will either contain Symbol "UseG1GC" is at 0xxxxx or No symbol "UseG1GC" in current context.

  • Check Java process PID

Before collection, you need to know the Java process's PID(use jps command)

$ jps

9234 Jps
8983 Computey
  • Profile Java process

Run profiler.sh and specify Java process PID:

./profiler.sh -d 10 -f profiling.html 8983 

Profiling for 10 seconds
Done

After about 10s, there will generate a file named profiling.html in current dir, you can use browser to open it.

Combine DataKit with async-profiler

Requirements:

By default, the program name will be automatically obtained as a 'service' to report the Guance Cloud. If customization is needed, the service name can be injected when the program starts:

java -Ddk.service=<service-name> ... -jar <your-jar>

There are two integration methods:

automate by script

Automated scripts can easily integrate async profiler and DataKit, use as follows:

  • create shell script

Create a file named "collect.sh" in current dir, type follow text:

???- note "collect.sh"(click to expand)

set -e
LIBRARY_VERSION=2.8.3

MAX_JFR_FILE_SIZE=6000000

datakit_url=http://localhost:9529
if [ -n "$DATAKIT_URL" ]; then
    datakit_url=$DATAKIT_URL
fi

datakit_profiling_url=$datakit_url/profiling/v1/input


app_env=dev
if [ -n "$APP_ENV" ]; then
    app_env=$APP_ENV
fi

app_version=0.0.0
if [ -n "$APP_VERSION" ]; then
    app_version=$APP_VERSION
fi

host_name=$(hostname)
if [ -n "$HOST_NAME" ]; then
    host_name=$HOST_NAME
fi

service_name=
if [ -n "$SERVICE_NAME" ]; then
    service_name=$SERVICE_NAME
fi

# profiling duration, in seconds
profiling_duration=10
if [ -n "$PROFILING_DURATION" ]; then
    profiling_duration=$PROFILING_DURATION
fi

# profiling event
profiling_event=cpu
if [ -n "$PROFILING_EVENT" ]; then
    profiling_event=$PROFILING_EVENT
fi

# 采集的 java 应用进程 ID, 此处可以自定义需要采集的 java 进程,比如可以根据进程名称过滤
java_process_ids=$(jps -q -J-XX:+PerfDisableSharedMem)
if [ -n "$PROCESS_ID" ]; then
    java_process_ids=`echo $PROCESS_ID | tr "," " "`
fi

if [[ $java_process_ids == "" ]]; then
    printf "Warning: no java program found, exit now\n"
    exit 1
fi

is_valid_process_id() {
    if [ -n "$1" ]; then
        if [[ $1 =~ ^[0-9]+$ ]]; then
            return 1
        fi
    fi
    return 0
}

profile_collect() {
    # disable -e
    set +e

    process_id=$1
    is_valid_process_id $process_id
    if [[ $? == 0 ]]; then
        printf "Warning: invalid process_id: $process_id, ignore"
        return 1
    fi

    uuid=$(uuidgen)
    jfr_file=$runtime_dir/profiler_$uuid.jfr
    event_json_file=$runtime_dir/event_$uuid.json

    arr=($(jps -v | grep "^$process_id"))

    process_name="default"

    for (( i = 0; i < ${#arr[@]}; i++ ))
    do
        value=${arr[$i]}
        if [ $i == 1 ]; then
            process_name=$value
        elif [[ $value =~ "-Ddk.service=" ]]; then
            service_name=${value/-Ddk.service=/}
        fi
    done

    start_time=$(date +%FT%T.%N%:z)
    ./profiler.sh -d $profiling_duration --fdtransfer -e $profiling_event -o jfr -f $jfr_file $process_id
    end_time=$(date +%FT%T.%N%:z)

    if [ ! -f $jfr_file ]; then
        printf "Warning: generating profiling file failed for %s, pid %d\n" $process_name $process_id
        return
    else
        printf "generate profiling file successfully for %s, pid %d\n" $process_name $process_id
    fi

    jfr_zip_file=$jfr_file.gz

    gzip -qc $jfr_file > $jfr_zip_file

    zip_file_size=`ls -la $jfr_zip_file | awk '{print $5}'`

    if [ -z "$service_name" ]; then
        service_name=$process_name
    fi

    if [ $zip_file_size -gt $MAX_JFR_FILE_SIZE ]; then
        printf "Warning: the size of the jfr file generated is bigger than $MAX_JFR_FILE_SIZE bytes, now is $zip_file_size bytes\n"
    else
        tags="library_version:$LIBRARY_VERSION,library_type:async_profiler,process_id:$process_id,process_name:$process_name,service:$service_name,host:$host_name,env:$app_env,version:$app_version"
        if [ -n "$PROFILING_TAGS" ]; then
          tags="$tags,$PROFILING_TAGS"
        fi
        cat >$event_json_file <<END
{
        "tags_profiler": "$tags",
        "start": "$start_time",
        "end": "$end_time",
        "family": "java",
        "format": "jfr"
}
END

        res=$(curl -i $datakit_profiling_url \
            -F "main=@$jfr_zip_file;filename=main.jfr" \
            -F "event=@$event_json_file;filename=event.json;type=application/json" | head -n 1 )

        if [[ ! $res =~ 2[0-9][0-9] ]]; then
            printf "Warning: send profile file to datakit failed, %s\n" "$res"
            printf "$res"
        else
            printf "Info: send profile file to datakit successfully\n"
            rm -rf $event_json_file $jfr_file $jfr_zip_file
        fi
    fi

    set -e
}

runtime_dir=runtime
if [ ! -d $runtime_dir ]; then
    mkdir $runtime_dir
fi

for process_id in $java_process_ids; do
    printf "profiling process %d\n" $process_id
    profile_collect $process_id > $runtime_dir/$process_id.log 2>&1 &
done

wait

for process_id in $java_process_ids; do
    log_file=$runtime_dir/$process_id.log
    if [ -f $log_file ]; then
        echo
        cat $log_file
        rm $log_file
    fi
done

  • Execute script
bash collect.sh

After the script is executed, the collected profiling data will be reported to the GuanceCloud platform through DataKit, which can be viewed later in the "APM" - "Profile" page.

available env:

  • DATAKIT_URL :DataKit URL address, default: http://localhost:9529
  • APP_ENV :current env, for example: dev/prod/test
  • APP_VERSION :your application version
  • HOST_NAME :hostname
  • SERVICE_NAME :your service name
  • PROFILING_DURATION :duration, in seconds
  • PROFILING_EVENT :events, for example: cpu/alloc/lock
  • PROFILING_TAGS :set custom tags, split by comma if multiples, e.g., key1:value1,key2:value2
  • PROCESS_ID :target process PID, for example: 98789,33432
DATAKIT_URL=http://localhost:9529 APP_ENV=test APP_VERSION=1.0.0 HOST_NAME=datakit PROFILING_EVENT=cpu,alloc PROFILING_DURATION=60 PROFILING_TAGS="tag1:val1,tag2:val2" PROCESS_ID=98789,33432 bash collect.sh

manually collect

Compared to automated scripts, manual operations have higher degrees of freedom and can meet the needs of different scenarios

  • generate profiling file, format in "jfr"
./profiler.sh -d 10 -o jfr -f profiling.jfr jps
  • prepare "event.JSON" file
{
    "tags_profiler": "library_version:2.8.3,library_type:async_profiler,process_id:16718,host:host_name,service:profiling-demo,env:dev,version:1.0.0",
    "start": "2022-10-28T14:30:39.122688553+08:00",
    "end": "2022-10-28T14:32:39.122688553+08:00",
    "family": "java",
    "format": "jfr"
}

fields:

  • tags_profiler: profiling tags,
    • library_version: async-profiler version
    • library_type: profiler name
    • process_id: Java process PID
    • host: hostname
    • service: your service name
    • env: your service env
    • version: your app version
    • others
  • start: profiling start time
  • end: profiling end time
  • family: language
  • format: format

  • upload to DataKit

$ curl http://localhost:9529/profiling/v1/input \
  -F "main=@profiling.jfr;filename=main.jfr" \
  -F "event=@event.json;filename=event.json;type=application/json"

If the http response body contains {"content":{"ProfileID":"xxxxxxxx"}} indicate successfully uploading.

Feedback

Is this page helpful? ×