How to Configure Application Performance Monitoring Sampling¶

Guance's APM feature supports the analysis and management of trace data collected by collectors that conform to the Opentracing protocol. By default, application performance data is collected in full volume, meaning that every call generates data. Without restrictions, this can result in a large amount of collected data, consuming excessive storage. You can save on data storage and reduce costs by setting sampling rates for collecting application performance data.

Below, we will use DDtrace Collector as an example to introduce how to configure a 5% sampling rate for application performance data.

Prerequisites¶

Sampling Configuration¶

Before starting to collect application performance data, you need to configure DDtrace according to different languages.

Below, we will take a commonly used Python Webserver Flask application as an example to explain how to sample application performance data. In the example, SERVICE_A provides HTTP services and calls the SERVICE_B HTTP service.

Step 1: Install DDtrace Runtime Environment¶

pip install ddtrace

Step 2: Install flask Package¶

pip install flask

Step 3: Configure Sampling¶

Sampling Configuration Description

Create SERVICE_A and SERVICE_B, configure sampling at 5% for SERVICE_A, and use the default collection for SERVICE_B.

When creating them, reference DDtrace and set the service name, service name mapping relationship, and configure the project name, environment name, version number information through environment variables. Additionally, configure the DataKit trace API service address (the specific address depends on the DataKit address with port number 9529).

Refer to Python Flask Complete Example.

1. SERVICE_A¶

# -*- encoding: utf8 -*-
#--------- service_a.py ----------

from flask import Flask, request
import requests, os
from ddtrace import tracer,sampler

# Set service name
os.environ["DD_SERVICE"] = "SERVICE_A"

# Set service name mapping relationship
os.environ["DD_SERVICE_MAPPING"] = "postgres:postgresql,defaultdb:postgresql"

# Set project name, environment name, version number via environment variables
os.environ["DD_TAGS"] = "project:your_project_name,env=test,version=v1"

# Configure DataKit trace API service address
tracer.configure(
    hostname = "localhost",  # Depends on the specific DataKit address
    port     = "9529",
    sampler  = sampler.RateSampler(0.05),
)

app = Flask(__name__)

def shutdown_server():
    func = request.environ.get('werkzeug.server.shutdown')
    if func is None:
        raise RuntimeError('Not running with the Werkzeug Server')
    func()

@app.route('/a',  methods=['GET'])
def index():
    requests.get('http://127.0.0.1:54322/b')
    return 'OK', 200

@app.route('/stop',  methods=['GET'])
def stop():
    shutdown_server()
    return 'Server shutting down...\n'

# Start service A: HTTP service starts on port 54321
if __name__ == '__main__':
    app.run(host="0.0.0.0", port=54321, debug=True)

Example:

2. SERVICE_B¶

# -*- encoding: utf8 -*-

#--------- service_b.py ----------

from flask import Flask, request
import os, time, requests
from ddtrace import tracer

# Set service name
os.environ["DD_SERVICE"] = "SERVICE_B"

# Set service name mapping relationship
os.environ["DD_SERVICE_MAPPING"] = "postgres:postgresql,defaultdb:postgresql"

# Set project name, environment name, version number via environment variables
os.environ["DD_TAGS"] = "project:your_project_name,env=test,version=v1"

tracer.configure(
    hostname = "localhost",  # Depends on the specific DataKit address
    port="9529",
)

app = Flask(__name__)

def shutdown_server():
    func = request.environ.get('werkzeug.server.shutdown')
    if func is None:
        raise RuntimeError('Not running with the Werkzeug Server')
    func()

@app.route('/b',  methods=['GET'])
def index():
    time.sleep(1)
    return 'OK', 200

@app.route('/stop',  methods=['GET'])
def stop():
    shutdown_server()
    return 'Server shutting down...\n'

# Start service B: HTTP service starts on port 54322
if __name__ == '__main__':
    app.run(host="0.0.0.0", port=54322, debug=True)

Step 4: Start and Call `SERVICE_A` and `SERVICE_B`¶

# Start both services in the background:
(ddtrace-run python3 service_a.py &> a.log &)
(ddtrace-run python3 service_b.py &> b.log &)

# Call service A, which triggers a call to service B, thus generating corresponding trace data (this can be executed multiple times).
curl http://localhost:54321/a

# Stop both services
curl http://localhost:54321/stop
curl http://localhost:54322/stop

Step 5: View Results in Guance Workspace¶

Log into the Guance workspace, where you can see the collected SERVICE_A and SERVICE_B trace data.

Warning

Application performance sampling is based on traces. If there are 100 traces and the sampling rate is set to 5%, then 5% of these traces will be randomly sampled, meaning 5 traces and all their Spans will be reported to the Guance workspace.

In this example, SERVICE_A provides HTTP services and calls the SERVICE_B HTTP service, which forms one trace. Assuming there are 100 traces, 5 of them will be randomly reported.

How to Configure Application Performance Monitoring Sampling¶

Prerequisites¶

Sampling Configuration¶

Step 1: Install DDtrace Runtime Environment¶

Step 2: Install flask Package¶

Step 3: Configure Sampling¶

1. SERVICE_A¶

2. SERVICE_B¶

Step 4: Start and Call `SERVICE_A` and `SERVICE_B`¶

Step 5: View Results in Guance Workspace¶

Further Reading¶

Is this page helpful? ×

How to Configure Application Performance Monitoring Sampling¶

Prerequisites¶

Sampling Configuration¶

Step 1: Install DDtrace Runtime Environment¶

Step 2: Install flask Package¶

Step 3: Configure Sampling¶

1. SERVICE_A¶

2. SERVICE_B¶

Step 4: Start and Call SERVICE_A and SERVICE_B¶

Step 5: View Results in Guance Workspace¶

Further Reading¶

Is this page helpful? ×

Step 4: Start and Call `SERVICE_A` and `SERVICE_B`¶