Openllmetry

'OTEL'
'APM' __int_icon: 'icon/openllmetry' dashboard:
desc: 'OpenLLMetry' path: 'dashboard/en/openllmetry'

OpenLLMetry is developed and maintained by the Traceloop team under the Apache 2.0 license. It extends the functionality of OpenTelemetry to provide specialized monitoring and debugging tools for LLM applications. It utilizes OpenTelemetry's standardized telemetry data format to standardize the output of key performance metrics and tracing information for LLM applications.

Configuration¶

Before sending APM data to DataKit using OTEL, ensure that the Collector is configured. Also, adjust the configuration file customer_tags = ["llm.request.type","traceloop.entity.path","llm.is_streaming","gen_ai.openai.api_base","gen_ai.prompt.1.content","gen_ai.response.model","gen_ai.completion.0.content","gen_ai.request.model","gen_ai.request.temperature","gen_ai.system","traceloop.workflow.name"] as shown below:

[[inputs.opentelemetry]]
  ## customer_tags will work as a whitelist to prevent tags send to data center.
  ## All . will replace to _ ,like this :
    customer_tags = ["llm.request.type","traceloop.entity.path","llm.is_streaming","gen_ai.openai.api_base","gen_ai.prompt.1.content","gen_ai.response.model","gen_ai.completion.0.content","gen_ai.request.model","gen_ai.request.temperature","gen_ai.system","traceloop.workflow.name"]
  ...

After making the adjustments, restart DataKit.

Install OpenTelemetry SDK¶

pip install opentelemetry-api opentelemetry-instrumentation
pip install opentelemetry-instrumentation-flask

Install OpenLLMetry SDK¶

pip install traceloop-sdk

Initialize OpenLLMetry in the Application¶

from traceloop.sdk import Traceloop

# Initialize OpenLit
# Traceloop.init()

Traceloop.init(app_name="kimi_openllmetry_stream_flask")

OpenLLMetry Example Code¶

import os
import httpx
from flask import Flask, request, Response,jsonify,stream_with_context
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow,task
from openai import OpenAI

from opentelemetry.instrumentation.flask import FlaskInstrumentor


app = Flask(__name__)
# Use FlaskInstrumentor to automatically instrument the Flask application
FlaskInstrumentor().instrument_app(app)

# Initialize OpenLit
Traceloop.init(app_name="kimi_openllmetry_stream_flask")

# Get the API Key from the environment variable
api_key = os.getenv("MOONSHOT_API_KEY")
if not api_key:
    raise ValueError("Please set the MOONSHOT_API_KEY environment variable")

client = OpenAI(
    api_key=api_key,
    base_url="https://api.moonshot.cn/v1",
)

def estimate_token_count(input_messages) -> int:
    """
    Calculate the number of tokens in the input messages.
    """
    try:
        header = {
            "Authorization": f"Bearer {api_key}",
        }
        data = {
            "model": "moonshot-v1-128k",
            "messages": input_messages,
        }
        with httpx.Client() as client:
            print("Requesting interface")
            r = client.post("https://api.moonshot.cn/v1/tokenizers/estimate-token-count", headers=header, json=data)
            r.raise_for_status()
            response_data = r.json()
            print(response_data["data"]["total_tokens"])
            return response_data["data"]["total_tokens"]
    except httpx.RequestError as e:
        print(f"Request failed: {e}")
        raise
    except (KeyError, ValueError) as e:
        print(f"Failed to parse response: {e}")
        raise

def select_model(input_messages, max_tokens=1024) -> str:
    """
    Select the appropriate model based on the input context messages and the expected max_tokens value.
    """
    if not isinstance(max_tokens, int) or max_tokens <= 0:
        raise ValueError("max_tokens must be a positive integer")

    prompt_tokens = estimate_token_count(input_messages)
    total_tokens = prompt_tokens + max_tokens

    if total_tokens <= 8 * 1024:
        return "moonshot-v1-8k"
    elif total_tokens <= 32 * 1024:
        return "moonshot-v1-32k"
    elif total_tokens <= 128 * 1024:
        return "moonshot-v1-128k"
    else:
        raise ValueError("Token count exceeds the limit 😢")

@app.route('/ask', methods=['POST'])
@workflow(name="ask_workflow")
def ask():
    data = request.json
    messages = data.get('messages')
    max_tokens = data.get('max_tokens', 2048)

    if not messages:
        return jsonify({"error": "The messages field cannot be empty"}), 400

    try:
        model = select_model(messages, max_tokens)

        completion = client.chat.completions.create(
            model=model,
            messages=messages,
            max_tokens=max_tokens,
            temperature=0.3,
            stream=True  # Enable streaming generation
        )

        def generate():
            for chunk in completion:
                # yield chunk.choices[0].delta.content or ''
                delta = chunk.choices[0].delta
                if delta.content:
                    print(delta.content, end="")
                    yield delta.content or ''

        return Response(stream_with_context(generate()), content_type='text/event-stream')
    except Exception as e:
        return jsonify({"error": str(e)}), 500
if __name__ == '__main__':
    app.run(debug=True,port=5001)

Configure env to send data to Datakit via OpenTelemetry

export TRACELOOP_BASE_URL=http://localhost:9529/otel

Metric Details¶

Metric Name	Description	Unit
`gen_ai.client.generation.choices`	Number of choices generated by the client	count
`gen_ai.client.operation.duration_bucket`	Histogram bucket for the duration of client operations	milliseconds
`gen_ai.client.operation.duration_count`	Total number of client operations	count
`gen_ai.client.operation.duration_max`	Maximum duration of client operations	milliseconds
`gen_ai.client.operation.duration_min`	Minimum duration of client operations	milliseconds
`gen_ai.client.operation.duration_sum`	Total duration of client operations	milliseconds
`llm.openai.chat_completions.streaming_time_to_first_token_bucket`	Histogram bucket for the time to first token in OpenAI chat completions streaming	milliseconds
`llm.openai.chat_completions.streaming_time_to_first_token_count`	Total number of times the first token is generated in OpenAI chat completions streaming	count
`llm.openai.chat_completions.streaming_time_to_first_token_max`	Maximum time to first token in OpenAI chat completions streaming	milliseconds
`llm.openai.chat_completions.streaming_time_to_first_token_min`	Minimum time to first token in OpenAI chat completions streaming	milliseconds
`llm.openai.chat_completions.streaming_time_to_first_token_sum`	Total time to first token in OpenAI chat completions streaming	milliseconds
`llm.openai.chat_completions.streaming_time_to_generate_bucket`	Histogram bucket for the total time to generate content in OpenAI chat completions streaming	milliseconds
`llm.openai.chat_completions.streaming_time_to_generate_count`	Total number of times content is generated in OpenAI chat completions streaming	count
`llm.openai.chat_completions.streaming_time_to_generate_max`	Maximum time to generate content in OpenAI chat completions streaming	milliseconds
`llm.openai.chat_completions.streaming_time_to_generate_min`	Minimum time to generate content in OpenAI chat completions streaming	milliseconds
`llm.openai.chat_completions.streaming_time_to_generate_sum`	Total time to generate content in OpenAI chat completions streaming	milliseconds