Skip to content

Openllmetry

title : 'OpenLLMetry' summary : 'OpenLLMetry is developed and maintained by the Traceloop team under the Apache 2.0 license. It provides specialized monitoring and debugging tools for LLM applications by extending OpenTelemetry capabilities. It leverages OpenTelemetry's standardized telemetry data format to standardize the output of key performance metrics and trace information from LLM applications.' tags : - 'OTEL' - 'APM' __int_icon: 'icon/openllmetry' dashboard: - desc: 'OpenLLMetry' path: 'dashboard/zh/openllmetry'


OpenLLMetry is developed and maintained by the Traceloop team under the Apache 2.0 license. It provides specialized monitoring and debugging tools for LLM applications by extending OpenTelemetry capabilities. It leverages OpenTelemetry's standardized telemetry data format to standardize the output of key performance metrics and trace information from LLM applications.

Configuration

Before sending trace data to DataKit via OTEL, ensure that you have configured the Collector. Additionally, adjust the configuration file customer_tags = ["llm.request.type","traceloop.entity.path","llm.is_streaming","gen_ai.openai.api_base","gen_ai.prompt.1.content","gen_ai.response.model","gen_ai.completion.0.content","gen_ai.request.model","gen_ai.request.temperature","gen_ai.system","traceloop.workflow.name"] as follows:

[[inputs.opentelemetry]]
  ## customer_tags will work as a whitelist to prevent tags send to data center.
  ## All . will replace to _ ,like this :
  ## "project.name" to send to GuanCe center is "project_name"
    customer_tags = ["llm.request.type","traceloop.entity.path","llm.is_streaming","gen_ai.openai.api_base","gen_ai.prompt.1.content","gen_ai.response.model","gen_ai.completion.0.content","gen_ai.request.model","gen_ai.request.temperature","gen_ai.system","traceloop.workflow.name"]
  ...

After completing the adjustments, restart DataKit

Install OpenTelemetry SDK

pip install opentelemetry-api opentelemetry-instrumentation pip install opentelemetry-instrumentation-flask

Install OpenLLMetry SDK

pip install traceloop-sdk

Initialize OpenLLMetry in your application

from traceloop.sdk import Traceloop

# Initializing OpenLit
# Traceloop.init()

Traceloop.init(app_name="kimi_openllmetry_stream_flask")

OpenLLMetry Sample Code

import os
import httpx
from flask import Flask, request, Response,jsonify,stream_with_context
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow,task
from openai import OpenAI

from opentelemetry.instrumentation.flask import FlaskInstrumentor


app = Flask(__name__)
# Automatically instrument the Flask app using FlaskInstrumentor
FlaskInstrumentor().instrument_app(app)

# Initializing OpenLit
Traceloop.init(app_name="kimi_openllmetry_stream_flask")

# Retrieving API Key from environment variable
api_key = os.getenv("MOONSHOT_API_KEY")
if not api_key:
    raise ValueError("Please set the MOONSHOT_API_KEY environment variable")

client = OpenAI(
    api_key=api_key,
    base_url="https://api.moonshot.cn/v1",
)

def estimate_token_count(input_messages) -> int:
    """
    Calculate the number of Tokens in the input messages.
    """
    try:
        header = {
            "Authorization": f"Bearer {api_key}",
        }
        data = {
            "model": "moonshot-v1-128k",
            "messages": input_messages,
        }
        with httpx.Client() as client:
            print("Calling API endpoint")
            r = client.post("https://api.moonshot.cn/v1/tokenizers/estimate-token-count", headers=header, json=data)
            r.raise_for_status()
            response_data = r.json()
            print(response_data["data"]["total_tokens"])
            return response_data["data"]["total_tokens"]
    except httpx.RequestError as e:
        print(f"Request failed: {e}")
        raise
    except (KeyError, ValueError) as e:
        print(f"Parsing response failed: {e}")
        raise

def select_model(input_messages, max_tokens=1024) -> str:
    """
    Select an appropriate model based on the provided context messages and the expected max_tokens value.
    """
    if not isinstance(max_tokens, int) or max_tokens <= 0:
        raise ValueError("max_tokens must be a positive integer")

    prompt_tokens = estimate_token_count(input_messages)
    total_tokens = prompt_tokens + max_tokens

    if total_tokens <= 8 * 1024:
        return "moonshot-v1-8k"
    elif total_tokens <= 32 * 1024:
        return "moonshot-v1-32k"
    elif total_tokens <= 128 * 1024:
        return "moonshot-v1-128k"
    else:
        raise ValueError("Token count exceeds limit 😢")

@app.route('/ask', methods=['POST'])
@workflow(name="ask_workflow")
def ask():
    data = request.json
    messages = data.get('messages')
    max_tokens = data.get('max_tokens', 2048)

    if not messages:
        return jsonify({"error": "The messages field cannot be empty"}), 400

    try:
        model = select_model(messages, max_tokens)

        completion = client.chat.completions.create(
            model=model,
            messages=messages,
            max_tokens=max_tokens,
            temperature=0.3,
            stream=True  # Enable streaming generation
        )

        def generate():
            for chunk in completion:
                # yield chunk.choices[0].delta.content or ''
                delta = chunk.choices[0].delta
                if delta.content:
                    print(delta.content, end="")
                    yield delta.content or ''

        return Response(stream_with_context(generate()), content_type='text/event-stream')
    except Exception as e:
        return jsonify({"error": str(e)}), 500
if __name__ == '__main__':
    app.run(debug=True,port=5001)

Configure env to report data to Datakit via OpenTelemetry

export TRACELOOP_BASE_URL=http://localhost:9529/otel

Metric Details

Metric Name Description Unit
gen_ai.client.generation.choices Number of choices generated by the client count
gen_ai.client.operation.duration_bucket Histogram bucket for duration of client operations ms
gen_ai.client.operation.duration_count Total number of client operations times
gen_ai.client.operation.duration_max Maximum duration of client operations ms
gen_ai.client.operation.duration_min Minimum duration of client operations ms
gen_ai.client.operation.duration_sum Total duration of client operations ms
llm.openai.chat_completions.streaming_time_to_first_token_bucket Histogram bucket for time taken to generate the first token during streaming chat completions in OpenAI ms
llm.openai.chat_completions.streaming_time_to_first_token_count Total number of times the first token was generated during streaming chat completions in OpenAI times
llm.openai.chat_completions.streaming_time_to_first_token_max Maximum time taken to generate the first token during streaming chat completions in OpenAI ms
llm.openai.chat_completions.streaming_time_to_first_token_min Minimum time taken to generate the first token during streaming chat completions in OpenAI ms
llm.openai.chat_completions.streaming_time_to_first_token_sum Total time taken to generate the first token during streaming chat completions in OpenAI ms
llm.openai.chat_completions.streaming_time_to_generate_bucket Histogram bucket for total time taken to generate content during streaming chat completions in OpenAI ms
llm.openai.chat_completions.streaming_time_to_generate_count Total number of times content was generated during streaming chat completions in OpenAI times
llm.openai.chat_completions.streaming_time_to_generate_max Maximum time taken to generate content during streaming chat completions in OpenAI ms
llm.openai.chat_completions.streaming_time_to_generate_min Minimum time taken to generate content during streaming chat completions in OpenAI ms
llm.openai.chat_completions.streaming_time_to_generate_sum Total time taken to generate content during streaming chat completions in OpenAI ms

References

Feedback

Is this page helpful? ×