Langfuse

Introduction to Langfuse¶

Langfuse is an open-source observability platform designed specifically for LLM (Large Language Model) applications, offering the following core features:

Full-Link Trace Tracing
- Records LLM call chains (Prompt→LLM→Output)
- Supports tracking of multi-step complex workflows
Metric Monitoring
- Token usage statistics
- Request latency monitoring
- Cost calculation (based on model pricing)
Data Annotation and Analysis
- Manual annotation functionality
- Output quality scoring
- A/B testing support

Langfuse Access Sites¶

Site Name (Alias)	Domain Name
Hangzhou (CN1)	`https://llm-openway.guance.com`
Ningxia (CN2)	`https://aws-llm-openway.guance.com`
Beijing (CN3)	`https://cn3-llm-openway.guance.com`
Guangzhou (CN4)	`https://cn4-llm-openway.guance.com`
Hong Kong (CN6)	`https://cn6-llm-openway.guance.com`
Oregon (US1)	`https://us1-llm-openway.guance.com`
Frankfurt (EU1)	`https://eu1-llm-openway.guance.com`
Singapore (EU1)	`https://ap1-llm-openway.guance.com`
Singapore (AP1)	`https://id1-llm-openway.guance.com`
Middle East (ME1)	`https://me1-llm-openway.guance.com`

Python Integration¶

Install Dependencies¶

# Core SDK
pip install langfuse

# Optional: Async support (recommended for production environments)
pip install langfuse[async]

# Development tools (for testing)
pip install pytest langfuse-test

Langfuse Integration Instructions

Langfuse supports a wide range of LLM integrations. Currently, we have tested the following data ingestion methods, and support for more models is pending further testing.
- Dify
- LangChain
- Ollama
- Gemini
- OpenAI
In the following text, YOUR_LLM_APP_ID and YOUR_LLM_APP_TOKEN correspond to Langfuse's public key and private key respectively.

Python SDK Integration Example¶

Initialize the Client

Environment Variable Integration (Recommended)Parameter Construction

LANGFUSE_PUBLIC_KEY="YOUR_LLM_APP_ID"
LANGFUSE_SECRET_KEY="YOUR_LLM_APP_TOKEN"
LANGFUSE_HOST="https://llm-openway.guance.com"

from langfuse import Langfuse
langfuse = Langfuse(
    public_key="YOUR_LLM_APP_ID",
    secret_key="YOUR_LLM_APP_TOKEN",
    host="https://llm-openway.guance.com"
)

Integration Verification

from langfuse import Langfuse

# Initialize with constructor arguments
langfuse = Langfuse(
    public_key="YOUR_LLM_APP_ID",
    secret_key="YOUR_LLM_APP_TOKEN",
    host="https://llm-openway.guance.com"
)

# Verify connection, do not use in production as this is a synchronous call
if langfuse.auth_check():
    print("Langfuse client is authenticated and ready!")

If integration fails, an error similar to the following will occur:

langfuse.api.resources.commons.errors.unauthorized_error.UnauthorizedError: status_code: 401, body: {}

Python Application Examples¶

The following are several simple examples.

Ollama Integration¶

If you have Ollama deployed locally, you can use Langfuse to track Ollama API calls:

import os

from langfuse.openai import OpenAI, AsyncOpenAI, AzureOpenAI, AsyncAzureOpenAI

os.environ["LANGFUSE_PUBLIC_KEY"] = "YOUR_LLM_APP_ID"
os.environ["LANGFUSE_SECRET_KEY"] = "YOUR_LLM_APP_TOKEN"
os.environ["LANGFUSE_HOST"] = "https://llm-openway.guance.com"

# Configure the OpenAI client to use http://localhost:11434/v1 as base url
client = OpenAI(
    base_url = 'http://localhost:11434/v1', # local deployed ollama service
    api_key='ollama', # required, but unused
)

stream=False # use stream mode

response = client.chat.completions.create(
        #model="llama3.1:latest",
        model="gemma3:4b", # specify gemma3:4b
        stream=stream,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain the working principles of nuclear fusion and nuclear fission."},
            ]
        )

if stream:
    for chk in response:
        content = chk.choices[0].delta.content
        if content is not None:
            print(content, end="", flush=True)
else:
    print(response)

DeepSeek Integration¶

import os
from langfuse.openai import OpenAI
from langfuse import observe
from langfuse import get_client

os.environ["LANGFUSE_PUBLIC_KEY"] = "YOUR_LLM_APP_ID" 
os.environ["LANGFUSE_SECRET_KEY"] = "YOUR_LLM_APP_TOKEN" 
os.environ["LANGFUSE_HOST"] = "https://llm-openway.guance.com"

# Your DeepSeek API key (get it from https://platform.deepseek.com/api_keys)
os.environ["DEEPSEEK_API_KEY"] = "YOUR_DEEPSEEK_API_KEY"  # Replace with your DeepSeek API key

client = OpenAI(
    base_url="https://api.deepseek.com",
    api_key=os.getenv('DEEPSEEK_API_KEY'),
)

langfuse = get_client()

@observe()
def my_llm_call(input):
    completion = client.chat.completions.create(
        name="story-generator",
        model="deepseek-chat",
        messages=[
            {"role": "system", "content": "You are a creative storyteller."},
            {"role": "user", "content": input}
        ],
        metadata={"genre": "adventure"},
    )
    return completion.choices[0].message.content

with langfuse.start_as_current_span(name="my-ds-trace") as span:
    # Run your application here
    output = my_llm_call("Tell me a short story about a token that got lost on its way to the language model. Answer in 100 words or less.")

    # Pass additional attributes to the span
    span.update_trace(
        input=input,
        output=output,
        user_id="user_123",
        session_id="session_abc",
        tags=["agent", "my-trace"],
        metadata={"email": "user@langfuse.com"},
        version="1.0.0"
        )

# Flush events in short-lived applications
langfuse.flush()

For more Langfuse integration examples, refer to here.

JavaScript Integration¶

The following is an example of using the Langfuse JavaScript SDK (v4) to call local Ollama.

Warning

The Langfuse JavaScript SDK only supports version v4, as it uses the OpenTelemetry protocol for data reporting, whose data format is more compliant with trace data specifications.

Node.js/npm: Node.js (Version 18+)
Local Ollama service deployed (http://localhost:11434/v1)

Install dependencies

npm install dotenv openai @langfuse/openai @langfuse/otel @opentelemetry/sdk-trace-node @opentelemetry/api @langfuse/client

Set up the .env file

# Replace with the keys from your Langfuse project's settings
LANGFUSE_SECRET_KEY=llm_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
LANGFUSE_PUBLIC_KEY=<YOUR-LLM-APP-NAME>

LANGFUSE_BASEURL="https://llm-openway.guance.com"

OLLAMA_HOST=localhost:11434
OLLAMA_MODEL=<your-model-name> # such as qwen3:1.7b

Create demo.js

demo.js

// --- demo.js ---
// An interactive command-line app using Langfuse v4 SDK with OpenTelemetry.

// -----------------------------------------------------------------------------
// PART 1: SETUP (Executed once at the start)
// -----------------------------------------------------------------------------

import 'dotenv/config';
import { observeOpenAI } from "@langfuse/openai";
import { LangfuseSpanProcessor } from "@langfuse/otel";
import { NodeTracerProvider } from "@opentelemetry/sdk-trace-node";
import { trace, context } from "@opentelemetry/api";
import { LangfuseClient } from "@langfuse/client";
import OpenAI from "openai";
import readline from "node:readline";

// --- Manual OpenTelemetry Provider Setup ---
const langfuseSpanProcessor = new LangfuseSpanProcessor();
const tracerProvider = new NodeTracerProvider({
    // Note: BatchSpanProcessor is used internally by LangfuseSpanProcessor for efficiency
    spanProcessors: [langfuseSpanProcessor],
});
tracerProvider.register();

const lfscore = new LangfuseClient();

console.log("OpenTelemetry provider configured and registered globally.");

const OLLAMA_HOST = process.env.OLLAMA_HOST;

// --- Tracer and OpenAI Client Setup ---
const tracer = trace.getTracer("my-llm-app", "1.0.0");
const openai = new OpenAI({
    baseURL: `http://${OLLAMA_HOST}/v1`,
    apiKey: "ollama",
});
// The tracedOpenAI client will automatically create child spans for any API call
const tracedOpenAI = observeOpenAI(openai);

// set stream mode or not
const streamMode = false

// --- Readline Interface for CLI ---
const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
});

// -----------------------------------------------------------------------------
// PART 2: APPLICATION LOGIC & INTERACTIVE LOOP
// -----------------------------------------------------------------------------

/**
 * A dedicated shutdown function for the OpenTelemetry provider.
 * Ensures all buffered spans are sent to Langfuse.
 */
async function shutdown() {
    console.log("\nShutting down gracefully...");
    console.log("Flushing remaining traces to Langfuse...");
    await tracerProvider.shutdown();
    console.log("Shutdown complete. Goodbye!");
}

/**
 * The core function that processes a single user prompt.
 * It creates a trace and calls the instrumented OpenAI client.
 * @param {string} userPrompt - The input from the user.
 * @param {string} sessionId - The session ID for this run.
 */
async function processPrompt(userPrompt, sessionId) {
    // 1. Manually create the Root Span for this conversation turn.
    const traceSpan = tracer.startSpan("user-chat-turn");

    // 2. Use context.with to ensure the auto-instrumented LLM call
    //    becomes a child of our manual `traceSpan`.
    await context.with(trace.setSpan(context.active(), traceSpan), async () => {
        try {
            const spanContext = traceSpan.spanContext();
            const traceId = spanContext.traceId;

            console.log("\n--- Trace Details ---");
            console.log(`Trace ID: ${traceId}`);
            console.log(`You can view this trace in Langfuse at: ${process.env.LANGFUSE_BASEURL}/trace/${traceId}`);
            console.log("Raw Span Context:", spanContext);
            console.log("-----------------------");

            console.log("...thinking...");

            // 3. Set Langfuse-specific attributes on the root span.
            traceSpan.setAttributes({
                "langfuse.session.id": sessionId,
                "langfuse.input": userPrompt,
                "langfuse.tags": ["interactive-cli", "ollama"],
                "langfuse.hello": "world",
            });

            // 4. Make the instrumented LLM call. This automatically creates a child span.
            const resp = await tracedOpenAI.chat.completions.create({
                messages: [{ "role": "user", "content": userPrompt }],
                model: process.env.OLLAMA_MODEL || "llama3",
                stream: streamMode,
            });

            let fullResponse = "";
            if (streamMode) {
                process.stdout.write("\nOllama: ");
                for await (const chunk of resp) {
                    const content = chunk.choices[0]?.delta?.content || "";
                    fullResponse += content;
                    process.stdout.write(content); // Write to console without newline
                }

                // Add a newline to the console after the stream is complete
                console.log();
            } else {
                fullResponse = resp.choices[0].message.content;
                console.log(`\nOllama: ${fullResponse}`);
            }

            // 5. Add the final result as the output of the root trace.
            traceSpan.setAttribute("langfuse.output", fullResponse);

            await lfscore.score.create({
                traceId: traceId,
                comment: "comment example",
                sessionId: sessionId,
                name: "accuracy",
                value: 0.9,
                tags: ["tag1", "tag2"],
            })
        } catch(e) {
            console.error("An error occurred:", e.message);
            traceSpan.recordException(e);
            traceSpan.setStatus({ code: 2, message: e.message }); // 2 = ERROR in OTEL
        } finally {
            // 6. End the root span for this turn.
            traceSpan.end();
        }
    });
}

/**
 * The main function to start and manage the interactive loop.
 */
function main() {
    // A unique session ID for this entire chat session
    const sessionId = `cli-session-${Date.now()}`;
    console.log(`Starting chat session: ${sessionId}`);
    console.log('Type your prompt and press Enter. Type "exit" to quit.');

    // This recursive function creates the interactive loop
    const askQuestion = () => {
        rl.question("\n> ", async (prompt) => {
            // Exit condition
            if (prompt.toLowerCase() === "exit") {
                rl.close();
                await shutdown();
                return;
            }

            // Process the user's prompt
            await processPrompt(prompt, sessionId);

            // Ask the next question
            askQuestion();
        });
    };

    // Start the conversation
    askQuestion();
}

// --- Graceful exit on Ctrl+C ---
process.on('SIGINT', async () => {
    rl.close();
    await shutdown();
    process.exit(0);
});

// --- Start the application ---
main();

Run the example:
```
node demo.js
```
After entering the interactive interface, you can initiate conversations. After each conversation round, the corresponding LLM observability data will be triggered. Modify streamMode in the code to switch the response mode.

Data Fields¶

Data reported by Langfuse is divided into the following categories:

llm_trace: LLM observability trace data
score: Score data attached to LLM observability trace data

`llm_trace`¶

Tags & Fields	Description
app_id	The LLM application ID. Type: string \| - Unit: -
app_name	The LLM application name. Type: string \| - Unit: -
completion_tokens	The completion tokens of current generation. Type: int \| (gauge) Unit: count
duration	The duration of current span . Type: int \| (gauge) Unit: time,usec
error_message	The error message of current trace(if some exception throwed). Type: string \| - Unit: -
error_stack	The call stack of current trace(if some exception throwed). Type: string \| - Unit: -
error_type	The error type of current trace(if some exception throwed). Type: string \| - Unit: -
input	The input prompt of current generation. Type: string \| - Unit: -
input_cache_read_tokens	The cached tokens of current generation. Type: int \| (gauge) Unit: count
input_tokens	The input tokens of current generation. Type: int \| (gauge) Unit: count
llm_provider	The provider of current generation. Type: string \| - Unit: -
message	The JSON dump of the span. Type: string \| - Unit: -
mode_name	The model name of current generation. Type: string \| - Unit: -
model_parameters	The model parameters of current generation. Type: string \| - Unit: -
observation_type	The model parameters of current generation. Type: string \| - Unit: -
operation	Same as span name of current span . Type: string \| - Unit: -
output	The output of current generation. Type: string \| - Unit: -
output_tokens	The output tokens of current generation. Type: int \| (gauge) Unit: count
prompt_tokens	The prompt tokens of current generation. Type: int \| (gauge) Unit: count
reasoning_tokens	The reasoning tokens of current generation. Type: int \| (gauge) Unit: count
resource	The span name of current span . Type: string \| - Unit: -
scope_name	Langfuse SDK name. Type: string \| - Unit: -
scope_version	Langfuse SDK version. Type: string \| - Unit: -
sdk_language	SDK language, such as `nodejs/python`. Type: string \| - Unit: -
sdk_name	SDK name, such as `opentelemetry`. Type: string \| - Unit: -
sdk_version	OpenTelemetry SDK version. Type: string \| - Unit: -
service_name	Service name. Type: string \| - Unit: -
session_id	The session ID of current generation. Type: string \| - Unit: -
span_id	The span ID of current span . Type: string \| - Unit: -
start	The start time of current span . Type: int \| - Unit: timeStamp,usec
status	The status of the span. Type: string \| - Unit: -
stream_mode	Is it stream mode during prompt. Type: int \| (gauge) Unit: -
total_tokens	The total tokens of current generation. Type: int \| (gauge) Unit: count
trace_id	The trace ID of current span . Type: string \| - Unit: -
ttft	For stream mode, the waiting time of the first token. Type: int \| (gauge) Unit: time,μs
user_id	The user ID of current generation. Type: string \| - Unit: -
vendor	Which vendor the LLM tracing vendor, for Langfuse, it's `langfuse`

In addition to the fields listed above, custom fields prefixed with langfuse. in the span will also be extracted into top-level fields (. will be replaced with _). For example, in the following span, we can add custom fields to the span:

traceSpan.setAttributes({
        "langfuse.session.id": <your-session-id>, // => session_id: <your-session-id>
        "langfuse.input": <user-prompt>,          // => input: <user-promtpt>
        "langfuse.tags": ["tag1", "tag2"],        // => tags: values:{string_value:"tag1"}  values:{string_value:"tag2"}
        });

`score`¶

Tags & Fields	Description
app_id	The LLM application ID. Type: string \| - Unit: -
app_name	The LLM application name. Type: string \| - Unit: -
comment	The score comment. Type: string \| - Unit: -
name	The score name. Type: string \| - Unit: -
score_data_type	The data type of the score, available types are `NUMERIC/BOOLEAN/CATEGORICAL`. Type: string \| - Unit: -
score_value	The value the NUMERIC and BOOLEAN score. Type: float \| - Unit: -
score_value_str	The value the CATEGORICAL score. Type: float \| - Unit: -
score_observation_id	The obervation ID the score. Type: string \| - Unit: -
trace_id	The trace ID of the score. Type: string \| - Unit: -
config_id	The config ID of the score. Type: string \| - Unit: -
session_id	The session ID of the score. Type: string \| - Unit: -

Further References¶

Langfuse Python SDK