Get the FREE Ultimate OpenClaw Setup Guide →

opentelemetry-llm

Scanned
npx machina-cli add skill a5c-ai/babysitter/opentelemetry-llm --openclaw
Files (1)
SKILL.md
1.3 KB

OpenTelemetry LLM Skill

Capabilities

  • Configure OpenTelemetry SDK for LLM apps
  • Implement LLM-specific instrumentation
  • Set up trace exporters (Jaeger, OTLP)
  • Design semantic conventions for LLM
  • Configure span attributes for AI workloads
  • Implement context propagation

Target Processes

  • llm-observability-monitoring
  • agent-deployment-pipeline

Implementation Details

Core Components

  1. TracerProvider: SDK configuration
  2. SpanProcessor: Batch/simple processors
  3. Exporters: Jaeger, OTLP, Console
  4. Instrumentation: Auto and manual

LLM Semantic Conventions

  • gen_ai.system (OpenAI, Anthropic)
  • gen_ai.request.model
  • gen_ai.request.max_tokens
  • gen_ai.response.finish_reason
  • gen_ai.usage.prompt_tokens

Configuration Options

  • Exporter selection
  • Sampling strategies
  • Resource attributes
  • Span limits
  • Context propagation

Best Practices

  • Consistent attribute naming
  • Appropriate sampling
  • Error handling traces
  • Propagate context across services

Dependencies

  • opentelemetry-sdk
  • opentelemetry-exporter-*
  • openinference (optional)

Source

git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/opentelemetry-llm/SKILL.mdView on GitHub

Overview

This skill enables full observability for LLM-based applications by configuring the OpenTelemetry SDK, applying LLM-specific instrumentation, and exporting traces to Jaeger, OTLP, or the Console. It defines semantic conventions for LLM calls and ensures context propagation across services to improve debugging and performance insights.

How This Skill Works

Set up a TracerProvider and SpanProcessor, attach exporters (Jaeger, OTLP, Console), and enable both auto and manual instrumentation for LLM interactions. Implement LLM semantic conventions using attributes like gen_ai.system, gen_ai.request.model, and gen_ai.usage.prompt_tokens, and propagate context across service boundaries for end-to-end tracing.

When to Use It

  • Instrument an AI chatbot or agent pipeline to collect end-to-end traces across multiple services
  • Enforce standardized LLM attributes with gen_ai.* conventions across teams
  • Deploy in production with Jaeger or OTLP exporters for centralized observability
  • Debug latency, retries, and errors in AI workloads with detailed traces
  • Coordinate tracing across orchestrators, workers, and post-processing steps

Quick Start

  1. Step 1: Install opentelemetry-sdk and exporter packages (Jaeger/OTLP/Console)
  2. Step 2: Initialize TracerProvider, add a SpanProcessor, and register the chosen exporter
  3. Step 3: Enable auto/manual LLM instrumentation and apply gen_ai.* attributes; ensure context propagation

Best Practices

  • Consistent attribute naming for all LLM traces (gen_ai.*)
  • Appropriate sampling to balance visibility and overhead
  • Use error-focused traces to surface failures in LLM calls
  • Propagate context across services to maintain trace continuity
  • Design and enforce LLM semantic conventions (gen_ai.*) across the stack

Example Use Cases

  • Instrument a chatbot service with OpenTelemetry and export traces to Jaeger, using gen_ai.request.model and gen_ai.usage tokens
  • Trace an LLM workflow that flows from orchestrator to worker to post-processor with OTLP exporter
  • Configure a Kubernetes deployment to send traces to a centralized collector via OTLP
  • Apply batch span processing for high-throughput LLM traffic while maintaining trace quality
  • Enforce gen_ai semantic attributes across all LLM calls in a multi-service platform

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers