opentelemetry-llm
Scannednpx machina-cli add skill a5c-ai/babysitter/opentelemetry-llm --openclawFiles (1)
SKILL.md
1.3 KB
OpenTelemetry LLM Skill
Capabilities
- Configure OpenTelemetry SDK for LLM apps
- Implement LLM-specific instrumentation
- Set up trace exporters (Jaeger, OTLP)
- Design semantic conventions for LLM
- Configure span attributes for AI workloads
- Implement context propagation
Target Processes
- llm-observability-monitoring
- agent-deployment-pipeline
Implementation Details
Core Components
- TracerProvider: SDK configuration
- SpanProcessor: Batch/simple processors
- Exporters: Jaeger, OTLP, Console
- Instrumentation: Auto and manual
LLM Semantic Conventions
- gen_ai.system (OpenAI, Anthropic)
- gen_ai.request.model
- gen_ai.request.max_tokens
- gen_ai.response.finish_reason
- gen_ai.usage.prompt_tokens
Configuration Options
- Exporter selection
- Sampling strategies
- Resource attributes
- Span limits
- Context propagation
Best Practices
- Consistent attribute naming
- Appropriate sampling
- Error handling traces
- Propagate context across services
Dependencies
- opentelemetry-sdk
- opentelemetry-exporter-*
- openinference (optional)
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/opentelemetry-llm/SKILL.mdView on GitHub Overview
This skill enables full observability for LLM-based applications by configuring the OpenTelemetry SDK, applying LLM-specific instrumentation, and exporting traces to Jaeger, OTLP, or the Console. It defines semantic conventions for LLM calls and ensures context propagation across services to improve debugging and performance insights.
How This Skill Works
Set up a TracerProvider and SpanProcessor, attach exporters (Jaeger, OTLP, Console), and enable both auto and manual instrumentation for LLM interactions. Implement LLM semantic conventions using attributes like gen_ai.system, gen_ai.request.model, and gen_ai.usage.prompt_tokens, and propagate context across service boundaries for end-to-end tracing.
When to Use It
- Instrument an AI chatbot or agent pipeline to collect end-to-end traces across multiple services
- Enforce standardized LLM attributes with gen_ai.* conventions across teams
- Deploy in production with Jaeger or OTLP exporters for centralized observability
- Debug latency, retries, and errors in AI workloads with detailed traces
- Coordinate tracing across orchestrators, workers, and post-processing steps
Quick Start
- Step 1: Install opentelemetry-sdk and exporter packages (Jaeger/OTLP/Console)
- Step 2: Initialize TracerProvider, add a SpanProcessor, and register the chosen exporter
- Step 3: Enable auto/manual LLM instrumentation and apply gen_ai.* attributes; ensure context propagation
Best Practices
- Consistent attribute naming for all LLM traces (gen_ai.*)
- Appropriate sampling to balance visibility and overhead
- Use error-focused traces to surface failures in LLM calls
- Propagate context across services to maintain trace continuity
- Design and enforce LLM semantic conventions (gen_ai.*) across the stack
Example Use Cases
- Instrument a chatbot service with OpenTelemetry and export traces to Jaeger, using gen_ai.request.model and gen_ai.usage tokens
- Trace an LLM workflow that flows from orchestrator to worker to post-processor with OTLP exporter
- Configure a Kubernetes deployment to send traces to a centralized collector via OTLP
- Apply batch span processing for high-throughput LLM traffic while maintaining trace quality
- Enforce gen_ai semantic attributes across all LLM calls in a multi-service platform
Frequently Asked Questions
Add this skill to your agents