phoenix-arize-setup
npx machina-cli add skill a5c-ai/babysitter/phoenix-arize-setup --openclawFiles (1)
SKILL.md
1.2 KB
Phoenix Arize Setup Skill
Capabilities
- Set up Phoenix local server
- Configure tracing instrumentation
- Design evaluation experiments
- Implement embedding visualizations
- Set up retrieval analysis
- Create custom evaluations with LLM-as-judge
Target Processes
- llm-observability-monitoring
- agent-evaluation-framework
Implementation Details
Core Features
- Tracing: OpenTelemetry-based LLM traces
- Evals: LLM-as-judge evaluations
- Embeddings: Visualization and drift detection
- Retrieval: RAG quality analysis
- Datasets: Experiment management
Instrumentation
- OpenAI auto-instrumentation
- LangChain instrumentation
- LlamaIndex instrumentation
- Custom span creation
Configuration Options
- Phoenix server setup
- Trace sampling
- Evaluation metrics
- Embedding models
- Export settings
Best Practices
- Comprehensive instrumentation
- Regular evaluation runs
- Monitor embedding drift
- Analyze retrieval quality
Dependencies
- arize-phoenix
- openinference-instrumentation-openai
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/phoenix-arize-setup/SKILL.mdView on GitHub Overview
Configures a local Phoenix server with OpenTelemetry tracing, embedding visualizations, and evaluation workflows for LLM debugging and evaluation. It enables end-to-end observability from traces to embeddings and retrieval analysis, including LLM-as-judge evaluations.
How This Skill Works
Install and start a Phoenix server, then enable OpenTelemetry instrumentation across OpenAI, LangChain, LlamaIndex, and custom spans. Run experiments that collect traces, embeddings, and retrieval data, and use Arize dashboards to analyze performance, drift, and RAG quality.
When to Use It
- Debug LLM runs with a Phoenix-based observability stack
- Design and run LLM-as-judge evaluation experiments
- Analyze embedding drift and visualize embeddings
- Perform retrieval quality and RAG analysis
- Manage experiments with datasets and export settings
Quick Start
- Step 1: Install and start the Phoenix server with arize-phoenix
- Step 2: Enable OpenTelemetry instrumentation for OpenAI, LangChain, and LlamaIndex
- Step 3: Configure evaluation metrics, embedding models, and export settings; run an experiment
Best Practices
- Comprehensive instrumentation across tracing, embeddings, and retrieval
- Regular evaluation runs to benchmark models
- Monitor embedding drift to detect semantic changes
- Analyze retrieval quality for RAG analysis
- Tightly configure export and evaluation metrics for reproducibility
Example Use Cases
- Set up a local Phoenix server with OpenTelemetry to trace an LLM call
- Run LLM-as-judge evaluations to compare model outputs
- Visualize embeddings and detect drift across model versions
- Perform retrieval quality analysis to assess RAG performance
- Manage experiments with datasets and export results for reporting
Frequently Asked Questions
Add this skill to your agents