Get the FREE Ultimate OpenClaw Setup Guide →

Rag Audit

Scanned
npx machina-cli add skill davicqueiroz/claude-rag-skills/rag-audit --openclaw
Files (1)
SKILL.md
6.4 KB

RAG Audit Skill

Analyze RAG (Retrieval-Augmented Generation) implementations for anti-patterns, performance issues, and best practices violations.

When to Use

Use /rag-audit when:

  • Reviewing existing RAG code for quality issues
  • Before deploying a RAG system to production
  • Debugging retrieval or generation problems
  • Optimizing RAG pipeline performance

What This Skill Does

  1. Code Analysis: Scans your codebase for RAG-related code (embeddings, vector stores, retrieval, generation)
  2. Anti-Pattern Detection: Identifies common mistakes and suboptimal patterns
  3. Best Practices Check: Validates against industry standards
  4. Recommendations: Provides actionable fixes with code examples

Audit Categories

1. Chunking Strategy

  • Chunk size appropriateness (too large loses precision, too small loses context)
  • Overlap configuration (recommended: 10-20% of chunk size)
  • Document-type specific chunking (code vs prose vs tables)
  • Metadata preservation during chunking

2. Embedding Configuration

  • Model selection for use case (multilingual, code, general)
  • Dimension efficiency (smaller dims for speed, larger for accuracy)
  • Batch processing for large document sets
  • Embedding caching to avoid recomputation

3. Vector Store Setup

  • Index type selection (HNSW vs IVF vs flat)
  • Distance metric matching (cosine for normalized, L2 for raw)
  • Collection/namespace organization
  • Metadata filtering capabilities

4. Retrieval Pipeline

  • Top-k selection (too few misses context, too many adds noise)
  • Score thresholding implementation
  • Hybrid search (dense + sparse/BM25)
  • Reranking stage presence
  • Query expansion/transformation

5. Generation Configuration

  • Context window utilization
  • System prompt quality
  • Source citation implementation
  • Hallucination guardrails
  • Temperature settings for factual tasks

6. Production Readiness

  • Error handling and fallbacks
  • Logging and observability
  • Rate limiting and caching
  • Cost optimization (model selection, caching)

How to Run an Audit

When the user invokes /rag-audit, follow this process:

Step 1: Discover RAG Code

Search for RAG-related patterns in the codebase:

Patterns to search:
- "embedding" OR "embeddings" OR "embed("
- "vector" OR "vectorstore" OR "vector_store"
- "qdrant" OR "pinecone" OR "chroma" OR "weaviate" OR "milvus"
- "chunk" OR "chunking" OR "split" OR "splitter"
- "retriev" OR "search" OR "query"
- "langchain" OR "llamaindex" OR "haystack"
- "openai.embed" OR "cohere.embed" OR "voyageai"

Step 2: Analyze Each Component

For each RAG component found, check against the audit categories above.

Step 3: Generate Report

Produce a structured audit report:

# RAG Audit Report

## Summary
- **Files Analyzed**: X
- **Issues Found**: Y (X critical, Y warnings, Z suggestions)
- **Overall Score**: X/100

## Critical Issues
[Issues that will cause failures or severe degradation]

## Warnings
[Issues that impact quality or performance]

## Suggestions
[Optimizations and best practices]

## Detailed Findings

### [Component Name]
**Location**: `path/to/file.py:line`
**Issue**: [Description]
**Impact**: [What goes wrong]
**Fix**: [How to fix with code example]

Common Anti-Patterns to Flag

1. No Chunk Overlap

# BAD: No overlap causes context loss at boundaries
chunks = text_splitter.split(text, chunk_size=1000, overlap=0)

# GOOD: 10-20% overlap preserves context
chunks = text_splitter.split(text, chunk_size=1000, overlap=150)

2. Hardcoded Top-K

# BAD: Fixed top-k regardless of query complexity
results = vectorstore.search(query, k=5)

# GOOD: Dynamic or configurable with score threshold
results = vectorstore.search(query, k=10, score_threshold=0.7)

3. No Reranking

# BAD: Using raw vector similarity scores only
docs = vectorstore.similarity_search(query, k=5)
context = "\n".join([d.content for d in docs])

# GOOD: Rerank for relevance before using
docs = vectorstore.similarity_search(query, k=20)
reranked = reranker.rerank(query, docs, top_k=5)
context = "\n".join([d.content for d in reranked])

4. Ignoring Metadata

# BAD: Storing only text
vectorstore.add(texts=chunks)

# GOOD: Preserve source metadata for citations
vectorstore.add(
    texts=chunks,
    metadatas=[{"source": doc.name, "page": i, "chunk_id": j} for ...]
)

5. No Error Handling

# BAD: Unhandled failures
response = llm.generate(prompt)

# GOOD: Graceful degradation
try:
    response = llm.generate(prompt)
except RateLimitError:
    response = fallback_response(query)
except Exception as e:
    logger.error(f"Generation failed: {e}")
    response = "I couldn't process your request. Please try again."

6. Context Window Overflow

# BAD: Stuffing all retrieved docs without checking
context = "\n".join([doc.content for doc in all_docs])
prompt = f"Context: {context}\nQuestion: {query}"

# GOOD: Respect token limits
max_context_tokens = 3000
context = truncate_to_tokens(docs, max_context_tokens)

7. Missing Hybrid Search

# BAD: Dense-only search misses keyword matches
results = vectorstore.similarity_search(query)

# GOOD: Combine dense + sparse for better recall
dense_results = vectorstore.similarity_search(query, k=10)
sparse_results = bm25.search(query, k=10)
results = reciprocal_rank_fusion(dense_results, sparse_results)

8. No Query Preprocessing

# BAD: Raw user query to embedding
embedding = embed(user_query)

# GOOD: Clean and optionally expand query
cleaned_query = preprocess(user_query)
# Optional: query expansion for better recall
expanded_queries = expand_query(cleaned_query)

Reference Resources

For detailed explanations of RAG best practices:

Output Format

Always end the audit with:

  1. A summary score (0-100)
  2. Top 3 priority fixes
  3. Links to relevant Ailog guides for deeper reading

Source

git clone https://github.com/davicqueiroz/claude-rag-skills/blob/main/rag-audit/SKILL.mdView on GitHub

Overview

RAG Audit analyzes Retrieval-Augmented Generation implementations for anti-patterns, performance issues, and best-practice violations. It evaluates chunking, embedding, vector store, retrieval, and generation configurations, and it produces actionable recommendations to improve reliability and efficiency.

How This Skill Works

The skill scans your codebase for RAG components (embeddings, vector stores, retrieval, generation), flags anti-patterns, checks against best practices, and returns concrete fixes with code examples.

When to Use It

  • Reviewing existing RAG code for quality issues
  • Before deploying a RAG system to production
  • Debugging retrieval or generation problems
  • Optimizing RAG pipeline performance
  • Auditing for production readiness and observability

Quick Start

  1. Step 1: Discover RAG Code
  2. Step 2: Analyze Each Component
  3. Step 3: Generate Report

Best Practices

  • Chunking: use 10-20% overlap to preserve context
  • Embedding configuration: tune dimensions and enable batch processing
  • Vector store setup: align index type and distance metric with data
  • Retrieval pipeline: implement appropriate top-k, score thresholds, and consider hybrid search and reranking
  • Production readiness: robust error handling, logging/observability, rate limiting, caching, and source citations

Example Use Cases

  • Auditing a LangChain + Pinecone deployment to detect missing chunk overlap and suboptimal dimensions
  • Identifying hard-coded top-k and replacing with a dynamic configuration and score threshold
  • Enabling embedding caching for a large document set to reduce recomputation
  • Switching distance metric to cosine for normalized embeddings in a vector store
  • Adding a reranking stage and source citations to generation results to prevent hallucinations

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers