How do I run the audit?

Invoke /rag-audit and follow the three-step process: discover, analyze, and generate a report.

What outputs will I receive?

A structured RAG Audit Report with summary, issues, suggestions, and detailed findings.

Rag Audit

Scanned

npx machina-cli add skill davicqueiroz/claude-rag-skills/rag-audit --openclaw

Files (1)

SKILL.md

6.4 KB

RAG Audit Skill

Analyze RAG (Retrieval-Augmented Generation) implementations for anti-patterns, performance issues, and best practices violations.

When to Use

Use /rag-audit when:

Reviewing existing RAG code for quality issues
Before deploying a RAG system to production
Debugging retrieval or generation problems
Optimizing RAG pipeline performance

What This Skill Does

Code Analysis: Scans your codebase for RAG-related code (embeddings, vector stores, retrieval, generation)
Anti-Pattern Detection: Identifies common mistakes and suboptimal patterns
Best Practices Check: Validates against industry standards
Recommendations: Provides actionable fixes with code examples

Audit Categories

1. Chunking Strategy

Chunk size appropriateness (too large loses precision, too small loses context)
Overlap configuration (recommended: 10-20% of chunk size)
Document-type specific chunking (code vs prose vs tables)
Metadata preservation during chunking

2. Embedding Configuration

Model selection for use case (multilingual, code, general)
Dimension efficiency (smaller dims for speed, larger for accuracy)
Batch processing for large document sets
Embedding caching to avoid recomputation

3. Vector Store Setup

Index type selection (HNSW vs IVF vs flat)
Distance metric matching (cosine for normalized, L2 for raw)
Collection/namespace organization
Metadata filtering capabilities

4. Retrieval Pipeline

Top-k selection (too few misses context, too many adds noise)
Score thresholding implementation
Hybrid search (dense + sparse/BM25)
Reranking stage presence
Query expansion/transformation

5. Generation Configuration

Context window utilization
System prompt quality
Source citation implementation
Hallucination guardrails
Temperature settings for factual tasks

6. Production Readiness

Error handling and fallbacks
Logging and observability
Rate limiting and caching
Cost optimization (model selection, caching)

How to Run an Audit

When the user invokes /rag-audit, follow this process:

Step 1: Discover RAG Code

Search for RAG-related patterns in the codebase:

Patterns to search:
- "embedding" OR "embeddings" OR "embed("
- "vector" OR "vectorstore" OR "vector_store"
- "qdrant" OR "pinecone" OR "chroma" OR "weaviate" OR "milvus"
- "chunk" OR "chunking" OR "split" OR "splitter"
- "retriev" OR "search" OR "query"
- "langchain" OR "llamaindex" OR "haystack"
- "openai.embed" OR "cohere.embed" OR "voyageai"

Step 2: Analyze Each Component

For each RAG component found, check against the audit categories above.

Step 3: Generate Report

Produce a structured audit report:

# RAG Audit Report

## Summary
- **Files Analyzed**: X
- **Issues Found**: Y (X critical, Y warnings, Z suggestions)
- **Overall Score**: X/100

## Critical Issues
[Issues that will cause failures or severe degradation]

## Warnings
[Issues that impact quality or performance]

## Suggestions
[Optimizations and best practices]

## Detailed Findings

### [Component Name]
**Location**: `path/to/file.py:line`
**Issue**: [Description]
**Impact**: [What goes wrong]
**Fix**: [How to fix with code example]

Common Anti-Patterns to Flag

1. No Chunk Overlap

# BAD: No overlap causes context loss at boundaries
chunks = text_splitter.split(text, chunk_size=1000, overlap=0)

# GOOD: 10-20% overlap preserves context
chunks = text_splitter.split(text, chunk_size=1000, overlap=150)

2. Hardcoded Top-K

# BAD: Fixed top-k regardless of query complexity
results = vectorstore.search(query, k=5)

# GOOD: Dynamic or configurable with score threshold
results = vectorstore.search(query, k=10, score_threshold=0.7)

3. No Reranking

# BAD: Using raw vector similarity scores only
docs = vectorstore.similarity_search(query, k=5)
context = "\n".join([d.content for d in docs])

# GOOD: Rerank for relevance before using
docs = vectorstore.similarity_search(query, k=20)
reranked = reranker.rerank(query, docs, top_k=5)
context = "\n".join([d.content for d in reranked])

4. Ignoring Metadata

# BAD: Storing only text
vectorstore.add(texts=chunks)

# GOOD: Preserve source metadata for citations
vectorstore.add(
    texts=chunks,
    metadatas=[{"source": doc.name, "page": i, "chunk_id": j} for ...]
)

5. No Error Handling

# BAD: Unhandled failures
response = llm.generate(prompt)

# GOOD: Graceful degradation
try:
    response = llm.generate(prompt)
except RateLimitError:
    response = fallback_response(query)
except Exception as e:
    logger.error(f"Generation failed: {e}")
    response = "I couldn't process your request. Please try again."

6. Context Window Overflow

# BAD: Stuffing all retrieved docs without checking
context = "\n".join([doc.content for doc in all_docs])
prompt = f"Context: {context}\nQuestion: {query}"

# GOOD: Respect token limits
max_context_tokens = 3000
context = truncate_to_tokens(docs, max_context_tokens)

7. Missing Hybrid Search

# BAD: Dense-only search misses keyword matches
results = vectorstore.similarity_search(query)

# GOOD: Combine dense + sparse for better recall
dense_results = vectorstore.similarity_search(query, k=10)
sparse_results = bm25.search(query, k=10)
results = reciprocal_rank_fusion(dense_results, sparse_results)

8. No Query Preprocessing

# BAD: Raw user query to embedding
embedding = embed(user_query)

# GOOD: Clean and optionally expand query
cleaned_query = preprocess(user_query)
# Optional: query expansion for better recall
expanded_queries = expand_query(cleaned_query)

Reference Resources

For detailed explanations of RAG best practices:

Chunking strategies: https://app.ailog.fr/en/blog/guides/chunking-strategies
Embedding selection: https://app.ailog.fr/en/blog/guides/choosing-embedding-models
Hybrid search: https://app.ailog.fr/en/blog/guides/hybrid-search-rag
Reranking: https://app.ailog.fr/en/blog/guides/reranking
Production deployment: https://app.ailog.fr/en/blog/guides/production-deployment
RAG evaluation: https://app.ailog.fr/en/blog/guides/rag-evaluation

Output Format

Always end the audit with:

A summary score (0-100)
Top 3 priority fixes
Links to relevant Ailog guides for deeper reading

Source

git clone https://github.com/davicqueiroz/claude-rag-skills/blob/main/rag-audit/SKILL.mdView on GitHub

Overview

RAG Audit analyzes Retrieval-Augmented Generation implementations for anti-patterns, performance issues, and best-practice violations. It evaluates chunking, embedding, vector store, retrieval, and generation configurations, and it produces actionable recommendations to improve reliability and efficiency.

How This Skill Works

The skill scans your codebase for RAG components (embeddings, vector stores, retrieval, generation), flags anti-patterns, checks against best practices, and returns concrete fixes with code examples.

When to Use It

Reviewing existing RAG code for quality issues
Before deploying a RAG system to production
Debugging retrieval or generation problems
Optimizing RAG pipeline performance
Auditing for production readiness and observability

Quick Start

Step 1: Discover RAG Code
Step 2: Analyze Each Component
Step 3: Generate Report

Best Practices

Chunking: use 10-20% overlap to preserve context
Embedding configuration: tune dimensions and enable batch processing
Vector store setup: align index type and distance metric with data
Retrieval pipeline: implement appropriate top-k, score thresholds, and consider hybrid search and reranking
Production readiness: robust error handling, logging/observability, rate limiting, caching, and source citations

Example Use Cases

Auditing a LangChain + Pinecone deployment to detect missing chunk overlap and suboptimal dimensions
Identifying hard-coded top-k and replacing with a dynamic configuration and score threshold
Enabling embedding caching for a large document set to reduce recomputation
Switching distance metric to cosine for normalized embeddings in a vector store
Adding a reranking stage and source citations to generation results to prevent hallucinations

Frequently Asked Questions

Add this skill to your agents