how does rag-research work?

RAG-research guides you through loading documents, chunking, embedding with FastEmbed, storing in Qdrant, and retrieving with tuned search quality.

how to improve search results?

Tune chunk size and overlap, choose an appropriate embedding model, ensure indexing, and test with specific domain terms; verify documents are indexed via /rag-research:list and increase limit if needed.

which models and chunking strategies are covered?

Models include BAAI/bge-small-en-v1.5 (384), bge-base-en-v1.5 (768), and bge-large-en-v1.5 (1024); chunking strategies discuss 256-512 vs 1024+ with 10-20% overlap, starting at 512/50.

rag-research

npx machina-cli add skill docutray/docutray-claude-code-plugins/rag-research --openclaw

Files (1)

SKILL.md

3.2 KB

RAG Research Skill

Use this skill when users ask about RAG (Retrieval-Augmented Generation), semantic search, document indexing, embeddings, vector databases, or chunking strategies. This skill provides best practices for working with the rag-research plugin and optimizing document retrieval.

When to Use

Trigger this skill when users:

Ask about RAG, embeddings, or semantic search concepts
Want to optimize their document indexing strategy
Need help with chunking or retrieval quality
Ask "how does rag-research work?" or "how to improve search results?"
Troubleshoot poor search results or missing information

Core Concepts

Document Indexing Pipeline

Load Document: Extract text from PDF, Markdown, or Text files
Chunk Text: Split into overlapping segments (default: 512 chars, 50 overlap)
Generate Embeddings: Convert chunks to vectors using FastEmbed (BAAI/bge-small-en-v1.5)
Store in Qdrant: Persist vectors with metadata for retrieval

Embedding Models

The plugin uses FastEmbed with ONNX Runtime for efficient CPU inference:

Model	Dimensions	Speed	Quality	Use Case
`BAAI/bge-small-en-v1.5`	384	Fast	Good	Default, general use
`BAAI/bge-base-en-v1.5`	768	Medium	Better	Higher accuracy needs
`BAAI/bge-large-en-v1.5`	1024	Slow	Best	Maximum quality

Chunking Strategies

Chunk size affects retrieval quality:

Smaller chunks (256-512): More precise, may lose context
Larger chunks (1024+): More context, may dilute relevance
Overlap (10-20%): Prevents information loss at boundaries

Recommend: Start with defaults (512/50), adjust based on results.

Search Quality Tips

Use specific queries: "Mistral OCR API configuration" > "OCR"
Check coverage: Run /rag-research:list to verify documents are indexed
Increase limit: Use --limit 20 for comprehensive research
Review scores: Scores > 0.7 are highly relevant, < 0.5 may be tangential

Troubleshooting

Poor Search Results

Check if document is indexed: /rag-research:list --filter "keyword"
Re-index with different chunking: Adjust CHUNK_SIZE in .env
Use more specific queries: Add domain-specific terms
Verify embeddings: Check model compatibility

PDF Extraction Issues

Enable Mistral OCR: Set MISTRAL_API_KEY in .env for scanned PDFs
Fallback to pypdf: Use --no-ocr flag for text-based PDFs
Check file permissions: Ensure PDF is readable

Database Issues

Reset database: rm -rf ~/.rag-research and re-index
Check disk space: Qdrant needs space for vectors
Verify installation: uv run rag-research stats

Configuration Reference

See references/configuration.md for detailed settings documentation.

Examples

See references/examples.md for common usage patterns.

Source

git clone https://github.com/docutray/docutray-claude-code-plugins/blob/main/plugins/rag-research/skills/rag-research/SKILL.md

View on GitHub

Overview

RAG Research helps you design and optimize Retrieval-Augmented Generation workflows. It covers document indexing, embedding, vector databases, and chunking strategies, with practical tips for using the rag-research plugin to boost retrieval quality.

How This Skill Works

Load documents from PDFs, Markdown, or Text, then chunk into overlapping segments (default: 512 chars, 50 overlap). Generate embeddings with FastEmbed (ONNX Runtime) and store vectors in Qdrant with metadata for retrieval. Choose embedding models based on accuracy needs (BAAI/bge-small-en-v1.5, -base-en-v1.5, -large-en-v1.5) and tune chunk size/overlap to balance precision and context.

When to Use It

When you’re working with RAG, embeddings, or semantic search concepts
When you want to optimize how documents are indexed for retrieval
When you need to improve chunking or retrieval quality
When asking 'how does rag-research work?' or 'how to improve search results?'
When troubleshooting poor search results or missing information

Quick Start

Step 1: Load documents (PDF/Markdown/Text) and chunk with default 512/50
Step 2: Generate embeddings using FastEmbed (ONNX Runtime) and store in Qdrant
Step 3: Run basic queries and verify results with /rag-research:list and score thresholds

Best Practices

Start with default chunking (512 chars, 50 overlap) and adjust based on results
Use the ONNX FastEmbed models for CPU inference and balance speed vs. accuracy
Verify indexing with /rag-research:list and ensure documents are indexed
Experiment with chunk sizes (256-512 vs 1024+) and measure retrieval quality
Keep embeddings and database healthy: monitor Qdrant storage and model compatibility

Example Use Cases

Index a corpus of PDFs and answer precise questions with chunked segments
Tune chunking to improve retrieval for product manuals or technical docs
Switch to a larger embedding model for higher accuracy when needed
Audit result quality by checking scores and adjusting limit with --limit 20
Re-index after corpus updates and verify with /rag-research:list

Frequently Asked Questions

Add this skill to your agents