rag-reranking
Scannednpx machina-cli add skill a5c-ai/babysitter/rag-reranking --openclawRAG Reranking Skill
Capabilities
- Implement cross-encoder reranking models
- Configure Maximal Marginal Relevance (MMR) filtering
- Set up Cohere Rerank integration
- Design multi-stage retrieval pipelines
- Implement diversity-aware reranking
- Configure score normalization and thresholds
Target Processes
- advanced-rag-patterns
- rag-pipeline-implementation
Implementation Details
Reranking Methods
- Cross-Encoder Reranking: Sentence-transformer cross-encoders
- Cohere Rerank: Cohere rerank-v3 API
- MMR Reranking: Diversity-aware result filtering
- LLM Reranking: Using LLM for relevance scoring
- Reciprocal Rank Fusion: Combining multiple retrievers
Configuration Options
- Reranking model selection
- Top-k after reranking
- MMR lambda (relevance vs diversity)
- Score threshold filtering
- Batch size for reranking
Best Practices
- Use cross-encoders for quality
- Balance relevance and diversity
- Set appropriate thresholds
- Monitor reranking latency
Dependencies
- sentence-transformers
- cohere (optional)
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/rag-reranking/SKILL.mdView on GitHub Overview
RAG Reranking enhances retrieval quality by applying cross-encoder reranking, MMR diversity filtering, and optional Cohere Rerank integration within multi-stage pipelines. It supports several reranking strategies (cross-encoder, Cohere Rerank, MMR, LLM reranking, Reciprocal Rank Fusion) and provides score normalization and threshold controls to balance precision and diversity.
How This Skill Works
During retrieval, initial candidates are reranked using a cross-encoder or Cohere Rerank, producing higher relevance scores. MMR filtering and diversity-aware scoring adjust results to avoid redundancy, then optional LLM reranking or fusion methods can be applied. Configurable options such as top-k, MMR lambda, and batch size drive performance and latency in a multi-stage retrieval pipeline.
When to Use It
- When you need higher-quality relevance in search results
- When results are too homogeneous and lack diversity
- When using multi-stage rag pipelines and need structured reranking
- When integrating Cohere Rerank or LLM-based scoring
- When you must balance latency with precision via thresholds and batching
Quick Start
- Step 1: Enable a reranking method (Cross-Encoder, Cohere Rerank, or MMR) and set an initial top-k
- Step 2: Configure MMR lambda, score thresholds, and batch size for reranking
- Step 3: Run in test mode, monitor latency and precision, and iterate
Best Practices
- Use cross-encoders for quality while monitoring latency
- Tune MMR lambda to balance relevance and diversity
- Set top-k and score thresholds to prune noise
- Normalize scores consistently before fusion or filtering
- Validate with diverse query types and real user data
Example Use Cases
- E-commerce search: boost product relevance in top results while maintaining option variety
- Customer support knowledge base: diversify recommended articles for a query
- Research-related search: combine sources with RR Fusion for broader coverage
- News discovery: balance recency with topic diversity in top hits
- Internal documentation: use Cohere Rerank v3 for domain-specific terms