What is rag-reranking?

A set of techniques to improve retrieval quality by reordering results with cross-encoder models, MMR diversity filtering, and optional LLM-based scoring.

Which reranking methods are supported?

Cross-Encoder Reranking, Cohere Rerank, MMR Reranking, LLM Reranking, and Reciprocal Rank Fusion.

How do I tune performance and thresholds?

Choose top-k, adjust MMR lambda to control relevance vs diversity, set score thresholds, and monitor latency to fit your workload.

rag-reranking

Scanned

npx machina-cli add skill a5c-ai/babysitter/rag-reranking --openclaw

Files (1)

SKILL.md

1.3 KB

RAG Reranking Skill

Capabilities

Implement cross-encoder reranking models
Configure Maximal Marginal Relevance (MMR) filtering
Set up Cohere Rerank integration
Design multi-stage retrieval pipelines
Implement diversity-aware reranking
Configure score normalization and thresholds

Target Processes

advanced-rag-patterns
rag-pipeline-implementation

Implementation Details

Reranking Methods

Cross-Encoder Reranking: Sentence-transformer cross-encoders
Cohere Rerank: Cohere rerank-v3 API
MMR Reranking: Diversity-aware result filtering
LLM Reranking: Using LLM for relevance scoring
Reciprocal Rank Fusion: Combining multiple retrievers

Configuration Options

Reranking model selection
Top-k after reranking
MMR lambda (relevance vs diversity)
Score threshold filtering
Batch size for reranking

Best Practices

Use cross-encoders for quality
Balance relevance and diversity
Set appropriate thresholds
Monitor reranking latency

Dependencies

sentence-transformers
cohere (optional)

Source

git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/rag-reranking/SKILL.md

View on GitHub

Overview

RAG Reranking enhances retrieval quality by applying cross-encoder reranking, MMR diversity filtering, and optional Cohere Rerank integration within multi-stage pipelines. It supports several reranking strategies (cross-encoder, Cohere Rerank, MMR, LLM reranking, Reciprocal Rank Fusion) and provides score normalization and threshold controls to balance precision and diversity.

How This Skill Works

During retrieval, initial candidates are reranked using a cross-encoder or Cohere Rerank, producing higher relevance scores. MMR filtering and diversity-aware scoring adjust results to avoid redundancy, then optional LLM reranking or fusion methods can be applied. Configurable options such as top-k, MMR lambda, and batch size drive performance and latency in a multi-stage retrieval pipeline.

When to Use It

When you need higher-quality relevance in search results
When results are too homogeneous and lack diversity
When using multi-stage rag pipelines and need structured reranking
When integrating Cohere Rerank or LLM-based scoring
When you must balance latency with precision via thresholds and batching

Quick Start

Step 1: Enable a reranking method (Cross-Encoder, Cohere Rerank, or MMR) and set an initial top-k
Step 2: Configure MMR lambda, score thresholds, and batch size for reranking
Step 3: Run in test mode, monitor latency and precision, and iterate

Best Practices

Use cross-encoders for quality while monitoring latency
Tune MMR lambda to balance relevance and diversity
Set top-k and score thresholds to prune noise
Normalize scores consistently before fusion or filtering
Validate with diverse query types and real user data

Example Use Cases

E-commerce search: boost product relevance in top results while maintaining option variety
Customer support knowledge base: diversify recommended articles for a query
Research-related search: combine sources with RR Fusion for broader coverage
News discovery: balance recency with topic diversity in top hits
Internal documentation: use Cohere Rerank v3 for domain-specific terms

Frequently Asked Questions

Add this skill to your agents