langchain-retriever
npx machina-cli add skill a5c-ai/babysitter/langchain-retriever --openclawLangChain Retriever Skill
Capabilities
- Implement various LangChain retriever types
- Configure vector store retrievers
- Set up multi-query retrievers for improved recall
- Implement contextual compression retrievers
- Design ensemble retrievers combining multiple strategies
- Configure self-query retrievers for structured filtering
Target Processes
- rag-pipeline-implementation
- advanced-rag-patterns
Implementation Details
Retriever Types
- VectorStoreRetriever: Basic similarity search
- MultiQueryRetriever: Generates query variations
- ContextualCompressionRetriever: Filters and compresses results
- EnsembleRetriever: Combines multiple retrievers
- SelfQueryRetriever: Structured metadata filtering
- ParentDocumentRetriever: Returns parent chunks
Configuration Options
- Search type (similarity, mmr, similarity_score_threshold)
- Number of documents to retrieve (k)
- Score thresholds
- Metadata filtering
- Compression settings
Dependencies
- langchain
- langchain-community
- Vector store client
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/langchain-retriever/SKILL.mdView on GitHub Overview
Implements multiple LangChain retriever types for RAG applications, enabling flexible recall, filtering, and strategy experimentation. It includes vector store retrievers, multi-query variations, contextual compression, ensemble strategies, and self-query filtering to tailor results. Designed to plug into RAG pipelines and advanced patterns.
How This Skill Works
This skill exposes modular retriever types that can be configured via standard options such as k, search type, thresholds, metadata filters, and compression settings. It supports composing retrievers (EnsembleRetriever) and using SelfQueryRetriever for structured filtering, returning relevant docs or parent documents as needed. It relies on LangChain and vector store clients to fetch and rank documents.
When to Use It
- Building a RAG pipeline that needs baseline vector similarity search.
- Improving recall with multi-query retrievers that generate query variations.
- Filtering results through metadata using SelfQueryRetriever.
- Combining strategies with EnsembleRetriever for better coverage.
- Retrieving parent documents to preserve higher-level context.
Quick Start
- Step 1: Choose a retriever type (e.g., VectorStoreRetriever) and set up your vector store.
- Step 2: Configure k, search type, thresholds, and metadata filters for your data.
- Step 3: (Optional) Combine retrievers with EnsembleRetriever or SelfQueryRetriever and integrate into the RAG pipeline; test and iterate.
Best Practices
- Start with a simple VectorStoreRetriever baseline and benchmark results.
- Tune k and choose a suitable search type (similarity vs mmr) for the task.
- Apply metadata filtering early to restrict candidate docs.
- Use ContextualCompressionRetriever to prune results before final ranking.
- Experiment with EnsembleRetriever and SelfQueryRetriever to balance recall and precision.
Example Use Cases
- Customer support bot indexing product manuals with a vector store.
- Academic research assistant using MultiQueryRetriever to explore ideas.
- Enterprise knowledge base with metadata filters for secure access.
- Compliance-focused search using contextual compression to remove noisy results.
- Chat app retrieving parent documents to retain conversation context.