haystack-pipeline
Scannednpx machina-cli add skill a5c-ai/babysitter/haystack-pipeline --openclawFiles (1)
SKILL.md
1.2 KB
Haystack Pipeline Skill
Capabilities
- Configure Haystack pipeline components
- Set up document stores and retrievers
- Implement reader/generator models
- Design custom pipeline graphs
- Configure preprocessing pipelines
- Implement evaluation pipelines
Target Processes
- rag-pipeline-implementation
- intent-classification-system
Implementation Details
Core Components
- DocumentStores: Elasticsearch, Weaviate, FAISS, etc.
- Retrievers: BM25, Dense, Hybrid
- Readers/Generators: Extractive and generative QA
- Preprocessors: Document cleaning and splitting
Pipeline Types
- Retrieval pipelines
- RAG pipelines
- Evaluation pipelines
- Indexing pipelines
Configuration Options
- Component selection
- Pipeline graph design
- Document store backend
- Model selection
- Preprocessing settings
Best Practices
- Modular pipeline design
- Proper preprocessing
- Evaluation integration
- Component versioning
Dependencies
- haystack-ai
- farm-haystack (legacy)
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/haystack-pipeline/SKILL.mdView on GitHub Overview
This skill configures Haystack NLP pipelines for document processing and QA. It covers setting up document stores and retrievers, integrating readers or generators, and designing custom pipeline graphs with preprocessing and evaluation steps.
How This Skill Works
You select core components (DocumentStore, Retriever, Reader/Generator), design a pipeline graph, and configure preprocessing. Then you pick a backend (Elasticsearch, Weaviate, FAISS, etc.) and appropriate models for retrieval and QA, building retrieval, RAG, indexing, or evaluation pipelines as needed.
When to Use It
- When building a retrieval-augmented QA system over a knowledge base
- When indexing a large document corpus for fast search
- When evaluating QA models and retrievers with an integrated pipeline
- When designing custom multi-stage pipelines with modular components
- When switching document stores or backends to optimize performance or scale
Quick Start
- Step 1: Choose a DocumentStore backend (e.g., Elasticsearch, Weaviate, or FAISS) and select a retriever and QA model
- Step 2: Design a pipeline graph (retrieve -> reader/generator) and configure preprocessing (cleaning and splitting)
- Step 3: Run the pipeline, review results, and iterate on components and settings
Best Practices
- Keep pipelines modular and swap components independently
- Apply proper preprocessing: cleaning and splitting documents
- Integrate evaluation steps to monitor QA accuracy over time
- Version-control component configurations and models
- Test pipelines locally before production deployment
Example Use Cases
- Building a RAG QA system over a company's knowledge base using a BM25 retriever and a generative reader
- Setting up a FAISS-based indexing pipeline for fast similarity search over a large corpus
- An extractive QA setup with BM25 or dense retrievers over product manuals
- A generative QA workflow combining a retriever with a generator for open-ended answers
- An evaluation pipeline that compares QA models and retrievers to track improvements
Frequently Asked Questions
Add this skill to your agents