rag-embedding-generation
npx machina-cli add skill a5c-ai/babysitter/rag-embedding-generation --openclawFiles (1)
SKILL.md
1.3 KB
RAG Embedding Generation Skill
Capabilities
- Generate embeddings with multiple providers
- Implement batch processing for large datasets
- Configure caching for embedding reuse
- Handle rate limiting and retries
- Support various embedding models
- Implement embedding quality validation
Target Processes
- rag-pipeline-implementation
- vector-database-setup
Implementation Details
Embedding Providers
- OpenAI Embeddings: text-embedding-ada-002, text-embedding-3-*
- HuggingFace: sentence-transformers models
- Cohere: embed-v3 models
- Voyage AI: voyage-2 models
- Local Models: GGUF/ONNX embedding models
Configuration Options
- Model selection and parameters
- Batch size optimization
- Cache backend configuration
- Rate limit settings
- Retry policies
- Dimensionality settings
Best Practices
- Use appropriate model for domain
- Implement caching for cost reduction
- Monitor embedding quality
- Handle API errors gracefully
Dependencies
- langchain-openai / langchain-huggingface
- numpy
- Caching backend (Redis, SQLite)
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/rag-embedding-generation/SKILL.mdView on GitHub Overview
This skill orchestrates batch embedding generation across multiple providers (OpenAI, HuggingFace, Cohere, Voyage AI, and local models) for scalable RAG workflows. It supports caching to reuse embeddings, rate limiting with retries, and embedding quality validation, enabling efficient, cost-aware vector database pipelines such as rag-pipeline-implementation and vector-database-setup.
How This Skill Works
Inputs are batched, a provider and model are selected from config, and embeddings are generated with rate-limiting and retry logic. Generated embeddings are cached in a backend (Redis or SQLite) to avoid recomputation, then they are returned in a form ready for vector databases.
When to Use It
- Indexing or updating a vector database in rag-pipeline-implementation.
- Scaling embedding generation for large datasets via batching.
- Using multiple providers to compare embeddings or meet quotas.
- Caching embeddings to cut costs on repetitive queries.
- Validating embedding quality and dimensionality before storage.
Quick Start
- Step 1: Configure providers, models, batch size, cache backend, rate limits, and retries in your workflow.
- Step 2: Run the rag-embedding-generation batch job to generate + cache embeddings.
- Step 3: Validate embedding quality and store vectors in your vector database.
Best Practices
- Choose domain-appropriate models for better semantic fidelity.
- Enable caching to reduce API calls and latency.
- Tune batch size and provider mix per workload.
- Monitor embedding quality and consistency across providers.
- Implement robust error handling and retry backoff.
Example Use Cases
- Batch-create OpenAI or HuggingFace embeddings for a product catalog and cache them for fast retrieval.
- Index a knowledge base by generating and caching embeddings, then populate a vector store.
- Compare embeddings from OpenAI vs. HuggingFace to select the best fit for a domain.
- Cache frequently queried embeddings to prevent repeated API calls in a customer support bot.
- Use local GGUF/ONNX models for offline embedding generation in restricted environments.
Frequently Asked Questions
Add this skill to your agents