Which embedding providers are supported?

OpenAI (text-embedding-ada-002, text-embedding-3-*), HuggingFace (sentence-transformers), Cohere (embed-v3), Voyage AI (voyage-2), and local GGUF/ONNX models.

What caching backends are supported?

Redis and SQLite are supported backends for embedding caching.

How are rate limits handled?

Rate limit settings control how often requests are issued; the skill implements retries with backoff and respects provider quotas to prevent failures.

rag-embedding-generation

npx machina-cli add skill a5c-ai/babysitter/rag-embedding-generation --openclaw

Files (1)

SKILL.md

1.3 KB

RAG Embedding Generation Skill

Capabilities

Generate embeddings with multiple providers
Implement batch processing for large datasets
Configure caching for embedding reuse
Handle rate limiting and retries
Support various embedding models
Implement embedding quality validation

Target Processes

rag-pipeline-implementation
vector-database-setup

Implementation Details

Embedding Providers

OpenAI Embeddings: text-embedding-ada-002, text-embedding-3-*
HuggingFace: sentence-transformers models
Cohere: embed-v3 models
Voyage AI: voyage-2 models
Local Models: GGUF/ONNX embedding models

Configuration Options

Model selection and parameters
Batch size optimization
Cache backend configuration
Rate limit settings
Retry policies
Dimensionality settings

Best Practices

Use appropriate model for domain
Implement caching for cost reduction
Monitor embedding quality
Handle API errors gracefully

Dependencies

langchain-openai / langchain-huggingface
numpy
Caching backend (Redis, SQLite)

Source

git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/rag-embedding-generation/SKILL.md

View on GitHub

Overview

This skill orchestrates batch embedding generation across multiple providers (OpenAI, HuggingFace, Cohere, Voyage AI, and local models) for scalable RAG workflows. It supports caching to reuse embeddings, rate limiting with retries, and embedding quality validation, enabling efficient, cost-aware vector database pipelines such as rag-pipeline-implementation and vector-database-setup.

How This Skill Works

Inputs are batched, a provider and model are selected from config, and embeddings are generated with rate-limiting and retry logic. Generated embeddings are cached in a backend (Redis or SQLite) to avoid recomputation, then they are returned in a form ready for vector databases.

When to Use It

Indexing or updating a vector database in rag-pipeline-implementation.
Scaling embedding generation for large datasets via batching.
Using multiple providers to compare embeddings or meet quotas.
Caching embeddings to cut costs on repetitive queries.
Validating embedding quality and dimensionality before storage.

Quick Start

Step 1: Configure providers, models, batch size, cache backend, rate limits, and retries in your workflow.
Step 2: Run the rag-embedding-generation batch job to generate + cache embeddings.
Step 3: Validate embedding quality and store vectors in your vector database.

Best Practices

Choose domain-appropriate models for better semantic fidelity.
Enable caching to reduce API calls and latency.
Tune batch size and provider mix per workload.
Monitor embedding quality and consistency across providers.
Implement robust error handling and retry backoff.

Example Use Cases

Batch-create OpenAI or HuggingFace embeddings for a product catalog and cache them for fast retrieval.
Index a knowledge base by generating and caching embeddings, then populate a vector store.
Compare embeddings from OpenAI vs. HuggingFace to select the best fit for a domain.
Cache frequently queried embeddings to prevent repeated API calls in a customer support bot.
Use local GGUF/ONNX models for offline embedding generation in restricted environments.

Frequently Asked Questions

Add this skill to your agents