weaviate-integration
Scannednpx machina-cli add skill a5c-ai/babysitter/weaviate-integration --openclawFiles (1)
SKILL.md
1.2 KB
Weaviate Integration Skill
Capabilities
- Set up Weaviate cluster (cloud or self-hosted)
- Define schemas with properties and vectorizers
- Implement GraphQL queries
- Configure hybrid search (vector + keyword)
- Set up multi-tenancy
- Implement batch import operations
Target Processes
- vector-database-setup
- rag-pipeline-implementation
Implementation Details
Core Operations
- Schema Management: Class definitions and properties
- Data Import: Single and batch object creation
- Vector Search: nearVector, nearText queries
- Hybrid Search: Combined vector and BM25
- GraphQL: Flexible querying with Get and Aggregate
Configuration Options
- Vectorizer modules (text2vec-, multi2vec-)
- Replication factor
- Sharding configuration
- Multi-tenancy settings
- Module configuration
Best Practices
- Design schema for query patterns
- Use appropriate vectorizer
- Enable hybrid search for better recall
- Configure proper backups
- Monitor resource usage
Dependencies
- weaviate-client
- langchain-weaviate
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/weaviate-integration/SKILL.mdView on GitHub Overview
This skill enables setting up a Weaviate vector database (cloud or self-hosted), defining schemas with vectorizers, and implementing GraphQL queries plus hybrid search. It also covers multi-tenancy and batch data imports for scalable, semantically-rich applications.
How This Skill Works
It defines schemas with class properties and vectorizers, imports data (single or batch), and enables vector search via nearVector and nearText. Hybrid search combines vector results with keyword ranking (BM25), while GraphQL Get and Aggregate queries provide flexible data access; configuration options tailor vectorizers, replication, sharding, and multi-tenancy settings.
When to Use It
- Building a RAG pipeline that relies on a semantic vector store for retrieval
- Setting up a Weaviate cluster (cloud or self-hosted) for enterprise search or knowledge base access
- Implementing GraphQL queries and aggregates to power analytics and dashboards
- Enabling hybrid search (vector + keyword) to improve recall in product, document, or support search
- Managing multi-tenant data isolation in a SaaS or multi-customer environment
Quick Start
- Step 1: Initialize and configure a Weaviate cluster (cloud or self-hosted) and install he required clients like weaviate-client
- Step 2: Define schemas with class definitions, properties, and vectorizers suitable for your data
- Step 3: Import data (single or batch), enable nearVector/nearText searches, and set up hybrid search and GraphQL queries
Best Practices
- Design schema carefully around common query patterns and expected vectorizers
- Choose appropriate vectorizer modules (text2vec-*, multi2vec-*) for your data
- Enable hybrid search to boost recall without sacrificing precision
- Configure regular backups and restore tests for Weaviate data
- Monitor resource usage (CPU/memory) and optimize replication and sharding settings
Example Use Cases
- RAG-powered knowledge assistant using nearText queries over a document collection
- Product catalog search combining embeddings with BM25 keywords for fast, relevant results
- Document ingestion workflow with single and batch imports for large corpora
- Multi-tenant chatbot with isolated namespaces for each customer
- Analytics dashboards powered by GraphQL aggregates over semantically enriched data
Frequently Asked Questions
Add this skill to your agents