Vector_dbs
npx machina-cli add skill muhammederem/chief/vector_dbs --openclawVector Databases for Semantic Search
Overview
Vector databases store and query high-dimensional vector embeddings, enabling semantic search, recommendation systems, and RAG applications.
Key Concepts
Embeddings
Vector representations of text/images that capture semantic meaning:
- Text: 384-1536 dimensions (OpenAI, Sentence Transformers)
- Images: 512-2048 dimensions (CLIP, ResNet)
- Dense vectors vs sparse vectors
Similarity Metrics
- Cosine Similarity: Angle between vectors (range: -1 to 1)
- Euclidean Distance: Straight-line distance
- Dot Product: Unnormalized cosine similarity
Key Operations
- Index: Store vectors with metadata
- Search: Find nearest neighbors
- Delete: Remove vectors
- Update: Modify vectors or metadata
Pinecone
Setup
pip install pinecone-client
import pinecone
# Initialize
pinecone.init(
api_key="your-api-key",
environment="us-west1-gcp"
)
# Create index
pinecone.create_index(
name="my-index",
dimension=1536,
metric="cosine",
pod_type="p1.x1"
)
# Connect
index = pinecone.Index("my-index")
Basic Operations
# Upsert vectors
index.upsert(
vectors=[
("vec1", [0.1, 0.2, ...], {"category": "tech"}),
("vec2", [0.3, 0.4, ...], {"category": "news"})
]
)
# Query
results = index.query(
vector=[0.1, 0.2, ...],
top_k=10,
include_metadata=True
)
# Delete
index.delete(ids=["vec1", "vec2"])
Filtering
results = index.query(
vector=query_vector,
filter={"category": {"$eq": "tech"}},
top_k=10
)
Weaviate
Setup
pip install weaviate-client
import weaviate
# Connect
client = weaviate.Client("http://localhost:8080")
# Create class
client.schema.create_class({
"class": "Document",
"properties": [
{"name": "text", "dataType": ["text"]},
{"name": "category", "dataType": ["string"]}
],
"vectorizer": "text2vec-openai"
})
Basic Operations
# Add data object
client.data_object.create(
class_name="Document",
data_object={
"text": "Sample text",
"category": "tech"
}
)
# Query
results = client.query.get(
"Document",
["text", "category"]
).with_near_vector({
"vector": query_vector,
"certainty": 0.7
}).with_limit(10).do()
# Delete
client.data_object.delete(
class_name="Document",
uuid=obj_id
)
Hybrid Search
results = client.query.get(
"Document",
["text"]
).with_hybrid(
query="search terms",
vector=query_vector,
alpha=0.5 # 0 = keyword, 1 = vector
).with_limit(10).do()
Qdrant
Setup
pip install qdrant-client
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
# Connect
client = QdrantClient("localhost", port=6333)
# Create collection
client.create_collection(
collection_name="my_collection",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)
Basic Operations
# Upsert points
client.upsert(
collection_name="my_collection",
points=[
PointStruct(id=1, vector=[0.1, 0.2, ...], payload={"text": "sample"}),
PointStruct(id=2, vector=[0.3, 0.4, ...], payload={"text": "example"})
]
)
# Search
results = client.search(
collection_name="my_collection",
query_vector=[0.1, 0.2, ...],
limit=10,
with_payload=True
)
# Delete
client.delete(
collection_name="my_collection",
points_selector=[1, 2]
)
Filtering
from qdrant_client.models import Filter
results = client.search(
collection_name="my_collection",
query_vector=query_vector,
query_filter=Filter(
must=[{"key": "category", "match": {"value": "tech"}}]
)
)
Embedding Generation
OpenAI Embeddings
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(openai_api_key="your-key")
vector = embeddings.embed_query("Your text here")
Sentence Transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
vectors = model.encode(["text1", "text2"])
Hugging Face
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
# Generate embeddings
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
embeddings = model(**inputs).last_hidden_state.mean(dim=1)
Best Practices
1. Embedding Strategy
- Domain-specific: Use models fine-tuned on your domain
- Multilingual: Use multilingual models for international content
- Batch processing: Embed in batches for efficiency
2. Chunking Strategies
# Fixed size
chunks = [text[i:i+1000] for i in range(0, len(text), 1000)]
# Semantic
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
3. Metadata Design
- Store relevant filtering fields
- Include timestamps, sources, categories
- Keep metadata lightweight
4. Index Optimization
- Pinecone: Choose appropriate pod type (s1 vs p1)
- Weaviate: Use HNSW for fast approximate search
- Qdrant: Tune quantization for memory efficiency
5. Query Optimization
# Hybrid search (vector + keyword)
# Re-ranking
# Filtering before vector search
# Caching frequent queries
RAG Integration
End-to-End Pipeline
from langchain_community.vectorstores import Pinecone
from langchain_openai import OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
# Vector store
vectorstore = Pinecone.from_documents(
documents=documents,
embedding=OpenAIEmbeddings(),
index_name="rag-index"
)
# RAG chain
qa = RetrievalQA.from_chain_type(
llm=ChatOpenAI(model="gpt-4"),
chain_type="stuff",
retriever=vectorstore.as_retriever(search_kwargs={"k": 3})
)
Advanced RAG
# Multi-query retrieval
from langchain.retrievers import MultiQueryRetriever
retriever = MultiQueryRetriever.from_llm(
retriever=vectorstore.as_retriever(),
llm=ChatOpenAI()
)
# Contextual compression
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
compressor = LLMChainExtractor.from_llm(ChatOpenAI())
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=vectorstore.as_retriever()
)
Performance Tuning
Pinecone
- Use namespaces for multi-tenancy
- Batch upsert (max 100 vectors per batch)
- Choose pod type based on needs (s1 for storage, p1 for performance)
Weaviate
- Tune HNSW parameters (ef_construction, M, ef)
- Enable replication for high availability
- Use sharding for large datasets
Qdrant
- Enable quantization for memory savings
- Use optimizers for index building
- Tune search parameters (hnsw_ef, payload_index)
Comparison
| Feature | Pinecone | Weaviate | Qdrant |
|---|---|---|---|
| Managed | ✓ | Self-hosted | Both |
| Open Source | ✗ | ✓ | ✓ |
| Hybrid Search | ✗ | ✓ | ✓ |
| Filtering | ✓ | ✓ | ✓ |
| Scalability | High | High | High |
| Setup | Easiest | Medium | Medium |
Common Patterns
Semantic Search
def semantic_search(query, top_k=5):
query_vector = embeddings.embed_query(query)
results = index.query(vector=query_vector, top_k=top_k)
return results
Recommendation System
def find_similar_items(item_id, top_k=10):
item_vector = get_item_vector(item_id)
results = index.query(vector=item_vector, top_k=top_k)
return results
Deduplication
def find_duplicates(text, threshold=0.95):
vector = embeddings.embed_query(text)
results = index.query(vector=vector, top_k=10)
duplicates = [r for r in results if r.score > threshold]
return duplicates
Integration
- LangChain: All vector stores supported
- LlamaIndex: Vector store integrations
- Haystack: Document stores
- Embedding Models: OpenAI, Cohere, Sentence Transformers
Source
git clone https://github.com/muhammederem/chief/blob/main/.claude/skills/ml-ai/vector_dbs/SKILL.mdView on GitHub Overview
Vector databases store and query high-dimensional embeddings, enabling semantic search, recommendations, and RAG applications. They support text and image embeddings and attach metadata to each vector, powering precise retrieval and personalized experiences.
How This Skill Works
Embeddings convert content into dense high-dimensional vectors that capture meaning. A vector DB indexes these vectors with metadata and uses similarity metrics such as cosine similarity, Euclidean distance, or dot product to find nearest neighbors during a search. Core operations include upsert (index), search, delete, and update to keep results fresh.
When to Use It
- Implement semantic search over documents, FAQs, or knowledge bases.
- Build personalized recommendations using vector similarity.
- Power Retrieval-Augmented Generation (RAG) with LLMs by feeding relevant context.
- Perform multimodal search that combines text and image embeddings.
- Enable hybrid search that blends keyword filters with vector proximity.
Quick Start
- Step 1: Install and connect to a vector DB client (Pinecone, Weaviate, or Qdrant) and set up your environment.
- Step 2: Generate embeddings for your data (e.g., OpenAI embeddings) and prepare metadata.
- Step 3: Create an index/collection with the correct vector size, upsert vectors with metadata, then run a nearest-neighbor query.
Best Practices
- Choose embeddings that suit your data and language.
- Align the dimension and distance metric with your similarity goal.
- Store metadata to enable filtered or hybrid queries.
- Batch upserts and monitor indexing performance and cost.
- Validate results with end-to-end prompts and user feedback.
Example Use Cases
- Semantic search over enterprise docs using Pinecone with 1536-dim embeddings and cosine metric.
- Product recommendations driven by vector similarity in a recommender system.
- RAG pipelines using Weaviate with text2vec-openai vectorization.
- Image similarity search using Qdrant with 1536-dim vectors.
- Hybrid search combining keyword queries with vector proximity in a Weaviate setup.