Get the FREE Ultimate OpenClaw Setup Guide →
npx machina-cli add skill Orchestra-Research/AI-Research-SKILLs/faiss --openclaw
Files (1)
SKILL.md
4.9 KB

FAISS - Efficient Similarity Search

Facebook AI's library for billion-scale vector similarity search.

When to use FAISS

Use FAISS when:

  • Need fast similarity search on large vector datasets (millions/billions)
  • GPU acceleration required
  • Pure vector similarity (no metadata filtering needed)
  • High throughput, low latency critical
  • Offline/batch processing of embeddings

Metrics:

  • 31,700+ GitHub stars
  • Meta/Facebook AI Research
  • Handles billions of vectors
  • C++ with Python bindings

Use alternatives instead:

  • Chroma/Pinecone: Need metadata filtering
  • Weaviate: Need full database features
  • Annoy: Simpler, fewer features

Quick start

Installation

# CPU only
pip install faiss-cpu

# GPU support
pip install faiss-gpu

Basic usage

import faiss
import numpy as np

# Create sample data (1000 vectors, 128 dimensions)
d = 128
nb = 1000
vectors = np.random.random((nb, d)).astype('float32')

# Create index
index = faiss.IndexFlatL2(d)  # L2 distance
index.add(vectors)             # Add vectors

# Search
k = 5  # Find 5 nearest neighbors
query = np.random.random((1, d)).astype('float32')
distances, indices = index.search(query, k)

print(f"Nearest neighbors: {indices}")
print(f"Distances: {distances}")

Index types

1. Flat (exact search)

# L2 (Euclidean) distance
index = faiss.IndexFlatL2(d)

# Inner product (cosine similarity if normalized)
index = faiss.IndexFlatIP(d)

# Slowest, most accurate

2. IVF (inverted file) - Fast approximate

# Create quantizer
quantizer = faiss.IndexFlatL2(d)

# IVF index with 100 clusters
nlist = 100
index = faiss.IndexIVFFlat(quantizer, d, nlist)

# Train on data
index.train(vectors)

# Add vectors
index.add(vectors)

# Search (nprobe = clusters to search)
index.nprobe = 10
distances, indices = index.search(query, k)

3. HNSW (Hierarchical NSW) - Best quality/speed

# HNSW index
M = 32  # Number of connections per layer
index = faiss.IndexHNSWFlat(d, M)

# No training needed
index.add(vectors)

# Search
distances, indices = index.search(query, k)

4. Product Quantization - Memory efficient

# PQ reduces memory by 16-32×
m = 8   # Number of subquantizers
nbits = 8
index = faiss.IndexPQ(d, m, nbits)

# Train and add
index.train(vectors)
index.add(vectors)

Save and load

# Save index
faiss.write_index(index, "large.index")

# Load index
index = faiss.read_index("large.index")

# Continue using
distances, indices = index.search(query, k)

GPU acceleration

# Single GPU
res = faiss.StandardGpuResources()
index_cpu = faiss.IndexFlatL2(d)
index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu)  # GPU 0

# Multi-GPU
index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)

# 10-100× faster than CPU

LangChain integration

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

# Create FAISS vector store
vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())

# Save
vectorstore.save_local("faiss_index")

# Load
vectorstore = FAISS.load_local(
    "faiss_index",
    OpenAIEmbeddings(),
    allow_dangerous_deserialization=True
)

# Search
results = vectorstore.similarity_search("query", k=5)

LlamaIndex integration

from llama_index.vector_stores.faiss import FaissVectorStore
import faiss

# Create FAISS index
d = 1536
faiss_index = faiss.IndexFlatL2(d)

vector_store = FaissVectorStore(faiss_index=faiss_index)

Best practices

  1. Choose right index type - Flat for <10K, IVF for 10K-1M, HNSW for quality
  2. Normalize for cosine - Use IndexFlatIP with normalized vectors
  3. Use GPU for large datasets - 10-100× faster
  4. Save trained indices - Training is expensive
  5. Tune nprobe/ef_search - Balance speed/accuracy
  6. Monitor memory - PQ for large datasets
  7. Batch queries - Better GPU utilization

Performance

Index TypeBuild TimeSearch TimeMemoryAccuracy
FlatFastSlowHigh100%
IVFMediumFastMedium95-99%
HNSWSlowFastestHigh99%
PQMediumFastLow90-95%

Resources

Source

git clone https://github.com/Orchestra-Research/AI-Research-SKILLs/blob/main/15-rag/faiss/SKILL.mdView on GitHub

Overview

FAISS is Facebook AI's library for billion-scale vector similarity search. It supports billions of vectors, GPU acceleration, and multiple index types (Flat, IVF, HNSW). It’s ideal for high-performance applications that require pure vector similarity without metadata.

How This Skill Works

You choose an index type (e.g., FlatL2, IVF, HNSW, PQ), train it if needed, add your dense vectors, and query for nearest neighbors. FAISS runs on CPU or GPU, returning distances and indices for the k nearest vectors, enabling fast, scalable similarity search on large datasets.

When to Use It

  • Need fast similarity search on large vector datasets (millions/billions)
  • GPU acceleration required
  • Pure vector similarity (no metadata filtering needed)
  • High throughput, low latency is critical
  • Offline/batch processing of embeddings

Quick Start

  1. Step 1: Install FAISS (faiss-cpu or faiss-gpu) depending on your environment
  2. Step 2: Build an index (e.g., index = faiss.IndexFlatL2(d)); add vectors with index.add(vectors)
  3. Step 3: Run a search (distances, indices = index.search(query, k)) and save/load with faiss.write_index/read_index

Best Practices

  • Choose index type based on accuracy vs. speed: Flat for exact search, IVF or HNSW for fast approximate search, PQ for memory efficiency
  • If using Inner Product (cosine similarity via IP), normalize vectors when appropriate
  • Train IVF/PQ indices on a representative subset before adding all vectors
  • Tune IVF nprobe to balance recall and latency
  • Leverage GPU acceleration when data fits GPU memory and latency is critical

Example Use Cases

  • RAG pipelines for retrieval-augmented generation with billions of document embeddings
  • E-commerce product similarity across billions of items for recommendations
  • Large-scale image or document retrieval using dense embeddings
  • Real-time similarity search in large multimedia catalogs
  • Offline embedding indexing for batch ranking and analysis

Frequently Asked Questions

Add this skill to your agents

Related Skills

llamaindex

Orchestra-Research/AI-Research-SKILLs

Data framework for building LLM applications with RAG. Specializes in document ingestion (300+ connectors), indexing, and querying. Features vector indices, query engines, agents, and multi-modal support. Use for document Q&A, chatbots, knowledge retrieval, or building RAG pipelines. Best for data-centric LLM applications.

dspy

Orchestra-Research/AI-Research-SKILLs

Build complex AI systems with declarative programming, optimize prompts automatically, create modular RAG systems and agents with DSPy - Stanford NLP's framework for systematic LM programming

langchain

Orchestra-Research/AI-Research-SKILLs

Framework for building LLM-powered applications with agents, chains, and RAG. Supports multiple providers (OpenAI, Anthropic, Google), 500+ integrations, ReAct agents, tool calling, memory management, and vector store retrieval. Use for building chatbots, question-answering systems, autonomous agents, or RAG applications. Best for rapid prototyping and production deployments.

chroma

Orchestra-Research/AI-Research-SKILLs

Open-source embedding database for AI applications. Store embeddings and metadata, perform vector and full-text search, filter by metadata. Simple 4-function API. Scales from notebooks to production clusters. Use for semantic search, RAG applications, or document retrieval. Best for local development and open-source projects.

nemo-curator

Orchestra-Research/AI-Research-SKILLs

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora.

qdrant-vector-search

Orchestra-Research/AI-Research-SKILLs

High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance.

Sponsor this space

Reach thousands of developers