Get the FREE Ultimate OpenClaw Setup Guide →
npx machina-cli add skill Orchestra-Research/AI-Research-SKILLs/sentence-transformers --openclaw
Files (1)
SKILL.md
6.2 KB

Sentence Transformers - State-of-the-Art Embeddings

Python framework for sentence and text embeddings using transformers.

When to use Sentence Transformers

Use when:

  • Need high-quality embeddings for RAG
  • Semantic similarity and search
  • Text clustering and classification
  • Multilingual embeddings (100+ languages)
  • Running embeddings locally (no API)
  • Cost-effective alternative to OpenAI embeddings

Metrics:

  • 15,700+ GitHub stars
  • 5000+ pre-trained models
  • 100+ languages supported
  • Based on PyTorch/Transformers

Use alternatives instead:

  • OpenAI Embeddings: Need API-based, highest quality
  • Instructor: Task-specific instructions
  • Cohere Embed: Managed service

Quick start

Installation

pip install sentence-transformers

Basic usage

from sentence_transformers import SentenceTransformer

# Load model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings
sentences = [
    "This is an example sentence",
    "Each sentence is converted to a vector"
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (2, 384)

# Cosine similarity
from sentence_transformers.util import cos_sim
similarity = cos_sim(embeddings[0], embeddings[1])
print(f"Similarity: {similarity.item():.4f}")

Popular models

General purpose

# Fast, good quality (384 dim)
model = SentenceTransformer('all-MiniLM-L6-v2')

# Better quality (768 dim)
model = SentenceTransformer('all-mpnet-base-v2')

# Best quality (1024 dim, slower)
model = SentenceTransformer('all-roberta-large-v1')

Multilingual

# 50+ languages
model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')

# 100+ languages
model = SentenceTransformer('paraphrase-multilingual-mpnet-base-v2')

Domain-specific

# Legal domain
model = SentenceTransformer('nlpaueb/legal-bert-base-uncased')

# Scientific papers
model = SentenceTransformer('allenai/specter')

# Code
model = SentenceTransformer('microsoft/codebert-base')

Semantic search

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('all-MiniLM-L6-v2')

# Corpus
corpus = [
    "Python is a programming language",
    "Machine learning uses algorithms",
    "Neural networks are powerful"
]

# Encode corpus
corpus_embeddings = model.encode(corpus, convert_to_tensor=True)

# Query
query = "What is Python?"
query_embedding = model.encode(query, convert_to_tensor=True)

# Find most similar
hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=3)
print(hits)

Similarity computation

# Cosine similarity
similarity = util.cos_sim(embedding1, embedding2)

# Dot product
similarity = util.dot_score(embedding1, embedding2)

# Pairwise cosine similarity
similarities = util.cos_sim(embeddings, embeddings)

Batch encoding

# Efficient batch processing
sentences = ["sentence 1", "sentence 2", ...] * 1000

embeddings = model.encode(
    sentences,
    batch_size=32,
    show_progress_bar=True,
    convert_to_tensor=False  # or True for PyTorch tensors
)

Fine-tuning

from sentence_transformers import InputExample, losses
from torch.utils.data import DataLoader

# Training data
train_examples = [
    InputExample(texts=['sentence 1', 'sentence 2'], label=0.8),
    InputExample(texts=['sentence 3', 'sentence 4'], label=0.3),
]

train_dataloader = DataLoader(train_examples, batch_size=16)

# Loss function
train_loss = losses.CosineSimilarityLoss(model)

# Train
model.fit(
    train_objectives=[(train_dataloader, train_loss)],
    epochs=10,
    warmup_steps=100
)

# Save
model.save('my-finetuned-model')

LangChain integration

from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"
)

# Use with vector stores
from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=embeddings
)

LlamaIndex integration

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-mpnet-base-v2"
)

from llama_index.core import Settings
Settings.embed_model = embed_model

# Use in index
index = VectorStoreIndex.from_documents(documents)

Model selection guide

ModelDimensionsSpeedQualityUse Case
all-MiniLM-L6-v2384FastGoodGeneral, prototyping
all-mpnet-base-v2768MediumBetterProduction RAG
all-roberta-large-v11024SlowBestHigh accuracy needed
paraphrase-multilingual768MediumGoodMultilingual

Best practices

  1. Start with all-MiniLM-L6-v2 - Good baseline
  2. Normalize embeddings - Better for cosine similarity
  3. Use GPU if available - 10× faster encoding
  4. Batch encoding - More efficient
  5. Cache embeddings - Expensive to recompute
  6. Fine-tune for domain - Improves quality
  7. Test different models - Quality varies by task
  8. Monitor memory - Large models need more RAM

Performance

ModelSpeed (sentences/sec)MemoryDimension
MiniLM~2000120MB384
MPNet~600420MB768
RoBERTa~3001.3GB1024

Resources

Source

git clone https://github.com/Orchestra-Research/AI-Research-SKILLs/blob/main/15-rag/sentence-transformers/SKILL.mdView on GitHub

Overview

Sentence Transformers is a Python framework for producing high-quality embeddings for sentences, text, and images using transformer models. It offers 5000+ pre-trained models for semantic similarity, clustering, and retrieval, with multilingual and multimodal options, suitable for production embedding generation. It’s ideal for RAG, semantic search, and similarity tasks.

How This Skill Works

The library uses PyTorch/Transformers to load pretrained models, encodes inputs into fixed-length vectors, and provides utilities for similarity measures and semantic search. Users can batch-encode data, compare embeddings with cosine or dot-product, and run semantic searches over a corpus or knowledge base.

When to Use It

  • Build a RAG pipeline for knowledge bases and documentation
  • Perform semantic similarity or fuzzy matching across large text corpora
  • Cluster or classify documents based on content
  • Generate multilingual embeddings for cross-language search and retrieval
  • Run embeddings locally in production to avoid API costs and maintain privacy

Quick Start

  1. Step 1: Install the library with pip install sentence-transformers
  2. Step 2: Load a model and encode text: model = SentenceTransformer('all-MiniLM-L6-v2'); embeddings = model.encode(['Example sentence'])
  3. Step 3: Use similarity or search utilities: from sentence_transformers import util; sim = util.cos_sim(embeddings[0], embeddings[0])

Best Practices

  • Select a model that balances speed, dimensionality, and quality for your task
  • Fine-tune or choose domain-specific models when possible
  • Use batch encoding to improve throughput on large datasets
  • Prefer local in-house embeddings for sensitive data and privacy
  • Regularly evaluate and refresh embeddings with updated pre-trained models

Example Use Cases

  • RAG over a corporate knowledge base to answer employee questions
  • Semantic search over an e-commerce catalog to improve product discovery
  • Clustering customer feedback to identify common themes
  • Multilingual retrieval for cross-language customer support articles
  • Multimodal retrieval by matching images with descriptive captions

Frequently Asked Questions

Add this skill to your agents

Related Skills

llamaindex

Orchestra-Research/AI-Research-SKILLs

Data framework for building LLM applications with RAG. Specializes in document ingestion (300+ connectors), indexing, and querying. Features vector indices, query engines, agents, and multi-modal support. Use for document Q&A, chatbots, knowledge retrieval, or building RAG pipelines. Best for data-centric LLM applications.

faiss

Orchestra-Research/AI-Research-SKILLs

Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN search, large-scale vector retrieval, or when you need pure similarity search without metadata. Best for high-performance applications.

langchain

Orchestra-Research/AI-Research-SKILLs

Framework for building LLM-powered applications with agents, chains, and RAG. Supports multiple providers (OpenAI, Anthropic, Google), 500+ integrations, ReAct agents, tool calling, memory management, and vector store retrieval. Use for building chatbots, question-answering systems, autonomous agents, or RAG applications. Best for rapid prototyping and production deployments.

langsmith-observability

Orchestra-Research/AI-Research-SKILLs

LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.

llama-factory

Orchestra-Research/AI-Research-SKILLs

Expert guidance for fine-tuning LLMs with LLaMA-Factory - WebUI no-code, 100+ models, 2/3/4/5/6/8-bit QLoRA, multimodal support

chroma

Orchestra-Research/AI-Research-SKILLs

Open-source embedding database for AI applications. Store embeddings and metadata, perform vector and full-text search, filter by metadata. Simple 4-function API. Scales from notebooks to production clusters. Use for semantic search, RAG applications, or document retrieval. Best for local development and open-source projects.

Sponsor this space

Reach thousands of developers