Huggingface
npx machina-cli add skill muhammederem/chief/huggingface --openclawHugging Face Transformers
Overview
Hugging Face Transformers is a library providing pre-trained models for Natural Language Processing (NLP), Computer Vision, and Audio tasks. It supports PyTorch, TensorFlow, and JAX.
Installation
pip install transformers datasets evaluate accelerate
# For specific model types
pip install transformers[sentencepiece] # For tokenizers like SentencePiece
Core Components
Model Loading
from transformers import AutoModel, AutoTokenizer
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
Pipeline API
from transformers import pipeline
# Text classification
classifier = pipeline("sentiment-analysis")
result = classifier("I love this product!")
# Question answering
qa = pipeline("question-answering")
result = qa(question="What is AI?", context="Artificial intelligence is...")
# Text generation
generator = pipeline("text-generation", model="gpt2")
result = generator("Once upon a time")
# Named entity recognition
ner = pipeline("ner", aggregation_strategy="simple")
result = ner("Apple is looking at buying U.K. startup")
Tokenization
Basic Usage
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# Single text
tokens = tokenizer("Hello, world!")
print(tokens) # {'input_ids': [...], 'attention_mask': [...]}
# Multiple texts
tokens = tokenizer(["Hello", "World"], padding=True, truncation=True)
# Decode
text = tokenizer.decode(tokens["input_ids"][0])
Advanced Tokenization
# With return tensors
tokens = tokenizer(
"Text here",
padding="max_length",
truncation=True,
max_length=512,
return_tensors="pt" # Return PyTorch tensors
)
# Slow vs fast tokenizers
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased", use_fast=True)
Fine-Tuning
Prepare Dataset
from datasets import load_dataset
dataset = load_dataset("glue", "mrpc")
# Tokenize
def tokenize_function(examples):
return tokenizer(
examples["sentence1"],
examples["sentence2"],
padding="max_length",
truncation=True,
max_length=128,
)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
Training with Trainer API
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
model = AutoModelForSequenceClassification.from_pretrained(
"bert-base-uncased",
num_labels=2
)
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=3,
weight_decay=0.01,
logging_dir="./logs",
save_strategy="epoch",
load_best_model_at_end=True,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
)
trainer.train()
Training with Custom Loop
from transformers import AdamW, get_linear_schedule_with_warmup
from torch.utils.data import DataLoader
optimizer = AdamW(model.parameters(), lr=2e-5)
dataloader = DataLoader(tokenized_datasets["train"], batch_size=16)
num_epochs = 3
num_training_steps = num_epochs * len(dataloader)
scheduler = get_linear_schedule_with_warmup(
optimizer,
num_warmup_steps=0,
num_training_steps=num_training_steps
)
model.train()
for epoch in range(num_epochs):
for batch in dataloader:
outputs = model(**batch)
loss = outputs.loss
loss.backward()
optimizer.step()
scheduler.step()
optimizer.zero_grad()
Parameter-Efficient Fine-Tuning (PEFT)
LoRA (Low-Rank Adaptation)
from peft import LoraConfig, get_peft_model
peft_config = LoraConfig(
task_type="SEQ_CLS",
inference_mode=False,
r=8,
lora_alpha=32,
lora_dropout=0.1,
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
QLoRA (Quantized LoRA)
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b",
quantization_config=bnb_config,
device_map="auto",
)
Model Architectures
BERT-Based Models
- BERT: Bidirectional Encoder Representations from Transformers
- RoBERTa: Optimized BERT training
- DistilBERT: Smaller, faster BERT
- ALBERT: A Lite BERT
GPT-Based Models
- GPT-2, GPT-3: Autoregressive language models
- Llama 2: Open-source LLM from Meta
- Mistral: Efficient open-source LLM
T5-Based Models
- T5: Text-to-Text Transfer Transformer
- FLAN-T5: Instruction-tuned T5
Vision Models
- ViT: Vision Transformer
- Swin: Swin Transformer
- CLIP: Contrastive Language-Image Pre-training
Common Tasks
Text Classification
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(
"bert-base-uncased",
num_labels=3 # For 3-class classification
)
Question Answering
from transformers import AutoModelForQuestionAnswering
model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
Summarization
from transformers import pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer(article_text, max_length=130, min_length=30)
Translation
translator = pipeline("translation_en_to_de", model="Helsinki-NLP/opus-mt-en-de")
result = translator("Hello, how are you?")
Text Generation
generator = pipeline("text-generation", model="gpt2")
generated = generator(
"The future of AI is",
max_length=100,
num_return_sequences=3,
temperature=0.7,
)
Model Hub Integration
Upload Model
from huggingface_hub import login, upload_folder
login(token="your_token_here")
model.push_to_hub("your-username/your-model-name")
tokenizer.push_to_hub("your-username/your-model-name")
Load from Hub
model = AutoModel.from_pretrained("username/model-name")
Model Cards
Always include a model card with:
- Model description
- Training data
- Intended uses
- Limitations
- Ethical considerations
Best Practices
1. Use the Right Model for the Task
- Classification: BERT, RoBERTa
- Generation: GPT, Llama, Mistral
- QA: BERT-large, RoBERTa-large
- Summarization: BART, T5
2. Handle Long Sequences
# Sliding window approach
from transformers import pipeline
classifier = pipeline("sentiment-analysis", model="bert-base-uncased")
results = classifier(long_text, truncation=True, max_length=512)
3. Dynamic Padding
from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
4. Evaluation Metrics
import evaluate
accuracy = evaluate.load("accuracy")
f1 = evaluate.load("f1")
predictions = trainer.predict(tokenized_datasets["validation"])
metrics = {
"accuracy": accuracy.compute(predictions=predictions),
"f1": f1.compute(predictions=predictions),
}
5. Save and Load
# Save
model.save_pretrained("./my-model")
tokenizer.save_pretrained("./my-model")
# Load
model = AutoModel.from_pretrained("./my-model")
tokenizer = AutoTokenizer.from_pretrained("./my-model")
Performance Optimization
Flash Attention
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b",
use_flash_attention_2=True,
)
BetterTransformer
from transformers import BetterTransformer
model = BetterTransformer.transform(model)
torch.compile (PyTorch 2.0+)
import torch
model = torch.compile(model)
Integration
- LangChain: Use Hugging Face models in LLM applications
- Vector Databases: Generate embeddings for semantic search
- MLflow: Track training experiments
- SageMaker: Deploy at scale
Source
git clone https://github.com/muhammederem/chief/blob/main/.claude/skills/ml-ai/huggingface/SKILL.mdView on GitHub Overview
Hugging Face Transformers is a library providing pre-trained models for Natural Language Processing (NLP), Computer Vision, and Audio tasks, with support for PyTorch, TensorFlow, and JAX. It offers model loading, tokenization, pipelines, and fine-tuning utilities to accelerate AI development.
How This Skill Works
Models are loaded with AutoModel and AutoTokenizer, then applied via the Pipeline API for tasks like sentiment analysis, question answering, text generation, or named entity recognition. You can fine-tune using the Trainer API or a custom training loop, and explore parameter-efficient tuning with PEFT such as LoRA.
When to Use It
- You need quick NLP tasks like sentiment analysis, QA, or NER using pre-trained models via simple pipelines
- You want to fine-tune a model on a custom dataset using the Trainer API
- You need tokenization and preprocessing with AutoTokenizer for consistent input
- You want end-to-end workflows for NLP, CV, or audio tasks using the Pipeline API
- You want to experiment with parameter-efficient fine-tuning using PEFT like LoRA
Quick Start
- Step 1: Install transformers along with datasets, evaluate, and accelerate
- Step 2: Load a model and tokenizer with AutoModel/AutoTokenizer or use a ready-made pipeline
- Step 3: Run a simple inference or fine-tune with Trainer on a tokenized dataset
Best Practices
- Choose the right pipeline for the task: sentiment-analysis, question-answering, text-generation, or ner
- Load models with AutoModel and AutoTokenizer to ensure compatibility with the chosen weights
- Prefer fast tokenizers when available to speed up preprocessing
- Use datasets.load_dataset to prepare data and apply a tokenization function during mapping
- Fine-tune with Trainer or a custom loop and save the best model at checkpoints
Example Use Cases
- Text classification with the sentiment-analysis pipeline on product reviews
- Question answering over documents using the question-answering pipeline
- Text generation with a GPT-2 style model for story prompts
- Named entity recognition on news articles with the ner pipeline
- Fine-tuning a BERT model on the GLUE MRPC dataset using the Trainer API