Get the FREE Ultimate OpenClaw Setup Guide →

Pytorch

Scanned
npx machina-cli add skill muhammederem/chief/pytorch --openclaw
Files (1)
SKILL.md
8.6 KB

PyTorch Deep Learning Framework

Overview

PyTorch is an open-source machine learning library developed by Facebook's AI Research lab. It provides tensor computation with GPU acceleration and deep neural networks built on a tape-based automatic differentiation system.

Key Features

Dynamic Computation Graphs

PyTorch uses dynamic computational graphs that are built on-the-fly, making debugging easier and enabling more flexible model architectures.

GPU Acceleration

Seamless CUDA integration for GPU-accelerated computing:

import torch

# Check if CUDA is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tensor = torch.randn(1000, 1000).to(device)

Automatic Differentiation

Autograd system for automatic computation of gradients:

x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
    y = y * 2
gradients = torch.autograd.grad(y, x)

Model Design Patterns

Basic Model Structure

import torch.nn as nn
import torch.nn.functional as F

class NeuralNetwork(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

Convolutional Neural Networks

class CNN(nn.Module):
    def __init__(self, num_classes=10):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(64 * 8 * 8, 512)
        self.fc2 = nn.Linear(512, num_classes)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 64 * 8 * 8)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

Transfer Learning

import torchvision.models as models

# Load pretrained model
model = models.resnet50(pretrained=True)

# Freeze early layers
for param in model.parameters():
    param.requires_grad = False

# Replace final layer
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, num_classes)

Training Best Practices

Training Loop Template

def train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs, device):
    model = model.to(device)
    best_val_loss = float('inf')

    for epoch in range(num_epochs):
        # Training phase
        model.train()
        train_loss = 0.0

        for inputs, labels in train_loader:
            inputs, labels = inputs.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            train_loss += loss.item()

        # Validation phase
        model.eval()
        val_loss = 0.0
        correct = 0
        total = 0

        with torch.no_grad():
            for inputs, labels in val_loader:
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs)
                loss = criterion(outputs, labels)
                val_loss += loss.item()

                _, predicted = outputs.max(1)
                total += labels.size(0)
                correct += predicted.eq(labels).sum().item()

        # Save best model
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            torch.save(model.state_dict(), 'best_model.pth')

        print(f'Epoch {epoch+1}/{num_epochs}')
        print(f'Train Loss: {train_loss/len(train_loader):.4f}')
        print(f'Val Loss: {val_loss/len(val_loader):.4f}')
        print(f'Val Acc: {100.*correct/total:.2f}%')

    return model

Optimizer Choice

  • Adam: Default choice for most tasks (lr=0.001)
  • AdamW: Better for transformers (lr=1e-4)
  • SGD with Momentum: Better generalization (lr=0.1, momentum=0.9)

Learning Rate Scheduling

# Reduce on plateau
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode='min', factor=0.1, patience=5
)

# Cosine annealing
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
    optimizer, T_max=num_epochs
)

# One cycle learning
scheduler = torch.optim.lr_scheduler.OneCycleLR(
    optimizer, max_lr=0.01, epochs=num_epochs, steps_per_epoch=len(train_loader)
)

Data Loading

Custom Dataset

from torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset):
    def __init__(self, data, targets, transform=None):
        self.data = data
        self.targets = targets
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        sample = self.data[idx]
        label = self.targets[idx]

        if self.transform:
            sample = self.transform(sample)

        return sample, label

# Create data loaders
train_dataset = CustomDataset(train_data, train_labels, transform=train_transform)
train_loader = DataLoader(
    train_dataset,
    batch_size=32,
    shuffle=True,
    num_workers=4,
    pin_memory=True
)

Data Augmentation

from torchvision import transforms

train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.RandomRotation(15),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

Performance Optimization

Mixed Precision Training

from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

for inputs, labels in train_loader:
    inputs, labels = inputs.to(device), labels.to(device)

    with autocast():
        outputs = model(inputs)
        loss = criterion(outputs, labels)

    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

Gradient Accumulation

accumulation_steps = 4
optimizer.zero_grad()

for i, (inputs, labels) in enumerate(train_loader):
    inputs, labels = inputs.to(device), labels.to(device)

    with autocast():
        outputs = model(inputs)
        loss = criterion(outputs, labels) / accumulation_steps

    scaler.scale(loss).backward()

    if (i + 1) % accumulation_steps == 0:
        scaler.step(optimizer)
        scaler.update()
        optimizer.zero_grad()

Gradient Clipping

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

Checkpointing

Save Checkpoint

checkpoint = {
    'epoch': epoch,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'scheduler_state_dict': scheduler.state_dict(),
    'loss': loss,
}
torch.save(checkpoint, 'checkpoint.pth')

Load Checkpoint

checkpoint = torch.load('checkpoint.pth')
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
scheduler.load_state_dict(checkpoint['scheduler_state_dict'])
start_epoch = checkpoint['epoch'] + 1

Common Issues and Solutions

Out of Memory

  • Reduce batch size
  • Use gradient accumulation
  • Enable gradient checkpointing: model.gradient_checkpointing_enable()
  • Clear cache: torch.cuda.empty_cache()

Slow Training

  • Use pin_memory=True in DataLoader
  • Increase num_workers in DataLoader
  • Enable mixed precision training
  • Use multiple GPUs with DataParallel or DistributedDataParallel

Overfitting

  • Add data augmentation
  • Use dropout: nn.Dropout(0.5)
  • Add L2 regularization via weight decay in optimizer
  • Early stopping based on validation loss

Best Practices Summary

  1. Always use model.eval() for inference and model.train() for training
  2. Use torch.no_grad() context manager during inference
  3. Pin memory (pin_memory=True) for faster GPU transfer
  4. Use mixed precision training for modern GPUs
  5. Save checkpoints regularly with validation metrics
  6. Use learning rate schedulers instead of manual decay
  7. Normalize data using dataset statistics
  8. Set random seeds for reproducibility:
    torch.manual_seed(42)
    torch.cuda.manual_seed_all(42)
    

Integration Points

  • Vector Databases: Store trained embeddings
  • Hugging Face: Load pretrained transformers
  • MLflow: Track experiments and metrics
  • SageMaker: Distributed training
  • FastAPI: Model serving endpoints

Source

git clone https://github.com/muhammederem/chief/blob/main/.claude/skills/ml-ai/pytorch/SKILL.mdView on GitHub

Overview

PyTorch is an open-source machine learning library developed by Facebook's AI Research lab. It provides tensor computation with GPU acceleration and deep neural networks built on a tape-based automatic differentiation system.

How This Skill Works

PyTorch uses dynamic computation graphs that are built on-the-fly, enabling flexible model designs and easier debugging. It integrates seamless CUDA support for GPU acceleration and includes the Autograd system for automatic gradient computation.

When to Use It

  • Prototype ideas quickly with dynamic graphs when model architectures are experimental.
  • Train models on GPUs to speed up computation with CUDA integration.
  • Fine-tune pretrained models via transfer learning to adapt to new tasks.
  • Build common architectures like feedforward nets and CNNs using nn modules.
  • Iterate with a clear training loop using a reusable template.

Quick Start

  1. Step 1: Install PyTorch and set up CUDA if available.
  2. Step 2: Define a model using nn.Module (for example a simple NeuralNetwork).
  3. Step 3: Write a training loop with a loss function, optimizer, and device management.

Best Practices

  • Leverage dynamic graphs to debug and iterate model designs efficiently.
  • Move data and models to the correct device (CPU/GPU) for optimal performance.
  • Use the Autograd system to compute gradients and call backward() or autograd.grad.
  • Define models with nn.Module and explicit forward methods for readability.
  • Adopt a structured training loop and save the best model during training.

Example Use Cases

  • Implement a simple feedforward neural network using nn.Module and a forward pass.
  • Create a CNN with Conv2d, ReLU, pooling, and fully connected layers for image tasks.
  • Fine-tune a pretrained ResNet50 by freezing early layers and replacing the final layer.
  • Use a training loop template to train and validate a model with proper device handling.
  • Switch between CPU and CUDA contexts by moving tensors and models to the target device.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers