DeepSpeed

(4 skills)

AI agent skills tagged “DeepSpeed” for Claude Code, Cursor, Windsurf, and more.

huggingface-accelerate

Orchestra-Research/AI-Research-SKILLs

Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.

deepspeed

Orchestra-Research/AI-Research-SKILLs

4.3k

Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention

moe-training

Orchestra-Research/AI-Research-SKILLs

4.3k

Train Mixture of Experts (MoE) models using DeepSpeed or HuggingFace. Use when training large-scale models with limited compute (5× cost reduction vs dense models), implementing sparse architectures like Mixtral 8x7B or DeepSeek-V3, or scaling model capacity without proportional compute increase. Covers MoE architectures, routing mechanisms, load balancing, expert parallelism, and inference optimization.

pytorch-lightning

Orchestra-Research/AI-Research-SKILLs

4.3k

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.