Reinforcement Learning
(6 skills)AI agent skills tagged “Reinforcement Learning” for Claude Code, Cursor, Windsurf, and more.
grpo-rl-training
Orchestra-Research/AI-Research-SKILLs
Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training
fine-tuning-with-trl
Orchestra-Research/AI-Research-SKILLs
Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.
torchforge-rl-training
Orchestra-Research/AI-Research-SKILLs
Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms. Use when you want clean RL abstractions, easy algorithm experimentation, or scalable training with Monarch and TorchTitan.
slime-rl-training
Orchestra-Research/AI-Research-SKILLs
Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.
verl-rl-training
Orchestra-Research/AI-Research-SKILLs
Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.
miles-rl-training
Orchestra-Research/AI-Research-SKILLs
Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.