optimization

(15 skills)

AI agent skills tagged “optimization” for Claude Code, Cursor, Windsurf, and more.

sql-optimization

chaterm/terminal-skills

SQL 优化与调优

tuning

chaterm/terminal-skills

--- name: tuning description: 系统调优 version: 1.0.0 author: terminal-skills tags: [performance, tuning, sysctl, kernel, optimization] --- # 系统调优 ## 概述内核参数、文件系统、网络优化技能。 ## 内核参数调优 ### 内存管理 ```bash # /etc/sysctl.d/99-memory.conf # 减少交换倾向 vm.swappiness = 10 # 脏页刷新 vm.dirty_ratio = 20 vm.dirty_backg

quantizing-models-bitsandbytes

Orchestra-Research/AI-Research-SKILLs

4.3k

Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4, FP4 formats, QLoRA training, and 8-bit optimizers. Works with HuggingFace Transformers.

deepspeed

Orchestra-Research/AI-Research-SKILLs

4.3k

Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention

awq-quantization

Orchestra-Research/AI-Research-SKILLs

4.3k

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

unsloth

Orchestra-Research/AI-Research-SKILLs

4.3k

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

optimizing-attention-flash

Orchestra-Research/AI-Research-SKILLs

4.3k

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.

gptq

Orchestra-Research/AI-Research-SKILLs

4.3k

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× memory reduction with <2% perplexity degradation, or for faster inference (3-4× speedup) vs FP16. Integrates with transformers and PEFT for QLoRA fine-tuning.

hqq-quantization

Orchestra-Research/AI-Research-SKILLs

4.3k

Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets, for fast quantization workflows, or when deploying with vLLM or HuggingFace Transformers.

smart-sourcing

athola/claude-night-market

197

balancing accuracy with token efficiency.

gguf-quantization

Orchestra-Research/AI-Research-SKILLs

4.3k

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

performance-optimization

JanSzewczyk/claude-plugins

Performance optimization patterns for Next.js applications. Covers bundle analysis, React rendering optimization, database query optimization, Core Web Vitals, image optimization, and caching strategies.

response-compression

athola/claude-night-market

197

hype, and unnecessary framing. Includes termination and directness guidelines.

Neon Egress Optimizer

openclaw/skills

Audit and optimize database queries to minimize egress (outbound data transfer) costs on Neon Postgres and other cloud databases. Use this skill whenever the user mentions high database costs, Neon billing, egress charges, slow queries, database optimization, query performance, SELECT *, overfetching, N+1 queries, caching database results, or wants to reduce data transfer from their database — even if they don't specifically say 'egress'.

claude-md-optimizer

smith-horn/product-builder-starter

Optimize oversized CLAUDE.md files using progressive disclosure. Analyzes content tiers, detects encryption constraints, creates sub-documents, and rewrites the main file with a Sub-Documentation Table. Triggers: optimize CLAUDE.md, reduce CLAUDE.md size, CLAUDE.md too long, apply progressive disclosure to CLAUDE.md