huggingface
npx machina-cli add skill G1Joshi/Agent-Skills/huggingface --openclawHugging Face
Hugging Face is the GitHub of AI. It hosts 1M+ models. 2025 sees massive growth in Multimodal models and Robotics (LeRobot).
When to Use
- Model Discovery: Finding the SOTA open-source model for any task.
- Inference:
transformerslibrary is the standard way to run models in Python. - Datasets: Accessing standard datasets (
load_dataset('squad')).
Core Concepts
Transformers Library
The API to download and run models. pipeline('sentiment-analysis').
Hugging Face Hub (Hugging Face CLI)
Versioning, git-based storage for large model weights (git lfs).
Spaces
Hosting simple Gradio/Streamlit apps for model demos.
Best Practices (2025)
Do:
- Use
bitsandbytes: Load 70B models in 4-bit precision easily. - Use
accelerate: For multi-GPU training/inference distributed across devices. - Push to Hub: Share your fine-tunes.
Don't:
- Don't hardcode paths: Use
from_pretrained("repo/id")to auto-cache models.
References
Source
git clone https://github.com/G1Joshi/Agent-Skills/blob/main/skills/ai-ml/huggingface/SKILL.mdView on GitHub Overview
Hugging Face provides access to a vast ecosystem of NLP models via the Hub and the transformers library. It enables model discovery, fast Python-based inference, and standardized datasets, making it central to modern AI workflows.
How This Skill Works
Install and import transformers to download and run models via simple APIs like pipeline. The Hub handles versioning and large weights with git LFS, while Spaces lets you deploy quick demos for models.
When to Use It
- Finding SOTA open-source models for a task
- Running models in Python with the transformers library
- Accessing standard datasets with load_dataset
- Prototyping model demos via Spaces (Gradio/Streamlit)
- Sharing fine-tuned models to the Hub
Quick Start
- Step 1: Install transformers, datasets, and huggingface_hub (and accelerate if needed)
- Step 2: Load a model with from_pretrained('repo/id') or use pipeline('sentiment-analysis')
- Step 3: Optional: push your fine-tuned model to the Hub with push_to_hub and explore datasets/spaces
Best Practices
- Use bitsandbytes to load 70B models in 4-bit precision
- Use accelerate for multi-GPU training/inference distributed across devices
- Push to Hub to share your fine-tunes
- Use from_pretrained("repo/id") to auto-cache models
- Leverage datasets via load_dataset to build reproducible data pipelines
Example Use Cases
- Discovering a SOTA sentiment-analysis model for product reviews
- Running a 70B model in 4-bit precision with bitsandbytes for inference
- Fine-tuning a model and pushing it to the Hub for sharing
- Loading SQuAD via load_dataset('squad') for QA tasks
- Building a quick demo app on Spaces (Gradio/Streamlit)