Get the FREE Ultimate OpenClaw Setup Guide →

ml-validate

npx machina-cli add skill nishide-dev/claude-code-ml-research/ml-validate --openclaw
Files (1)
SKILL.md
9.6 KB

ML Project Validation

Comprehensive validation of ML project structure, configurations, code quality, and training readiness.

Quick Start

# Run full validation
python scripts/validate_project.py

# Quick config check
python src/train.py --cfg job

# Fast dev run (1 batch train/val/test)
python src/train.py trainer.fast_dev_run=true

Validation Checks

1. Project Structure

Required directories:

  • src/ - Source code
  • src/models/ - Model implementations
  • src/data/ - DataModule implementations
  • configs/ - Hydra configuration files
  • tests/ - Unit tests (recommended)

Required files:

  • src/train.py - Training script
  • configs/config.yaml - Main config
  • pyproject.toml or pixi.toml - Package manager

Check manually:

# Verify structure
test -d src && test -d configs && echo "✓ Basic structure OK"
test -f src/train.py && echo "✓ Training script found"
test -f configs/config.yaml && echo "✓ Main config found"

2. Configuration Validation

YAML syntax:

# Validate all YAML files
python -c "
import yaml
from pathlib import Path

for yaml_file in Path('configs').rglob('*.yaml'):
    try:
        yaml.safe_load(yaml_file.read_text())
        print(f'✓ {yaml_file}')
    except yaml.YAMLError as e:
        print(f'❌ {yaml_file}: {e}')
"

Config composition:

# Test Hydra config loads correctly
python src/train.py --cfg job

target validation:

  • All _target_ paths must be importable
  • Check model, data, trainer, logger targets
  • Verify no typos in module paths

Use scripts/validate_project.py for automated checking.

3. Code Quality

Linting:

# Ruff checks
ruff check src/ tests/

# Auto-fix issues
ruff check --fix src/ tests/

Type checking:

# ty (type checker)
ty check src/

# mypy (alternative)
mypy src/ --ignore-missing-imports

Import validation:

# Check all files have valid Python syntax
import ast
from pathlib import Path

for py_file in Path("src").rglob("*.py"):
    try:
        ast.parse(py_file.read_text())
        print(f"✓ {py_file}")
    except SyntaxError as e:
        print(f"❌ {py_file}: {e}")

4. Dependencies

Required packages:

  • torch - PyTorch
  • pytorch_lightning - Lightning framework
  • hydra-core - Configuration management

Optional but recommended:

  • wandb - Experiment tracking
  • tensorboard - Visualization
  • torch_geometric - For GNNs
  • transformers - For NLP

Check installation:

python -c "
import torch
import pytorch_lightning
import hydra

print(f'PyTorch: {torch.__version__}')
print(f'Lightning: {pytorch_lightning.__version__}')
print(f'Hydra: {hydra.__version__}')
"

GPU availability:

python -c "
import torch

print(f'CUDA available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f'CUDA version: {torch.version.cuda}')
    print(f'GPU count: {torch.cuda.device_count()}')
    for i in range(torch.cuda.device_count()):
        print(f'GPU {i}: {torch.cuda.get_device_name(i)}')
"

5. Data Pipeline

DataModule instantiation:

from hydra import compose, initialize_config_dir
from hydra.utils import instantiate
from pathlib import Path

# Load config
config_dir = Path.cwd() / "configs"
with initialize_config_dir(version_base=None, config_dir=str(config_dir)):
    cfg = compose(config_name="config")

# Instantiate DataModule
dm = instantiate(cfg.data)
print(f"✓ DataModule: {type(dm).__name__}")

# Test setup
dm.setup("fit")
print("✓ DataModule.setup() successful")

# Check dataloaders
train_loader = dm.train_dataloader()
print(f"✓ Train batches: {len(train_loader)}")

Data directory:

# Verify data path exists
python -c "
from omegaconf import OmegaConf
from pathlib import Path

cfg = OmegaConf.load('configs/config.yaml')
data_dir = Path(cfg.data.data_dir)

if data_dir.exists():
    print(f'✓ Data directory: {data_dir}')
    print(f'  Files: {len(list(data_dir.rglob(\"*\")))}')
else:
    print(f'⚠️  Data directory not found: {data_dir}')
"

6. Model Validation

Model instantiation:

from hydra import compose, initialize_config_dir
from hydra.utils import instantiate
from pathlib import Path

# Load config
config_dir = Path.cwd() / "configs"
with initialize_config_dir(version_base=None, config_dir=str(config_dir)):
    cfg = compose(config_name="config")

# Instantiate model
model = instantiate(cfg.model)
print(f"✓ Model: {type(model).__name__}")

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"  Total params: {total_params:,}")
print(f"  Trainable: {trainable_params:,}")

Forward pass test:

import torch

# Create dummy input (adjust for your model)
batch_size = 2
dummy_input = torch.randn(batch_size, 3, 224, 224)

# Test forward pass
model.eval()
with torch.no_grad():
    output = model(dummy_input)

print(f"✓ Forward pass OK")
print(f"  Input: {dummy_input.shape}")
print(f"  Output: {output.shape}")

7. Training Readiness

Fast dev run:

# Run 1 batch of train/val/test
python src/train.py trainer.fast_dev_run=true

# Expected output:
# - No errors
# - Completes in <1 minute
# - Shows train/val/test progress

Logger check:

from hydra import compose, initialize_config_dir
from pathlib import Path
import os

config_dir = Path.cwd() / "configs"
with initialize_config_dir(version_base=None, config_dir=str(config_dir)):
    cfg = compose(config_name="config")

if "logger" in cfg:
    print(f"✓ Logger: {cfg.logger.get('_target_', 'unknown')}")

    # Check W&B credentials if using wandb
    if "wandb" in str(cfg.logger.get("_target_", "")):
        if "WANDB_API_KEY" in os.environ:
            print("✓ W&B API key set")
        else:
            print("⚠️  W&B not logged in (run: wandb login)")

Validation Script

Use the automated validation script:

python scripts/validate_project.py

What it checks:

  • ✓ Project structure (directories & files)
  • ✓ Config YAML syntax
  • ✓ Config composition
  • target paths are importable
  • ✓ Code quality (ruff)
  • ✓ Dependencies installed
  • ✓ GPU availability
  • ✓ Model instantiation
  • ✓ DataModule instantiation
  • ✓ Fast dev run

Example output:

INFO: Starting ML project validation...
INFO: ✓ Project structure valid
INFO: ✓ All configs valid
INFO: ✓ Code quality OK
INFO: ✓ All dependencies installed
INFO: ✓ Model instantiated successfully
INFO: ✓ DataModule instantiated successfully
INFO: ✓ Fast dev run completed
INFO: ✓ All validation checks passed!

See scripts/validate_project.py for implementation.

Quick Checks

One-line Validation

# Config only
python src/train.py --cfg job && echo "✓ Config OK"

# Full validation
python scripts/validate_project.py && echo "✓ All OK"

Pre-Training Checklist

# 1. Structure
test -d src -a -d configs -a -f src/train.py && echo "✓ Structure"

# 2. Config
python src/train.py --cfg job && echo "✓ Config"

# 3. Dependencies
python -c "import torch, pytorch_lightning, hydra" && echo "✓ Deps"

# 4. GPU
python -c "import torch; assert torch.cuda.is_available()" && echo "✓ GPU"

# 5. Fast dev run
python src/train.py trainer.fast_dev_run=true && echo "✓ Training"

CI/CD Integration

Add to .github/workflows/validate.yml:

name: Validate ML Project

on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install dependencies
        run: uv sync --all-extras

      - name: Validate project
        run: uv run python scripts/validate_project.py

      - name: Test config
        run: uv run python src/train.py --cfg job

      - name: Fast dev run
        run: uv run python src/train.py trainer.fast_dev_run=true

Common Issues

"Config composition failed"

Cause: Typo in defaults or invalid YAML.

Fix:

# Check YAML syntax
python -c "import yaml; yaml.safe_load(open('configs/config.yaml'))"

# Check defaults exist
ls configs/model/ configs/data/ configs/trainer/

"target not found"

Cause: Module path incorrect or not installed.

Fix:

# Check import works
python -c "from src.models.my_model import MyModel"

# Verify path in config matches file structure

"DataModule setup failed"

Cause: Data directory missing or incorrect path.

Fix:

# Check data path in config
grep data_dir configs/data/*.yaml

# Create data directory
mkdir -p data/

"Fast dev run failed"

Cause: Various issues in training loop.

Fix:

# Run with verbose logging
python src/train.py trainer.fast_dev_run=true --verbose

# Check logs for specific error

Success Criteria

  • Project structure valid
  • All YAML files valid
  • Config composes without errors
  • All target paths importable
  • Code passes linting
  • Required deps installed
  • GPU available (if needed)
  • Model instantiates
  • DataModule instantiates
  • Fast dev run succeeds
  • Logger configured

✅ Project is ready for training!

Source

git clone https://github.com/nishide-dev/claude-code-ml-research/blob/main/skills/ml-validate/SKILL.mdView on GitHub

Overview

ML-validate provides a comprehensive suite to audit ML projects. It checks project structure, configuration loading, data pipelines, model architecture, and dependencies to ensure training readiness and easier debugging.

How This Skill Works

It runs a targeted validation suite across five areas: project structure, configuration validation, code quality, dependencies, and the data pipeline. Each step verifies presence, correctness, and importability, then reports issues to fix before training.

When to Use It

  • Starting a new ML project to verify required folders, files, and tooling.
  • Before any training run to catch config, dependency, or environment problems.
  • While debugging Hydra configs or _target_ imports to resolve setup errors.
  • After adding or changing the data pipeline or model modules to ensure integration.
  • In CI or local environments to confirm dependencies and GPU availability.

Quick Start

  1. Step 1: Run full validation: python scripts/validate_project.py
  2. Step 2: Quick config check: python src/train.py --cfg job
  3. Step 3: Fast dev run: python src/train.py trainer.fast_dev_run=true

Best Practices

  • Run the full validation script (validate_project.py) first.
  • Validate all YAML configs and confirm Hydra loads with cfg.job.
  • Run linting (ruff) and type checks (ty, mypy) and fix syntax errors.
  • Verify dependencies and GPU availability in the target environment.
  • Test the DataModule setup and dataloaders with a quick fit/test.

Example Use Cases

  • Kickoff a new project by validating the basic structure and core scripts.
  • Before training, ensure configs, targets, and models are wired correctly.
  • CI failure due to YAML or import errors is quickly diagnosed with config validation.
  • Dependency version mismatches are surfaced by the installation checks.
  • DataModule instantiation confirms dataloaders and setup run without errors.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers