What formats does ml-model-export support?

ONNX, TorchScript, and TensorRT export paths, plus publishing to registries like Hugging Face Hub and MLflow.

How do I validate that the ONNX export is correct?

Run a small dummy input through both PyTorch and the ONNX Runtime session and ensure outputs are close within a tolerance.

How can I optimize the exported ONNX model?

Use the provided optimize_onnx function, which runs a set of optimization passes and reports size reductions and validity.

ml-model-export

Scanned

npx machina-cli add skill nishide-dev/claude-code-ml-research/ml-model-export --openclaw

Files (1)

SKILL.md

10.5 KB

ML Model Export

Export trained PyTorch models to various formats for deployment and sharing.

Overview

Model export enables:

Cross-platform deployment (ONNX)
Production serving (TorchScript)
Optimized inference (TensorRT, OpenVINO)
Model sharing (Hugging Face Hub, MLflow)
Mobile deployment (TorchScript Mobile, ONNX)

Export Formats

1. ONNX (Open Neural Network Exchange)

Benefits:

Cross-framework compatibility (PyTorch → TensorFlow, etc.)
Hardware optimization (CPUs, GPUs, NPUs)
Industry standard for model interchange
Supported by ONNX Runtime, TensorRT, OpenVINO

Export to ONNX:

import torch
from pathlib import Path

def export_to_onnx(
    model: torch.nn.Module,
    output_path: Path,
    input_shape: tuple = (1, 3, 224, 224),
    opset_version: int = 17,
    dynamic_axes: dict = None,
):
    """Export PyTorch model to ONNX format."""
    model.eval()
    device = next(model.parameters()).device
    dummy_input = torch.randn(input_shape).to(device)

    # Default dynamic axes for batch size
    if dynamic_axes is None:
        dynamic_axes = {
            "input": {0: "batch_size"},
            "output": {0: "batch_size"},
        }

    torch.onnx.export(
        model,
        dummy_input,
        output_path,
        export_params=True,
        opset_version=opset_version,
        do_constant_folding=True,
        input_names=["input"],
        output_names=["output"],
        dynamic_axes=dynamic_axes,
    )

    print(f"✓ Model exported to ONNX: {output_path}")

Validate ONNX Export:

import onnx
import onnxruntime as ort
import numpy as np

def validate_onnx(onnx_path: Path, pytorch_model: torch.nn.Module):
    """Validate ONNX export matches PyTorch output."""
    # Load ONNX model
    onnx_model = onnx.load(str(onnx_path))
    onnx.checker.check_model(onnx_model)
    print("✓ ONNX model is valid")

    # Create inference session
    ort_session = ort.InferenceSession(str(onnx_path))

    # Compare outputs
    dummy_input = torch.randn(1, 3, 224, 224)

    # PyTorch output
    pytorch_model.eval()
    with torch.no_grad():
        pytorch_output = pytorch_model(dummy_input).numpy()

    # ONNX output
    onnx_input = {ort_session.get_inputs()[0].name: dummy_input.numpy()}
    onnx_output = ort_session.run(None, onnx_input)[0]

    # Compare
    np.testing.assert_allclose(pytorch_output, onnx_output, rtol=1e-3, atol=1e-5)
    print("✓ ONNX output matches PyTorch output")

Optimize ONNX:

import onnx
from onnx import optimizer as onnx_optimizer

def optimize_onnx(input_path: Path, output_path: Path):
    """Optimize ONNX model for inference."""
    model = onnx.load(str(input_path))

    # Apply optimizations
    passes = [
        "eliminate_deadend",
        "eliminate_identity",
        "eliminate_nop_dropout",
        "extract_constant_to_initializer",
        "eliminate_unused_initializer",
        "fuse_add_bias_into_conv",
        "fuse_bn_into_conv",
        "fuse_consecutive_concats",
        "fuse_matmul_add_bias_into_gemm",
        "fuse_pad_into_conv",
    ]

    optimized_model = onnx_optimizer.optimize(model, passes)
    onnx.save(optimized_model, str(output_path))

    # Report sizes
    original_size = input_path.stat().st_size / 1024**2
    optimized_size = output_path.stat().st_size / 1024**2
    reduction = (1 - optimized_size / original_size) * 100

    print(f"✓ Optimized ONNX saved: {output_path}")
    print(f"  Size reduction: {reduction:.1f}%")

2. TorchScript

Benefits:

Native PyTorch format
C++ deployment without Python
Mobile deployment (iOS, Android)
Optimized execution
No external dependencies

Export with Tracing:

def export_torchscript_trace(
    model: torch.nn.Module,
    output_path: Path,
    input_shape: tuple = (1, 3, 224, 224),
):
    """Export using torch.jit.trace (for fixed control flow)."""
    model.eval()
    device = next(model.parameters()).device
    example_input = torch.randn(input_shape).to(device)

    # Trace model
    traced_model = torch.jit.trace(model, example_input)

    # Optimize for inference
    traced_model = torch.jit.freeze(traced_model)

    # Save
    traced_model.save(str(output_path))
    print(f"✓ TorchScript (traced) saved: {output_path}")

Export with Scripting:

def export_torchscript_script(
    model: torch.nn.Module,
    output_path: Path,
):
    """Export using torch.jit.script (for dynamic control flow)."""
    model.eval()

    # Script model (analyzes Python code)
    scripted_model = torch.jit.script(model)

    # Optimize
    scripted_model = torch.jit.freeze(scripted_model)

    # Save
    scripted_model.save(str(output_path))
    print(f"✓ TorchScript (scripted) saved: {output_path}")

TorchScript Mobile:

from torch.utils.mobile_optimizer import optimize_for_mobile

def export_torchscript_mobile(
    model: torch.nn.Module,
    output_path: Path,
    input_shape: tuple = (1, 3, 224, 224),
):
    """Export for mobile deployment (iOS, Android)."""
    model.eval()
    example_input = torch.randn(input_shape)

    # Trace
    traced_model = torch.jit.trace(model, example_input)

    # Optimize for mobile
    optimized_model = optimize_for_mobile(traced_model)

    # Save
    optimized_model._save_for_lite_interpreter(str(output_path))
    print(f"✓ Mobile TorchScript saved: {output_path}")

3. TensorRT (NVIDIA GPUs)

Benefits:

Optimized inference on NVIDIA GPUs
Up to 10x speedup
Automatic kernel fusion
Supports FP32, FP16, INT8

Convert ONNX to TensorRT:

import tensorrt as trt

def convert_onnx_to_tensorrt(
    onnx_path: Path,
    engine_path: Path,
    precision: str = "fp16",  # fp32, fp16, int8
):
    """Convert ONNX to TensorRT engine."""
    logger = trt.Logger(trt.Logger.WARNING)
    builder = trt.Builder(logger)
    network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
    parser = trt.OnnxParser(network, logger)

    # Parse ONNX
    with open(onnx_path, "rb") as f:
        if not parser.parse(f.read()):
            raise RuntimeError("Failed to parse ONNX model")

    # Builder config
    config = builder.create_builder_config()
    config.max_workspace_size = 4 * 1024 * 1024 * 1024  # 4GB

    # Set precision
    if precision == "fp16":
        config.set_flag(trt.BuilderFlag.FP16)
    elif precision == "int8":
        config.set_flag(trt.BuilderFlag.INT8)

    # Build engine
    print("Building TensorRT engine...")
    engine = builder.build_engine(network, config)

    # Save
    with open(engine_path, "wb") as f:
        f.write(engine.serialize())

    print(f"✓ TensorRT engine saved: {engine_path}")

4. Hugging Face Hub Upload

Upload to Hub:

from huggingface_hub import HfApi, create_repo

def upload_to_huggingface_hub(
    model_path: Path,
    repo_name: str,
    token: str = None,
    model_card: str = None,
):
    """Upload model to Hugging Face Hub."""
    api = HfApi()

    # Create repo
    create_repo(repo_name, token=token, exist_ok=True)
    print(f"✓ Repository created: {repo_name}")

    # Upload model
    api.upload_file(
        path_or_fileobj=str(model_path),
        path_in_repo=model_path.name,
        repo_id=repo_name,
        token=token,
    )

    # Upload model card
    if model_card:
        api.upload_file(
            path_or_fileobj=model_card.encode(),
            path_in_repo="README.md",
            repo_id=repo_name,
            token=token,
        )

    print(f"✓ Uploaded to: https://huggingface.co/{repo_name}")

Generate Model Card:

def generate_model_card(
    model_name: str,
    task: str,
    metrics: dict,
    training_data: str,
) -> str:
    """Generate Hugging Face model card."""
    return f"""---
language: en
tags:
- pytorch
- pytorch-lightning
- {task}
license: mit
---

# {model_name}

## Model Description

This model was trained for {task}.

## Training Data

{training_data}

## Performance Metrics

{"".join(f"- {k}: {v}\\n" for k, v in metrics.items())}

## Usage

```python
import torch

model = torch.load("model.pt")
model.eval()
output = model(input_tensor)

5. MLflow Model Registry

Log Model to MLflow:

import mlflow
import mlflow.pytorch

def log_model_to_mlflow(
    model: torch.nn.Module,
    model_name: str,
    metrics: dict,
    artifacts: dict = None,
):
    """Log model and metrics to MLflow."""
    with mlflow.start_run():
        # Log metrics
        mlflow.log_metrics(metrics)

        # Log artifacts
        if artifacts:
            for name, path in artifacts.items():
                mlflow.log_artifact(path, name)

        # Log model
        mlflow.pytorch.log_model(
            model,
            "model",
            registered_model_name=model_name,
        )

        print(f"✓ Model logged to MLflow: {model_name}")
        print(f"  Run ID: {mlflow.active_run().info.run_id}")

Complete Export Script

Use the automated export script:

python scripts/export_model.py checkpoints/best.ckpt

See scripts/export_model.py for implementation.

Usage

# Export model from Lightning checkpoint
python scripts/export_model.py checkpoints/best.ckpt

# Export to specific formats
python scripts/export_model.py checkpoints/best.ckpt --formats onnx torchscript

# Upload to Hugging Face
python scripts/upload_to_hub.py \
    --model exported_models/model.onnx \
    --repo username/model-name \
    --token $HF_TOKEN

Deployment Examples

ONNX Runtime Inference:

import onnxruntime as ort
import numpy as np

# Load model
session = ort.InferenceSession("model.onnx")

# Inference
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
outputs = session.run(None, {"input": input_data})

TorchScript Inference:

import torch

# Load model
model = torch.jit.load("model.pt")

# Inference
input_tensor = torch.randn(1, 3, 224, 224)
output = model(input_tensor)

Success Criteria

Model exported to required formats
Exported models validated (output matches PyTorch)
Models optimized for inference
Documentation generated
Models uploaded to registry (if needed)
Deployment examples provided

✅ Your models are ready for production!

Source

git clone https://github.com/nishide-dev/claude-code-ml-research/blob/main/skills/ml-model-export/SKILL.mdView on GitHub

Overview

ML Model Export converts trained PyTorch models into portable formats such as ONNX, TorchScript, and TensorRT, and publishes them to registries like Hugging Face Hub or MLflow. It supports cross-platform deployment, production serving, and mobile readiness.

How This Skill Works

The workflow exports PyTorch models to ONNX using torch.onnx.export with configurable inputs and dynamic axes, then validates the export against PyTorch outputs using ONNX Runtime. It also provides utilities to optimize the ONNX graph and to export TorchScript for native PyTorch/C++/mobile deployment, with options to publish artifacts to model registries.

When to Use It

Deploy a trained PyTorch model to production inference backends using ONNX, TorchScript, or TensorRT.
Share trained weights by exporting formats and uploading to registries like Hugging Face Hub or MLflow.
Enable cross-framework interoperability by exporting PyTorch models to ONNX for use in other frameworks.
Prepare mobile or edge deployment with TorchScript Mobile or lightweight TorchScript variants.
Validate exports end-to-end to ensure ONNX or TorchScript results match PyTorch outputs before release.

Quick Start

Step 1: Export to ONNX with export_to_onnx(model, output_path, input_shape=(1, 3, 224, 224))
Step 2: Validate ONNX with validate_onnx(onnx_path, model)
Step 3: Optimize ONNX with optimize_onnx(input_path, optimized_path) and optionally export TorchScript or publish to a registry

Best Practices

Always call model.eval() before exporting to ensure consistent behavior.
Specify dynamic_axes for the batch dimension to support variable input sizes.
Validate ONNX export by comparing outputs with PyTorch using ONNX Runtime on a representative input.
Keep opset_version and optimization passes up-to-date with target runtimes (ONNX Runtime, TensorRT, OpenVINO).
Test end-to-end deployment in the target runtime (server, mobile, or edge) prior to production.

Example Use Cases

Export a PyTorch CNN to ONNX and run inference with ONNX Runtime in a server endpoint.
Convert a PyTorch Transformer to TorchScript for a C++ inference service.
Publish a trained model to Hugging Face Hub for community access and reuse.
Push to MLflow Model Registry for experiment tracking and deployment.
Create a TorchScript Mobile artifact for iOS/Android edge inference.

Frequently Asked Questions

Add this skill to your agents