ml-model-export
Scannednpx machina-cli add skill nishide-dev/claude-code-ml-research/ml-model-export --openclawML Model Export
Export trained PyTorch models to various formats for deployment and sharing.
Overview
Model export enables:
- Cross-platform deployment (ONNX)
- Production serving (TorchScript)
- Optimized inference (TensorRT, OpenVINO)
- Model sharing (Hugging Face Hub, MLflow)
- Mobile deployment (TorchScript Mobile, ONNX)
Export Formats
1. ONNX (Open Neural Network Exchange)
Benefits:
- Cross-framework compatibility (PyTorch → TensorFlow, etc.)
- Hardware optimization (CPUs, GPUs, NPUs)
- Industry standard for model interchange
- Supported by ONNX Runtime, TensorRT, OpenVINO
Export to ONNX:
import torch
from pathlib import Path
def export_to_onnx(
model: torch.nn.Module,
output_path: Path,
input_shape: tuple = (1, 3, 224, 224),
opset_version: int = 17,
dynamic_axes: dict = None,
):
"""Export PyTorch model to ONNX format."""
model.eval()
device = next(model.parameters()).device
dummy_input = torch.randn(input_shape).to(device)
# Default dynamic axes for batch size
if dynamic_axes is None:
dynamic_axes = {
"input": {0: "batch_size"},
"output": {0: "batch_size"},
}
torch.onnx.export(
model,
dummy_input,
output_path,
export_params=True,
opset_version=opset_version,
do_constant_folding=True,
input_names=["input"],
output_names=["output"],
dynamic_axes=dynamic_axes,
)
print(f"✓ Model exported to ONNX: {output_path}")
Validate ONNX Export:
import onnx
import onnxruntime as ort
import numpy as np
def validate_onnx(onnx_path: Path, pytorch_model: torch.nn.Module):
"""Validate ONNX export matches PyTorch output."""
# Load ONNX model
onnx_model = onnx.load(str(onnx_path))
onnx.checker.check_model(onnx_model)
print("✓ ONNX model is valid")
# Create inference session
ort_session = ort.InferenceSession(str(onnx_path))
# Compare outputs
dummy_input = torch.randn(1, 3, 224, 224)
# PyTorch output
pytorch_model.eval()
with torch.no_grad():
pytorch_output = pytorch_model(dummy_input).numpy()
# ONNX output
onnx_input = {ort_session.get_inputs()[0].name: dummy_input.numpy()}
onnx_output = ort_session.run(None, onnx_input)[0]
# Compare
np.testing.assert_allclose(pytorch_output, onnx_output, rtol=1e-3, atol=1e-5)
print("✓ ONNX output matches PyTorch output")
Optimize ONNX:
import onnx
from onnx import optimizer as onnx_optimizer
def optimize_onnx(input_path: Path, output_path: Path):
"""Optimize ONNX model for inference."""
model = onnx.load(str(input_path))
# Apply optimizations
passes = [
"eliminate_deadend",
"eliminate_identity",
"eliminate_nop_dropout",
"extract_constant_to_initializer",
"eliminate_unused_initializer",
"fuse_add_bias_into_conv",
"fuse_bn_into_conv",
"fuse_consecutive_concats",
"fuse_matmul_add_bias_into_gemm",
"fuse_pad_into_conv",
]
optimized_model = onnx_optimizer.optimize(model, passes)
onnx.save(optimized_model, str(output_path))
# Report sizes
original_size = input_path.stat().st_size / 1024**2
optimized_size = output_path.stat().st_size / 1024**2
reduction = (1 - optimized_size / original_size) * 100
print(f"✓ Optimized ONNX saved: {output_path}")
print(f" Size reduction: {reduction:.1f}%")
2. TorchScript
Benefits:
- Native PyTorch format
- C++ deployment without Python
- Mobile deployment (iOS, Android)
- Optimized execution
- No external dependencies
Export with Tracing:
def export_torchscript_trace(
model: torch.nn.Module,
output_path: Path,
input_shape: tuple = (1, 3, 224, 224),
):
"""Export using torch.jit.trace (for fixed control flow)."""
model.eval()
device = next(model.parameters()).device
example_input = torch.randn(input_shape).to(device)
# Trace model
traced_model = torch.jit.trace(model, example_input)
# Optimize for inference
traced_model = torch.jit.freeze(traced_model)
# Save
traced_model.save(str(output_path))
print(f"✓ TorchScript (traced) saved: {output_path}")
Export with Scripting:
def export_torchscript_script(
model: torch.nn.Module,
output_path: Path,
):
"""Export using torch.jit.script (for dynamic control flow)."""
model.eval()
# Script model (analyzes Python code)
scripted_model = torch.jit.script(model)
# Optimize
scripted_model = torch.jit.freeze(scripted_model)
# Save
scripted_model.save(str(output_path))
print(f"✓ TorchScript (scripted) saved: {output_path}")
TorchScript Mobile:
from torch.utils.mobile_optimizer import optimize_for_mobile
def export_torchscript_mobile(
model: torch.nn.Module,
output_path: Path,
input_shape: tuple = (1, 3, 224, 224),
):
"""Export for mobile deployment (iOS, Android)."""
model.eval()
example_input = torch.randn(input_shape)
# Trace
traced_model = torch.jit.trace(model, example_input)
# Optimize for mobile
optimized_model = optimize_for_mobile(traced_model)
# Save
optimized_model._save_for_lite_interpreter(str(output_path))
print(f"✓ Mobile TorchScript saved: {output_path}")
3. TensorRT (NVIDIA GPUs)
Benefits:
- Optimized inference on NVIDIA GPUs
- Up to 10x speedup
- Automatic kernel fusion
- Supports FP32, FP16, INT8
Convert ONNX to TensorRT:
import tensorrt as trt
def convert_onnx_to_tensorrt(
onnx_path: Path,
engine_path: Path,
precision: str = "fp16", # fp32, fp16, int8
):
"""Convert ONNX to TensorRT engine."""
logger = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, logger)
# Parse ONNX
with open(onnx_path, "rb") as f:
if not parser.parse(f.read()):
raise RuntimeError("Failed to parse ONNX model")
# Builder config
config = builder.create_builder_config()
config.max_workspace_size = 4 * 1024 * 1024 * 1024 # 4GB
# Set precision
if precision == "fp16":
config.set_flag(trt.BuilderFlag.FP16)
elif precision == "int8":
config.set_flag(trt.BuilderFlag.INT8)
# Build engine
print("Building TensorRT engine...")
engine = builder.build_engine(network, config)
# Save
with open(engine_path, "wb") as f:
f.write(engine.serialize())
print(f"✓ TensorRT engine saved: {engine_path}")
4. Hugging Face Hub Upload
Upload to Hub:
from huggingface_hub import HfApi, create_repo
def upload_to_huggingface_hub(
model_path: Path,
repo_name: str,
token: str = None,
model_card: str = None,
):
"""Upload model to Hugging Face Hub."""
api = HfApi()
# Create repo
create_repo(repo_name, token=token, exist_ok=True)
print(f"✓ Repository created: {repo_name}")
# Upload model
api.upload_file(
path_or_fileobj=str(model_path),
path_in_repo=model_path.name,
repo_id=repo_name,
token=token,
)
# Upload model card
if model_card:
api.upload_file(
path_or_fileobj=model_card.encode(),
path_in_repo="README.md",
repo_id=repo_name,
token=token,
)
print(f"✓ Uploaded to: https://huggingface.co/{repo_name}")
Generate Model Card:
def generate_model_card(
model_name: str,
task: str,
metrics: dict,
training_data: str,
) -> str:
"""Generate Hugging Face model card."""
return f"""---
language: en
tags:
- pytorch
- pytorch-lightning
- {task}
license: mit
---
# {model_name}
## Model Description
This model was trained for {task}.
## Training Data
{training_data}
## Performance Metrics
{"".join(f"- {k}: {v}\\n" for k, v in metrics.items())}
## Usage
```python
import torch
model = torch.load("model.pt")
model.eval()
output = model(input_tensor)
5. MLflow Model Registry
Log Model to MLflow:
import mlflow
import mlflow.pytorch
def log_model_to_mlflow(
model: torch.nn.Module,
model_name: str,
metrics: dict,
artifacts: dict = None,
):
"""Log model and metrics to MLflow."""
with mlflow.start_run():
# Log metrics
mlflow.log_metrics(metrics)
# Log artifacts
if artifacts:
for name, path in artifacts.items():
mlflow.log_artifact(path, name)
# Log model
mlflow.pytorch.log_model(
model,
"model",
registered_model_name=model_name,
)
print(f"✓ Model logged to MLflow: {model_name}")
print(f" Run ID: {mlflow.active_run().info.run_id}")
Complete Export Script
Use the automated export script:
python scripts/export_model.py checkpoints/best.ckpt
See scripts/export_model.py for implementation.
Usage
# Export model from Lightning checkpoint
python scripts/export_model.py checkpoints/best.ckpt
# Export to specific formats
python scripts/export_model.py checkpoints/best.ckpt --formats onnx torchscript
# Upload to Hugging Face
python scripts/upload_to_hub.py \
--model exported_models/model.onnx \
--repo username/model-name \
--token $HF_TOKEN
Deployment Examples
ONNX Runtime Inference:
import onnxruntime as ort
import numpy as np
# Load model
session = ort.InferenceSession("model.onnx")
# Inference
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
outputs = session.run(None, {"input": input_data})
TorchScript Inference:
import torch
# Load model
model = torch.jit.load("model.pt")
# Inference
input_tensor = torch.randn(1, 3, 224, 224)
output = model(input_tensor)
Success Criteria
- Model exported to required formats
- Exported models validated (output matches PyTorch)
- Models optimized for inference
- Documentation generated
- Models uploaded to registry (if needed)
- Deployment examples provided
✅ Your models are ready for production!
Source
git clone https://github.com/nishide-dev/claude-code-ml-research/blob/main/skills/ml-model-export/SKILL.mdView on GitHub Overview
ML Model Export converts trained PyTorch models into portable formats such as ONNX, TorchScript, and TensorRT, and publishes them to registries like Hugging Face Hub or MLflow. It supports cross-platform deployment, production serving, and mobile readiness.
How This Skill Works
The workflow exports PyTorch models to ONNX using torch.onnx.export with configurable inputs and dynamic axes, then validates the export against PyTorch outputs using ONNX Runtime. It also provides utilities to optimize the ONNX graph and to export TorchScript for native PyTorch/C++/mobile deployment, with options to publish artifacts to model registries.
When to Use It
- Deploy a trained PyTorch model to production inference backends using ONNX, TorchScript, or TensorRT.
- Share trained weights by exporting formats and uploading to registries like Hugging Face Hub or MLflow.
- Enable cross-framework interoperability by exporting PyTorch models to ONNX for use in other frameworks.
- Prepare mobile or edge deployment with TorchScript Mobile or lightweight TorchScript variants.
- Validate exports end-to-end to ensure ONNX or TorchScript results match PyTorch outputs before release.
Quick Start
- Step 1: Export to ONNX with export_to_onnx(model, output_path, input_shape=(1, 3, 224, 224))
- Step 2: Validate ONNX with validate_onnx(onnx_path, model)
- Step 3: Optimize ONNX with optimize_onnx(input_path, optimized_path) and optionally export TorchScript or publish to a registry
Best Practices
- Always call model.eval() before exporting to ensure consistent behavior.
- Specify dynamic_axes for the batch dimension to support variable input sizes.
- Validate ONNX export by comparing outputs with PyTorch using ONNX Runtime on a representative input.
- Keep opset_version and optimization passes up-to-date with target runtimes (ONNX Runtime, TensorRT, OpenVINO).
- Test end-to-end deployment in the target runtime (server, mobile, or edge) prior to production.
Example Use Cases
- Export a PyTorch CNN to ONNX and run inference with ONNX Runtime in a server endpoint.
- Convert a PyTorch Transformer to TorchScript for a C++ inference service.
- Publish a trained model to Hugging Face Hub for community access and reuse.
- Push to MLflow Model Registry for experiment tracking and deployment.
- Create a TorchScript Mobile artifact for iOS/Android edge inference.