modal
Scannednpx machina-cli add skill samarth777/modal-skills/main --openclawModal Development Guide
Overview
Modal is a serverless cloud platform for running Python code with minimal configuration. It excels at:
- GPU-accelerated AI/ML inference (supports T4, L4, A10, A100, L40S, H100, H200, B200)
- Serverless web APIs and endpoints
- Scheduled jobs (cron)
- High-performance batch processing
- Sandboxed code execution
Key benefits:
- Pay only for resources used (billed per second)
- Containers spin up in seconds
- No infrastructure management required
- Built-in autoscaling from zero to thousands of containers
Quick Start
Installation & Setup
pip install modal
modal setup # Authenticate with Modal
Basic App Structure
import modal
# Create an App (groups Functions for deployment)
app = modal.App("my-app")
# Define a container image with dependencies
image = modal.Image.debian_slim(python_version="3.12").pip_install("numpy", "pandas")
# Create a serverless function
@app.function(image=image)
def process_data(x: int) -> int:
import numpy as np
return int(np.square(x))
# Local entrypoint for running the app
@app.local_entrypoint()
def main():
result = process_data.remote(5)
print(f"Result: {result}")
Running Your App
# Run ephemeral (for development/testing)
modal run my_app.py
# Deploy persistently
modal deploy my_app.py
# Serve with hot-reload (for web endpoints)
modal serve my_app.py
Core Concepts
Apps and Functions
An App groups related Functions for atomic deployment. A Function is an independent unit that scales up/down automatically.
import modal
app = modal.App("my-app")
@app.function()
def hello(name: str) -> str:
return f"Hello, {name}!"
@app.function()
def goodbye(name: str) -> str:
return f"Goodbye, {name}!"
Container Images
Define custom environments using method chaining:
image = (
modal.Image.debian_slim(python_version="3.12")
.apt_install("git", "ffmpeg") # System packages
.pip_install("torch", "transformers") # Python packages (pip)
.uv_pip_install("numpy", "pandas") # Python packages (uv - faster)
.env({"MY_VAR": "value"}) # Environment variables
.run_commands("echo 'setup complete'") # Shell commands
.add_local_python_source("my_module") # Local Python code
)
@app.function(image=image)
def my_function():
...
Using Existing Images
# From Docker registry
image = modal.Image.from_registry("pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime")
# From Dockerfile
image = modal.Image.from_dockerfile("./Dockerfile")
# From NVIDIA CUDA
image = modal.Image.from_registry("nvidia/cuda:12.8.1-devel-ubuntu24.04", add_python="3.12")
Handling Remote-Only Imports
When packages exist only in the container:
image = modal.Image.debian_slim().pip_install("pandas")
# Option 1: Import inside function body
@app.function(image=image)
def my_function():
import pandas as pd # Import inside function
return pd.DataFrame()
# Option 2: Use Image.imports() context manager
with image.imports():
import pandas as pd
import numpy as np
@app.function(image=image)
def my_function():
return pd.DataFrame()
GPU Acceleration
Requesting GPUs
# Single GPU
@app.function(gpu="A100")
def train_model():
import torch
assert torch.cuda.is_available()
...
# Multiple GPUs (same machine)
@app.function(gpu="H100:8")
def train_large_model():
...
# GPU fallbacks (tries in order)
@app.function(gpu=["H100", "A100-80GB", "A100-40GB"])
def flexible_inference():
...
Available GPU Types
| GPU | Memory | Best For |
|---|---|---|
T4 | 16 GB | Budget inference |
L4 | 24 GB | Inference |
A10 | 24 GB | Inference (up to 4x) |
A100-40GB | 40 GB | Training/Inference |
A100-80GB | 80 GB | Large models |
L40S | 48 GB | Best cost/performance |
H100 | 80 GB | High-performance training |
H200 | 141 GB | Large models |
B200 | 192 GB | Largest models |
GPU Image Setup
For libraries requiring CUDA toolkit:
image = (
modal.Image.from_registry("nvidia/cuda:12.8.1-devel-ubuntu24.04", add_python="3.12")
.entrypoint([])
.pip_install("torch", "transformers")
)
@app.function(gpu="A100", image=image)
def gpu_function():
...
Web Endpoints
FastAPI Endpoints
from fastapi import FastAPI
import modal
image = modal.Image.debian_slim().pip_install("fastapi[standard]")
app = modal.App("web-app", image=image)
@app.function()
@modal.fastapi_endpoint()
def hello(name: str = "World") -> dict:
return {"message": f"Hello, {name}!"}
# With custom configuration
@app.function()
@modal.fastapi_endpoint(method="POST", docs=True)
def process(data: dict) -> dict:
return {"processed": data}
Full ASGI/WSGI Apps
# ASGI (FastAPI, Starlette)
@app.function()
@modal.asgi_app()
def fastapi_app():
from fastapi import FastAPI
web_app = FastAPI()
@web_app.get("/")
def root():
return {"status": "ok"}
return web_app
# WSGI (Flask, Django)
@app.function()
@modal.wsgi_app()
def flask_app():
from flask import Flask
web_app = Flask(__name__)
@web_app.route("/")
def root():
return {"status": "ok"}
return web_app
Web Server (Custom Ports)
@app.function()
@modal.web_server(port=8080)
def custom_server():
import subprocess
subprocess.run(["python", "-m", "http.server", "8080"])
Custom Domains
@app.function()
@modal.fastapi_endpoint(custom_domains=["api.example.com"])
def my_api():
return {"message": "Hello from custom domain!"}
Persistent Storage
Volumes (High-Performance Storage)
# Create or reference a Volume
volume = modal.Volume.from_name("my-volume", create_if_missing=True)
@app.function(volumes={"/data": volume})
def save_data():
with open("/data/output.txt", "w") as f:
f.write("Hello, Volume!")
@app.function(volumes={"/data": volume})
def read_data():
with open("/data/output.txt", "r") as f:
return f.read()
Volume Operations
# Upload files
with volume.batch_upload() as batch:
batch.put_file("local.txt", "/remote.txt")
batch.put_directory("./local_dir", "/remote_dir")
# Read files
for chunk in volume.read_file("output.txt"):
print(chunk)
# List contents
for entry in volume.listdir("/"):
print(entry)
Cloud Bucket Mounts
bucket_mount = modal.CloudBucketMount(
bucket_name="my-s3-bucket",
secret=modal.Secret.from_name("aws-secret"),
)
@app.function(volumes={"/s3": bucket_mount})
def process_s3_data():
import os
files = os.listdir("/s3")
...
Secrets Management
Creating Secrets
# From Modal dashboard (recommended for production)
secret = modal.Secret.from_name("my-secret")
# From dictionary (inline)
secret = modal.Secret.from_dict({"API_KEY": "xxx", "DB_PASSWORD": "yyy"})
# From .env file
secret = modal.Secret.from_dotenv()
Using Secrets
@app.function(secrets=[modal.Secret.from_name("openai-secret")])
def call_openai():
import os
api_key = os.environ["OPENAI_API_KEY"]
...
Scheduling (Cron Jobs)
# Run every hour
@app.function(schedule=modal.Period(hours=1))
def hourly_task():
print("Running hourly task")
# Run daily at specific time
@app.function(schedule=modal.Cron("0 9 * * *")) # 9 AM UTC daily
def daily_report():
print("Generating daily report")
# Run every 5 minutes
@app.function(schedule=modal.Period(minutes=5))
def frequent_check():
print("Checking...")
Deploy with modal deploy to activate schedules.
Parallel Processing
Using .map() for Batch Processing
@app.function()
def process_item(item: int) -> int:
return item * 2
@app.local_entrypoint()
def main():
items = list(range(1000))
# Process all items in parallel
results = list(process_item.map(items))
print(f"Processed {len(results)} items")
Using .starmap() for Multiple Arguments
@app.function()
def add(x: int, y: int) -> int:
return x + y
@app.local_entrypoint()
def main():
pairs = [(1, 2), (3, 4), (5, 6)]
results = list(add.starmap(pairs))
Fire-and-Forget with .spawn()
@app.function()
def background_task(task_id: int):
# Long-running task
...
@app.local_entrypoint()
def main():
# Spawn without waiting for results
for i in range(100):
background_task.spawn(i)
print("All tasks spawned")
Sandboxes (Dynamic Code Execution)
Execute arbitrary code in isolated containers:
import modal
app = modal.App.lookup("sandbox-app", create_if_missing=True)
# Create a sandbox
sb = modal.Sandbox.create(
image=modal.Image.debian_slim().pip_install("numpy"),
app=app,
)
# Execute commands
p = sb.exec("python", "-c", "import numpy; print(numpy.__version__)")
print(p.stdout.read())
# Clean up
sb.terminate()
LLM Code Execution
@app.function(
restrict_modal_access=True, # Security: restrict Modal API access
single_use_containers=True, # Fresh container per request
timeout=30,
block_network=True, # No network access
)
def run_untrusted_code(code: str):
try:
exec_globals = {}
exec(code, exec_globals)
return exec_globals.get("result")
except Exception as e:
return f"Error: {e}"
Classes with Lifecycle
Use @app.cls() for stateful services with initialization:
@app.cls(gpu="A100", image=image)
class ModelService:
@modal.enter()
def load_model(self):
# Runs once when container starts
from transformers import pipeline
self.model = pipeline("text-generation", model="gpt2", device="cuda")
@modal.method()
def generate(self, prompt: str) -> str:
return self.model(prompt, max_length=100)[0]["generated_text"]
@modal.exit()
def cleanup(self):
# Runs when container shuts down
del self.model
# Usage
@app.local_entrypoint()
def main():
service = ModelService()
result = service.generate.remote("Hello, world!")
print(result)
Resource Configuration
CPU and Memory
@app.function(
cpu=4.0, # 4 CPU cores
memory=8192, # 8 GB RAM
)
def heavy_computation():
...
# With limits
@app.function(
cpu=(1.0, 4.0), # Request 1 core, limit 4
memory=(2048, 8192), # Request 2GB, limit 8GB
)
def flexible_function():
...
Timeouts and Retries
@app.function(
timeout=600, # 10 minute timeout
retries=modal.Retries(
max_retries=3,
initial_delay=1.0,
backoff_coefficient=2.0,
),
)
def reliable_function():
...
Container Configuration
@app.function(
concurrency_limit=10, # Max containers
allow_concurrent_inputs=5, # Inputs per container
container_idle_timeout=300, # Keep warm for 5 min
)
def optimized_function():
...
Common Patterns
AI Model Inference Service
import modal
app = modal.App("llm-service")
image = (
modal.Image.debian_slim(python_version="3.12")
.pip_install("vllm", "torch")
)
@app.cls(gpu="A100", image=image, container_idle_timeout=300)
class LLMService:
@modal.enter()
def load(self):
from vllm import LLM
self.llm = LLM(model="meta-llama/Llama-3.1-8B-Instruct")
@modal.method()
def generate(self, prompt: str) -> str:
from vllm import SamplingParams
params = SamplingParams(temperature=0.7, max_tokens=256)
outputs = self.llm.generate([prompt], params)
return outputs[0].outputs[0].text
@app.function()
@modal.fastapi_endpoint(method="POST")
def inference(body: dict) -> dict:
service = LLMService()
result = service.generate.remote(body["prompt"])
return {"response": result}
Data Processing Pipeline
import modal
app = modal.App("data-pipeline")
volume = modal.Volume.from_name("pipeline-data", create_if_missing=True)
image = modal.Image.debian_slim().pip_install("pandas", "pyarrow")
@app.function(image=image, volumes={"/data": volume})
def extract(source: str) -> str:
import pandas as pd
df = pd.read_csv(source)
output_path = f"/data/extracted_{source.split('/')[-1]}"
df.to_parquet(output_path)
return output_path
@app.function(image=image, volumes={"/data": volume})
def transform(input_path: str) -> str:
import pandas as pd
df = pd.read_parquet(input_path)
# Transform logic
df["processed"] = True
output_path = input_path.replace("extracted", "transformed")
df.to_parquet(output_path)
return output_path
@app.function(image=image, volumes={"/data": volume})
def load(input_path: str):
import pandas as pd
df = pd.read_parquet(input_path)
# Load to destination
print(f"Loaded {len(df)} rows")
@app.local_entrypoint()
def run_pipeline():
sources = ["data1.csv", "data2.csv", "data3.csv"]
# Extract in parallel
extracted = list(extract.map(sources))
# Transform in parallel
transformed = list(transform.map(extracted))
# Load in parallel
list(load.map(transformed))
Job Queue Pattern
import modal
app = modal.App("job-queue")
@app.function()
def process_job(data: dict) -> dict:
# Long-running processing
import time
time.sleep(10)
return {"status": "complete", "data": data}
# Submit job and get call ID
def submit_job(data: dict) -> str:
process_job_fn = modal.Function.from_name("job-queue", "process_job")
call = process_job_fn.spawn(data)
return call.object_id
# Poll for result
def get_result(call_id: str):
call = modal.FunctionCall.from_id(call_id)
try:
return call.get(timeout=0) # Non-blocking
except TimeoutError:
return {"status": "pending"}
Deployment Best Practices
Project Structure
my-modal-app/
├── app.py # Main Modal app
├── models/ # ML models
│ └── inference.py
├── utils/ # Shared utilities
│ └── helpers.py
├── requirements.txt
└── Dockerfile # Optional
Image Optimization
# Order layers by change frequency (least changing first)
image = (
modal.Image.debian_slim(python_version="3.12")
.apt_install("ffmpeg") # Rarely changes
.pip_install("torch==2.1.0") # Pin versions
.pip_install("transformers==4.35.0") # Pin versions
.add_local_python_source("utils") # Changes more often
)
CI/CD Integration
# .github/workflows/deploy.yml
name: Deploy to Modal
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
env:
MODAL_TOKEN_ID: ${{ secrets.MODAL_TOKEN_ID }}
MODAL_TOKEN_SECRET: ${{ secrets.MODAL_TOKEN_SECRET }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install modal
- run: modal deploy app.py
Debugging & Monitoring
View Logs
# Stream logs from deployed app
modal app logs my-app
# View specific function logs
modal container logs <container-id>
Interactive Debugging
# Add breakpoint for debugging
@app.function()
def debug_function():
import pdb; pdb.set_trace()
...
Force Image Rebuild
# Rebuild all images
MODAL_FORCE_BUILD=1 modal run app.py
# Ignore cache (doesn't break cache for others)
MODAL_IGNORE_CACHE=1 modal run app.py
Reference
CLI Commands
| Command | Description |
|---|---|
modal run app.py | Run ephemeral app |
modal deploy app.py | Deploy persistently |
modal serve app.py | Serve with hot-reload |
modal app list | List deployed apps |
modal app stop <name> | Stop deployed app |
modal secret list | List secrets |
modal volume list | List volumes |
Pricing
- CPU: ~$0.192/core/hour
- Memory: ~$0.024/GB/hour
- GPUs: Varies by type (see modal.com/pricing)
- Billed per second, only for resources used
Links
- Documentation: https://modal.com/docs
- Examples: https://github.com/modal-labs/modal-examples
- Pricing: https://modal.com/pricing
- Dashboard: https://modal.com/apps
Source
git clone https://github.com/samarth777/modal-skills/blob/main/skills/main/SKILL.mdView on GitHub Overview
Modal provides a serverless cloud platform to run Python code with minimal configuration. It excels at GPU-accelerated AI/ML inference, serverless web APIs, cron jobs, and high-performance batch processing, all without infrastructure management.
How This Skill Works
Create an App to group Functions, define a container image with dependencies, and deploy or run them on Modal’s serverless infra. Functions scale automatically and can request GPUs; code runs in sandboxed containers and can be exposed as web endpoints or scheduled jobs.
When to Use It
- Deploy Python code to Modal's infrastructure
- Run GPU-accelerated AI/ML inference
- Create serverless web APIs and endpoints
- Process data at scale with batch processing
- Build AI-driven applications
Quick Start
- Step 1: Install Modal and authenticate: pip install modal; modal setup
- Step 2: Create an App, define a container image, and add a function
- Step 3: Run, deploy, or serve your app: modal run/deploy/serve my_app.py
Best Practices
- Structure apps with Apps and Functions, keeping functions small and cohesive.
- Choose a dedicated Image per function and pin dependencies with pip_install/uv_pip_install.
- Request GPUs explicitly for heavy workloads using @app.function(gpu=...)
- Use modal run for development and modal deploy/serve for production.
- Leverage environment variables, run_commands, and imports() to manage setup and code locality.
Example Use Cases
- GPU-accelerated model inference in a function using @app.function(gpu='A100') with PyTorch.
- A serverless web API endpoint served via modal serve my_app.py.
- Scheduled cron jobs running data pipelines on a defined schedule.
- High-throughput batch processing of large datasets with autoscaling containers.
- Sandboxed code execution for user-submitted AI tasks without managing infrastructure.