Get the FREE Ultimate OpenClaw Setup Guide →

modal

Scanned
npx machina-cli add skill samarth777/modal-skills/main --openclaw
Files (1)
SKILL.md
16.7 KB

Modal Development Guide

Overview

Modal is a serverless cloud platform for running Python code with minimal configuration. It excels at:

  • GPU-accelerated AI/ML inference (supports T4, L4, A10, A100, L40S, H100, H200, B200)
  • Serverless web APIs and endpoints
  • Scheduled jobs (cron)
  • High-performance batch processing
  • Sandboxed code execution

Key benefits:

  • Pay only for resources used (billed per second)
  • Containers spin up in seconds
  • No infrastructure management required
  • Built-in autoscaling from zero to thousands of containers

Quick Start

Installation & Setup

pip install modal
modal setup  # Authenticate with Modal

Basic App Structure

import modal

# Create an App (groups Functions for deployment)
app = modal.App("my-app")

# Define a container image with dependencies
image = modal.Image.debian_slim(python_version="3.12").pip_install("numpy", "pandas")

# Create a serverless function
@app.function(image=image)
def process_data(x: int) -> int:
    import numpy as np
    return int(np.square(x))

# Local entrypoint for running the app
@app.local_entrypoint()
def main():
    result = process_data.remote(5)
    print(f"Result: {result}")

Running Your App

# Run ephemeral (for development/testing)
modal run my_app.py

# Deploy persistently
modal deploy my_app.py

# Serve with hot-reload (for web endpoints)
modal serve my_app.py

Core Concepts

Apps and Functions

An App groups related Functions for atomic deployment. A Function is an independent unit that scales up/down automatically.

import modal

app = modal.App("my-app")

@app.function()
def hello(name: str) -> str:
    return f"Hello, {name}!"

@app.function()
def goodbye(name: str) -> str:
    return f"Goodbye, {name}!"

Container Images

Define custom environments using method chaining:

image = (
    modal.Image.debian_slim(python_version="3.12")
    .apt_install("git", "ffmpeg")              # System packages
    .pip_install("torch", "transformers")       # Python packages (pip)
    .uv_pip_install("numpy", "pandas")          # Python packages (uv - faster)
    .env({"MY_VAR": "value"})                   # Environment variables
    .run_commands("echo 'setup complete'")      # Shell commands
    .add_local_python_source("my_module")       # Local Python code
)

@app.function(image=image)
def my_function():
    ...

Using Existing Images

# From Docker registry
image = modal.Image.from_registry("pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime")

# From Dockerfile
image = modal.Image.from_dockerfile("./Dockerfile")

# From NVIDIA CUDA
image = modal.Image.from_registry("nvidia/cuda:12.8.1-devel-ubuntu24.04", add_python="3.12")

Handling Remote-Only Imports

When packages exist only in the container:

image = modal.Image.debian_slim().pip_install("pandas")

# Option 1: Import inside function body
@app.function(image=image)
def my_function():
    import pandas as pd  # Import inside function
    return pd.DataFrame()

# Option 2: Use Image.imports() context manager
with image.imports():
    import pandas as pd
    import numpy as np

@app.function(image=image)
def my_function():
    return pd.DataFrame()

GPU Acceleration

Requesting GPUs

# Single GPU
@app.function(gpu="A100")
def train_model():
    import torch
    assert torch.cuda.is_available()
    ...

# Multiple GPUs (same machine)
@app.function(gpu="H100:8")
def train_large_model():
    ...

# GPU fallbacks (tries in order)
@app.function(gpu=["H100", "A100-80GB", "A100-40GB"])
def flexible_inference():
    ...

Available GPU Types

GPUMemoryBest For
T416 GBBudget inference
L424 GBInference
A1024 GBInference (up to 4x)
A100-40GB40 GBTraining/Inference
A100-80GB80 GBLarge models
L40S48 GBBest cost/performance
H10080 GBHigh-performance training
H200141 GBLarge models
B200192 GBLargest models

GPU Image Setup

For libraries requiring CUDA toolkit:

image = (
    modal.Image.from_registry("nvidia/cuda:12.8.1-devel-ubuntu24.04", add_python="3.12")
    .entrypoint([])
    .pip_install("torch", "transformers")
)

@app.function(gpu="A100", image=image)
def gpu_function():
    ...

Web Endpoints

FastAPI Endpoints

from fastapi import FastAPI
import modal

image = modal.Image.debian_slim().pip_install("fastapi[standard]")
app = modal.App("web-app", image=image)

@app.function()
@modal.fastapi_endpoint()
def hello(name: str = "World") -> dict:
    return {"message": f"Hello, {name}!"}

# With custom configuration
@app.function()
@modal.fastapi_endpoint(method="POST", docs=True)
def process(data: dict) -> dict:
    return {"processed": data}

Full ASGI/WSGI Apps

# ASGI (FastAPI, Starlette)
@app.function()
@modal.asgi_app()
def fastapi_app():
    from fastapi import FastAPI
    web_app = FastAPI()
    
    @web_app.get("/")
    def root():
        return {"status": "ok"}
    
    return web_app

# WSGI (Flask, Django)
@app.function()
@modal.wsgi_app()
def flask_app():
    from flask import Flask
    web_app = Flask(__name__)
    
    @web_app.route("/")
    def root():
        return {"status": "ok"}
    
    return web_app

Web Server (Custom Ports)

@app.function()
@modal.web_server(port=8080)
def custom_server():
    import subprocess
    subprocess.run(["python", "-m", "http.server", "8080"])

Custom Domains

@app.function()
@modal.fastapi_endpoint(custom_domains=["api.example.com"])
def my_api():
    return {"message": "Hello from custom domain!"}

Persistent Storage

Volumes (High-Performance Storage)

# Create or reference a Volume
volume = modal.Volume.from_name("my-volume", create_if_missing=True)

@app.function(volumes={"/data": volume})
def save_data():
    with open("/data/output.txt", "w") as f:
        f.write("Hello, Volume!")

@app.function(volumes={"/data": volume})
def read_data():
    with open("/data/output.txt", "r") as f:
        return f.read()

Volume Operations

# Upload files
with volume.batch_upload() as batch:
    batch.put_file("local.txt", "/remote.txt")
    batch.put_directory("./local_dir", "/remote_dir")

# Read files
for chunk in volume.read_file("output.txt"):
    print(chunk)

# List contents
for entry in volume.listdir("/"):
    print(entry)

Cloud Bucket Mounts

bucket_mount = modal.CloudBucketMount(
    bucket_name="my-s3-bucket",
    secret=modal.Secret.from_name("aws-secret"),
)

@app.function(volumes={"/s3": bucket_mount})
def process_s3_data():
    import os
    files = os.listdir("/s3")
    ...

Secrets Management

Creating Secrets

# From Modal dashboard (recommended for production)
secret = modal.Secret.from_name("my-secret")

# From dictionary (inline)
secret = modal.Secret.from_dict({"API_KEY": "xxx", "DB_PASSWORD": "yyy"})

# From .env file
secret = modal.Secret.from_dotenv()

Using Secrets

@app.function(secrets=[modal.Secret.from_name("openai-secret")])
def call_openai():
    import os
    api_key = os.environ["OPENAI_API_KEY"]
    ...

Scheduling (Cron Jobs)

# Run every hour
@app.function(schedule=modal.Period(hours=1))
def hourly_task():
    print("Running hourly task")

# Run daily at specific time
@app.function(schedule=modal.Cron("0 9 * * *"))  # 9 AM UTC daily
def daily_report():
    print("Generating daily report")

# Run every 5 minutes
@app.function(schedule=modal.Period(minutes=5))
def frequent_check():
    print("Checking...")

Deploy with modal deploy to activate schedules.


Parallel Processing

Using .map() for Batch Processing

@app.function()
def process_item(item: int) -> int:
    return item * 2

@app.local_entrypoint()
def main():
    items = list(range(1000))
    
    # Process all items in parallel
    results = list(process_item.map(items))
    print(f"Processed {len(results)} items")

Using .starmap() for Multiple Arguments

@app.function()
def add(x: int, y: int) -> int:
    return x + y

@app.local_entrypoint()
def main():
    pairs = [(1, 2), (3, 4), (5, 6)]
    results = list(add.starmap(pairs))

Fire-and-Forget with .spawn()

@app.function()
def background_task(task_id: int):
    # Long-running task
    ...

@app.local_entrypoint()
def main():
    # Spawn without waiting for results
    for i in range(100):
        background_task.spawn(i)
    print("All tasks spawned")

Sandboxes (Dynamic Code Execution)

Execute arbitrary code in isolated containers:

import modal

app = modal.App.lookup("sandbox-app", create_if_missing=True)

# Create a sandbox
sb = modal.Sandbox.create(
    image=modal.Image.debian_slim().pip_install("numpy"),
    app=app,
)

# Execute commands
p = sb.exec("python", "-c", "import numpy; print(numpy.__version__)")
print(p.stdout.read())

# Clean up
sb.terminate()

LLM Code Execution

@app.function(
    restrict_modal_access=True,  # Security: restrict Modal API access
    single_use_containers=True,   # Fresh container per request
    timeout=30,
    block_network=True,           # No network access
)
def run_untrusted_code(code: str):
    try:
        exec_globals = {}
        exec(code, exec_globals)
        return exec_globals.get("result")
    except Exception as e:
        return f"Error: {e}"

Classes with Lifecycle

Use @app.cls() for stateful services with initialization:

@app.cls(gpu="A100", image=image)
class ModelService:
    @modal.enter()
    def load_model(self):
        # Runs once when container starts
        from transformers import pipeline
        self.model = pipeline("text-generation", model="gpt2", device="cuda")
    
    @modal.method()
    def generate(self, prompt: str) -> str:
        return self.model(prompt, max_length=100)[0]["generated_text"]
    
    @modal.exit()
    def cleanup(self):
        # Runs when container shuts down
        del self.model

# Usage
@app.local_entrypoint()
def main():
    service = ModelService()
    result = service.generate.remote("Hello, world!")
    print(result)

Resource Configuration

CPU and Memory

@app.function(
    cpu=4.0,           # 4 CPU cores
    memory=8192,       # 8 GB RAM
)
def heavy_computation():
    ...

# With limits
@app.function(
    cpu=(1.0, 4.0),           # Request 1 core, limit 4
    memory=(2048, 8192),      # Request 2GB, limit 8GB
)
def flexible_function():
    ...

Timeouts and Retries

@app.function(
    timeout=600,              # 10 minute timeout
    retries=modal.Retries(
        max_retries=3,
        initial_delay=1.0,
        backoff_coefficient=2.0,
    ),
)
def reliable_function():
    ...

Container Configuration

@app.function(
    concurrency_limit=10,         # Max containers
    allow_concurrent_inputs=5,    # Inputs per container
    container_idle_timeout=300,   # Keep warm for 5 min
)
def optimized_function():
    ...

Common Patterns

AI Model Inference Service

import modal

app = modal.App("llm-service")

image = (
    modal.Image.debian_slim(python_version="3.12")
    .pip_install("vllm", "torch")
)

@app.cls(gpu="A100", image=image, container_idle_timeout=300)
class LLMService:
    @modal.enter()
    def load(self):
        from vllm import LLM
        self.llm = LLM(model="meta-llama/Llama-3.1-8B-Instruct")
    
    @modal.method()
    def generate(self, prompt: str) -> str:
        from vllm import SamplingParams
        params = SamplingParams(temperature=0.7, max_tokens=256)
        outputs = self.llm.generate([prompt], params)
        return outputs[0].outputs[0].text

@app.function()
@modal.fastapi_endpoint(method="POST")
def inference(body: dict) -> dict:
    service = LLMService()
    result = service.generate.remote(body["prompt"])
    return {"response": result}

Data Processing Pipeline

import modal

app = modal.App("data-pipeline")
volume = modal.Volume.from_name("pipeline-data", create_if_missing=True)

image = modal.Image.debian_slim().pip_install("pandas", "pyarrow")

@app.function(image=image, volumes={"/data": volume})
def extract(source: str) -> str:
    import pandas as pd
    df = pd.read_csv(source)
    output_path = f"/data/extracted_{source.split('/')[-1]}"
    df.to_parquet(output_path)
    return output_path

@app.function(image=image, volumes={"/data": volume})
def transform(input_path: str) -> str:
    import pandas as pd
    df = pd.read_parquet(input_path)
    # Transform logic
    df["processed"] = True
    output_path = input_path.replace("extracted", "transformed")
    df.to_parquet(output_path)
    return output_path

@app.function(image=image, volumes={"/data": volume})
def load(input_path: str):
    import pandas as pd
    df = pd.read_parquet(input_path)
    # Load to destination
    print(f"Loaded {len(df)} rows")

@app.local_entrypoint()
def run_pipeline():
    sources = ["data1.csv", "data2.csv", "data3.csv"]
    
    # Extract in parallel
    extracted = list(extract.map(sources))
    
    # Transform in parallel
    transformed = list(transform.map(extracted))
    
    # Load in parallel
    list(load.map(transformed))

Job Queue Pattern

import modal

app = modal.App("job-queue")

@app.function()
def process_job(data: dict) -> dict:
    # Long-running processing
    import time
    time.sleep(10)
    return {"status": "complete", "data": data}

# Submit job and get call ID
def submit_job(data: dict) -> str:
    process_job_fn = modal.Function.from_name("job-queue", "process_job")
    call = process_job_fn.spawn(data)
    return call.object_id

# Poll for result
def get_result(call_id: str):
    call = modal.FunctionCall.from_id(call_id)
    try:
        return call.get(timeout=0)  # Non-blocking
    except TimeoutError:
        return {"status": "pending"}

Deployment Best Practices

Project Structure

my-modal-app/
├── app.py           # Main Modal app
├── models/          # ML models
│   └── inference.py
├── utils/           # Shared utilities
│   └── helpers.py
├── requirements.txt
└── Dockerfile       # Optional

Image Optimization

# Order layers by change frequency (least changing first)
image = (
    modal.Image.debian_slim(python_version="3.12")
    .apt_install("ffmpeg")                    # Rarely changes
    .pip_install("torch==2.1.0")              # Pin versions
    .pip_install("transformers==4.35.0")      # Pin versions
    .add_local_python_source("utils")         # Changes more often
)

CI/CD Integration

# .github/workflows/deploy.yml
name: Deploy to Modal
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    env:
      MODAL_TOKEN_ID: ${{ secrets.MODAL_TOKEN_ID }}
      MODAL_TOKEN_SECRET: ${{ secrets.MODAL_TOKEN_SECRET }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install modal
      - run: modal deploy app.py

Debugging & Monitoring

View Logs

# Stream logs from deployed app
modal app logs my-app

# View specific function logs
modal container logs <container-id>

Interactive Debugging

# Add breakpoint for debugging
@app.function()
def debug_function():
    import pdb; pdb.set_trace()
    ...

Force Image Rebuild

# Rebuild all images
MODAL_FORCE_BUILD=1 modal run app.py

# Ignore cache (doesn't break cache for others)
MODAL_IGNORE_CACHE=1 modal run app.py

Reference

CLI Commands

CommandDescription
modal run app.pyRun ephemeral app
modal deploy app.pyDeploy persistently
modal serve app.pyServe with hot-reload
modal app listList deployed apps
modal app stop <name>Stop deployed app
modal secret listList secrets
modal volume listList volumes

Pricing

  • CPU: ~$0.192/core/hour
  • Memory: ~$0.024/GB/hour
  • GPUs: Varies by type (see modal.com/pricing)
  • Billed per second, only for resources used

Links

Source

git clone https://github.com/samarth777/modal-skills/blob/main/skills/main/SKILL.mdView on GitHub

Overview

Modal provides a serverless cloud platform to run Python code with minimal configuration. It excels at GPU-accelerated AI/ML inference, serverless web APIs, cron jobs, and high-performance batch processing, all without infrastructure management.

How This Skill Works

Create an App to group Functions, define a container image with dependencies, and deploy or run them on Modal’s serverless infra. Functions scale automatically and can request GPUs; code runs in sandboxed containers and can be exposed as web endpoints or scheduled jobs.

When to Use It

  • Deploy Python code to Modal's infrastructure
  • Run GPU-accelerated AI/ML inference
  • Create serverless web APIs and endpoints
  • Process data at scale with batch processing
  • Build AI-driven applications

Quick Start

  1. Step 1: Install Modal and authenticate: pip install modal; modal setup
  2. Step 2: Create an App, define a container image, and add a function
  3. Step 3: Run, deploy, or serve your app: modal run/deploy/serve my_app.py

Best Practices

  • Structure apps with Apps and Functions, keeping functions small and cohesive.
  • Choose a dedicated Image per function and pin dependencies with pip_install/uv_pip_install.
  • Request GPUs explicitly for heavy workloads using @app.function(gpu=...)
  • Use modal run for development and modal deploy/serve for production.
  • Leverage environment variables, run_commands, and imports() to manage setup and code locality.

Example Use Cases

  • GPU-accelerated model inference in a function using @app.function(gpu='A100') with PyTorch.
  • A serverless web API endpoint served via modal serve my_app.py.
  • Scheduled cron jobs running data pipelines on a defined schedule.
  • High-throughput batch processing of large datasets with autoscaling containers.
  • Sandboxed code execution for user-submitted AI tasks without managing infrastructure.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers