A secure code execution service for LLM-generated Python code using Modal sandboxes with strict security controls.

How are security and isolation enforced?

Sandboxed images, block_network options, timeouts, and memory limits prevent unsafe or excessive resource usage.

Can I run an interactive session or install packages?

Yes. Use the persistent REPLSession to execute code, install packages, and inspect files inside the sandbox.

Code Exec

Flagged

{"isSafe":false,"isSuspicious":true,"riskLevel":"medium","findings":[{"category":"data_exfiltration","severity":"high","description":"Arbitrary user code could print sensitive data (environment variables, host files) and this stdout is returned to the caller, enabling potential data leakage.","evidence":"execute_code() returns stdout/stderr; REPLSession.read_file() exposes read access to arbitrary paths; OPENAI_API_KEY is read from the environment in llm_code_agent and could be printed by user code."},{"category":"prompt_injection","severity":"medium","description":"The LLM-based code generation relies on a system prompt; crafted user inputs could attempt to jailbreak or influence the LLM to emit code that bypasses sandbox constraints or reveals secrets.","evidence":"system prompt constrains the LLM to generate only Python code and print results; generated_code is then executed inside the sandbox."},{"category":"system_harm","severity":"medium","description":"Persistent sandbox and in-session package installation could be abused to consume resources or install malicious packages if access controls are insufficient.","evidence":"REPLSession maintains a persistent sandbox with idle_timeout and offers install_package using pip install inside the sandbox."},{"category":"data_exfiltration","severity":"medium","description":"The REPL APIs allow reading and writing arbitrary files within the sandbox, which could be used to exfiltrate sensitive data if misused.","evidence":"list_files/read_file/write_file methods expose arbitrary path access inside the sandbox."}],"summary":"The code provides a capable sandbox for executing user code with network control, but poses data-exfiltration and prompt-injection risks (stdout leakage, environment-variable exposure, and LLM-driven code generation). Mitigations include restricting environment-variable access, avoiding returning generated_code, hardening path restrictions for read/write, auditing sandbox isolation, and enforcing strict input/output filtering."}

npx machina-cli add skill samarth777/modal-skills/code-exec --openclaw

Files (1)

SKILL.md

8.8 KB

Code Execution Sandbox Example

A complete example of a secure code execution service for LLM-generated code.

import modal
from typing import Optional

app = modal.App("code-executor")

# --- Sandboxed Execution Image ---
sandbox_image = (
    modal.Image.debian_slim(python_version="3.12")
    .pip_install(
        "numpy",
        "pandas",
        "matplotlib",
        "scipy",
        "sympy",
        "requests",  # for network-enabled sandbox
    )
)

# --- API Image ---
api_image = modal.Image.debian_slim().pip_install("fastapi[standard]")

# --- Secure Code Executor ---
@app.function(
    image=api_image,
    timeout=120,
)
def execute_code(
    code: str,
    timeout: int = 30,
    allow_network: bool = False,
    memory_mb: int = 512,
) -> dict:
    """Execute arbitrary Python code in a secure sandbox."""
    
    # Get or create app reference
    sandbox_app = modal.App.lookup("code-executor", create_if_missing=True)
    
    # Create sandbox with security constraints
    sb = modal.Sandbox.create(
        image=sandbox_image,
        timeout=timeout,
        memory=memory_mb,
        block_network=not allow_network,
        app=sandbox_app,
    )
    
    try:
        # Write code to file
        with sb.open("/tmp/user_code.py", "w") as f:
            f.write(code)
        
        # Execute code
        p = sb.exec("python", "/tmp/user_code.py", timeout=timeout)
        p.wait()
        
        stdout = p.stdout.read()
        stderr = p.stderr.read()
        
        return {
            "success": p.returncode == 0,
            "stdout": stdout,
            "stderr": stderr,
            "returncode": p.returncode,
        }
    except TimeoutError:
        return {
            "success": False,
            "stdout": "",
            "stderr": "Execution timed out",
            "returncode": -1,
        }
    except Exception as e:
        return {
            "success": False,
            "stdout": "",
            "stderr": str(e),
            "returncode": -1,
        }
    finally:
        sb.terminate()

# --- Interactive REPL Session ---
@app.cls(image=api_image, timeout=3600)
class REPLSession:
    """Maintain a persistent Python REPL session."""
    
    def __init__(self, session_id: str):
        self.session_id = session_id
        self.sb = None
    
    @modal.enter()
    def create_sandbox(self):
        app = modal.App.lookup("code-executor", create_if_missing=True)
        
        # Create persistent sandbox
        self.sb = modal.Sandbox.create(
            image=sandbox_image,
            timeout=3600,  # 1 hour max
            idle_timeout=300,  # 5 min idle timeout
            block_network=True,
            app=app,
        )
        
        # Initialize Python REPL
        self.sb.exec("python", "-c", "import sys; sys.ps1 = '>>> '").wait()
    
    @modal.method()
    def execute(self, code: str) -> dict:
        """Execute code in the persistent session."""
        # Write code to temp file
        with self.sb.open("/tmp/code.py", "w") as f:
            f.write(code)
        
        # Execute
        p = self.sb.exec("python", "/tmp/code.py", timeout=30)
        p.wait()
        
        return {
            "stdout": p.stdout.read(),
            "stderr": p.stderr.read(),
            "success": p.returncode == 0,
        }
    
    @modal.method()
    def install_package(self, package: str) -> dict:
        """Install a pip package in the session."""
        p = self.sb.exec("pip", "install", package, timeout=120)
        p.wait()
        
        return {
            "success": p.returncode == 0,
            "output": p.stdout.read() + p.stderr.read(),
        }
    
    @modal.method()
    def list_files(self, path: str = "/tmp") -> list[str]:
        """List files in the sandbox."""
        return list(self.sb.ls(path))
    
    @modal.method()
    def read_file(self, path: str) -> str:
        """Read a file from the sandbox."""
        with self.sb.open(path, "r") as f:
            return f.read()
    
    @modal.method()
    def write_file(self, path: str, content: str) -> bool:
        """Write a file to the sandbox."""
        with self.sb.open(path, "w") as f:
            f.write(content)
        return True
    
    @modal.exit()
    def cleanup(self):
        if self.sb:
            self.sb.terminate()

# --- Web API ---
@app.function(image=api_image)
@modal.asgi_app()
def api():
    from fastapi import FastAPI, HTTPException
    from pydantic import BaseModel
    
    web_app = FastAPI(title="Code Execution API")
    
    class ExecuteRequest(BaseModel):
        code: str
        timeout: int = 30
        allow_network: bool = False
    
    class ExecuteResponse(BaseModel):
        success: bool
        stdout: str
        stderr: str
        returncode: int
    
    @web_app.post("/execute", response_model=ExecuteResponse)
    async def execute_endpoint(request: ExecuteRequest):
        result = execute_code.remote(
            code=request.code,
            timeout=request.timeout,
            allow_network=request.allow_network,
        )
        return result
    
    class REPLRequest(BaseModel):
        session_id: str
        code: str
    
    @web_app.post("/repl/execute")
    async def repl_execute(request: REPLRequest):
        session = REPLSession(request.session_id)
        result = session.execute.remote(request.code)
        return result
    
    @web_app.post("/repl/install")
    async def repl_install(session_id: str, package: str):
        session = REPLSession(session_id)
        result = session.install_package.remote(package)
        return result
    
    return web_app

# --- Batch Code Execution ---
@app.function(image=api_image, timeout=3600)
def execute_batch(code_snippets: list[dict]) -> list[dict]:
    """Execute multiple code snippets in parallel."""
    
    def run_one(snippet: dict) -> dict:
        return execute_code.remote(
            code=snippet["code"],
            timeout=snippet.get("timeout", 30),
            allow_network=snippet.get("allow_network", False),
        )
    
    # Run in parallel using map
    results = list(execute_code.map(
        [s["code"] for s in code_snippets],
        [s.get("timeout", 30) for s in code_snippets],
        [s.get("allow_network", False) for s in code_snippets],
    ))
    
    return results

# --- LLM Integration Example ---
@app.function(
    image=api_image,
    secrets=[modal.Secret.from_name("openai-secret")],
)
def llm_code_agent(task: str) -> dict:
    """Use an LLM to generate and execute code."""
    import os
    from openai import OpenAI
    
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    
    # Generate code
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {
                "role": "system",
                "content": """You are a Python code generator. 
                Generate only Python code that solves the given task.
                The code should print its result to stdout.
                Do not include any explanation, only code."""
            },
            {"role": "user", "content": task}
        ],
    )
    
    generated_code = response.choices[0].message.content
    
    # Clean up code (remove markdown if present)
    if "```python" in generated_code:
        generated_code = generated_code.split("```python")[1].split("```")[0]
    elif "```" in generated_code:
        generated_code = generated_code.split("```")[1].split("```")[0]
    
    # Execute code
    result = execute_code.remote(
        code=generated_code,
        timeout=60,
        allow_network=False,
    )
    
    return {
        "task": task,
        "generated_code": generated_code,
        "execution_result": result,
    }

# --- CLI ---
@app.local_entrypoint()
def main(code: str = "print('Hello from sandbox!')"):
    result = execute_code.remote(code)
    
    print("=== Execution Result ===")
    print(f"Success: {result['success']}")
    print(f"Return code: {result['returncode']}")
    print(f"\n--- stdout ---\n{result['stdout']}")
    if result['stderr']:
        print(f"\n--- stderr ---\n{result['stderr']}")

Usage

# Simple execution
modal run code_executor.py --code "print(sum(range(100)))"

# Deploy API
modal deploy code_executor.py

# Execute via API
curl -X POST https://your-workspace--code-executor-api.modal.run/execute \
  -H "Content-Type: application/json" \
  -d '{"code": "import numpy as np; print(np.random.rand(5))", "timeout": 30}'

# LLM agent
modal run code_executor.py::llm_code_agent --task "Calculate the first 20 Fibonacci numbers"

Security Considerations

Network isolation: block_network=True prevents outbound connections
Timeout limits: Prevent infinite loops
Memory limits: Prevent memory exhaustion
Fresh containers: Each execution gets a clean environment
No Modal access: Sandboxes can't access other Modal resources by default

Source

git clone https://github.com/samarth777/modal-skills/blob/main/skills/code-exec/SKILL.mdView on GitHub

Overview

Code Exec provides a secure, sandboxed environment to run Python code generated by LLMs. It demonstrates separate Sandbox and API images, strict security controls (like block_network), and timeouts to prevent abuse. It also includes a persistent REPL session for iterative experimentation.

How This Skill Works

It builds a sandboxed execution image and an API image, then runs user code by writing it to a file inside the sandbox and invoking Python. It collects stdout, stderr, and the return code, applying resource limits and automatic sandbox termination. A persistent REPLSession supports ongoing coding, package installation, and file operations within a secured sandbox.

When to Use It

Safely execute user-submitted Python code in AI-assisted tools
Prototype and test code snippets in a controlled, isolated sandbox
Run data-processing tasks with libraries like numpy or pandas under strict limits
Offer an interactive REPL for iterative experimentation during conversations
Test network-enabled sandboxes with explicit allow_network flags when needed

Quick Start

Step 1: Deploy the code-executor app and build the sandbox and API images
Step 2: Call execute_code with your Python code and desired constraints (timeout, memory, network)
Step 3: Optionally create a REPLSession to run code, install packages, and manage files

Best Practices

Use a purpose-built sandbox image and predictable library sets
Always enforce timeout and memory limits on execute_code
Double-check network access using block_network and allow_network flags
Return structured results by capturing stdout, stderr, and returncode
Leverage the REPL session for long-running exploration while keeping isolation

Example Use Cases

AI coding assistants that evaluate and debug code snippets
Educators offering safe environments for student submissions
Data-science prototyping with numpy, pandas, and matplotlib in a sandbox
Secure testbeds for API calls or network-enabled sandbox scenarios
Interactive tutoring where learners experiment in a persistent REPL

Frequently Asked Questions

Add this skill to your agents