What Python version is required and are there dependencies?

Python 3.8+ with no external dependencies (standard library only).

Can it run on Windows, macOS, and Linux?

Yes—the script generation works across Linux, macOS, and Windows; actual submission is SLURM-only on supported clusters.

What outputs do I get when using --json?

Structured outputs: results.script, results.directives, results.derived, and results.warnings for easy integration into tools or CI.

slurm-job-script-generator

npx machina-cli add skill HeshamFS/materials-simulation-skills/slurm-job-script-generator --openclaw

Files (1)

SKILL.md

5.0 KB

SLURM Job Script Generator

Goal

Generate a correct, copy-pasteable SLURM job script (.sbatch) for running a simulation, and surface common configuration mistakes (bad walltime format, conflicting memory flags, oversubscription hints).

Requirements

Python 3.8+
No external dependencies (Python standard library only)
Works on Linux, macOS, and Windows (script generation only)

Inputs to Gather

Input	Description	Example
Job name	Short identifier for the job	`phasefield-strong-scaling`
Walltime	SLURM time limit	`00:30:00`
Partition	Cluster partition/queue (if required)	`compute`
Account	Project/account (if required)	`matsim`
Nodes	Number of nodes to allocate	`2`
MPI tasks	Total tasks, or tasks per node	`128` or `64` per node
Threads	CPUs per task (OpenMP threads)	`2`
Memory	`--mem` or `--mem-per-cpu` (cluster policy dependent)	`32G`
GPUs	GPUs per node (optional)	`4`
Working directory	Where the run should execute	`$SLURM_SUBMIT_DIR`
Modules	Environment modules to load (optional)	`gcc/12`, `openmpi/4.1`
Run command	The command to launch under SLURM	`./simulate --config cfg.json`

Decision Guidance

MPI vs MPI+OpenMP layout

Does the code use OpenMP / threading?
├── NO  → Use MPI-only: cpus-per-task=1
└── YES → Use hybrid: set cpus-per-task = threads per MPI rank
          and export OMP_NUM_THREADS = cpus-per-task

Rule of thumb: if you see diminishing strong-scaling efficiency at high MPI ranks, try fewer ranks with more threads per rank (and measure).

Memory flag selection

Use either --mem (per node) or --mem-per-cpu (per CPU), not both.
Follow your cluster’s documentation; some sites enforce one style.
SLURM --mem units are integer MB by default, or an integer with suffix K/M/G/T (and --mem=0 commonly means “all memory on node”).

Script Outputs (JSON Fields)

Script	Key Outputs
`scripts/slurm_script_generator.py`	`results.script`, `results.directives`, `results.derived`, `results.warnings`

Workflow

Gather cluster constraints (partition/account, GPU policy, memory policy).
Choose a process layout (MPI-only vs hybrid MPI+OpenMP).
Generate the script with slurm_script_generator.py.
Inspect warnings (conflicts, suspicious layouts).
Save the generated script as job.sbatch.
Submit with sbatch job.sbatch and monitor with squeue.

CLI Examples

# Preview a job script (prints to stdout)
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
  --job-name phasefield \
  --time 00:10:00 \
  --partition compute \
  --nodes 1 \
  --ntasks-per-node 8 \
  --cpus-per-task 2 \
  --mem 16G \
  --module gcc/12 \
  --module openmpi/4.1 \
  -- \
  ./simulate --config config.json

# Write to a file and also emit structured JSON
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
  --job-name phasefield \
  --time 00:10:00 \
  --nodes 1 \
  --ntasks 16 \
  --cpus-per-task 1 \
  --out job.sbatch \
  --json \
  -- \
  /bin/echo hello

Conversational Workflow Example

User: I need an sbatch script for my MPI simulation. I want 2 nodes, 64 ranks per node, 2 OpenMP threads per rank, and 2 hours.

Agent workflow:

Confirm partition/account and whether GPUs are needed.

Generate a hybrid job script:

python3 scripts/slurm_script_generator.py --job-name run --time 02:00:00 --nodes 2 --ntasks-per-node 64 --cpus-per-task 2 -- -- ./simulate

Explain the mapping:
- Total ranks = 128
- Threads per rank = 2 (OMP_NUM_THREADS=2)
If the user provides node core counts, sanity-check oversubscription using --cores-per-node.

Error Handling

Error	Cause	Resolution
`time must be HH:MM:SS or D-HH:MM:SS`	Bad walltime format	Use `00:30:00` or `1-00:00:00`
`nodes must be positive`	Non-positive nodes	Provide `--nodes >= 1`
`Provide either --mem or --mem-per-cpu, not both`	Conflicting memory directives	Choose one memory style
`Provide a run command after --`	Missing launch command	Add `-- ./simulate ...`

Limitations

Does not query cluster hardware or site policies; it can only validate internal consistency.
SLURM installations vary (GPU directives, QoS rules, partitions). Adjust directives for your site.

References

references/slurm_directives.md - Common #SBATCH directives and mapping tips

Version History

v1.0.0 (2026-02-25): Initial SLURM job script generator

Source

git clone https://github.com/HeshamFS/materials-simulation-skills/blob/main/skills/hpc-deployment/slurm-job-script-generator/SKILL.md

View on GitHub

Overview

SLURM Job Script Generator creates correct, copy-pasteable sbatch scripts for simulations and surfaces common configuration mistakes like bad walltime formats, memory flag conflicts, and potential oversubscription. It guides you through gathering inputs (job name, walltime, partition, account, nodes, tasks, cpus per task, memory, GPUs, modules, and run command), decides between MPI-only or MPI+OpenMP layouts, and standardizes #SBATCH directives.

How This Skill Works

It runs on Python 3.8+ with the standard library to collect required inputs, validate resource requests, and generate an sbatch script via the slurm_script_generator.py tool. It then outputs structured JSON fields (results.script, results.directives, results.derived, results.warnings) to support debugging, auditing, and easy integration.

When to Use It

Preparing a submission script for a new HPC simulation run.
Deciding between MPI-only vs hybrid MPI+OpenMP layouts based on code threading behavior.
Standardizing #SBATCH directives across multiple runs or projects.
Debugging sbatch/srun configurations to catch issues like walltime formats or memory flag conflicts.
Previewing or exporting a ready-to-run script in JSON/structure-friendly form for automation.

Quick Start

Step 1: Run the generator with your inputs (job name, time, nodes, ntasks, cpus per task, mem, etc.).
Step 2: Review directives and warnings; adjust resource requests or layout if needed.
Step 3: Save as job.sbatch and submit with sbatch job.sbatch (or use --json for automated pipelines).

Best Practices

Gather cluster constraints (partition, account, GPU policy, memory policy) before scripting.
Choose MPI-only or hybrid MPI+OpenMP early, and reflect the decision in cpus-per-task and OMP_NUM_THREADS.
Use either --mem or --mem-per-cpu, not both; follow your cluster policy to avoid submission errors.
Review generated warnings for conflicts, suspicious layouts, or inconsistent resource requests.
Save the final script as job.sbatch and test with sbatch in a safe or preview mode before full runs.

Example Use Cases

MPI-only run: 2 nodes, 128 total ranks, 1 CPU per task, 32G memory, walltime 00:30:00.
Hybrid MPI+OpenMP: 4 nodes, 64 ranks per node, 2 OpenMP threads per rank, 64G memory, walltime 01:00:00.
Memory flag conflict detected: user provides both --mem and --mem-per-cpu and the generator flags an error.
GPU-enabled run: 2 nodes, 4 GPUs per node, cpus-per-task 4, mem 32G, walltime 02:00:00.
Preview export: using --json to emit script and directives without writing a file, for CI workflow.

Frequently Asked Questions

Add this skill to your agents