SLURM Job Script Generator

Goal

Generate a correct, copy-pasteable SLURM job script (.sbatch) for running a simulation, and surface common configuration mistakes (bad walltime format, conflicting memory flags, oversubscription hints).

Requirements

Python 3.8+
No external dependencies (Python standard library only)
Works on Linux, macOS, and Windows (script generation only)

Inputs to Gather

Input	Description	Example
Job name	Short identifier for the job	`phasefield-strong-scaling`
Walltime	SLURM time limit	`00:30:00`
Partition	Cluster partition/queue (if required)	`compute`
Account	Project/account (if required)	`matsim`
Nodes	Number of nodes to allocate	`2`
MPI tasks	Total tasks, or tasks per node	`128` or `64` per node
Threads	CPUs per task (OpenMP threads)	`2`
Memory	`--mem` or `--mem-per-cpu` (cluster policy dependent)	`32G`
GPUs	GPUs per node (optional)	`4`
Working directory	Where the run should execute	`$SLURM_SUBMIT_DIR`
Modules	Environment modules to load (optional)	`gcc/12`, `openmpi/4.1`
Run command	The command to launch under SLURM	`./simulate --config cfg.json`

Decision Guidance

MPI vs MPI+OpenMP layout

Does the code use OpenMP / threading?
├── NO  → Use MPI-only: cpus-per-task=1
└── YES → Use hybrid: set cpus-per-task = threads per MPI rank
          and export OMP_NUM_THREADS = cpus-per-task

Rule of thumb: if you see diminishing strong-scaling efficiency at high MPI ranks, try fewer ranks with more threads per rank (and measure).

Memory flag selection

Use either --mem (per node) or --mem-per-cpu (per CPU), not both.
Follow your cluster’s documentation; some sites enforce one style.
SLURM --mem units are integer MB by default, or an integer with suffix K/M/G/T (and --mem=0 commonly means “all memory on node”).

Script Outputs (JSON Fields)

Script	Key Outputs
`scripts/slurm_script_generator.py`	`results.script`, `results.directives`, `results.derived`, `results.warnings`

Workflow

Gather cluster constraints (partition/account, GPU policy, memory policy).
Choose a process layout (MPI-only vs hybrid MPI+OpenMP).
Generate the script with slurm_script_generator.py.
Inspect warnings (conflicts, suspicious layouts).
Save the generated script as job.sbatch.
Submit with sbatch job.sbatch and monitor with squeue.

CLI Examples

# Preview a job script (prints to stdout)
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
  --job-name phasefield \
  --time 00:10:00 \
  --partition compute \
  --nodes 1 \
  --ntasks-per-node 8 \
  --cpus-per-task 2 \
  --mem 16G \
  --module gcc/12 \
  --module openmpi/4.1 \
  -- \
  ./simulate --config config.json

# Write to a file and also emit structured JSON
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
  --job-name phasefield \
  --time 00:10:00 \
  --nodes 1 \
  --ntasks 16 \
  --cpus-per-task 1 \
  --out job.sbatch \
  --json \
  -- \
  /bin/echo hello

Conversational Workflow Example

User: I need an sbatch script for my MPI simulation. I want 2 nodes, 64 ranks per node, 2 OpenMP threads per rank, and 2 hours.

Agent workflow:

Confirm partition/account and whether GPUs are needed.

Generate a hybrid job script:

python3 scripts/slurm_script_generator.py --job-name run --time 02:00:00 --nodes 2 --ntasks-per-node 64 --cpus-per-task 2 -- -- ./simulate

Explain the mapping:
- Total ranks = 128
- Threads per rank = 2 (OMP_NUM_THREADS=2)
If the user provides node core counts, sanity-check oversubscription using --cores-per-node.

Error Handling

Error	Cause	Resolution
`time must be HH:MM:SS or D-HH:MM:SS`	Bad walltime format	Use `00:30:00` or `1-00:00:00`
`nodes must be positive`	Non-positive nodes	Provide `--nodes >= 1`
`Provide either --mem or --mem-per-cpu, not both`	Conflicting memory directives	Choose one memory style
`Provide a run command after --`	Missing launch command	Add `-- ./simulate ...`

Security

Input Validation

--time is validated against strict HH:MM:SS or D-HH:MM:SS format via regex
--nodes, --ntasks, --ntasks-per-node, --cpus-per-task, --gpus are validated as positive integers with upper bounds
--mem and --mem-per-cpu are validated against SLURM's accepted format (<int>[K|M|G|T]); providing both simultaneously is rejected
--job-name is validated against [a-zA-Z0-9_.-]+ (no shell metacharacters)
--partition and --account are validated against safe-character allowlists
--module values are validated to prevent shell injection (no ;, |, &, backticks, or $)

File Access

The script reads no external files; all inputs are provided via CLI arguments
--out writes the generated sbatch script to a single specified file path
The generated script is a plain-text shell script with #SBATCH directives; it contains no dynamically generated code

Tool Restrictions

Read: Used to inspect script source, references, and existing job scripts
Bash: Used to execute slurm_script_generator.py with explicit argument lists; the generated script itself is NOT executed by the agent
Write: Used to save the generated .sbatch file; writes are scoped to the user's working directory
Grep/Glob: Used to locate existing scripts, configs, and cluster documentation

Safety Measures

No eval(), exec(), or dynamic code generation
All subprocess calls use explicit argument lists (no shell=True)
The run command (after --) is included verbatim in the generated script but is never executed by the skill itself
Module names are sanitized to prevent injection into module load directives
Generated scripts use set -euo pipefail for safe shell execution on the cluster

Limitations

Does not query cluster hardware or site policies; it can only validate internal consistency.
SLURM installations vary (GPU directives, QoS rules, partitions). Adjust directives for your site.

References

references/slurm_directives.md - Common #SBATCH directives and mapping tips

Version History

v1.0.0 (2026-02-25): Initial SLURM job script generator

slurm-job-script-generator