Generating batch scripts with shell scripting


Suppose you are conducting a performance study using your code. You may be interested in the performance using many combinations of parameters, and perhaps also vary numbers of nodes and processes per node. For each run of your program, you'll need a slightly different submission script. It could be tedious job to create these scripts manually. On this page, we will demonstrate how to generate these scripts from a master script, using Bash.

Make sure you've read the tutorial for C programs first, to understand the basics of serial and parallel programming on tara.

Script generating example

In this example, we will launch an executable named "myexecutable", which takes three numerical parameters: param1, param2, and param3. We want to use the following levels: For each parameter combination, we would like to capture performance with the number of processes = 1, 2, 4, ..., 128 as follows: Each script should look something like this.
#SBATCH --job-name=expi20j3k3
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=parallel

srun /home/araim1/myexecutable --param1 20 --param2 3 --param3 3 

Notice that ppn, nodes, and npernode have been set so that we'll have 4 processes running on two nodes each. The parameters to the executable are filled in as well. Also the job has been given a descriptive name, based on our parameters settings.

To allow many jobs like this to coexist (and be able to run at the same time) we'll put each job in its own directory. When each job runs, it will produce stdout and stderr to files slurm.out and slurm.err in this directory. This structure will also make it easy to organize output files, should our executable produce any to the current working directory. To generate scripts like this, we can use the following.


# This function writes a qsub script. We can call it with different parameter 
# settings to create different experiments
function write_script

JOB_NAME=$(printf 'expi%02dj%dk%d' ${I} ${J} ${K})
DIR_NAME=$(printf '%s/nodes%03dnper%d' ${JOB_NAME} ${NODES} ${NPERNODE})

if [ -d $DIR_NAME ] ; then
    echo "$DIR_NAME already exists, skipping..."
    return 0
    echo "Creating job $DIR_NAME"

mkdir -p $DIR_NAME

cat << _EOF_ > ${DIR_NAME}/slurm-exp.bash
#SBATCH --job-name=${JOB_NAME}
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=${QUEUE}
#SBATCH --nodes=${NODES}
#SBATCH --ntasks-per-node=${NPERNODE}

srun ${EXECUTABLE} --param1 ${I} --param2 ${J} --param3 ${K} 


chmod 775 ${DIR_NAME}/slurm-exp.bash

echo "cd ${DIR_NAME}; sbatch slurm-exp.bash; cd \$BASEDIR" >> run_all_experiments.bash


# Create a script to submit all of our experiments to the scheduler
echo "#!/bin/bash" > run_all_experiments.bash
echo "BASEDIR=\$(dirname \$0)" >> run_all_experiments.bash
chmod 775 run_all_experiments.bash

# Loop through all the parameter combinations
# For each combination, we'll run the experiment on one node with 1, 2, 4, and 88 processors
# Then we'll run the experiment with 8 processors on 2, 4, 8, ..., 32 nodes

for I in 5 10 20
    for (( J=2; J<=4; J++ ))
        for K in 1 2 3
            for NPERNODE in 1 2 4 8

            for NODES in 2 4 8 16 32

Notice that the function write_script is responsible for generating a single script, and the loop at the bottom determines which parameter combinations will be used. Also notice the special "cat << _EOF_ >" syntax used in write_script, to write a text block to a file. Because we're writing a Bash script from within a Bash script, certain characters in the block need to be escaped. For example, to get a literal dollar sign ($) in the generated file, we need to use "\$" in this script. Otherwise, Bash resolves "$JOB_NAME" (for example) to its variable value. We also generate a script "run_all_experiments.bash", which will launch all of the jobs we've generated.

The result is that we get the following directory structure.

[araim1@tara-fe1 create-experiments]$ ls
create-exps.bash     expi05j2k3  expi05j4k1  expi10j2k2  expi10j3k3  expi20j2k1  expi20j3k2  expi20j4k3
example-script.bash  expi05j3k1  expi05j4k2  expi10j2k3  expi10j4k1  expi20j2k2  expi20j3k3  run_all_experiments.bash
expi05j2k1           expi05j3k2  expi05j4k3  expi10j3k1  expi10j4k2  expi20j2k3  expi20j4k1
expi05j2k2           expi05j3k3  expi10j2k1  expi10j3k2  expi10j4k3  expi20j3k1  expi20j4k2
[araim1@tara-fe1 create-experiments]$ ls expi05j3k2
nodes001nper1  nodes001nper4  nodes002nper8  nodes008nper8  nodes032nper8
nodes001nper2  nodes001nper8  nodes004nper8  nodes016nper8
[araim1@tara-fe1 create-experiments]$ ls expi05j3k2/nodes002nper8
[araim1@tara-fe1 create-experiments]$ cat slurm-exp.bash
#SBATCH --job-name=expi05j3k2
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=parallel

srun /home/araim1/myexecutable --param1 5 --param2 3 -param3 2 

[araim1@tara-fe1 create-experiments]$
To submit a single job, we can now run (as usual)
[araim1@tara-fe1 create-experiments]$ cd expi05j3k2/nodes002nper8
[araim1@tara-fe1 nodes002nper8]$ sbatch slurm-exp.bash
We can also submit all of our jobs at once. Make sure to test a few individual submission scripts before doing this!
[araim1@tara-fe1 create-experiments]$ ./run_all_experiments.bash