UMBC High Performance Computing Facility
Generating batch scripts with shell scripting
Introduction
Suppose you are conducting a performance study using your code. You may be
interested in the performance using many combinations of parameters, and
perhaps also vary numbers of nodes and processes per node. For each run
of your program, you'll need a slightly different submission script. It
could be tedious job to create these scripts manually. On this page, we
will demonstrate how to generate these scripts from a master script,
using Bash.
Make sure you've read the
tutorial for C programs
first, to understand the basics of serial and parallel programming on tara.
Script generating example
In this example, we will launch an executable named "myexecutable", which takes
three numerical parameters: param1, param2, and param3. We want to use the
following levels:
param1 = 5, 10, 20
param2 = 2, 3, 4
param3 = 1, 2, 3
For each parameter combination, we would like to capture performance with the
number of processes = 1, 2, 4, ..., 128 as follows:
One node with 1, 2, 4, 8 cores
Nodes = 2, 4, 8, 16, 32, with 8 cores
Each script should look something like this.
#!/bin/bash
#SBATCH --job-name=expi20j3k3
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=parallel
srun /home/araim1/myexecutable --param1 20 --param2 3 --param3 3
Download:
../code-2010/create-experiments/example-script.bash
Notice that ppn, nodes, and npernode have been set so that we'll have 4
processes running on two nodes each. The parameters to the executable
are filled in as well. Also the job has been given a descriptive
name, based on our parameters settings.
To allow many jobs like this to coexist (and be able to run at the same time)
we'll put each job in its own directory. When each job runs, it will produce
stdout and stderr to files slurm.out and slurm.err in this directory.
This structure will also make it easy to organize output
files, should our executable produce any to the current working directory.
To generate scripts like this, we can use the following.
#!/bin/bash
# This function writes a qsub script. We can call it with different parameter
# settings to create different experiments
function write_script
{
JOB_NAME=$(printf 'expi%02dj%dk%d' ${I} ${J} ${K})
DIR_NAME=$(printf '%s/nodes%03dnper%d' ${JOB_NAME} ${NODES} ${NPERNODE})
if [ -d $DIR_NAME ] ; then
echo "$DIR_NAME already exists, skipping..."
return 0
else
echo "Creating job $DIR_NAME"
fi
mkdir -p $DIR_NAME
cat << _EOF_ > ${DIR_NAME}/slurm-exp.bash
#!/bin/bash
#SBATCH --job-name=${JOB_NAME}
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=${QUEUE}
#SBATCH --nodes=${NODES}
#SBATCH --ntasks-per-node=${NPERNODE}
srun ${EXECUTABLE} --param1 ${I} --param2 ${J} --param3 ${K}
_EOF_
chmod 775 ${DIR_NAME}/slurm-exp.bash
echo "cd ${DIR_NAME}; sbatch slurm-exp.bash; cd \$BASEDIR" >> run_all_experiments.bash
}
# Create a script to submit all of our experiments to the scheduler
echo "#!/bin/bash" > run_all_experiments.bash
echo "BASEDIR=\$(dirname \$0)" >> run_all_experiments.bash
chmod 775 run_all_experiments.bash
# Loop through all the parameter combinations
# For each combination, we'll run the experiment on one node with 1, 2, 4, and 88 processors
# Then we'll run the experiment with 8 processors on 2, 4, 8, ..., 32 nodes
QUEUE=parallel
EXECUTABLE=/home/araim1/myexecutable
for I in 5 10 20
do
for (( J=2; J<=4; J++ ))
do
for K in 1 2 3
do
NODES=1
for NPERNODE in 1 2 4 8
do
write_script
done
NPERNODE=8
for NODES in 2 4 8 16 32
do
write_script
done
done
done
done
Download:
../code-2010/create-experiments/create-exps.bash
Notice that the function write_script is responsible for generating a single
script, and the loop at the bottom determines which parameter combinations will
be used. Also notice the special "cat << _EOF_ >" syntax used
in write_script, to write a text block to a file. Because we're writing a Bash
script from within a Bash script, certain characters in the block need to be
escaped. For example, to get a literal dollar sign ($) in the generated file,
we need to use "\$" in this script. Otherwise, Bash resolves "$JOB_NAME"
(for example) to its variable value. We also generate a script
"run_all_experiments.bash", which will launch all of the jobs we've generated.
The result is that we get the following directory structure.
[araim1@tara-fe1 create-experiments]$ ls
create-exps.bash expi05j2k3 expi05j4k1 expi10j2k2 expi10j3k3 expi20j2k1 expi20j3k2 expi20j4k3
example-script.bash expi05j3k1 expi05j4k2 expi10j2k3 expi10j4k1 expi20j2k2 expi20j3k3 run_all_experiments.bash
expi05j2k1 expi05j3k2 expi05j4k3 expi10j3k1 expi10j4k2 expi20j2k3 expi20j4k1
expi05j2k2 expi05j3k3 expi10j2k1 expi10j3k2 expi10j4k3 expi20j3k1 expi20j4k2
[araim1@tara-fe1 create-experiments]$ ls expi05j3k2
nodes001nper1 nodes001nper4 nodes002nper8 nodes008nper8 nodes032nper8
nodes001nper2 nodes001nper8 nodes004nper8 nodes016nper8
[araim1@tara-fe1 create-experiments]$ ls expi05j3k2/nodes002nper8
slurm-exp.bash
[araim1@tara-fe1 create-experiments]$ cat slurm-exp.bash
#!/bin/bash
#SBATCH --job-name=expi05j3k2
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=parallel
srun /home/araim1/myexecutable --param1 5 --param2 3 -param3 2
[araim1@tara-fe1 create-experiments]$
To submit a single job, we can now run (as usual)
[araim1@tara-fe1 create-experiments]$ cd expi05j3k2/nodes002nper8
[araim1@tara-fe1 nodes002nper8]$ sbatch slurm-exp.bash
We can also submit all of our jobs at once. Make sure to test a few individual
submission scripts before doing this!
[araim1@tara-fe1 create-experiments]$ ./run_all_experiments.bash