![]() |
Make sure you've read the tutorial for C programs first, to understand the basics of serial and parallel programming on tara.
#!/bin/bash #SBATCH --job-name=expi20j3k3 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --output=slurm.out #SBATCH --error=slurm.err #SBATCH --partition=parallel srun /home/araim1/myexecutable --param1 20 --param2 3 --param3 3
To allow many jobs like this to coexist (and be able to run at the same time) we'll put each job in its own directory. When each job runs, it will produce stdout and stderr to files slurm.out and slurm.err in this directory. This structure will also make it easy to organize output files, should our executable produce any to the current working directory. To generate scripts like this, we can use the following.
#!/bin/bash # This function writes a qsub script. We can call it with different parameter # settings to create different experiments function write_script { JOB_NAME=$(printf 'expi%02dj%dk%d' ${I} ${J} ${K}) DIR_NAME=$(printf '%s/nodes%03dnper%d' ${JOB_NAME} ${NODES} ${NPERNODE}) if [ -d $DIR_NAME ] ; then echo "$DIR_NAME already exists, skipping..." return 0 else echo "Creating job $DIR_NAME" fi mkdir -p $DIR_NAME cat << _EOF_ > ${DIR_NAME}/slurm-exp.bash #!/bin/bash #SBATCH --job-name=${JOB_NAME} #SBATCH --output=slurm.out #SBATCH --error=slurm.err #SBATCH --partition=${QUEUE} #SBATCH --nodes=${NODES} #SBATCH --ntasks-per-node=${NPERNODE} srun ${EXECUTABLE} --param1 ${I} --param2 ${J} --param3 ${K} _EOF_ chmod 775 ${DIR_NAME}/slurm-exp.bash echo "cd ${DIR_NAME}; sbatch slurm-exp.bash; cd \$BASEDIR" >> run_all_experiments.bash } # Create a script to submit all of our experiments to the scheduler echo "#!/bin/bash" > run_all_experiments.bash echo "BASEDIR=\$(dirname \$0)" >> run_all_experiments.bash chmod 775 run_all_experiments.bash # Loop through all the parameter combinations # For each combination, we'll run the experiment on one node with 1, 2, 4, and 88 processors # Then we'll run the experiment with 8 processors on 2, 4, 8, ..., 32 nodes QUEUE=parallel EXECUTABLE=/home/araim1/myexecutable for I in 5 10 20 do for (( J=2; J<=4; J++ )) do for K in 1 2 3 do NODES=1 for NPERNODE in 1 2 4 8 do write_script done NPERNODE=8 for NODES in 2 4 8 16 32 do write_script done done done done
The result is that we get the following directory structure.
[araim1@tara-fe1 create-experiments]$ ls create-exps.bash expi05j2k3 expi05j4k1 expi10j2k2 expi10j3k3 expi20j2k1 expi20j3k2 expi20j4k3 example-script.bash expi05j3k1 expi05j4k2 expi10j2k3 expi10j4k1 expi20j2k2 expi20j3k3 run_all_experiments.bash expi05j2k1 expi05j3k2 expi05j4k3 expi10j3k1 expi10j4k2 expi20j2k3 expi20j4k1 expi05j2k2 expi05j3k3 expi10j2k1 expi10j3k2 expi10j4k3 expi20j3k1 expi20j4k2 [araim1@tara-fe1 create-experiments]$ ls expi05j3k2 nodes001nper1 nodes001nper4 nodes002nper8 nodes008nper8 nodes032nper8 nodes001nper2 nodes001nper8 nodes004nper8 nodes016nper8 [araim1@tara-fe1 create-experiments]$ ls expi05j3k2/nodes002nper8 slurm-exp.bash [araim1@tara-fe1 create-experiments]$ cat slurm-exp.bash #!/bin/bash #SBATCH --job-name=expi05j3k2 #SBATCH --nodes=2 #SBATCH --ntasks-per-node=8 #SBATCH --output=slurm.out #SBATCH --error=slurm.err #SBATCH --partition=parallel srun /home/araim1/myexecutable --param1 5 --param2 3 -param3 2 [araim1@tara-fe1 create-experiments]$
[araim1@tara-fe1 create-experiments]$ cd expi05j3k2/nodes002nper8 [araim1@tara-fe1 nodes002nper8]$ sbatch slurm-exp.bash
[araim1@tara-fe1 create-experiments]$ ./run_all_experiments.bash