UMBC High Performance Computing Facility
How to run Bash programs on tara
Introduction
Now we'll see how to run a Bash script on the cluster. Before
proceeding, make sure you've read the
How To
Run tutorial first.
We now know that we should be running our job on the compute nodes,
rather than the front end node. However, we need to be careful with
scripting, and make that the scheduler always has control over our job.
We'll see some examples of how to do this correctly, along with some
counterexamples. Use of other scripting languages and shells should be
very similar.
Simple Bash example
Let's start with the following script. We initiate a one minute sleep to
allow it to run for a little while. This is such a simple example, we
could have included it directly in the batch script. In practice though,
we'll usually want to keep our functional code seperate from our
batch job running code.
Here is the qsub script we will use to launch it
#!/bin/bash
#SBATCH --job-name=pause
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
./pause.bash
Download:
../code/bash_pause/run.slurm
Now we launch the job
[araim1@tara-fe1 bash_pause]$ sbatch openmpi.slurm
sbatch: Submitted batch job 2618
[araim1@tara-fe1 bash_pause]$ squeue
JOBID PARTITION NAME USER ST TIME NODES QOS NODELIST(REASON)
2620 serial pause araim1 R 0:00 1 normal (Resources)
[araim1@tara-fe1 bash_pause]$
After about a minute, we get the following output
[araim1@tara-fe1 bash_pause]$ cat slurm.err
[araim1@tara-fe1 bash_pause]$ cat slurm.out
Script started at Thu Aug 20 18:12:36 EDT 2009
Script ended at Thu Aug 20 18:13:36 EDT 2009
[araim1@tara-fe1 bash_pause]$
If we had killed the job during its execution, the scheduler
would have been able to stop it cleanly, and no pieces of it would
continue to run on the compute node.
It would not be a good idea to try to run the pause.bash script as a
background job, or through nohup, as a note to users familiar with these
mechanisms. These could potentially run outside of the scheduler. If
this happens, you would lose control of your job and need to
contact HPC Support to stop it. If such a job is
left running, other users' jobs could be scheduled on your busy
processors, which could interfere with their execution.