Introduction

Now we'll see how to run a Bash script on the cluster. Before proceeding, make sure you've read the How To Run tutorial first.

We now know that we should be running our job on the compute nodes, rather than the front end node. However, we need to be careful with scripting, and make that the scheduler always has control over our job. We'll see some examples of how to do this correctly, along with some counterexamples. Use of other scripting languages and shells should be very similar.

Simple Bash example

Let's start with the following script. We initiate a one minute sleep to allow it to run for a little while. This is such a simple example, we could have included it directly in the batch script. In practice though, we'll usually want to keep our functional code seperate from our batch job running code.

#!/bin/bash

echo Script started at `date`
sleep 60
echo Script ended at `date`

Download: ../code-2010/bash_pause/pause.bash

Here is the qsub script we will use to launch it

#!/bin/bash
#SBATCH --job-name=pause
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop

./pause.bash

Download: ../code-2010/bash_pause/run.slurm

Now we launch the job

[araim1@tara-fe1 bash_pause]$ sbatch openmpi.slurm 
sbatch: Submitted batch job 2618
[araim1@tara-fe1 bash_pause]$ squeue
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
   2620    serial    pause   araim1  R        0:00      1 (Resources)
[araim1@tara-fe1 bash_pause]$

After about a minute, we get the following output

[araim1@tara-fe1 bash_pause]$ cat slurm.err 
[araim1@tara-fe1 bash_pause]$ cat slurm.out 
Script started at Thu Aug 20 18:12:36 EDT 2009
Script ended at Thu Aug 20 18:13:36 EDT 2009
[araim1@tara-fe1 bash_pause]$

If we had killed the job during its execution, the scheduler would have been able to stop it cleanly, and no pieces of it would continue to run on the compute node.

It would not be a good idea to try to run the pause.bash script as a background job, or through nohup, as a note to users familiar with these mechanisms. These could potentially run outside of the scheduler. If this happens, you would lose control of your job and need to contact HPC Support to stop it. If such a job is left running, other users' jobs could be scheduled on your busy processors, which could interfere with their execution.