UMBC High Performance Computing Facility
How to run OpenMP programs on tara
Introduction
On this page we'll see how to run OpenMP programs on the cluster. Before
proceeding, make sure you've read the
How To
Run tutorial first.
OpenMP is a parallel programming model for shared memory systems. In this
model, the user creates worker threads which are coordinated by a master thread.
The user marks sections of code as parallel using special preprocessor
directives. The nodes on tara do not share memory, so
OpenMP by itself
cannot be used for jobs that need to utilize multiple cluster nodes.
But it can be used to
utilize multiple cores on a single node. For this reason, we recommend
MPI as the more general programming model.
(For multi-node jobs, hybrid programs using both MPI + OpenMP are also
possible, but we won't get into that at this time).
OpenMP is available from several programming languages such as
C and FORTRAN.
Hello World example C
Let's start with a simple Hello World script written in C (taken from
an example at
Purdue)
#include <omp.h>
#include <stdio.h>
int main (int argc, char *argv[])
{
int nthreads, thread_id;
#pragma omp parallel private(nthreads, thread_id)
{
thread_id = omp_get_thread_num();
printf("Thread %d says: Hello World\n", thread_id);
if (thread_id == 0)
{
nthreads = omp_get_num_threads();
printf("Thread %d reports: the number of threads are %d\n",
thread_id, nthreads);
}
}
return 0;
}
Download:
../code-2010/hello_openmp_c/hello_openmp.c
Here is the batch script we will use to launch it
#!/bin/bash
#SBATCH --job-name=OMP_hello
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
export OMP_NUM_THREADS=8
./hello_openmp
Download:
../code-2010/hello_openmp_c/run.slurm
Notice the setting of the environment variable OMP_NUM_THREADS to 8; this
controls how many OpenMP threads will be used for the job. Setting this to a
higher number will generally not improve performance, since there are 8 cores
on each node. If you don't require 8 threads, you can also decrease
"--ntasks-per-node" accordingly (you should make OMP_NUM_THREADS and
--ntasks-per-node match).
Another important thing to note - if we change "--nodes" to 2, the job
will be duplicated on two nodes, not parallelized across them as we would
probably want. So it's recommended to leave --nodes=1
Now we will compile and launch the job
[araim1@tara-fe1 hello_openmp_c]$ gcc -fopenmp hello_openmp.c -o hello_openmp -lm
[araim1@tara-fe1 hello_openmp_c]$ ls
hello_openmp.c run.slurm
[araim1@tara-fe1 hello_openmp_c]$ sbatch run.slurm
Submitted batch job 37532
[araim1@tara-fe1 hello_openmp_c]$ cat slurm.out
Thread 1 says: Hello World
Thread 5 says: Hello World
Thread 6 says: Hello World
Thread 2 says: Hello World
Thread 7 says: Hello World
Thread 0 says: Hello World
Thread 3 says: Hello World
Thread 0 reports: the number of threads are 8
Thread 4 says: Hello World
[araim1@tara-fe1 hello_openmp_c]$
Hello World example FORTRAN
Now let's see a similar program in FORTRAN. Begin by downloading the hello
world FORTRAN example from
here.
Then grab the following batch script (which is the same as for the C code
above)
#!/bin/bash
#SBATCH --job-name=OMP_hello
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
export OMP_NUM_THREADS=8
./hello_open_mp
Download:
../code-2010/hello_openmp_f90/run.slurm
Now we can compile and run the code, the same way as in the C example
[araim1@tara-fe1 hello_openmp_f90]$ gfortran -fopenmp hello_open_mp.f90 -o hello_open_mp
[araim1@tara-fe1 hello_openmp_f90]$ sbatch run.slurm
Submitted batch job 37537
[araim1@tara-fe1 hello_openmp_f90]$ cat slurm.out
HELLO_OPEN_MP
FORTRAN90/OpenMP version
The number of processors available = 8
The number of threads available = 8
OUTSIDE the parallel region.
HELLO from process 0
Going INSIDE the parallel region:
HELLO from process 0
HELLO from process 4
HELLO from process 5
HELLO from process 3
HELLO from process 6
HELLO from process 2
HELLO from process 7
HELLO from process 1
Back OUTSIDE the parallel region.
HELLO_OPEN_MP
Normal end of execution.
Elapsed wall clock time = 0.131280E-01
[araim1@tara-fe1 hello_openmp_f90]$