UMBC High Performance Computing Facility
How to run OpenMP programs on maya
Introduction
On this page we'll see how to run OpenMP programs on the cluster. Before
proceeding, make sure you've read the
How To
Run tutorial first.
OpenMP is a parallel programming model for shared memory systems. In this
model, the user creates worker threads which are coordinated by a master thread.
The user marks sections of code as parallel using special preprocessor
directives. The nodes on maya do not share memory, so
OpenMP by itself
cannot be used to coordinate multiple node jobs. But it can be used
for multiple cores on a single node. For this reason, we recommend
MPI as the more general programming model.
For multi-node jobs, hybrid programs using both MPI + OpenMP are also
possible.
OpenMP is available from several programming languages such as
C and FORTRAN.
Hello World example C
Let's start with a simple Hello World script written in C (taken from
an example at
Purdue)
#include <omp.h>
#include <stdio.h>
int main (int argc, char *argv[])
{
int nthreads, thread_id;
#pragma omp parallel private(nthreads, thread_id)
{
thread_id = omp_get_thread_num();
printf("Thread %d says: Hello World\n", thread_id);
if (thread_id == 0)
{
nthreads = omp_get_num_threads();
printf("Thread %d reports: the number of threads are %d\n",
thread_id, nthreads);
}
}
return 0;
}
Download:
../code/hello_openmp_c/hello_openmp.c
Here is the batch script we will use to launch it
#!/bin/bash
#SBATCH --job-name=OMP_hello
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
export OMP_NUM_THREADS=8
./hello_openmp
Download:
../code/hello_openmp_c/run.slurm
Notice the setting of the environment variable OMP_NUM_THREADS to 8; this
controls how many OpenMP threads will be used for the job. Setting this to a
higher number will generally not improve performance, since there are 8 cores
on each node on maya2009 and maya2010 nodes.
If you don't require 8 threads, you can also decrease
"--ntasks-per-node" accordingly
(you should make OMP_NUM_THREADS match the total allocated cpus which
can be calculated by multiplying
--ntasks-per-node and --cpus-per-task).
Another important thing to note - if we change "--nodes" to 2, the job
will be duplicated on two nodes, not parallelized across them as we would
probably want. So it's recommended to leave --nodes=1
Now we will compile and launch the job
[araim1@maya-usr1 hello_openmp_c]$ gcc -fopenmp hello_openmp.c -o hello_openmp -lm # For GNU compiler
[araim1@maya-usr1 hello_openmp_c]$ icc -openmp hello_openmp.c -o hello_openmp -lm # For Intel compiler
[araim1@maya-usr1 hello_openmp_c]$ ls
hello_openmp.c run.slurm
[araim1@maya-usr1 hello_openmp_c]$ sbatch run.slurm
Submitted batch job 37532
[araim1@maya-usr1 hello_openmp_c]$ cat slurm.out
Thread 1 says: Hello World
Thread 5 says: Hello World
Thread 6 says: Hello World
Thread 2 says: Hello World
Thread 7 says: Hello World
Thread 0 says: Hello World
Thread 3 says: Hello World
Thread 0 reports: the number of threads are 8
Thread 4 says: Hello World
[araim1@maya-usr1 hello_openmp_c]$
Hello World example FORTRAN
Now let's see a similar program in FORTRAN. Begin by downloading the hello
world FORTRAN example from
here.
Then grab the following batch script (which is the same as for the C code
above)
#!/bin/bash
#SBATCH --job-name=OMP_hello
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
export OMP_NUM_THREADS=8
./hello_open_mp
Download:
../code/hello_openmp_f90/run.slurm
Now we can compile and run the code, the same way as in the C example
[araim1@maya-usr1 hello_openmp_f90]$ gfortran -fopenmp hello_open_mp.f90 -o hello_open_mp # For GNU compiler
[araim1@maya-usr1 hello_openmp_f90]$ ifort -openmp hello_open_mp.f90 -o hello_open_mp # For Intel Compiler
[araim1@maya-usr1 hello_openmp_f90]$ sbatch run.slurm
Submitted batch job 37537
[araim1@maya-usr1 hello_openmp_f90]$ cat slurm.out
HELLO_OPEN_MP
FORTRAN90/OpenMP version
The number of processors available = 8
The number of threads available = 8
OUTSIDE the parallel region.
HELLO from process 0
Going INSIDE the parallel region:
HELLO from process 0
HELLO from process 4
HELLO from process 5
HELLO from process 3
HELLO from process 6
HELLO from process 2
HELLO from process 7
HELLO from process 1
Back OUTSIDE the parallel region.
HELLO_OPEN_MP
Normal end of execution.
Elapsed wall clock time = 0.131280E-01
[araim1@maya-usr1 hello_openmp_f90]$
MPI/OpenMP Hybrid in C
It may be useful to consider hybrid programming using both MPI and OpenMP. For
example, MPI can be used for communication between nodes, and OpenMP can be
used for shared memory programming within a node.
The following Hello World program launches a predefined number of OpenMP
threads (which we will take to be 8 - the number of processor cores on a node),
and prints a message from each. The thread with thread ID 0 also reports the
number of threads in its group.
#include <stdio.h>
#include <omp.h>
#include <mpi.h>
int main (int argc, char *argv[])
{
int nthreads, thread_id;
int id, np;
char processor_name[MPI_MAX_PROCESSOR_NAME];
int processor_name_len;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &np);
MPI_Comm_rank(MPI_COMM_WORLD, &id);
MPI_Get_processor_name(processor_name, &processor_name_len);
#pragma omp parallel private(nthreads, thread_id)
{
thread_id = omp_get_thread_num();
printf("Hello World from thread %d, process %d of %d, hostname %s\n",
thread_id, id, np, processor_name);
if (thread_id == 0)
{
nthreads = omp_get_num_threads();
printf("Thread %d on process %d of %d reports: nthreads = %d\n",
thread_id, id, np, nthreads);
}
}
MPI_Finalize();
return 0;
}
Download:
../code/hello-omp-mpi/hello_omp_mpi.c
The following batch script launches two MPI processes which will each run on
their own node. We set the environment variable OMP_NUM_THREADS=8 to tell the
OpenMP framework that there should be eight threads per process. The
"--exclusive" flag lets the scheduler know that we will be using the entire
nodes, and that no other jobs should run alongside ours. As usual, launching
the executable with "srun" ensures that the MPI framework is used.
#!/bin/bash
#SBATCH --job-name=OMP_hello
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --exclusive
export OMP_NUM_THREADS=8
srun ./hello_omp_mpi
Download:
../code/hello-omp-mpi/run.slurm
The first line below shows how to compile the program with the GNU compiler
and the second line shows how to compile with the Intel compiler.
Notice that mpicc is used for both but a different module must be loaded
for the different compilers.
[araim1@maya-usr1 hello-omp-mpi]$ mpicc -fopenmp hello_omp_mpi.c -o hello_omp_mpi # For GNU Compiler
[araim1@maya-usr1 hello-omp-mpi]$ module swap mvapich2/gcc/4.8.1/1.9 mvapich2/intel/composer_xe_2013_sp1.1.106/1.9
[araim1@maya-usr1 hello-omp-mpi]$ mpicc -openmp hello_omp_mpi.c -o hello_omp_mpi # For Intel compiler
[araim1@maya-usr1 hello-omp-mpi]$ sbatch run.slurm
Submitted batch job 1381632
[araim1@maya-usr1 hello-omp-mpi]$ cat slurm.err
[araim1@maya-usr1 hello-omp-mpi]$ cat slurm.out
Hello World from thread 7, process 0 of 2, hostname n3
Hello World from thread 4, process 0 of 2, hostname n3
Hello World from thread 2, process 0 of 2, hostname n3
Hello World from thread 3, process 0 of 2, hostname n3
Hello World from thread 6, process 0 of 2, hostname n3
Hello World from thread 5, process 0 of 2, hostname n3
Hello World from thread 1, process 0 of 2, hostname n3
Hello World from thread 0, process 1 of 2, hostname n4
Thread 0 on process 1 of 2 reports: nthreads = 8
Hello World from thread 5, process 1 of 2, hostname n4
Hello World from thread 6, process 1 of 2, hostname n4
Hello World from thread 7, process 1 of 2, hostname n4
Hello World from thread 3, process 1 of 2, hostname n4
Hello World from thread 4, process 1 of 2, hostname n4
Hello World from thread 1, process 1 of 2, hostname n4
Hello World from thread 2, process 1 of 2, hostname n4
Hello World from thread 0, process 0 of 2, hostname n3
Thread 0 on process 0 of 2 reports: nthreads = 8
[araim1@maya-usr1 hello-omp-mpi]$
More OpenMP programming