UMBC logo
UMBC High Performance Computing Facility
How to Compile C programs on tara

Introduction

In this tutorial we will demonstrate compilation of C source code on tara. First we start with a simple serial example, then work our way to compiling parallel code. Once the code is compiled we will see how to run it. We will assume that you know some basic C, so the code will not be explained in much detail. Working on a multicore cluster like tara is fundamentally different from working on a standard server such as gl.umbc.edu, so please make sure to read and understand this material. More details can be found in manual pages on the system (e.g. try the command "man mpicc").

A convenient way to save the example code to your account is as follows. There is a "download" link under each code example. You can copy this link from your browser and issue the following command in your tara terminal session.

[araim1@tara-fe1 ~]$ wget <paste_the_link_here>

For example

[araim1@tara-fe1 ~]$ wget http://www.umbc.edu/hpcf/code/hello_serial/hello_serial.c
--16:08:24--  http://www.umbc.edu/hpcf/code/hello_serial/hello_serial.c
Resolving www.umbc.edu... 130.85.12.11
Connecting to www.umbc.edu|130.85.12.11|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 183 [text/plain]
Saving to: `hello_serial.c'

100%[======================================================================================>] 183         --.-K/s   in 0s     

16:08:24 (29.1 MB/s) - `hello_serial.c' saved [183/183]

[araim1@tara-fe1 ~]$ ls
hello_serial.c
[araim1@tara-fe1 ~]$

We have shown the prompt in the examples above to emphasize that a command is being issued. When following the examples, your prompt may look a bit different (e.g. your own username will be there!). When following along, be careful to only issue the command part, and not the prompt or the example output.

Serial Hello World

We will write a simple "hello world" program that prints the name of the host machine. Here is the code
#include <stdio.h>
#include <unistd.h>

int main(int argc, char* argv[])
{
    char hostname[256];
    gethostname(hostname, 256);

    printf("Hello world from %s\n", hostname);

    return 0;
}

Download: ../code-2010/hello_serial/hello_serial.c

Once you have saved this code to your account, try to compile it. There are several C compilers on the system, here we demonstrate the Portland Group C compiler.

[araim1@tara-fe1 hello_serial]$ pgcc hello_serial.c -o hello_serial
[araim1@tara-fe1 hello_serial]$

If successful, no warnings will appear and an executable "hello_serial" will have been created.

[araim1@tara-fe1 hello_serial]$ ls
hello_serial  hello_serial.c

To see how to run your serial executable on the cluster, jump to how to run serial programs.


Parallel Hello World

Now we will compile a "hello world" program which can be run in parallel on multiple processors. Save the following code to your account.
#include <stdio.h>
#include <mpi.h>

int main (int argc, char *argv[])
{
    int id, np;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int processor_name_len;

    MPI_Init(&argc, &argv);

    MPI_Comm_size(MPI_COMM_WORLD, &np);
    MPI_Comm_rank(MPI_COMM_WORLD, &id);
    MPI_Get_processor_name(processor_name, &processor_name_len);

    printf("Hello world from process %03d out of %03d, processor name %s\n", 
        id, np, processor_name);

    MPI_Finalize();
    return 0;
}

Download: ../code-2010/hello_parallel/hello_parallel.c

This version of the "hello world" program collects several pieces of information at each MPI process: the MPI processor name (i.e., the hostname), the process ID, and the number of processes in our job. Notice that we needed a new header file mpi.h to get access to the MPI commands. We also needed to call MPI_Init before using any of them, and MPI_Finalize at the end to clean up. Try to compile the code with the following command.

[araim1@tara-fe1 hello_parallel]$ mpicc hello_parallel.c -o hello_parallel
hello_parallel.c:
[araim1@tara-fe1 hello_parallel]$

After a successful compilation with no warnings, an executable "hello_parallel" should have been created

[araim1@tara-fe1 hello_parallel]$ ls
hello_parallel  hello_parallel.c
[araim1@tara-fe1 hello_parallel]$ 

To see how to run your parallel executable on the cluster, jump to how to run parallel programs.

In this example, we've written output from our MPI program to stdout. As a general guideline, stdout and stderr should be used for reporting status information, and not for returning large datasets. If your program does need to write out a lot of data, it would be more appropriate to use file I/O instead.

Choosing a Compiler and MPI Implementation

In the parallel code example, we've used a special compilation command "mpicc", that knows how to generate a parallel executable. "mpicc" is actually not a new compiler, but a generic command that uses one of the underlying C compilers (such as pgcc, gcc, etc). It also relies on a specific MPI implementation, several of which are available on tara.

By default, your account is set up to use the Portland Group Compiler and the MVAPICH2 1.4 rc2 implementation. To verify this, issue the following command.

[araim1@tara-fe1 results]$ switcher --show mpi
user:default=pgi-9.0-mvapich2-1.4rc2
user:exists=1
[araim1@tara-fe1 ~]$ 

The "user:default" line tells you which compiler/MPI combination you're using, while "user:exists=1" simply indicates you've been assigned a compiler/MPI combination. pgi-9.0-mvapich2-1.4rc2 should be a good default for many users, so you may not need to change it. There are several other possibilities though - these also involve the GNU gcc compiler and the OpenMPI implementation. Try the following command to see what is available.

[araim1@tara-fe1 results]$ switcher --list mpi
pgi-9.0-mvapich2-1.4rc2
gcc-mvapich2-1.4rc2
pgi-9.0-openmpi-1.3.3-p1
gcc-openmpi-1.3.3
gcc-openmpi-1.3.3-p1
pgi-9.0-openmpi-1.3.3
[araim1@tara-fe1 ~]$ 

At this point, we should discuss the Switcher. When you compile MPI programs, the compiler needs information about where to find the MPI libraries and which libraries to link to. Fortunately you don't have to worry about this since the MPI implementations provide wrapper scripts which call the compiler for you. These scripts are mpicc (for C), mpicxx (for C++), mpif90 (for Fortran 90), and mpif77 (for Fortran 77). In order to successfully compile or run any MPI program, you must have your PATH, LD_LIBRARY_PATH and pieces of your environment set correctly so that your shell can find the wrapper script and the MPI libraries. The Switcher program handles most of this work for you.

Above we saw that there were six compiler/MPI combinations. The compiler and MPI implementation are combined because the MPI libraries your code uses at runtime should have been compiled with the same compiler you're now using to compile your code. That means you have to use Switcher to pick one of the combinations first, before both compiling and running your program. It also means that if you change to different Switcher settings, you'll need to recompile your code before running it.

In the tutorials, we always assume you are using the default switcher setting PGI + MVAPICH2. MVAPICH2 and OpenMPI are about equally as easy to use, but we've found in preliminary benchmarks that MVAPICH2 consistently performs better for jobs with larger numbers of processes.

Suppose we wish to switch to a different compiler/MPI combination. Suppose the current setting is "pgi-9.0-mvapich2-1.4rc2" and the desired setting is "gcc-openmpi-1.3.3-p1". The following command will make the change.

[araim1@tara-fe1 ~]$ switcher mpi = gcc-openmpi-1.3.3-p1
Warning: mpi:default already has a value:
  pgi-9.0-mvapich2-1.4rc2
Replace old attribute value (y/N)? y
Attribute successfully set; new attribute setting will be effective for
future shells
[araim1@tara-fe1 ~]$ 

Now make sure to log out and log back in to all open sessions for the new settings to take effect. Or alternatively, you can use the switcher_reload command, and change settings within your current session.

[araim1@tara-fe1 ~]$ switcher_reload
[araim1@tara-fe1 ~]$ 
Another useful thing to mention is the "-show" flag for the MPI compiler commands. This will display which options are currently in use. For example:
[araim1@tara-fe1 ~]$ mpicc -show
pgcc -g -I/usr/cluster/mvapich2/1.4rc2-2/pgi/9.0/include -L/usr/cluster/mvapich2
/1.4rc2-2/pgi/9.0/lib -Wl,-rpath,/usr/cluster/mvapich2/1.4rc2-2/pgi/9.0/lib -lmp
ich -lpmi -lpthread -lrdmacm -libverbs -libumad -lrt
[araim1@tara-fe1 ~]$ 
We can see that a call to "mpicc" will invoke the pgcc compiler with MVAPICH2 options set, as we would expect from our switcher setting.

Logging which nodes are used

For a parallel program, it's always a good idea to log which compute nodes you've used. We can modify our parallel hello world a little bit to accomplish this. Running this example on the cluster is the same as the Parallel Hello World program, see how to run parallel jobs.

Logging which nodes are used - Version 1

First we'll start by logging some information to stdout. If you ran the Parallel Hello World example above, you probably noticed that the processes reported back in a random order. This is a bit difficult to read, so let's try sorting the responses before we print them out. To do this, we'll have the process with ID 0 receive greeting messages from every other process, in order by process ID. Process 0 will handle writing the messages to stdout.
#include <stdio.h>
#include <mpi.h>
#include <string.h>

int main(int argc, char* argv[])
{
    int id, np, processor_name_len;
    int j;
    int dest;
    int tag = 0;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    char message[100];
    MPI_Status status;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &id);
    MPI_Comm_size(MPI_COMM_WORLD, &np);
    MPI_Get_processor_name(processor_name, &processor_name_len);

    sprintf(message, 
        "Process %03d out of %03d running on processor %4s", 
        id, np, processor_name);

    if (id == 0)
    {
        printf("%s\n", message);

        for (j = 1; j < np; j++)
        {
            MPI_Recv(message, 100, MPI_CHAR, j, tag, MPI_COMM_WORLD, &status);
            printf("%s\n", message);
        }
    }
    else
    {
        dest = 0;
        MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
    }

    MPI_Finalize();
    return 0;
}

Download: ../code-2010/hello_send_recv/hello_send_recv.c
Message sending is accomplished using the MPI_Send function, and receiving with the MPI_Recv function. Each process prepares its own message, then execution varies depending on the current process. Process 0 writes its own message first, then receives and writes the others in order by process ID. All other processes simply send their message to process 0.

Logging which nodes are used - Version 2

Now we'll make a few improvements to the first version, (1) to make it look more like a real project and (2) to create a useful utility function. Download the following files
#include <stdio.h>
#include <mpi.h>
#include "nodes_used.h"

int main(int argc, char* argv[])
{
    int id, np, processor_name_len;
    char processor_name[MPI_MAX_PROCESSOR_NAME];

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &id);
    MPI_Comm_size(MPI_COMM_WORLD, &np);
    MPI_Get_processor_name(processor_name, &processor_name_len);

    FILE* log_fp = fopen("nodes_used.log", "w");

    /* Log our MPI processes to the log file. We could also have
       specified the special FILE* names "stdout" or "stderr" here */
    log_processes(log_fp, id, np, processor_name);

    fclose(log_fp);

    MPI_Finalize();
    return 0;
}


Download: ../code-2010/hello_send_recv-2/hello_send_recv.c
#ifndef NODES_USED_H
#define NODES_USED_H

#include <stdio.h>
#include <string.h>
#include <mpi.h>

void log_processes(FILE* fp, int id, int np, char* processor_name);

#endif

Download: ../code-2010/hello_send_recv-2/nodes_used.h
#include "nodes_used.h"

void log_processes(FILE* fp, int id, int np, char* processor_name)
{
    int j, dest;
    char message[100];
    int tag = 0;
    MPI_Status status;

    sprintf(message,
        "Process %03d out of %03d running on processor %4s",
        id, np, processor_name);

    if (id == 0)
    {
        fprintf(fp, "%s\n", message);

        for (j = 1; j < np; j++)
        {
            MPI_Recv(message, 100, MPI_CHAR, j, tag, MPI_COMM_WORLD, &status);
            fprintf(fp, "%s\n", message);
        }
    }
    else
    {
        dest = 0;
        MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
    }
}

Download: ../code-2010/hello_send_recv-2/nodes_used.c
OBJS := nodes_used.o hello_send_recv.o

EXECUTABLE := hello_send_recv

DEFS := 
CFLAGS := -g -O3 

INCLUDES :=
LDFLAGS := -lm 

CC := mpicc

%.o: %.c %.h
    $(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@

$(EXECUTABLE): $(OBJS)
    $(CC) $(CFLAGS) $(INCLUDES) $(OBJS) -o $@ $(LDFLAGS)

clean:
    rm -f *.o $(EXECUTABLE)


Download: ../code-2010/hello_send_recv-2/Makefile
Warning: if you copy and paste the Makefile text from your browser, you will lose some of the formatting. Namely, some of the lines need to begin with tabs. The easiest way to avoid this problem is to download the file using the link.
Now it should be a simple matter to compile the program.
[araim1@tara-fe1 hello_send_recv-2]$ make
mpicc -g -O3   -c nodes_used.c -o nodes_used.o
mpicc -g -O3    -c -o hello_send_recv.o hello_send_recv.c
mpicc -g -O3   nodes_used.o hello_send_recv.o -o hello_send_recv -lm 
[araim1@tara-fe1 hello_send_recv-2]$ ls
Makefile                                nodes_used.c
hello_send_recv                          nodes_used.h
hello_send_recv.c                        nodes_used.o
hello_send_recv.o
[araim1@tara-fe1 hello_send_recv-2]$
It's also simple to clean up the project (object files and executables)
[araim1@tara-fe1 hello_send_recv-2]$ make clean
rm -f *.o *.oo hello_send_recv
[araim1@tara-fe1 hello_send_recv-2]$ ls
Makefile  hello_send_recv.c  nodes_used.c  nodes_used.h
[araim1@tara-fe1 hello_send_recv-2]$
Now the nodes_used.h and nodes_used.c files can be copied to other projects, and used as a utility.