UMBC logo
UMBC High Performance Computing Facility
How to run MapReduce-MPI on tara

Introduction

MapReduce-MPI (MR-MPI) is a library written at Sandia National Lab which implements the MapReduce parallel programming paradigm on an MPI system. MapReduce was popularized by Google. Its most widely used implementation is probably the open source Java library called Hadoop. On this page we will show how to get started with MR-MPI on tara.

Currently MR-MPI is built only for the "GCC + OpenMPI 1.3.3" switcher combination, so we must first change to that setting. We also define an environment variable MRMPI_HOME for convenience.

[araim1@tara-fe1 ~]$ switcher mpi = gcc-openmpi-1.3.3-p1
[araim1@tara-fe1 ~]$ switcher_reload
[araim1@tara-fe1 ~]$ export MRMPI_HOME=/usr/cluster/contrib/gcc-openmpi-1.3.3/mrmpi-20Jun11/
We can now compile and run the examples that come with MR-MPI. Let's focus on the "wordfreq" example.
[araim1@tara-fe1 ~]$ cp /examples ~/
[araim1@tara-fe1 ~]$ cd ~/examples
[araim1@tara-fe1 examples]$ make
[araim1@tara-fe1 examples]$ ls -l wordfreq
-rwxrwx--- 1 araim1 contrib 375462 Oct  5 19:36 wordfreq
We can use a standard batch script. However, note that "mpirun" must be used to launch programs which are compiled to use OpenMPI.
#!/bin/bash
#SBATCH --job-name=mrmpi
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

mpirun ./wordfreq $MRMPI_HOME/doc/*.txt

Download: ../code/mrmpi-wordfreq/run.slurm
In the example above, we count word frequencies in the MRMPI documentation. Running this script yields the following.
[araim1@tara-fe1 examples]$ sbatch run.slurm 
Submitted batch job 447272
[araim1@tara-fe1 examples]$ cat slurm.err
[araim1@tara-fe1 examples]$ cat slurm.out
Map time (secs) = 0.0131121
Map KV =   KV pairs:   24277 ave 24277 max 24277 min
  Histogram:  1 0 0 0 0 0 0 0 0 0
  Kdata (Mb): 0.149896 ave 0.149896 max 0.149896 min
  Histogram:  1 0 0 0 0 0 0 0 0 0
  Vdata (Mb): 0 ave 0 max 0 min
  Histogram:  1 0 0 0 0 0 0 0 0 0
Collate time (secs) = 0.0173779
Collate KMV =   KMV pairs:  3568 ave 3568 max 3568 min
  Histogram:  1 0 0 0 0 0 0 0 0 0
  Kdata (Mb): 0.0345516 ave 0.0345516 max 0.0345516 min
  Histogram:  1 0 0 0 0 0 0 0 0 0
  Vdata (Mb): 0 ave 0 max 0 min
  Histogram:  1 0 0 0 0 0 0 0 0 0
...
[araim1@tara-fe1 examples]$
New MR-MPI programs can be written by using the header files in $MRMPI_HOME/include and linking to the library "libmrmpi_mpicc.a" in $MRMPI_HOME/lib. See the documentation page for MR-MPI for more details.