UMBC High Performance Computing Facility
How to run MapReduce-MPI on tara
Introduction
MapReduce-MPI (MR-MPI) is a library
written
at Sandia National Lab which implements the MapReduce parallel programming
paradigm on an MPI system. MapReduce was popularized by
Google. Its most widely used implementation
is probably the open source Java library called
Hadoop. On this page we will show how
to get started with MR-MPI on tara.
Currently MR-MPI is built only for the "GCC + OpenMPI 1.3.3" switcher
combination, so we must first change to that setting. We also define an
environment variable MRMPI_HOME for convenience.
[araim1@tara-fe1 ~]$ switcher mpi = gcc-openmpi-1.3.3-p1
[araim1@tara-fe1 ~]$ switcher_reload
[araim1@tara-fe1 ~]$ export MRMPI_HOME=/usr/cluster/contrib/gcc-openmpi-1.3.3/mrmpi-20Jun11/
We can now compile and run the examples that come with MR-MPI. Let's focus
on the "wordfreq" example.
[araim1@tara-fe1 ~]$ cp /examples ~/
[araim1@tara-fe1 ~]$ cd ~/examples
[araim1@tara-fe1 examples]$ make
[araim1@tara-fe1 examples]$ ls -l wordfreq
-rwxrwx--- 1 araim1 contrib 375462 Oct 5 19:36 wordfreq
We can use a standard batch script. However, note that "mpirun" must be used
to launch programs which are compiled to use OpenMPI.
#!/bin/bash
#SBATCH --job-name=mrmpi
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
mpirun ./wordfreq $MRMPI_HOME/doc/*.txt
Download:
../code/mrmpi-wordfreq/run.slurm
In the example above, we count word frequencies in the MRMPI documentation.
Running this script yields the following.
[araim1@tara-fe1 examples]$ sbatch run.slurm
Submitted batch job 447272
[araim1@tara-fe1 examples]$ cat slurm.err
[araim1@tara-fe1 examples]$ cat slurm.out
Map time (secs) = 0.0131121
Map KV = KV pairs: 24277 ave 24277 max 24277 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Kdata (Mb): 0.149896 ave 0.149896 max 0.149896 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Vdata (Mb): 0 ave 0 max 0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Collate time (secs) = 0.0173779
Collate KMV = KMV pairs: 3568 ave 3568 max 3568 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Kdata (Mb): 0.0345516 ave 0.0345516 max 0.0345516 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Vdata (Mb): 0 ave 0 max 0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
...
[araim1@tara-fe1 examples]$
New MR-MPI programs can be written by using the header files in
$MRMPI_HOME/include and linking to the library "libmrmpi_mpicc.a"
in $MRMPI_HOME/lib. See the
documentation
page for MR-MPI for more details.