UMBC High Performance Computing Facility
How to run MrBayes on maya
Introduction
MrBayes
is a program for Bayesian inference and model choice across a wide
range of phylogenetic and evolutionary models.
To run the MrBayes software interactively on the front-end node of
maya, use the command
We will now demonstrate running MrBayes on the compute nodes. This was adpated
from
MbWiki.
First, make sure to update your switcher settings as follows
[araim1@maya-usr1 ~]$ switcher mpi = gcc-mvapich2-1.4rc2
[araim1@maya-usr1 ~]$ switcher_reload
This is necessary to match the compiler and MPI implementation originally
used to configure MrBayes. Next create a .nex file as follows, with your
contents instead of the placeholders.
Now create a small script with commands to execute the .nex file
Note that there are other ways to set up a call to MrBayes; for example,
MCMC and likelihood options can be specified in a file separate from the data.
The last step is to create a usual SLURM batch script. This script will
run the MrBayes program on the compute nodes; the program will be run in
parallel if multiple processes are requested. MyBayes will then run batch.txt,
which will in turn execute our .nex file.
#!/bin/bash
#SBATCH --job-name=mrbayes
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=batch
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
srun mb batch.txt
Download:
../code/mrbayes-example/run.slurm
Make sure you have read the
how to run
tutorial before attempting to use the batch system.
Setting checkpoints for long runs
For users needing very long runs of MrBayes, it is suggested to break up the
work into several small jobs rather than one very long job. Long jobs have
a higher probably of being interrupted by maintenance windows or unforeseen
problems. Fortunately, MrBayes has a built in mechanism for creating
checkpoints, where progress can be saved from one job and continued in a
subsequent job.
To demonstrate this, consider the "primates.nex" example that comes with
the MrBayes software. We will create two MB run scripts to analyze this data.
The first script represents the initial run.
execute primates.nex;
mcmc ngen=10000000 nruns=2 temp=0.02 mcmcdiag=yes samplefreq=1000
stoprule=yes stopval=0.005 relburnin=yes burninfrac=0.1 printfreq=1000
checkfreq=1000;
Download:
../code/mrbayes-checkpoint/cmds1.nex
Notice that we set "checkfreq", which represents the number of generations
before checkpointing. The second script continues where the first script
left off.
execute primates.nex;
mcmc ngen=20000000 nruns=2 temp=0.02 mcmcdiag=yes samplefreq=1000
stoprule=yes stopval=0.005 relburnin=yes burninfrac=0.1 printfreq=1000
append=yes checkfreq=1000;
Download:
../code/mrbayes-checkpoint/cmds2.nex
The only differences are the addition of the option "append=yes", which tells
MrBayes to continue from the checkpoint, and that "ngen" has been increased
to request additional generations.
In this case, we want it to read the data from our 10-million generations run,
and stop after 20 million generations. Note that all values, other than "ngen"
and "append", must match between the two scripts; the run may fail otherwise.
Running these two scripts directly from the command line yields the following
[araim1@maya-usr1 mrbayes-checkpoint]$ ls
cmds1.nex cmds2.nex primates.nex
[araim1@maya-usr1 mrbayes-checkpoint]$ mb cmds1.nex
MrBayes v3.2.1 x64
...
Executing file "cmds1.nex"
...
Executing file "primates.nex"...
...
Returning execution to calling file ...
...
Chain results (10000 generations requested):
...
[araim1@maya-usr1 mrbayes-checkpoint]$ ls
cmds1.nex primates.nex primates.nex.mcmc primates.nex.run2.p
cmds2.nex primates.nex.ckp primates.nex.run1.p primates.nex.run2.t
primates.nex.ckp~ primates.nex.run1.t run.slurm
[araim1@maya-usr1 mrbayes-checkpoint]$ mb cmds2.nex
MrBayes v3.2.1 x64
...
Executing file "cmds2.nex"
...
Executing file "primates.nex"...
...
Returning execution to calling file ...
...
Executing file "primates.nex.ckp"...
[araim1@maya-usr1 mrbayes-checkpoint]$
This mechanism can be used to plan for very long runs, by setting "ngen"
accordingly, and also to recover from unexpected failures.
When carrying out real runs on maya, the user should place these calls into
batch scripts as illustrated in the previous section.