Using BEAST2/BEAGLE on HPC

1,184 views
Skip to first unread message

Juan Manuel Cabrera

unread,
Dec 12, 2017, 1:58:13 PM12/12/17
to beast-users
Hi everyone, I´m having trouble running Beast2/Beagle on an unix HPC-cluster. The sysadming of the cluster manage to install beast and beagle correctly but no matter what options I choose (-beagle_SSE, -beagle_CPU, -instances, -threads) it cannot use more than one thread/core (the system reserve me the resources but i cant use them all)

Here is on of the batch file I tried:

**********************************************************************************
#!/bin/bash 
#SBATCH --job-name=beast
#SBATCH --ntasks=4 
#SBATCH --tasks-per-node=4 
#SBATCH --output=trabajo-%j-salida.txt
#SBATCH --error=trabajo-%j-error.txt 

export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE
module load beast
module load beagle-lib/171124-avx 
beast -threads 4 -instances 4 -beagle_SSE sb2.xml
***********************************************************************************

On the output file I got the beagle messages:

 Using BEAGLE version: 2.1.2 resource 0: CPU
    with instance flags:  PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL SCALING_MANUAL SCALERS_RAW VECTOR_SSE THREADING_NONE PROCESSOR_CPU FRAMEWORK_CPU
  Using BEAGLE version: 2.1.2 resource 0: CPU
    with instance flags:  PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL SCALING_MANUAL SCALERS_RAW VECTOR_SSE THREADING_NONE PROCESSOR_CPU FRAMEWORK_CPU
  Using BEAGLE version: 2.1.2 resource 0: CPU
    with instance flags:  PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL SCALING_MANUAL SCALERS_RAW VECTOR_SSE THREADING_NONE PROCESSOR_CPU FRAMEWORK_CPU
  Using BEAGLE version: 2.1.2 resource 0: CPU
    with instance flags:  PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL SCALING_MANUAL SCALERS_RAW VECTOR_SSE THREADING_NONE PROCESSOR_CPU FRAMEWORK_CPU
  Using BEAGLE version: 2.1.2 resource 0: CPU
    with instance flags:  PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL SCALING_MANUAL SCALERS_RAW VECTOR_SSE THREADING_NONE PROCESSOR_CPU FRAMEWORK_CPU




Remco Bouckaert

unread,
Dec 12, 2017, 2:13:15 PM12/12/17
to beast...@googlegroups.com
Hi Juan,

Judging from the XML file name, I assume you are running a StartBeast2 analysis. Unfortunately, the BEAUti template is not set up yet for benefiting from threading (I opened an issue for that https://github.com/genomescale/starbeast2/issues/13). You can easily change the XML by editing it in a text editor and replacing

<distribution spec="CompoundDistribution" id="likelihood">

with

<distribution spec="CompoundDistribution" id="likelihood" usseThreads="true">
so that tree likelihoods for various genes are calculated in parallel, and 

spec=“TreeLikelihood” with spec=“ThreadedTreeLikliehood”.

so that tree likelihoods for single genes are calculated in parallel.

Note that the -instances flag sets how many threads are used for an individual (gene) alignment, and when you have many short genes the threading overhead may be larger than the benefit of parallelisation, so you have to experiment a bit with a combination of -threads and -instances to get optimal performance.

Cheers,

Remco



--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at https://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.

Remco Bouckaert

unread,
Dec 12, 2017, 2:15:08 PM12/12/17
to beast...@googlegroups.com
Sorry for the typo: it should be “ThreadedTreeLikelihood" not “ThreadedTreeLikliehood”.

Remco

Huw A. Ogilvie

unread,
Dec 12, 2017, 11:44:12 PM12/12/17
to beast-users
StarBEAST2 XML files are not set up for multithreading the phylogenetic likelihood function, because it will typically be even SLOWER than single threading, so please double-check the performance impact before wasting a whole lot of electricity. I just checked with one of the data sets I am currently working with, and enabling multithreading reduced performance by about 25%.

This is because of how StarBEAST2 works and the kind of data it is used with. Most MCMC operations will only change a few branches of a single gene tree, for which it is pretty fast to update the phylogenetic and multispecies coalescent likelihoods (a lot of credit goes to Remco for implementing those functions). StarBEAST2 is typically used with short nucleotide sequences, making the phylogenetic likelihood calculation even faster.

The very quick time taken by each MCMC step leaves little room to accelerate those calculations, and the overhead of multithreading will typically be larger than the performance gain, resulting in worse performance than single threading.

The order of MCMC operations in BEAST2 is strictly serialized, so the StarBEAST2 algorithm, as currently implemented, is pretty much the worst case scenario for multithreading.

- Huw

Huw A. Ogilvie

unread,
Dec 12, 2017, 11:54:23 PM12/12/17
to beast-users
I should clarify that the 25% percent reduction in performance is wall time. Multithreading uses more wall time AND more CPU time, so for my data set multithreading used about 50% more CPU hours (and hence would use up about 50% more of your HPC quota). For large data sets I am running multiple chains with different random seeds, and concatenating the the post-burnin samples. This is a MUCH more effective way of multithreading StarBEAST2. Another option could enabling MCMCMC aka MC3, but I haven't yet tested it with StarBEAST2 (or used it with BEAST2 at all).

Juan Manuel Cabrera

unread,
Dec 13, 2017, 12:08:39 PM12/13/17
to beast-users
Thanks for all the info. I will perform some test to see if I can improve my run times without wasting HPC resources. 

Huw I would like to try the multiple chain/concatenation approach you suggest. Could you explain me how can I do this?

Huw A. Ogilvie

unread,
Dec 13, 2017, 5:52:03 PM12/13/17
to beast-users
Hi Juan,

The easiest way to do this is to create a separate folder for each chain, and put a copy of your XML file in each folder. Then run BEAST for each XML file to initiate each chain, but make sure to use a different random seed for each chain. If different chains are started at roughly the same time, they will often get the same random seed, so set the seed manually for each chain.

Then you can use "logcombiner" (included with BEAST2) to strip burn-in and concatenate each chain into a single chain.

- Huw

Juan Manuel Cabrera

unread,
Dec 17, 2017, 3:31:39 AM12/17/17
to beast-users
You were right! Multi-thread approach didnt work (much slower than single thread). 

Thank everyone for the help
Reply all
Reply to author
Forward
0 new messages