compiling ExaML without MPI

48 views
Skip to first unread message

Mark Howison

unread,
Aug 9, 2016, 4:27:45 PM8/9/16
to raxml
Hello,

We have started using ExaML in place of single-threaded RAxML in the genetree stage of our phylogenomic pipeline Agalma (https://bitbucket.org/caseywdunn/agalma). We found that in practice it offers about a 10X speed-up over RAxML for the mixture of trees we are estimating (typically on the order of 7,000 genes each with 100 bootstraps = 700,000 individual ML searches). Even using full ML searches for bootstraps with ExaML is faster than RAxML in fast bootstrapping mode.

The problem we are running into is that we use GNU parallel to load balance the 700,000 individual calls across a set of compute nodes on our cluster. However, the way the MPI library interacts with multiple examl invocations on the same node causes performance degradation (all the invocations get pinned to a single core). I've been able to compile my own MPI library and circumvent this problem, but it would be more portable and maintainable if we could compile ExaML without any MPI support, since we are only using it with a single task anyway.

Is this an easy modification that could be made using a compiler directive, like -DDISABLE_MPI?

Thanks,
Mark Howison

Alexey Kozlov

unread,
Aug 11, 2016, 11:56:33 AM8/11/16
to ra...@googlegroups.com
Hello Mark,

> We have started using ExaML in place of single-threaded RAxML in the genetree stage of our phylogenomic pipeline Agalma
> (https://bitbucket.org/caseywdunn/agalma). We found that in practice it offers about a 10X speed-up over RAxML for the
> mixture of trees we are estimating (typically on the order of 7,000 genes each with 100 bootstraps = 700,000 individual
> ML searches). Even using full ML searches for bootstraps with ExaML is faster than RAxML in fast bootstrapping mode.

That's pretty weird, since both codes use the same search algorithm. ExaML is expected to be faster for big parallel
runs thanks to its more efficient parallelization scheme. But not much so on single-threaded runs with gene trees.
Another subtle ExaML/RAxML difference I'm aware of is that ExaML doesn't perform some final refinements (model parameter
optimizations, switch from CAT to GAMMA). This might lead to some performance differences in certain cases, but still
doesn't explain 10x on average...

Could you perhaps send me the exact RAxML/ExaML command lines you were using, and maybe one example of gene alignment
where you observed 10x runtime difference?

> The problem we are running into is that we use GNU parallel to load balance the 700,000 individual calls across a set of
> compute nodes on our cluster. However, the way the MPI library interacts with multiple examl invocations on the same
> node causes performance degradation (all the invocations get pinned to a single core). I've been able to compile my own
> MPI library and circumvent this problem, but it would be more portable and maintainable if we could compile ExaML
> without any MPI support, since we are only using it with a single task anyway.

Hm, could it be that you're running into the same problem with RAxML, i.e. all threads are pinned to the same core?

> Is this an easy modification that could be made using a compiler directive, like -DDISABLE_MPI?

Not really, and I assume it wouldn't be a very useful feature in general, since ExaML was was never meant to be used in
sequential mode. So we should better try to figure out the reasons for poor RAxML performance in your case.

Cheers,
Alexey

Mark Howison

unread,
Aug 14, 2016, 9:09:22 AM8/14/16
to ra...@googlegroups.com, Casey Dunn, Felipe Zapata
Dear Alexey,

Thanks for your offer to look at this. It will take a few days for me to pull together the examples that show the 10X speed-up for ExaML (I'm out of the office right now). RAxML was running correctly (utilizing all cores on the node). However, now that I'm thinking about it, I'm wondering if we are using the AVX version of ExaML, but only the SSE3 version of RAxML. Maybe that could explain the performance difference?

Best
Mark



--
You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/pTxSBrcxXKE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Alexandros Stamatakis

unread,
Aug 15, 2016, 2:49:33 AM8/15/16
to ra...@googlegroups.com
Hi Mark,

SSE3 versus AVX would bot explain a 10x speedup, could you let us know
how large the alignments (#taxa, #sites, #site patterns) are and how
exactly you are invoking the two codes?

Alexis

On 14.08.2016 15:09, Mark Howison wrote:
> Dear Alexey,
>
> Thanks for your offer to look at this. It will take a few days for me to
> pull together the examples that show the 10X speed-up for ExaML (I'm out
> of the office right now). RAxML was running correctly (utilizing all
> cores on the node). However, now that I'm thinking about it, I'm
> wondering if we are using the AVX version of ExaML, but only the SSE3
> version of RAxML. Maybe that could explain the performance difference?
>
> Best
> Mark
>
> On Thu, Aug 11, 2016 at 11:56 AM, Alexey Kozlov <alexei...@gmail.com
> <mailto:alexei...@gmail.com>> wrote:
>
> Hello Mark,
>
> We have started using ExaML in place of single-threaded RAxML in
> the genetree stage of our phylogenomic pipeline Agalma
> (https://bitbucket.org/caseywdunn/agalma
> <https://bitbucket.org/caseywdunn/agalma>). We found that in
> <https://groups.google.com/d/topic/raxml/pTxSBrcxXKE/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to
> raxml+un...@googlegroups.com
> <mailto:raxml%2Bunsu...@googlegroups.com>.
>
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org
Reply all
Reply to author
Forward
0 new messages