Hello Mark,
> We have started using ExaML in place of single-threaded RAxML in the genetree stage of our phylogenomic pipeline Agalma
> (
https://bitbucket.org/caseywdunn/agalma). We found that in practice it offers about a 10X speed-up over RAxML for the
> mixture of trees we are estimating (typically on the order of 7,000 genes each with 100 bootstraps = 700,000 individual
> ML searches). Even using full ML searches for bootstraps with ExaML is faster than RAxML in fast bootstrapping mode.
That's pretty weird, since both codes use the same search algorithm. ExaML is expected to be faster for big parallel
runs thanks to its more efficient parallelization scheme. But not much so on single-threaded runs with gene trees.
Another subtle ExaML/RAxML difference I'm aware of is that ExaML doesn't perform some final refinements (model parameter
optimizations, switch from CAT to GAMMA). This might lead to some performance differences in certain cases, but still
doesn't explain 10x on average...
Could you perhaps send me the exact RAxML/ExaML command lines you were using, and maybe one example of gene alignment
where you observed 10x runtime difference?
> The problem we are running into is that we use GNU parallel to load balance the 700,000 individual calls across a set of
> compute nodes on our cluster. However, the way the MPI library interacts with multiple examl invocations on the same
> node causes performance degradation (all the invocations get pinned to a single core). I've been able to compile my own
> MPI library and circumvent this problem, but it would be more portable and maintainable if we could compile ExaML
> without any MPI support, since we are only using it with a single task anyway.
Hm, could it be that you're running into the same problem with RAxML, i.e. all threads are pinned to the same core?
> Is this an easy modification that could be made using a compiler directive, like -DDISABLE_MPI?
Not really, and I assume it wouldn't be a very useful feature in general, since ExaML was was never meant to be used in
sequential mode. So we should better try to figure out the reasons for poor RAxML performance in your case.
Cheers,
Alexey