RAxML-NG vs. 8.2.11

183 views

Skip to first unread message

Sergios-Orestis Kolokotronis

unread,

Dec 20, 2017, 6:07:11 PM12/20/17

to raxml

A comparison using a few threads on a Mac. I saw in a discussion thread a note from the developers asking for updates on comparisons, so here's one. I noticed the searches using NG were taking a bit longer than I would have expected and, given the warning on thread usage pasted below, I thought I'd look into it. Of course, this is merely an isolated case. Maybe it has to do with the phylogenetic informativeness of the dataset. Trees estimated with v8.2.11 all exhibited low support for deep and mid-level nodes.

Dataset: AA, 27 taxa, 251 positions

Gaps: 2.24 %

Invariant sites: 24.70 %

ML search with 20 MP start trees

Alignment patterns: 199

Model: LG+Γ4+F

Run in NG 0.5.1b

run mode: ML tree search

start tree(s): parsimony (20)

random seed: 1513791424

tip-inner: ON

pattern compression: ON

per-rate scalers: OFF

site repeats: OFF

fast spr radius: AUTO

spr subtree cutoff: 1.000000

branch lengths: ML estimate (linked)

SIMD kernels: SSE3

parallelization: NONE/sequential

raxmlng051b --msa myALN.fasta --model LG+G4+F --tree pars{20} --threads 1

then --threads 2 and and so on.

When I try running 4 threads in NG, it exits with a "Too few patterns per thread" error.

Data distribution: partitions/thread: 1-1, patterns/thread: 49-50

WARNING: You are using too many threads (4) for your alignment with 199 unique patterns.

NOTE: Please consider using 1 threads ('--threads 1' option) for the optimal performance.

NOTE: As a general rule-of-thumb, please assign at least 200-1000 alignment patterns per thread.ERROR: Too few patterns per thread! RAxML-NG will terminate now to avoid wasting resources.

NOTE: Please reduce the number of threads (see guidelines above).

NOTE: This check can be disabled with the '--force' option.

I forced 4 and 10 threads and it worked, yielding a time improvement.

So, how are more threads negatively impacting the overall search duration here? Please let me know if I'm missing something.

Run in v8.2.11 SSE3 Pthreads

raxml8211 -T 1 -s myALN.fasta -m PROTGAMMALGF -p12345 -N 20 -n myALN.rx8.T1

then -T 2 and so on.

	Alpha	Final Log-Lik	Time per tree search (min-max, s)	Total time (s)
NG T=1	1.336438	-3043.663069	46-59	1048.327
NG T=2	1.333151	-3043.663041	29-40	701.46
NG T=3	1.334465	-3043.663039	26-37	598.236
NG T=4	1.334411	-3043.663057	21-30	537.825
NG T=10	1.333093	-3043.663067	18-35	541.841
R8 T=1	1.335617	-3043.662875	14.005111-14.680397	293.123228
R8 T=2	1.335617	-3043.662875	13.983387-14.625870	292.992224
R8 T=3	1.335617	-3043.662875	9.93641-10.775842	207.280066
R8 T=4	1.335617	-3043.662875	7.69699-8.064719	160.570449
R8 T=10	1.335617	-3043.662875	4.371018-9.155381	169.905267

NG: RAxML-NG v0.5.1b

R8: RAxML v8.2.11 SSE3 Pthreads

Alexey Kozlov

unread,

Dec 20, 2017, 8:07:30 PM12/20/17

to ra...@googlegroups.com

Hi again,

thanks for this evaluation! Do you mind sending your alignment and output files to my e-mail?

In general, RAxML-NG could be slower than RAxML for some specific datasets and settings, although those are really rare
individual cases. I observed it just a couple of times among dozens of diverse dataset I tested. In particular, it can
happen on very small alignments like the one you've tested, since improved flexibility of NG comes at a (small) constant
cost.

With respect to your results, I have several comments:

- we put limited effort into SSE3 optimization in NG, since nowadays there are rather few machines without AVX support,
and their number will only decrease with time (sorry)

- raxml8-pthreads will always use *at least* 2 threads, even if you specify "-T 1"

- regarding the optimal number of threads: unfortunately, it is very difficult to estimate it accurately given different
data types and models, partitioning, CPU architectures, RAM latency/bandwidth etc. etc. In your case, even
oversubscribing physical CPU cores and using as few as 20 AA sites/thread doesn't seem to result in performance
degradation; however, in many other cases it does, and sometimes very badly. So I added this error/warning to prevent
the wasting of resources.

There is also another aspect of it: we might want to optimize time-to-solution (your example), or we might want to
optimize throughput/efficiency. According to your results, using 4 threads yields ~2x speedup, so parallel efficiency is
only 50%. Now imagine you want to analyze multiple datasets, or you want to run multiple trees searches/bootstraps in
parallel. Obviously, in this scenario it will be more efficient (and faster!) to use just a single thread for each search.

Hope this helps!

Thanks,
Alexey

On 21.12.2017 00:07, Sergios-Orestis Kolokotronis wrote:
> A comparison using a few threads on a Mac. I saw in a discussion thread a note from the developers asking for updates on
> comparisons, so here's one. I noticed the searches using NG were taking a bit longer than I would have expected and,
> given the warning on thread usage pasted below, I thought I'd look into it. Of course, this is merely an isolated case.
> Maybe it has to do with the phylogenetic informativeness of the dataset. Trees estimated with v8.2.11 all exhibited low
> support for deep and mid-level nodes.
>
> Dataset: AA, 27 taxa, 251 positions
> Gaps: 2.24 %
> Invariant sites: 24.70 %
> ML search with 20 MP start trees
> Alignment patterns: 199
> Model: LG+Γ4+F
>

> _Run in NG 0.5.1b_

> run mode: ML tree search
> start tree(s): parsimony (20)
> random seed: 1513791424
> tip-inner: ON
> pattern compression: ON
> per-rate scalers: OFF
> site repeats: OFF
> fast spr radius: AUTO
> spr subtree cutoff: 1.000000
> branch lengths: ML estimate (linked)
> SIMD kernels: SSE3
> parallelization: NONE/sequential
>
> |
> raxmlng051b --msa myALN.fasta --model LG+G4+F --tree pars{20}--threads 1
> |

> then --threads 2and and so on.

>
> When I try running 4 threads in NG, it exits with a "Too few patterns per thread" error.
> Data distribution: partitions/thread: 1-1, patterns/thread: 49-50
> WARNING: You are using too many threads (4) for your alignment with 199 unique patterns.
> NOTE: Please consider using 1 threads ('--threads 1' option) for the optimal performance.
> NOTE: As a general rule-of-thumb, please assign at least 200-1000 alignment patterns per thread.ERROR: Too few patterns
> per thread! RAxML-NG will terminate now to avoid wasting resources.
> NOTE: Please reduce the number of threads (see guidelines above).
> NOTE: This check can be disabled with the '--force' option.
> I forced 4 and 10 threads and it worked, yielding a time improvement.

> *So, how are more threads negatively impacting the overall search duration here? Please let me know if I'm missing
> something.*
>
> _Run in v8.2.11 SSE3 Pthreads_
> |
> raxml8211 -T 1-s myALN.fasta -m PROTGAMMALGF -p12345 -N 20-n myALN.rx8.T1

> |
> then -T 2and so on.
>
>
>

> *Alpha*
>
>
>
> *Final Log-Lik*
>
>
>
> *Time per tree search (min-max, s)*
>
>
>
> *Total time (s)*

> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages