RAxML-NG vs. 8.2.11

183 views
Skip to first unread message

Sergios-Orestis Kolokotronis

unread,
Dec 20, 2017, 6:07:11 PM12/20/17
to raxml
A comparison using a few threads on a Mac. I saw in a discussion thread a note from the developers asking for updates on comparisons, so here's one. I noticed the searches using NG were taking a bit longer than I would have expected and, given the warning on thread usage pasted below, I thought I'd look into it. Of course, this is merely an isolated case. Maybe it has to do with the phylogenetic informativeness of the dataset. Trees estimated with v8.2.11 all exhibited low support for deep and mid-level nodes.

Dataset: AA, 27 taxa, 251 positions
Gaps: 2.24 %
Invariant sites: 24.70 %
ML search with 20 MP start trees
Alignment patterns: 199
Model: LG+Γ4+F

Run in NG 0.5.1b
  run mode: ML tree search
  start tree(s): parsimony (20)
  random seed: 1513791424
  tip-inner: ON
  pattern compression: ON
  per-rate scalers: OFF
  site repeats: OFF
  fast spr radius: AUTO
  spr subtree cutoff: 1.000000
  branch lengths: ML estimate (linked)
  SIMD kernels: SSE3
  parallelization: NONE/sequential

raxmlng051b --msa myALN.fasta --model LG+G4+F --tree pars{20} --threads 1
then --threads 2 and and so on.

When I try running 4 threads in NG, it exits with a "Too few patterns per thread" error.
Data distribution: partitions/thread: 1-1, patterns/thread: 49-50
WARNING: You are using too many threads (4) for your alignment with 199 unique patterns.
NOTE: Please consider using 1 threads ('--threads 1' option) for the optimal performance.
NOTE: As a general rule-of-thumb, please assign at least 200-1000 alignment patterns per thread.ERROR: Too few patterns per thread! RAxML-NG will terminate now to avoid wasting resources.
NOTE: Please reduce the number of threads (see guidelines above).
NOTE: This check can be disabled with the '--force' option.
I forced 4 and 10 threads and it worked, yielding a time improvement.
So, how are more threads negatively impacting the overall search duration here? Please let me know if I'm missing something.

Run in v8.2.11 SSE3 Pthreads
raxml8211 -T 1 -s myALN.fasta -m PROTGAMMALGF -p12345 -N 20 -n myALN.rx8.T1
then -T 2 and so on.

 

Alpha

Final Log-Lik

Time per tree search (min-max, s)

Total time (s)

NG T=1

1.336438

-3043.663069

46-59

1048.327

NG T=2

1.333151

-3043.663041

29-40

701.46

NG T=3

1.334465

-3043.663039

26-37

598.236

NG T=4

1.334411

-3043.663057

21-30

537.825

NG T=10

1.333093

-3043.663067

18-35

541.841

R8 T=1

1.335617

-3043.662875

14.005111-14.680397

293.123228

R8 T=2

1.335617

-3043.662875

13.983387-14.625870

292.992224

R8 T=3

1.335617

-3043.662875

9.93641-10.775842

207.280066

R8 T=4

1.335617

-3043.662875

7.69699-8.064719

160.570449

R8 T=10

1.335617

-3043.662875

4.371018-9.155381

169.905267


NG: RAxML-NG v0.5.1b
R8: RAxML v8.2.11 SSE3 Pthreads

Alexey Kozlov

unread,
Dec 20, 2017, 8:07:30 PM12/20/17
to ra...@googlegroups.com
Hi again,

thanks for this evaluation! Do you mind sending your alignment and output files to my e-mail?

In general, RAxML-NG could be slower than RAxML for some specific datasets and settings, although those are really rare
individual cases. I observed it just a couple of times among dozens of diverse dataset I tested. In particular, it can
happen on very small alignments like the one you've tested, since improved flexibility of NG comes at a (small) constant
cost.

With respect to your results, I have several comments:

- we put limited effort into SSE3 optimization in NG, since nowadays there are rather few machines without AVX support,
and their number will only decrease with time (sorry)

- raxml8-pthreads will always use *at least* 2 threads, even if you specify "-T 1"

- regarding the optimal number of threads: unfortunately, it is very difficult to estimate it accurately given different
data types and models, partitioning, CPU architectures, RAM latency/bandwidth etc. etc. In your case, even
oversubscribing physical CPU cores and using as few as 20 AA sites/thread doesn't seem to result in performance
degradation; however, in many other cases it does, and sometimes very badly. So I added this error/warning to prevent
the wasting of resources.

There is also another aspect of it: we might want to optimize time-to-solution (your example), or we might want to
optimize throughput/efficiency. According to your results, using 4 threads yields ~2x speedup, so parallel efficiency is
only 50%. Now imagine you want to analyze multiple datasets, or you want to run multiple trees searches/bootstraps in
parallel. Obviously, in this scenario it will be more efficient (and faster!) to use just a single thread for each search.

Hope this helps!

Thanks,
Alexey

On 21.12.2017 00:07, Sergios-Orestis Kolokotronis wrote:
> A comparison using a few threads on a Mac. I saw in a discussion thread a note from the developers asking for updates on
> comparisons, so here's one. I noticed the searches using NG were taking a bit longer than I would have expected and,
> given the warning on thread usage pasted below, I thought I'd look into it. Of course, this is merely an isolated case.
> Maybe it has to do with the phylogenetic informativeness of the dataset. Trees estimated with v8.2.11 all exhibited low
> support for deep and mid-level nodes.
>
> Dataset: AA, 27 taxa, 251 positions
> Gaps: 2.24 %
> Invariant sites: 24.70 %
> ML search with 20 MP start trees
> Alignment patterns: 199
> Model: LG+Γ4+F
>
> _Run in NG 0.5.1b_
>   run mode: ML tree search
>   start tree(s): parsimony (20)
>   random seed: 1513791424
>   tip-inner: ON
>   pattern compression: ON
>   per-rate scalers: OFF
>   site repeats: OFF
>   fast spr radius: AUTO
>   spr subtree cutoff: 1.000000
>   branch lengths: ML estimate (linked)
>   SIMD kernels: SSE3
>   parallelization: NONE/sequential
>
> |
> raxmlng051b --msa myALN.fasta --model LG+G4+F --tree pars{20}--threads 1
> |
> then --threads 2and and so on.
>
> When I try running 4 threads in NG, it exits with a "Too few patterns per thread" error.
> Data distribution: partitions/thread: 1-1, patterns/thread: 49-50
> WARNING: You are using too many threads (4) for your alignment with 199 unique patterns.
> NOTE: Please consider using 1 threads ('--threads 1' option) for the optimal performance.
> NOTE: As a general rule-of-thumb, please assign at least 200-1000 alignment patterns per thread.ERROR: Too few patterns
> per thread! RAxML-NG will terminate now to avoid wasting resources.
> NOTE: Please reduce the number of threads (see guidelines above).
> NOTE: This check can be disabled with the '--force' option.
> I forced 4 and 10 threads and it worked, yielding a time improvement.
> *So, how are more threads negatively impacting the overall search duration here? Please let me know if I'm missing
> something.*
>
> _Run in v8.2.11 SSE3 Pthreads_
> |
> raxml8211 -T 1-s myALN.fasta -m PROTGAMMALGF -p12345 -N 20-n myALN.rx8.T1
> |
> then -T 2and so on.
>
>
>
> *Alpha*
>
>
>
> *Final Log-Lik*
>
>
>
> *Time per tree search (min-max, s)*
>
>
>
> *Total time (s)*
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages