RAxML NG issues - 'MSA::remove_sites error' && runtime v0.2 vs. v0.4

moos

unread,

Jul 17, 2017, 10:32:42 AM7/17/17

to raxml

Dear all,

First of all, big thank you for your continuous efforts to improve RAxML! I am currently exploring the efficiency of RAxML NG and I have a few queries which hopefully can be easily resolved.

My dataset consists of 700 alignments, each ~10K bp and up to 46 species. I aim to infer ML gene trees for each of these 10K windows and initially used RAxML NG v0.2 (MPI version) on our local supercomputer (raxml-ng-mpi --msa path/to/aln --model GTR+G --prefix path/to/out). Overall this worked very well, except for ~5% of the alignments where a 'bestTree' is not generated, but instead the following error is called:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

raxml-ng-mpi: /sw/apps/bioinfo/RAxML-NG/0.2.0b-mpi/src/src/MSA.cpp:264: void MSA::remove_sites(const std::vector<long unsigned int>&): Assertion `pos == new_length' failed.

[m84:20589] *** Process received signal ***

[m84:20589] Signal: Aborted (6)

[m84:20589] Signal code: (-6)

[m84:20589] [ 0] /lib64/libpthread.so.0[0x300140f7e0]

[m84:20589] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x3000832495]

[m84:20589] [ 2] /lib64/libc.so.6(abort+0x175)[0x3000833c75]

[m84:20589] [ 3] /lib64/libc.so.6[0x300082b60e]

[m84:20589] [ 4] /lib64/libc.so.6(__assert_perror_fail+0x0)[0x300082b6d0]

[m84:20589] [ 5] raxml-ng-mpi(_ZN3MSA12remove_sitesERKSt6vectorImSaImEE+0x230)[0x4bf288]

[m84:20589] [ 6] raxml-ng-mpi(_Z9check_msaR13RaxmlInstance+0x512)[0x49c7b3]

[m84:20589] [ 7] raxml-ng-mpi(_Z8load_msaR13RaxmlInstance+0x2ca)[0x49d245]

[m84:20589] [ 8] raxml-ng-mpi(_Z11master_mainR13RaxmlInstanceR17CheckpointManager+0x41)[0x49ffa3]

[m84:20589] [ 9] raxml-ng-mpi(main+0x33a)[0x4a0510]

[m84:20589] [10] /lib64/libc.so.6(__libc_start_main+0xfd)[0x300081ed1d]

[m84:20589] [11] raxml-ng-mpi[0x48bfa9]

[m84:20589] *** End of error message ***

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I visually inspected the 35 alignments for which this error is generated, but on first glance I couldn't identify any obvious differences between these specific alignments and alignments for which a ML gene tree was inferred. I then downloaded the most recent precompiled version of RAxML NG (v0.4; Linux) and ran the same alignments on both my local workstation and the cluster.

The above error was no longer generated for the same alignments and a ML genetree successfully inferred, however, the runtimes increased dramatically (for both the alignments that ran successfully and unsuccessfully using v0.2). Whereas version 0.2 inferred the ML genetrees in approximately 40-60 seconds, version 0.4 took well over an hour to infer the same gene trees. To my surprise, even running the old MPI version (i.e. 0.2) in sequential mode, was still much faster than running the precompiled LINUX version (i.e. 0.4) using 12 threads. In example:

MPI version v0.2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Analysis options:

run mode: ML tree search

start tree(s): random

random seed: 1498862490

tip-inner: ON

pattern compression: ON

fast spr radius: AUTO

spr subtree cutoff: 1.000000

branch lengths: ML estimate (linked)

SIMD kernels: AVX

parallelization: NONE/sequential

......

[00:00:00] Generating random starting tree(s) with 42 taxa

[00:00:00] Data distribution: partitions/thread: 1-1, patterns/thread: 2166-2166

Starting ML tree search with 1 distinct starting trees

[00:00:00 -56939.015638] Initial branch length optimization

[00:00:00 -40390.007126] Model parameter optimization (eps = 10.000000)

[00:00:03 -35608.144783] AUTODETECT spr round 1 (radius: 5)

[00:00:04 -28729.196633] AUTODETECT spr round 2 (radius: 10)

[00:00:06 -27848.241373] AUTODETECT spr round 3 (radius: 15)

[00:00:06 -27848.239491] SPR radius for FAST iterations: 10 (autodetect)

[00:00:06 -27848.239491] Model parameter optimization (eps = 3.000000)

[00:00:08 -27824.795209] FAST spr round 1 (radius: 10)

[00:00:10 -26865.434938] FAST spr round 2 (radius: 10)

[00:00:12 -26850.920949] FAST spr round 3 (radius: 10)

[00:00:14 -26850.833057] Model parameter optimization (eps = 1.000000)

[00:00:14 -26850.368612] SLOW spr round 1 (radius: 5)

[00:00:20 -26849.122231] SLOW spr round 2 (radius: 5)

[00:00:25 -26849.122230] SLOW spr round 3 (radius: 10)

[00:00:30 -26849.122230] SLOW spr round 4 (radius: 15)

[00:00:32 -26849.122230] SLOW spr round 5 (radius: 20)

[00:00:32 -26849.122230] SLOW spr round 6 (radius: 25)

[00:00:33 -26849.122230] Model parameter optimization (eps = 0.100000)

[00:00:33] ML tree search #1, logLikelihood: -26849.108258

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

vs.

Pre-compiled Linux version v0.4
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Analysis options:

run mode: ML tree search

start tree(s): random

random seed: 1500285992

tip-inner: ON

pattern compression: ON

fast spr radius: AUTO

spr subtree cutoff: 1.000000

branch lengths: ML estimate (linked)

SIMD kernels: AVX2

parallelization: PTHREADS (12 threads)

......

[00:00:00] Generating random starting tree(s) with 41 taxa

[00:00:00] Data distribution: partitions/thread: 1-1, patterns/thread: 235-236

Starting ML tree search with 1 distinct starting trees

[00:00:50 -58409.934083] Initial branch length optimization

[00:00:50 -41679.274181] Model parameter optimization (eps = 10.000000)

[00:08:34 -36834.857750] AUTODETECT spr round 1 (radius: 5)

[00:38:30 -32027.542415] AUTODETECT spr round 2 (radius: 10)

[00:38:33 -28594.635624] AUTODETECT spr round 3 (radius: 15)

[00:38:33 -28594.634607] SPR radius for FAST iterations: 10 (autodetect)

[00:38:33 -28594.634607] Model parameter optimization (eps = 3.000000)

[00:38:37 -28575.091169] FAST spr round 1 (radius: 10)

[00:38:38 -27888.093417] FAST spr round 2 (radius: 10)

[00:58:30 -27671.034789] FAST spr round 3 (radius: 10)

[01:08:32 -27671.034788] Model parameter optimization (eps = 1.000000)

[01:10:32 -27668.123127] SLOW spr round 1 (radius: 5)

[01:38:31 -27668.122992] SLOW spr round 2 (radius: 10)

[01:48:32 -27668.122992] SLOW spr round 3 (radius: 15)

[02:08:34 -27668.122992] SLOW spr round 4 (radius: 20)

[02:08:34 -27668.122992] SLOW spr round 5 (radius: 25)

[02:08:34 -27668.122992] Model parameter optimization (eps = 0.100000)

[02:10:30] ML tree search #1, logLikelihood: -27668.121497

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Given that this is just an exploratory dataset, my final dataset will be much larger in terms of alignment number, I would kindly like to ask you whether you could provide some pointers A) how to best deal with the initial 'MSA::remove_sites' error if using the older, but faster, version and/or B) what might cause the increase in computation time between both analyses?

Please let me know if any further information is required and once more, many thanks for your help; it's much appreciated!!

Best,

Moos

Alexey Kozlov

unread,

Jul 17, 2017, 12:58:48 PM7/17/17

to ra...@googlegroups.com

Dear Moos,

many thanks for your feedback!

Could you please send me the alignment(s) in question such that I can test it on my side?

In general:

- in your comparison below, v0.2 log shows 42 taxa and v0.4 log - 41 taxa. are you sure that alignment is the same in
both runs?

- please always use the most recent version (v0.4 as of now) - even if the older version appears to be "faster", it is
most probably due to a bug/artifact.

- what happens if you run v0.4 with just 1 thread? in a cluster environment, configuring pthreads or hybrid mpi/pthreads
might be tricky. please consult your cluster documentation and make sure that all 12 threads are not running on the same
CPU core -- this is a common cause of a major slowdown like the one you observe.

Best,
Alexey

> *MPI version v0.2*

> *Pre-compiled Linux version v0.4*

> Given that this is just an exploratory dataset, my final dataset will be much larger in terms of alignment number, *I

> would kindly like to ask you whether you could provide some pointers A) how to best deal with the initial
> 'MSA::remove_sites' error if using the older, but faster, version and/or B) what might cause the increase in computation

> time between both analyses?*

>
> Please let me know if any further information is required and once more, many thanks for your help; it's much appreciated!!
>
> Best,
>
> Moos
>
>
>

> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

moos

unread,

Jul 24, 2017, 8:48:07 AM7/24/17

to raxml

Dear Alexey,

Many thanks for your prompt reply!

Using v0.4 (rather than v0.2) resolved most of the problem! It reduced the number of alignments that did not run successfully (i.e. MSA::Remove_sites error) to four and upon visual inspection, these alignments contained a number of gaps - insertions that represent misalignments. After I removed these alignment errors, they ran fine.

Following your advise, I ran v0.4 with a single thread rather than multiple threads (even though I previously allocated multiple cores on a single cluster node) and this greatly improved run times. It therefore seems that I have not compiled the program correctly for our cluster environment? In example, assuming I allocate 6 out of 12 cores on a cluster node, I cannot simply specify '--threads 6' to improve run times?

Many thanks for your help and development of RAxML ng, it is much appreciated!!

Best,

Moos

Alexey Kozlov

unread,

Jul 24, 2017, 12:06:16 PM7/24/17

to ra...@googlegroups.com

Dear Moos,

> Using v0.4 (rather than v0.2) resolved most of the problem! It reduced the number of alignments that did not run
successfully (i.e. MSA::Remove_sites error) to four and upon visual inspection, these alignments contained a number of
gaps - insertions that represent misalignments. After I removed these alignment errors, they ran fine.

Great! Could you please nevertheless send me those problematic alignments (before correction) to my personal e-mail?
Even with misalignments, RAxML-NG should handle them properly and/or print a sensible error message.

> Following your advise, I ran v0.4 with a single thread rather than multiple threads (even though I previously
allocated multiple cores on a single cluster node) and this greatly improved run times. It therefore seems that I have
not compiled the program correctly for our cluster environment? In example, assuming I allocate 6 out of 12 cores on a
cluster node, I cannot simply specify '--threads 6' to improve run times?

Well, as I wrote before, the might be problems with thread pinning, you might need to specify in your submission script
that you're running a pthreads/OpenMP program (this really depends on you cluster configuration). Another option would
be to compile the MPI version of RAxML-NG on your cluster.

On the other hand, if all of your alignments are that small (~40 taxa x ~2000 sites), there's not much sense in using
the parallel version anyway. So you can just allocate 1 CPU core and ran raxml-ng with "--threads 1".

> Many thanks for your help and development of RAxML ng, it is much appreciated!!

Thanks for testing!

Best,
Alexey

> <javascript:>
> > <mailto:raxml+un...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.

Reply all

Reply to author

Forward