GTRCAT in RAxML-NG

610 views
Skip to first unread message

Jeff Bowman

unread,
Jul 1, 2020, 11:53:26 AM7/1/20
to raxml

I'm modifying some old RAxML wrapper scripts to raxml-ng.  Previously I was building large (~8000 taxa) trees as:
raxmlHPC-PTHREADS-AVX2 -T 6 -m GTRGAMMA -s test.fasta -n ref.tre -f d -p 12345

The input alignment has grown to large for GTRGAMMA (nearing 10000 taxa), and raxml-ng is still running after 60 hours (using coarse-grained parallelization for multiple starting trees) when run with --search --treepars{1} --threads 6.  The parse command was used to initialize with GTR+G.  As previous runs using the GTRGAMMA model under RAxML v8 were much faster I suspect that I've reached the limit and need to switch to GTRCAT.  Searching the forum the last reference to GTRCAT on raxml-ng seems to be quite some time ago and indicates it was not available.  I've been going back and forth between manuals to try and understand if one of the model options now available in raxml-ng is equivalent to GTR-CAT, but it isn't clear to me.  Any suggestions would be much appreciated.  As an additional constraint the tree will be used with epa-ng.

As second question related to GTR-CAT regards selecting the best tree.  I understand from the v8 manual that likelihood values can't be used for this, so I hope to use AIC or BIC.  Is there a preferred approach?

Alexey Kozlov

unread,
Jul 2, 2020, 7:41:30 AM7/2/20
to ra...@googlegroups.com
Hi Jeff,

GTRCAT is not yet implemented in RAxML-NG.

Going from 8k to 10k sequences should not result in a major slowdown though.

Would you mind sharing your raxml-ng log file?

Best,
Alexey

On 01.07.20 17:53, Jeff Bowman wrote:
>
> I'm modifying some old RAxML wrapper scripts to raxml-ng.  Previously I was building large (~8000
> taxa) trees as:
> raxmlHPC-PTHREADS-AVX2 -T 6-m GTRGAMMA -s test.fasta -n ref.tre -f d -p 12345
>
> The input alignment has grown to large for GTRGAMMA (nearing 10000 taxa), and raxml-ng is still
> running after 60 hours (using coarse-grained parallelization for multiple starting trees) when run
> with --search --treepars{1} --threads 6.  The parse command was used to initialize with GTR+G.  As
> previous runs using the GTRGAMMA model under RAxML v8 were much faster I suspect that I've reached
> the limit and need to switch to GTRCAT.  Searching the forum the last reference to GTRCAT on
> raxml-ng seems to be quite some time ago and indicates it was not available.  I've been going back
> and forth between manuals to try and understand if one of the model options now available in
> raxml-ng is equivalent to GTR-CAT, but it isn't clear to me.  Any suggestions would be much
> appreciated.  As an additional constraint the tree will be used with epa-ng.
>
> As second question related to GTR-CAT regards selecting the best tree.  I understand from the v8
> manual that likelihood values can't be used for this, so I hope to use AIC or BIC.  Is there a
> preferred approach?
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/b78f931d-4f80-42f4-9ca7-54a73088b40an%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/b78f931d-4f80-42f4-9ca7-54a73088b40an%40googlegroups.com?utm_medium=email&utm_source=footer>.

Jeff Bowman

unread,
Jul 2, 2020, 5:27:50 PM7/2/20
to raxml
Thanks Alexey, I unfortunately deleted all the original files while conducting some further experimentation, but hopefully the output of a new in-process run will be helpful.  One correction to my previous post, I was using 2 threads not 6.  As I'm building 24 trees concurrently, and have 36 physical cores available, for the current run I'm using just a single thread per tree.  I've attached two files here.  Once is for random tree start and one for parsimony tree start.  The random tree start in particular is going very slow but perhaps that's as expected for a single thread.  If I eliminate the coarse-grained parallelization and build a single tree using 6 cores, however, it still takes surprisingly long compared to -f d on raxmlHPC-PTHREADS-AVX2 (in fact I have not see this complete, I gave up after ~60 hours).  I can reproduce that effort as well if it would be helpful.  Any guidance you can provide would be much appreciated!

Jeff
test6.raxml.log
test19.raxml.log

Alexey Kozlov

unread,
Jul 3, 2020, 12:38:27 PM7/3/20
to ra...@googlegroups.com
Hi Jeff,

ok so let's compare apples to apples :)

Do you observe that

raxml-ng --model GTR+G --search --tree parse{1} --threads 6

is much slower than:

raxmlHPC-PTHREADS-AVX2 -T 6 -m GTRGAMMA -f d

If yes, this is something worth investigating, and I would like to see your inputs and outputs.

Otherwise, it is expected that using 1 thread and/or random starting tree will make tree search slower.

Also please note that you have a typical alignment shape (many taxa, few sites, lots of gaps and
invariants) on which standard ML inference tools will struggle to find the best tree, due to low
phylogenetic signal and, as a result, something we call "rough likelihood surface". We discussed
this topic recently in this group in the context of Coronavirus analyses.

Best,
Alexey
> https://groups.google.com/d/msgid/raxml/a4595d1a-65fb-4db1-a562-109481712d42n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/a4595d1a-65fb-4db1-a562-109481712d42n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Jeff Bowman

unread,
Jul 6, 2020, 11:18:53 PM7/6/20
to raxml
Alexey,
Okay, so I re-ran both for a fair comparison:

raxml-ng --redo --search --msa test.raxml.rba --tree pars{1} --prefix raxml-ng --seed 1 --threads 6

vs.

raxmlHPC-PTHREADS-AVX2 -T 6 -m GTRGAMMA -s test.fasta -n classic.ref.tre -f d -p 12345

Classic raxml finished in about 8 hours, while raxml-ng finished in about 30 hours.  Let me know if I've missed something, and happy to provide the analysis files if they can help diagnose (too large for email).  Thanks also for the thoughts on insufficient phylogenetic signal.  This is definitely a concern and I'm way overdue on implementing a concatenated alignment, which I hope will solve or at least minimize that particular issue.

Cheers,
Jeff
raxml-ng.raxml.log

Alexey Kozlov

unread,
Jul 7, 2020, 6:54:30 PM7/7/20
to ra...@googlegroups.com
Hi Jeff,

that's interesting, what about final likelihood values, did raxml-ng at least found a better-scoring
tree?

And yes, I'd need RAxML8 log file and your input files for further investigation.

Best,
Alexey
> <https://groups.google.com/d/msgid/raxml/b78f931d-4f80-42f4-9ca7-54a73088b40an%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/raxml/b78f931d-4f80-42f4-9ca7-54a73088b40an%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups "raxml" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to
> > ra...@googlegroups.com <javascript:> <mailto:ra...@googlegroups.com <javascript:>>.
> <https://groups.google.com/d/msgid/raxml/a4595d1a-65fb-4db1-a562-109481712d42n%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/raxml/a4595d1a-65fb-4db1-a562-109481712d42n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/b87c70a1-7442-4a9f-9463-efc7794e57edo%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/b87c70a1-7442-4a9f-9463-efc7794e57edo%40googlegroups.com?utm_medium=email&utm_source=footer>.

Jeff Bowman

unread,
Jul 7, 2020, 7:19:25 PM7/7/20
to raxml
Alex,
Yes, it did return a higher maximum likelihood score (-1069878.284601 for classic vs. -1067948.995690 for raxml-ng).  I've attached the RAxML info file, and also the alignment as a gzipped fasta file.  Many thanks for taking a look.

Jeff

Jeff Bowman

unread,
Jul 7, 2020, 7:20:21 PM7/7/20
to raxml
Neglected to attach...
test.align.fasta.gz
raxml-ng.raxml.log

Wayne Pfeiffer

unread,
Jul 7, 2020, 9:24:33 PM7/7/20
to ra...@googlegroups.com
Hi Jeff,

The default treatment of the stationary frequencies is different in RAxML-NG from standard RAxML. To compare the likelihood scores, you need to use the same treatment.

  empirical GTRGAMMA (default) GTR+F+G
  ML estimate GTRGAMMAX GTR+G (default)

Since standard RAxML is faster, I suggest that you rerun it with GTRGAMMAX.

Best regards,  Wayne

To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raxml/83516b52-e7d5-4c85-8d4b-5f35823edbdeo%40googlegroups.com.
<test.align.fasta.gz><raxml-ng.raxml.log>

Jeff Bowman

unread,
Jul 8, 2020, 9:32:41 AM7/8/20
to raxml
Okay, I reran with RAxML v8 using GTRGAMMAX.  RAxML-ng still wins (ML -1068519.089387 for the new run).  Any idea why RAxML-ng is 3x slower?

Thanks!
Jeff
Hi Jeff,

To unsubscribe from this group and stop receiving emails from it, send an email to ra...@googlegroups.com.

Alexey Kozlov

unread,
Jul 8, 2020, 9:54:47 AM7/8/20
to ra...@googlegroups.com
@Wayne: thanks for pointing out the difference in default models!

> Okay, I reran with RAxML v8 using GTRGAMMAX.  RAxML-ng still wins (ML -1068519.089387 for the new
> run).  Any idea why RAxML-ng is 3x slower?

Because raxml-ng search heuristic has a couple of small modifications which make it a bit more
thorough. Typically, the difference in runtime is not that huge, oftentimes it is even compensated
by technical performance improvements. Here we have an extreme case, where raxml-ng apparently
performed many more SPR rounds (can be confirmed by checking RAxML_log.<NAME> file), but also found
a much better-scoring tree (570 logLH points is quite a diff). Btw, from the log file you can see
that after ~5h raxml-ng reached the likelihood which is better than the final likelihood of RAxML v8
tree.

(technically, it seems like RAxML v8 has chosen a smaller FAST SPR radius here, you can check
RAxML_info file to confirm)

Best,
Alexey



> On Tuesday, July 7, 2020 at 6:24:33 PM UTC-7, Wayne Pfeiffer wrote:
>
> Hi Jeff,
>
> The default treatment of the stationary frequencies is different in RAxML-NG from standard
> RAxML. To compare the likelihood scores, you need to use the same treatment.
>
>   empiricalGTRGAMMA (default)GTR+F+G
>   ML estimateGTRGAMMAXGTR+G (default)
>
> Since standard RAxML is faster, I suggest that you rerun it with GTRGAMMAX.
>
> Best regards,  Wayne
>
>> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/raxml/b78f931d-4f80-42f4-9ca7-54a73088b40an*40googlegroups.com__;JQ!!Mih3wA!UTyqYJ843IPfKf8CGKNQnLgn0BdC-aGMc516XieP6WrV494sTw3S-4fc_AsFXpsT$>
>> >
>> <https://groups.google.com/d/msgid/raxml/b78f931d-4f80-42f4-9ca7-54a73088b40an%40googlegroups.com
>> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/raxml/b78f931d-4f80-42f4-9ca7-54a73088b40an*40googlegroups.com__;JQ!!Mih3wA!UTyqYJ843IPfKf8CGKNQnLgn0BdC-aGMc516XieP6WrV494sTw3S-4fc_AsFXpsT$>>
>> >      >      >
>> >      >
>> >
>> <https://groups.google.com/d/msgid/raxml/b78f931d-4f80-42f4-9ca7-54a73088b40an%40googlegroups.com?utm_medium=email&utm_source=footer
>> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/raxml/b78f931d-4f80-42f4-9ca7-54a73088b40an*40googlegroups.com?utm_medium=email&utm_source=footer__;JQ!!Mih3wA!UTyqYJ843IPfKf8CGKNQnLgn0BdC-aGMc516XieP6WrV494sTw3S-4fc_KzubNlt$>
>> >
>> <https://groups.google.com/d/msgid/raxml/b78f931d-4f80-42f4-9ca7-54a73088b40an%40googlegroups.com?utm_medium=email&utm_source=footer
>> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/raxml/b78f931d-4f80-42f4-9ca7-54a73088b40an*40googlegroups.com?utm_medium=email&utm_source=footer__;JQ!!Mih3wA!UTyqYJ843IPfKf8CGKNQnLgn0BdC-aGMc516XieP6WrV494sTw3S-4fc_KzubNlt$>>>.
>> >
>> >      >
>> >      >
>> >      > --
>> >      > You received this message because you are subscribed to the Google Groups "raxml"
>> group.
>> >      > To unsubscribe from this group and stop receiving emails from it, send an email to
>> >      >ra...@googlegroups.com <http://googlegroups.com/><javascript:>
>> <mailto:ra...@googlegroups.com <http://googlegroups.com/><javascript:>>.
>> >      > To view this discussion on the web visit
>> >      >
>> >
>> https://groups.google.com/d/msgid/raxml/a4595d1a-65fb-4db1-a562-109481712d42n%40googlegroups.com
>> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/raxml/a4595d1a-65fb-4db1-a562-109481712d42n*40googlegroups.com__;JQ!!Mih3wA!UTyqYJ843IPfKf8CGKNQnLgn0BdC-aGMc516XieP6WrV494sTw3S-4fc_G4KhBad$>
>> >
>> <https://groups.google.com/d/msgid/raxml/a4595d1a-65fb-4db1-a562-109481712d42n%40googlegroups.com
>> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/raxml/a4595d1a-65fb-4db1-a562-109481712d42n*40googlegroups.com__;JQ!!Mih3wA!UTyqYJ843IPfKf8CGKNQnLgn0BdC-aGMc516XieP6WrV494sTw3S-4fc_G4KhBad$>>
>> >      >
>> >
>> <https://groups.google.com/d/msgid/raxml/a4595d1a-65fb-4db1-a562-109481712d42n%40googlegroups.com?utm_medium=email&utm_source=footer
>> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/raxml/a4595d1a-65fb-4db1-a562-109481712d42n*40googlegroups.com?utm_medium=email&utm_source=footer__;JQ!!Mih3wA!UTyqYJ843IPfKf8CGKNQnLgn0BdC-aGMc516XieP6WrV494sTw3S-4fc_BaJEdBc$>
>> >
>> <https://groups.google.com/d/msgid/raxml/a4595d1a-65fb-4db1-a562-109481712d42n%40googlegroups.com?utm_medium=email&utm_source=footer
>> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/raxml/a4595d1a-65fb-4db1-a562-109481712d42n*40googlegroups.com?utm_medium=email&utm_source=footer__;JQ!!Mih3wA!UTyqYJ843IPfKf8CGKNQnLgn0BdC-aGMc516XieP6WrV494sTw3S-4fc_BaJEdBc$>>>.
>> >
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups "raxml" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an email to
>> >ra...@googlegroups.com <http://googlegroups.com/><mailto:ra...@googlegroups.com
>> <http://googlegroups.com/>>.
>> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/raxml/b87c70a1-7442-4a9f-9463-efc7794e57edo*40googlegroups.com__;JQ!!Mih3wA!UTyqYJ843IPfKf8CGKNQnLgn0BdC-aGMc516XieP6WrV494sTw3S-4fc_JNzzk4e$>
>> >
>> <https://groups.google.com/d/msgid/raxml/b87c70a1-7442-4a9f-9463-efc7794e57edo%40googlegroups.com?utm_medium=email&utm_source=footer
>> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/raxml/b87c70a1-7442-4a9f-9463-efc7794e57edo*40googlegroups.com?utm_medium=email&utm_source=footer__;JQ!!Mih3wA!UTyqYJ843IPfKf8CGKNQnLgn0BdC-aGMc516XieP6WrV494sTw3S-4fc_A3A8Ndp$>>.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups "raxml" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email
>> tora...@googlegroups.com <javascript:>.
>> To view this discussion on the web
>> visithttps://groups.google.com/d/msgid/raxml/83516b52-e7d5-4c85-8d4b-5f35823edbdeo%40googlegroups.com
>> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/raxml/83516b52-e7d5-4c85-8d4b-5f35823edbdeo*40googlegroups.com?utm_medium=email&utm_source=footer__;JQ!!Mih3wA!UTyqYJ843IPfKf8CGKNQnLgn0BdC-aGMc516XieP6WrV494sTw3S-4fc_OVBpsGX$>.
>> <test.align.fasta.gz><raxml-ng.raxml.log>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/d0c84b54-9f2e-492e-bc84-47bdb9f5558eo%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/d0c84b54-9f2e-492e-bc84-47bdb9f5558eo%40googlegroups.com?utm_medium=email&utm_source=footer>.

Jeff Bowman

unread,
Jul 8, 2020, 2:03:08 PM7/8/20
to raxml
Thanks, that makes sense!

Cheers,
Jeff
Reply all
Reply to author
Forward
0 new messages