Running RAxML-ng with GTDBTk MSA file

99 views
Skip to first unread message

Kevin Myers

unread,
Jul 19, 2021, 12:05:41 PM7/19/21
to raxml
I am trying to contract a tree with RAxML-ng using the results from GTDBTk analysis for my metagenomic samples. GTDBTk has generated a MSA file that I am using.

I started with the following command to check the MSA for errors:

raxml-ng --check --msa gtdbtk.bac120.msa.edited2.fasta --model LG+G8+F --prefix T1

Then I ran the Parse command to determine the memory requirements and thread recommendations:

raxml-ng --parse --msa gtdbtk.bac120.msa.forRAxML.fasta --model LG+G8+F --prefix T2

I then ran RAxML-ng on the RBA file created during Parse. Since I'm running on a machine with 300 GB RAM and 16 cores (Parse recommended 14 cores), I decided to increase the tree numbers:

raxml-ng --msa T2.raxml.rba --model LG+G8+F --prefix T3 --threads 14 --seed 2 --tree pars{25},rand{25}

The program has been running for over a week at this point. Here is the Log information:

RAxML-NG v. 0.9.0 released on 20.05.2019 by The Exelixis Lab.

Developed by: Alexey M. Kozlov and Alexandros Stamatakis.

Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth.

Latest version: https://github.com/amkozlov/raxml-ng

Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

RAxML-NG was called at 12-Jul-2021 10:34:23 as follows:

raxml-ng --msa T2.raxml.rba --model LG+G8+F --prefix T3 --threads 14 --seed 2 --tree pars{25},rand{25}

Analysis options:

  run mode: ML tree search

  start tree(s): random (25) + parsimony (25)

  random seed: 2

  tip-inner: OFF

  pattern compression: ON

  per-rate scalers: OFF

  site repeats: ON

  fast spr radius: AUTO

  spr subtree cutoff: 1.000000

  branch lengths: proportional (ML estimate, algorithm: NR-FAST)

  SIMD kernels: AVX2

  parallelization: PTHREADS (14 threads), thread pinning: OFF

WARNING: The model you specified on the command line (LG+G8+F) will be ignored 

         since the binary MSA file already contains a model definition.

         If you want to change the model, please re-run RAxML-NG 

         with the original PHYLIP/FASTA alignment and --redo option.

[00:00:00] Loading binary alignment from file: T2.raxml.rba

[00:00:02] Alignment comprises 23478 taxa, 1 partitions and 5040 patterns

Partition 0: noname

Model: LG+FC+G8m

Alignment sites / patterns: 5040 / 5040

Gaps: 11.56 %

Invariant sites: 0.00 %

NOTE: Per-rate scalers were automatically enabled to prevent numerical issues on taxa-rich alignments.

NOTE: You can use --force switch to skip this check and fall back to per-site scalers.

[00:00:03] Generating 25 random starting tree(s) with 23478 taxa

[00:00:04] Generating 25 parsimony starting tree(s) with 23478 taxa

It looks like there are 23,748 taxa - including the 20 text taxa and all the GTDBTk reference taxa. The MSA is amino acid sequences for the 37 protein sequences GTDBTk uses for classification, concatenated for each sample in FASTA format.

I've never used RAxML before, so I'm not sure if this time frame is expected? Any advice would be appreciated.

Thanks!

Kevin

Alexey Kozlov

unread,
Jul 23, 2021, 4:43:00 AM7/23/21
to ra...@googlegroups.com
Hi Kevin,

(sorry for late response, your mail ended up in my spam folder)

first of all, please upgrade to the latest raxml-ng 1.0.3, which contains many bugfixes/improvements
compared to v0.9.0:

https://github.com/amkozlov/raxml-ng/releases/tag/1.0.3

Second, you'd probably be better off with using raxml-ng to build a backbone tree from the reference
sequences, and then use EPA-NG to place the metagenomic samples:

https://github.com/Pbdas/epa-ng

Best,
Alexey

On 19.07.21 18:05, Kevin Myers wrote:
> I am trying to contract a tree with RAxML-ng using the results from GTDBTk analysis for my
> metagenomic samples. GTDBTk has generated a MSA file that I am using.
>
> I started with the following command to check the MSA for errors:
>
> *raxm*l-ng --check --msa gtdbtk.bac120.msa.edited2.fasta --model LG+G8+F --prefix T1
>
> Then I ran the Parse command to determine the memory requirements and thread recommendations:
>
> *raxm*l-ng --parse --msa gtdbtk.bac120.msa.forRAxML.fasta --model LG+G8+F --prefix T2
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/7bff2fb2-eb20-4c8f-bb92-a7f2066822ecn%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/7bff2fb2-eb20-4c8f-bb92-a7f2066822ecn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Kevin Myers

unread,
Aug 20, 2021, 10:23:13 AM8/20/21
to raxml
Thank you! This response also ended up in my spam folder, so apologies for my late reply.

I was able to get the newest version of RAxML-ng running smoothly to build the backbone tree as you suggested!

Thanks again for your help!

Kevin

Alexey Kozlov

unread,
Aug 25, 2021, 8:15:55 AM8/25/21
to ra...@googlegroups.com
perfect, you are welcome!

Alexey

On 20.08.21 16:23, Kevin Myers wrote:
> Thank you! This response also ended up in my spam folder, so apologies for my late reply.
>
> I was able to get the newest version of RAxML-ng running smoothly to build the backbone tree as you
> suggested!
>
> Thanks again for your help!
>
> Kevin
>
> On Friday, July 23, 2021 at 3:43:00 AM UTC-5 alexei...@gmail.com wrote:
>
> Hi Kevin,
>
> (sorry for late response, your mail ended up in my spam folder)
>
> first of all, please upgrade to the latest raxml-ng 1.0.3, which contains many
> bugfixes/improvements
> compared to v0.9.0:
>
> https://github.com/amkozlov/raxml-ng/releases/tag/1.0.3
> <https://github.com/amkozlov/raxml-ng/releases/tag/1.0.3>
>
> Second, you'd probably be better off with using raxml-ng to build a backbone tree from the
> reference
> sequences, and then use EPA-NG to place the metagenomic samples:
>
> https://github.com/Pbdas/epa-ng <https://github.com/Pbdas/epa-ng>
>
> Best,
> Alexey
>
> On 19.07.21 18:05, Kevin Myers wrote:
> > I am trying to contract a tree with RAxML-ng using the results from GTDBTk analysis for my
> > metagenomic samples. GTDBTk has generated a MSA file that I am using.
> >
> > I started with the following command to check the MSA for errors:
> >
> > *raxm*l-ng --check --msa gtdbtk.bac120.msa.edited2.fasta --model LG+G8+F --prefix T1
> >
> > Then I ran the Parse command to determine the memory requirements and thread recommendations:
> >
> > *raxm*l-ng --parse --msa gtdbtk.bac120.msa.forRAxML.fasta --model LG+G8+F --prefix T2
> >
> > I then ran RAxML-ng on the RBA file created during Parse. Since I'm running on a machine with
> 300 GB
> > RAM and 16 cores (Parse recommended 14 cores), I decided to increase the tree numbers:
> >
> > raxml-ng --msa T2.raxml.rba --model LG+G8+F --prefix T3 --threads 14 --seed 2 --tree
> pars{25},rand{25}
> >
> > The program has been running for over a week at this point. Here is the Log information:
> >
> > RAxML-NG v. 0.9.0 released on 20.05.2019 by The Exelixis Lab.
> >
> > Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
> >
> > Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth.
> >
> > Latest version: https://github.com/amkozlov/raxml-ng <https://github.com/amkozlov/raxml-ng>
> <https://groups.google.com/d/msgid/raxml/7bff2fb2-eb20-4c8f-bb92-a7f2066822ecn%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/raxml/7bff2fb2-eb20-4c8f-bb92-a7f2066822ecn%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/d464d1e9-910d-4657-9bbf-505a6a0a6c60n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/d464d1e9-910d-4657-9bbf-505a6a0a6c60n%40googlegroups.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages