raxml - bootrapping taking too long

30 views
Skip to first unread message

Brintha V P cs18d017

unread,
Sep 25, 2024, 10:57:42 AMSep 25
to raxml
Hi,

I am using the below command to generate the phylogenetic tree.

/raxml-ng --all --model GTR+G --msa align.fasta --prefix result


It is running for more than 24 hours (as the number of sequences are around 6402 (have duplicates as well)). The ML based generation is completed and now currently in the process of generating bootstrap replicates. But after the generation of 50th tree, it is stuck for more than 5 hours. Is it kinda expected or should I restart the running? Any thoughts would be helpful as I am using raxml for the first time. The goal is to generate the best tree, which would be used as input for picrust2 tool.

Thanks !!


Oleksiy Kozlov

unread,
Sep 25, 2024, 12:10:04 PMSep 25
to ra...@googlegroups.com
No, this is not normal. Are you using the latest version of raxml-ng?

However, the first question is to ask is whether picrust2 can use (bootstrap) branch support values?

If not, you should rather re-run with "--search" instead of "--all", and it will be much faster.

Best,
Oleksiy
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/d906afe2-abe0-4b51-8921-c00b263295aan%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/d906afe2-abe0-4b51-8921-c00b263295aan%40googlegroups.com?utm_medium=email&utm_source=footer>.

Brintha V P cs18d017

unread,
Sep 25, 2024, 12:23:54 PMSep 25
to raxml
Hi,

I am using the latest version v1.2.2.

Thank you for the suggestion on using the --search option. I hope Picrust2 need not require the branch support values.

Regards,
Brintha

*

unread,
Sep 27, 2024, 2:26:52 AMSep 27
to ra...@googlegroups.com
Hi,

I am afraid I am not the one you should ask. I am also using this tool for the first/second time. In addition, I have never tried any sequence data as input, so I doubt my advice would be helpful. Nevertheless, 6402 sequences are a hell of a number! I've only tried sequence-like data (01.. or 0102... etc, i.e. binary or coded SSR genotypes) for 10-20 individuals. When I used thorough bootstrap analysis, the calculations were also pretty slow. In the end, I used ML+rapid bootstrap (GAMMA/ORDERED model) with 500 replicates on 9 populations (using the most frequent genotype to represent a population, i.e. 9 "sequences"), and it was fast (same as in PhyloNetworks). I think these programs just can't handle such extensive data, especially when you are dealing with population genetics and like 500+ individuals... How many of them do you have? By the way, I have the RAxML 2.0 GUI version, so I don't understand the commands.

Sorry not to help.
  


st 25. 9. 2024 o 16:57 Brintha V P cs18d017 <cs18...@smail.iitm.ac.in> napísal(a):
--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raxml/d906afe2-abe0-4b51-8921-c00b263295aan%40googlegroups.com.

Grimm

unread,
Sep 30, 2024, 2:44:14 AMSep 30
to raxml
Hi Brintha,

your problem is likely that you're are just feeding the wrong data set into RAxML-ng, i.e. an MSA with low internal variation and a huge number of near-identical sequences. I guess your are either looking at population-level data sets or buld target sequencing data.

Please check out the flow-chart for picrust2, what you need from RAxML-ng is not a total tree, including all your data, but a reference tree to place your entire sample using EPA-ng and GAPPA.


A reference tree and data set only includes representatives of (substantially) distinct alleles/gene variants/gene types. It has per definition no duplicates and you wouldn't want the reference tree to have too many flat terminal subtrees because otherwise EPA-ng cannot do its magic to full extent.

Cheers, Guido

Brintha V P cs18d017

unread,
Sep 30, 2024, 3:00:57 AMSep 30
to ra...@googlegroups.com
Hi,

You are right Grimm. After removing the duplicates and taking only the conserved regions in all the sequences, I am able to run RAxML without any issue. Currently, I run raxml with "--search" option (which is the default as well) instead of "--all".

Thank you for the suggestions.

Regards,
Brintha

--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages