Best command to run RAxML-NG for a large alignment

44 views
Skip to first unread message

Alejandro Petroni

unread,
Feb 4, 2025, 2:01:17 AMFeb 4
to raxml
Hi all,
I have to get a ML tree for an alignment of 680 taxa x 503,934 sites.
This is a SNP-sites alignment obtained from core genes of bacterial whole genomes.
All bacterial samples belong to a single Species Complex that groups 5 phylogroups, that shared 92.4 to 96.3% ANI. 

Features of this alignment are: 223,533 distinct alignment patterns and 0.25% of gaps and completely undetermined characters.

I would like to get bootstrapping with 100 replicates.

I have already tried with several executables of RAxML, i.e., raxmlHPC, raxmlHPC-AVX and raxmlHPC-PTHREADS but in all cases the LSF of the server I'm using killed the job by time limit (> 12 hs). I tried to use more threads but without success.

Command used (with vectorized versions):
raxmlHPC-AVX -f a -p 76076 -s <my_alignment>.aln -x 76076 -N 100 -m GTRCAT -V -n <assay_number>.tree   (asking for 16 threads and 8 GB of memory in the LSF submission)

and

raxmlHPC-PTHREADS -f a -p 76076 -T 16 -s  <my_alignment>.aln -x 76076 -N 100 -m GTRCAT -V -n  <assay_number>.tree   (same request of threads and emory as above)

So, I would like to know if it would be better to use RAxML-NG instead of different executables of RAxML. Is RAxML-NG faster??

If so, could someone tell me the best RAxML-NG command option to use considering the features of my alignment, please?

I don't need a 'very deep and formal" phylogeny of this sample set..., just to get a 'suitable' clusterization of phylogorups.

Many thanks in advance!
Alejandro

Oleksiy Kozlov

unread,
Feb 4, 2025, 9:13:19 AMFeb 4
to ra...@googlegroups.com
Hi Alejandro,

your alignment is pretty large, so it is not surprising that tree inference with bootstrapping takes
many hours.

Although RAxML-NG is generally faster, v1.x does not implement rapid bootstrapping, which means that
for your analysis it could even be slower.

However, you can try the development version

https://github.com/amkozlov/raxml-ng/wiki/Installation#building-development-branch

with the following options:

--model GTR+G --all --bs-trees 100 --bs-metric rbs

Finally, no matter with wich raxml or raxml-ng version, bootstrapping would be the main
computational bottleneck. So if you can forgo the support values, or use a cheaper alternative such
as SH-aLRT instead (--bs-metric sh), this would substantially accelerate the analysis.

Best,
Oleksiy
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/raxml/8a9486b3-e1b5-41b8-80a5-
> edd524230385n%40googlegroups.com <https://groups.google.com/d/msgid/raxml/8a9486b3-e1b5-41b8-80a5-
> edd524230385n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Alejandro Petroni

unread,
Feb 4, 2025, 10:38:06 AMFeb 4
to raxml
Thank you very much,  Oleksiy!
I will try these options.

Best wishes,
Alejandro

Grimm

unread,
Feb 5, 2025, 3:34:51 AMFeb 5
to raxml
Hi Alejandro,

why do you need bootstrap support at all, when your objective is "just to get a suitable clusterisation of phylogroups" and you already know it's five? You only need a tree for that. What's the purpose of bootstrapping these data? Keeping in mind that 100 BS pseudoreplicates hardly extract anything from a character matrix 223k DAP other than signal-wise trivial splits, so, very little information beyond the unbootstrapped ML tree.

Cheers, Guido
Reply all
Reply to author
Forward
0 new messages