Re: RAxML-NG running extremely slowly

109 views
Skip to first unread message

Grimm

unread,
Dec 7, 2023, 10:28:55 AM12/7/23
to raxml
Hi Kevin,

Then it's the signal in the alignment. Typically a very flat treespace or a matrix not providing any unambiguous tree-decisive signal, e.g. if you use a matrix with a lot of near-identical tips. You can run your matrix through Phytia for a treeability score.


Hopeless data will much inflate the computation time, because RAxML cannot know whether you feed it with data that informs a tree or not, it always tries.

Cheers, Guido

Kevin Myers schrieb am Donnerstag, 7. Dezember 2023 um 05:54:59 UTC+1:
I am using RAxML-NG to generate a tree with 159 taxa. I generated the alignment file from GTDB-Tk and ran the parse command before setting up the main command:

raxml-ng --all --msa T1.raxml.rba --model LG+G8+F --prefix RAXML_full_tree --threads 14 --seed 2

RAxML-ng has been running for more than 19 days and has made it through the first two bootstrapping analysis trees. This is much slower than any other tree I've made with RAxML-ng. Indeed, I used the same command and alignment for 140 taxa and it was finished in a couple of days. These were run with the same command and on the same server cluster, the only change was the alignment file used.

I'm attaching the output (so far) from the slow running sample. Any advice would be greatly appreciated. Thanks in advance.

Grimm

unread,
Dec 7, 2023, 10:28:55 AM12/7/23
to raxml
Hi Kevin,

most commonly, the reason for this is that you're feeding RAxML with data that struggles to inform a tree. E.g. a dataset with a lot of near-identical tips or a multigene sample with a lot of reticulate signal imbedded.
You can run your matrix through Phytia to see if it's tree-able or not.


"Hopeless"-score data will much inflate bootstrapping and tree inference time. RAxML cannot know if the data you fed it with does inform a tree, it always will try.

Cheers, Guido.

Kevin Myers

unread,
Dec 7, 2023, 11:16:11 AM12/7/23
to raxml
Thanks. Would the addition of 19 new samples really impact the signal that much? The 19 are from more distant related taxa as well if that helps?

Guido

unread,
Dec 7, 2023, 11:26:31 AM12/7/23
to ra...@googlegroups.com
Hi Kevin,

Am 07.12.2023 um 17:11 schrieb 'Kevin Myers' via raxml:
> Thanks. Would the addition of 19 new samples really impact the signal
> that much?

Normally not, if from the same group...


> The 19 are from more distant related taxa as well if that helps?

...it could be that they are too distantly related and interfering with
the original tip sets/ingroup topology. What kind of matrix is it?
Taxonomic level, organismal group, and gene sample can make huge
differences even when only few samples are added to an otherwise
unproblematic matrix (in some cases).

/G

Kevin Myers

unread,
Dec 7, 2023, 11:29:09 AM12/7/23
to raxml
I used GTDB-Tk to identify and align 120 genes and I used that alignment in RAxML. They are at the organism level. The 140 taxa tree are all from the same genus and it ran fine, so that's why I'm confused about why it's taking so long to run with an additional samples from outside the genus.

Guido

unread,
Dec 7, 2023, 11:59:46 AM12/7/23
to ra...@googlegroups.com


Am 07.12.2023 um 17:28 schrieb 'Kevin Myers' via raxml:
I used GTDB-Tk to identify and align 120 genes and I used that alignment in RAxML. They are at the organism level. The 140 taxa tree are all from the same genus and it ran fine, so that's why I'm confused about why it's taking so long to run with an additional samples from outside the genus.

It could indeed be that the added further-away tips are too distant for a good number of the genes because the aligned basepairs do not provide sorted phylogenetic patterns. If you focal genus is an isolated one, genetically highly coherent and distant from the added tips, especially the bootstraps may have problems in effectively inserting the new tips in the phylogeny that makes the framework for the smaller, more focussed taxon set. The problem may be exaggerated by crucially missing genes. E.g. an outgroup only covered for genes where it has no consistent splitting pattern with any distinct part of the ingroup, will lead to extreme topological ambiguity and may inflate computation time.

Have you looked at the new alignment, does it looks as clean as the one without the added tips? A quick assessment is also to infer a simple pairwise distance matrix for the total data and visualise it using a neighbour-net splits graph or a heat map.





On Thursday, December 7, 2023 at 10:26:31 AM UTC-6 Guido wrote:
Hi Kevin,

Am 07.12.2023 um 17:11 schrieb 'Kevin Myers' via raxml:
> Thanks. Would the addition of 19 new samples really impact the signal
> that much?

Normally not, if from the same group...


> The 19 are from more distant related taxa as well if that helps?

...it could be that they are too distantly related and interfering with
the original tip sets/ingroup topology. What kind of matrix is it?
Taxonomic level, organismal group, and gene sample can make huge
differences even when only few samples are added to an otherwise
unproblematic matrix (in some cases).

/G

--
You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/4QQpQxqlbz8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raxml/d3bef3aa-2277-4039-b1fd-472b4920a8dan%40googlegroups.com.

Kevin Myers

unread,
Dec 8, 2023, 2:29:05 AM12/8/23
to raxml
Thanks. The alignment looks good to my eye, but I will try some of the other tips you suggest!

I also downloaded the most recent version of RAxML-ng and it seems to be doing the trick and it's running much faster now.

Jaimie West

unread,
Dec 12, 2024, 3:14:06 AM12/12/24
to raxml
I have a side question. I am trying to use alignment from GTDB-Tk, results on bacterial genomes, and I cannot get the --check to pass since GTDB output in aa format. Curious how you accomplished this.

raxml-ng-mpi --check --msa gtdbtk.bac120.msa.fasta --model GTR+G --data-type aa --prefix T1


Oleksiy Kozlov

unread,
Dec 12, 2024, 5:22:34 AM12/12/24
to ra...@googlegroups.com
for AA data, you should use one of the protein models such as LG instead of GTR, please see:

https://github.com/amkozlov/raxml-ng/wiki/Input-data#evolutionary-model
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/raxml/3d55f0ba-5591-40bc-9555-
> fe95535e8eb8n%40googlegroups.com <https://groups.google.com/d/msgid/raxml/3d55f0ba-5591-40bc-9555-
> fe95535e8eb8n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Reply all
Reply to author
Forward
0 new messages