Outgroup assignment question

Noé Reyna

unread,

Jun 2, 2023, 2:49:29 AM6/2/23

to raxml

Hello!

I'm an undergrad researcher and have questions regarding raxml-ng's outgroup assignment. While finding the bestTree and visualizing, we noticed that some of our population-level samples were automatically assigned as outgroups (Hops samples) seen in the attached image. This is somewhat interesting since we expected Hops to fall under the 94% supported node. Hence, we suspect the outgroup assignment is at play here. The general pipeline

for this tree is: identify SNPs throughout samples --> create a population-level .vcf file --> use vcf2phylip to convert our multi-sample .VCF SNPs into a .phylip file --> input this .phy tree into raxml-ng.

-How does raxml-ng assign the outgroup if it's not specified via the --outgroup argument? I've tried looking at the github documentation and google and couldn't find the answer to this.

-Is there a way we could run raxml-ng without an outgroup being specified; hence, running it using an unrooted tree? Or will we have to introduce an outgroup regardless in order for raxml-ng to be carried out?

-I assume there's probably a way to undo the outgroup assignment w/ other tools out there as it's just a reassignment of the root? Would the best approach to just reassign the outgroup after running the analysis w/ raxml-ng.

Please let me know if there are any clarifying questions. And thanks in advance for the help. Much appreciated

Best,

Noe

Noé Reyna

unread,

Jun 2, 2023, 2:57:23 AM6/2/23

to raxml

I would like to note that when i use R ape's is.rooted(), FALSE is returned noting that the attached tree isn't rooted. :/

Grimm

unread,

Jun 2, 2023, 9:00:09 AM6/2/23

to raxml

Hi Noé, and any other newcomer to phylogeneticsm

Alexi answered the basic question, which was double posted, here's the cross-link:

https://groups.google.com/g/raxml/c/UwDebvBD5a0, including the reference to the now classic paper, a must-read for anyone making the first steps in phylogenetics (and any lectures teaching phylogenetics at universities!)

But I like to take the opportunity to remind once more (the question pops up regularly on the RAxML forum) that we do not infer rooted trees with most tree inference implementations (knowing that many lectures still get this wrong). Irrespective of whether we use or not the --outgroup option, we (typically) optimise unrooted trees, build up by a sequence of compatible taxon (tip) bipartitions such as A + B | C + D + E and A + B + C | D + E.

Unfortunately, the NEWICK format we use to encode an inferred tree doesn't do unrooted trees, it's not internode-(branch) but node-based, hence, any NEWICK tree is intrinsically rooted (by default in RAxML using the top-level trifurcation, or the pre-defined outgroup-ingroup internode when using the --outgroup option).

For the above case, we can define the same tree with different NEWICK codes ((A,B),(C(D,E)), placing the tree's root between AB and CDE or (((A,B),C)(D,E)) placing the root between ABC and DE, or (A(B(C(D,E)))) designating A as the outgroup. These are different rooted versions of the same inferred (unrooted), optimised tree.

Outgroup rooting (as invoked with the --outgroup option) is hence merely a post-inference graphical modification, a graphical interpretation of the tree under the assumption that our outgroup is not part of the ingroup. So, indeed, if we reroot a tree (really: the classic paper in Alexi's answer is a must-read!), it's just a reassignment, it doesn't change the tree.

There are some explicitly rooted inferred trees such as when we use asymmetrical (direct) substitution models, e.g., simplest case, a model than only allows Dollo mutations from 0 to 1 for a binary data set. Such a tree will assume the root is an all-zero ancestor from which all tips evolved by changing 0s to 1s.

Also chronograms are intrisically rooted being ultrametrised: the optimised clock model implicitly roots the inferred tree even in the absence of an outgroup. Clock-rooting is a valid, although much underused, alternative to outgroup-rooting and can be especially an option for population-level data or any data set with an ingroup that lacks a meaningfully close outgroup, i.e. where any sampled outgroup taxon is sequentially very different from all ingroup tips and may be prone to ingroup-outgroup long-branch attraction (note that ML only escapes LBA in 50% of the cases in the Felsenstein Zone, MP and distance-based approaches will always be wrong).

A not much used but very quick&handy test of outgroup-triggered roots is to use the outgroup sample as queries and optimise their position on an ingroup-only tree using RAxML's evolutionary placement algorithm (example below, test of a genus root, from Liede-Schumann et al. 2020, open access): if they do not map on the same internode, outgroup rooting is problematic or the outgroup sample includes better-suited outgroups and outgroups that may trigger ingroup-outgroup branching artefacts.

Reply all

Reply to author

Forward