Best a posteriori tree rooting options

21 views
Skip to first unread message

Kenta Renard

unread,
May 6, 2024, 12:52:30 PMMay 6
to raxml
Dear All,

I have a RAxML-NG ML tree and I didn't specify an outgroup option when I generated the ML tree (I understand it would just be a drawing option).

I used EPA-NG to place a more distant sequence and used Gappa to generate a Newick tree file and then I re-rooted the tree using the placed sequence. I'm not sure how this impacts the interpretation of the tree since the bootstrap replicates were done without the outgroup included (there might be an issue with what branch support is associated with what branch). Is there another way to add an outgroup to an unrooted tree?

Best wishes,
Kenta

Grimm

unread,
May 7, 2024, 12:13:47 PMMay 7
to raxml
Hi Kenta,

by using EPA-ng and outgroup taxa as queries, we do outgroup-signal sensitivity rooting test for our ingroup tree. In this case, the LWRs for each query give you the probability for the position of the outgroup-inferred ingroup root. There are no further issues with how to interpret branch supports, as they are connected to the internodes of the ingroup tree.

However, it may change what we interpret to be a clade (commonly synonymised with "monophyly"), pending where the outgroup was placed.

Assume you have a four-taxon tree: A + B | C + D with unambiguous support (BS = 100) for the only taxon bipartition. The tree we used for EPA-ng has five branches, one internode (for which we have support, the split between AB and CD) and four tips (which we call trivial splits) A-tip, B-tip, C-tip and D-tip.

Let's say 50% of your outgroup queries are connected to the internode with high LWRs, then an accordingly rooted tree has two clades of sistered tips A+B and C+D, mutually supported by BS=100.
The other 50% of queried outgroups are connected to the A-tip branch with high LWRs, then the equally probable alternative ingroup root would have A as sister to a B+C+D clade, and the BS=100 support a C+D clade in such a rooted tree. For the hypothesis that B is sister to C+D, i.e. a BCD clade, we cannot establish support because, it's a trivial split, not a bipartition. A "monophyly" of B, C, and D would be just an interpretation because of how we rooted the tree, not an actual inferred result.

To my experience, using EPA is the best-possible way to establish and test outgroup-inferred roots. It should be standard in phylogenetic literature, if rooting an ingroup is still a matter of debate (there are ingroup roots, which are signal-wise obvious, and wouldn't even require to infer any tree at all).

The alternatives are
  1. too use a large-as-possible outgroup sample and reinfer the total tree; the larger the outgroup, the less the risk of inflicting branching artefacts that can lead to a wrongly rooted tree (e.g. in the case of ingroup-outgroup long-branch attraction) or 
  2. clock rooting or accordingly adapted non-symetric substitution models, both methods do not require to add outgroups at all.
Re Alternative 1, the most commonly used to infer ingroup roots, a quick but essential test: if the ingroup(-only) branch supports are changed by adding outgroups to the matrix, then this is a direct indication for ingroup-outgroup branching artefacts; any outgroup changing ingroup branching patterns is highly problematic and should be removed from the outgroup sample.

If you need a paper that used EPA to root a tree in a non-trivial case, check out fig. 3 in Liede-Schumann et al.: https://doi.org/10.7717/peerj.8999/fig-3
(PS There's a related post on the former Genealogical World of Networks blog by D. Morrison for more background)

Cheers, Guido

Kenta Renard

unread,
May 7, 2024, 3:59:35 PMMay 7
to raxml
Dear Guido,

Thank you very much for your detailed and helpful response. If my understanding is correct, I will not need to bootstrap the new tree (with outgroup)? Would it be fine to leave the tree with 1 trivial split as we acknowledge that this does not reflect the true inferred result? 

Additionally, I assume we do not have to worry about any ingroup-outgroup long-branch attraction if the outgroup was placed a posteriori since it is not part of the originally inferred tree.

Best wishes,
Kenta

Grimm

unread,
May 8, 2024, 4:01:56 AMMay 8
to raxml
Mostly the answers to these questions are yes. 

For the visualisation as rooted phylogram, one just roots the ingroup tree according to the root preferred by most queries and with the highest cumulative LWR across queries. If there are two alternatives, one could visualise them both in a collapsed form. I.e. collapse all subtrees that are independent of where we place the root, to focus on the aspects that differ. The branch support of a trivial split (a tip branch) is per definition 100.

IO-LBA will express itself in the EPA result for some queries: the most obvious case would be a queried outgroup that includes very distant and less distant taxa, and the very distant are placed on a long-branched ingroup tip, while the less distant are placed internally on neighbouring internodes.

There is one situation when we might not escape or exlcude IO-LBA, also not in the EPA result, that is when the ingroup includes one lineage or tip X, genetically much drifted from the rest of the ingroup (I) and all available outgroups (O) are distant from X+I. Irrespective of whether X is nested in I or sister to I (evolutionary speaking), EPA will place the Os preferably on the X root/tip and a combined ingroup-outgroup tree will always favour a O + X | I split. In such a case only clock rooting/asymetric substitutions models may reveal an alternative root but I don't think anyone has explicitly tested that (I only know of this, because I once asked Joe Felsenstein about it, a good decade ago)

Alexandros Stamatakis

unread,
May 11, 2024, 5:45:03 AMMay 11
to ra...@googlegroups.com
Dear Kenta,

Just to add to this:

> Thank you very much for your detailed and helpful response. If my
> understanding is correct, I will not need to bootstrap the new tree
> (with outgroup)? Would it be fine to leave the tree with 1 trivial split
> as we acknowledge that this does not reflect the true inferred result?

What I would do is to just remove the outgroup again and just put the
root on the branch of the ingroup tree to which the outgroup attached
with the highest placement probability. In addition to that, you may
also want to show (in a separate Figure) how the outgroup placment
probabilities are distributed over the tree.

Finally, be careful when interpreting support values on rooted trees
depending on the tree viewer, this paper here discusses this in great
detail:

https://pubmed.ncbi.nlm.nih.gov/28369572/

>
> Additionally, I assume we do not have to worry about any
> ingroup-outgroup long-branch attraction if the outgroup was placed a
> posteriori since it is not part of the originally inferred tree.

Exactly.

Alexis
> 1. too use a large-as-possible outgroup sample and reinfer the
> total tree; the larger the outgroup, the less the risk of
> inflicting branching artefacts that can lead to a wrongly rooted
> tree (e.g. in the case of ingroup-outgroup long-branch
> attraction) or
> 2. clock rooting or accordingly adapted non-symetric substitution
> models, both methods do not require to add outgroups at all.
>
> Re Alternative 1, the most commonly used to infer ingroup roots, a
> quick but essential test: if the ingroup(-only) branch supports are
> changed by adding outgroups to the matrix, then this is a direct
> indication for ingroup-outgroup branching artefacts; any outgroup
> changing ingroup branching patterns is highly problematic and should
> be removed from the outgroup sample.
>
> If you need a paper that used EPA to root a tree in a non-trivial
> case, check out fig. 3 in Liede-Schumann et al.:
> https://doi.org/10.7717/peerj.8999/fig-3
> <https://doi.org/10.7717/peerj.8999/fig-3>
> (PS There's a related post
> <https://phylonetworks.blogspot.com/2019/12/trees-informing-networks-explaining.html> on the former /Genealogical World of Networks /blog by D. Morrison for more background)
>
> Cheers, Guido
>
>
>
>
>
>
> Kenta Renard schrieb am Montag, 6. Mai 2024 um 18:52:30 UTC+2:
>
> Dear All,
>
> I have a RAxML-NG ML tree and I didn't specify an outgroup
> option when I generated the ML tree (I understand it would just
> be a drawing option).
>
> I used EPA-NG to place a more distant sequence and used Gappa to
> generate a Newick tree file and then I re-rooted the tree using
> the placed sequence. I'm not sure how this impacts the
> interpretation of the tree since the bootstrap replicates were
> done without the outgroup included (there might be an issue with
> what branch support is associated with what branch). Is there
> another way to add an outgroup to an unrooted tree?
>
> Best wishes,
> Kenta
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/d99d00df-9059-4ba9-ab35-44085b09e55an%40googlegroups.com <https://groups.google.com/d/msgid/raxml/d99d00df-9059-4ba9-ab35-44085b09e55an%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Alexandros (Alexis) Stamatakis

ERA Chair, Institute of Computer Science, Foundation for Research and
Technology - Hellas
Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.biocomp.gr (Crete lab)
www.exelixis-lab.org (Heidelberg lab)
Reply all
Reply to author
Forward
0 new messages