Backbone constraint analysis

287 views
Skip to first unread message

Nick

unread,
Apr 4, 2022, 5:03:05 PM4/4/22
to raxml
There is a poorly supported node in my backbone constraint tree. Is it possible to specify that there is a three-way polytomy at this node and then do a constrained analysis in raxml?

Grimm

unread,
Apr 5, 2022, 8:52:10 AM4/5/22
to raxml
Hi Nick,

first, you cannot have a poorly supported "node" in your backbone tree, only a poorly supported branch. When we establish e.g. bootstrap support, we count taxon bipartitions, which are represented by internodes in phylogenetic trees. When using a backbone constraint in RAxML, we do the same: we define taxon bipartitions.

Phylogenetically it makes hence no sense to constrain a hard polytomy, because a phylogenetic tree cannot have one: it's a dichotomous graph – a node is the connective (vertex) of three internodes per definition, not four.

Let's say you have four groups of taxa: A, B, C, D – the latter three making your polytomy: the only possible background constraint would be A | B + C + D, RAxML would then infer only trees with A | BCD split, and optimise either AB | CD, AC | BD or AD|BC as subsequent splits. Rooting with A following the prior constraint, either B, C, or D would be sister to the other two.

In other words, if you want to resolve the BCD polytomy, you define all three splits as backbone constraints, and then, e.g. to a topology test, which one outperforms the other two or if they are equally probable.

Note that there are two kinds of poorly supported branches:
  1. indicisiveness: there's little signal in your data to resolve this part of the tree. Constraining one alternative wouldn't hurt, but doesn't help either. There's simply no signal in the data to resolve this soft polytomy.
  2. internal signal conflict: eg. some part of the data prefers a AB | CD split, another part a AC | BD split: you get low support, because it's split between the two alternatives.
Pending on the signal in your matrix, your inferred tree may actually show a AD | BC split, although it's the worst supported alternative of all three. Because internal signal conflict may inflict branching artefacts since I force incongruent genealogies in a single tree.

To see which is the case for your 3-way polytomy, just read in the RAxML bootstrap sample in SplitsTree and check out the consensus network.

An example from bears: (a), (b) and (c) are genuine genealogies based on Y-Chromosome (paternal lineages), mitochondrial coding gene regions (maternal lineages) and nuclear-encoded autosomal introns (inherited from both parents). Bears are solitary, the females are prone to not migrate, the males like too. The species are only semi-biological species and their ancestors may not have been at all (e.g. polar bears and brown bears produce interfertile offspring). All these tree are true trees but incompatible in aspects.

If I combine the data, the support collapses for some branches (clades in Giand Panda rooted trees), and I get the poorly supported false red clade: a sister relationship between Sloth and Sun Bear not supported by in any of the individual three data sets that were combined.


SchliepEtAl2017_fig4ad.jpg
And here's how the RAxML bootstrap replicate sample consensus network looks like for the combined data, and with the (a), (b), (c)-individual supports mapped.

SchliepEtAl2017_fig4.jpg

Nick

unread,
Apr 6, 2022, 2:32:48 PM4/6/22
to raxml
My question is whether I can include the three-way polytomy in the constraint tree and then do a constrained analysis in raxml

Alexandros Stamatakis

unread,
Apr 6, 2022, 2:47:17 PM4/6/22
to ra...@googlegroups.com
yes, constraint trees can be multi-furcating if you use the -g option.

However, I would strongly advise you to use the constraint tree option
implemented in RAxML-NG https://github.com/amkozlov/raxml-ng as there
was one rare bug in the RAxML constraint tree option I never found the
time to fix.

Alexis
> 1. indicisiveness: there's little signal in your data to resolve
> this part of the tree. Constraining one alternative wouldn't
> hurt, but doesn't help either. There's simply no signal in the
> data to resolve this soft polytomy.
> 2. internal signal conflict: eg. some part of the data prefers a AB
> | CD split, another part a AC | BD split: you get low support,
> because it's split between the two alternatives.
>
> Pending on the signal in your matrix, your inferred tree may
> actually show a AD | BC split, although it's the worst supported
> alternative of all three. Because internal signal conflict may
> inflict branching artefacts since I force incongruent genealogies in
> a single tree.
>
> To see which is the case for your 3-way polytomy, just read in the
> RAxML bootstrap sample in SplitsTree and check out the consensus
> network.
>
> An example from bears: (a), (b) and (c) are genuine genealogies
> based on Y-Chromosome (paternal lineages), mitochondrial coding gene
> regions (maternal lineages) and nuclear-encoded autosomal introns
> (inherited from both parents). Bears are solitary, the females are
> prone to not migrate, the males like too. The species are only
> semi-biological species and their ancestors may not have been at all
> (e.g. polar bears and brown bears produce interfertile offspring).
> All these tree are true trees but incompatible in aspects.
>
> If I combine the data, the support collapses for some branches
> (clades in Giand Panda rooted trees), and I get the poorly supported
> false red clade: a sister relationship between Sloth and Sun Bear
> not supported by in any of the individual three data sets that were
> combined.
>
>
> SchliepEtAl2017_fig4ad.jpg
> And here's how the RAxML bootstrap replicate sample consensus
> network looks like for the combined data, and with the (a), (b),
> (c)-individual supports mapped.
>
> SchliepEtAl2017_fig4.jpg
>
>
>
> Nick schrieb am Montag, 4. April 2022 um 23:03:05 UTC+2:
>
> There is a poorly supported node in my backbone constraint tree.
> Is it possible to specify that there is a three-way polytomy at
> this node and then do a constrained analysis in raxml?
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/9cdff57e-de60-4e0f-ad77-78704abc5becn%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/9cdff57e-de60-4e0f-ad77-78704abc5becn%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Affiliated Scientist, Evolutionary Genetics and Paleogenomics (EGP) lab,
Institute of Molecular Biology and Biotechnology, Foundation for
Research and Technology Hellas

www.exelixis-lab.org

Grimm

unread,
Apr 7, 2022, 9:22:24 AM4/7/22
to raxml
I think there's a semantic problem here (between biology and bioinformatics) :D

the multi-furcating constraint allows you to divide the taxon set a priori into n groups. I.e. you start the analysis essentially with a n-tipped star tree and reject all topologies which contrast this star tree. But it's not a hard polytomy we constrain, rather a short cut: instead of constraining four compatible bipartitions (bifurcating constraints) A | BCD, B | ACD, C | ABD, and D | ABC I put them in one A | B | C | D. Very handy, if I know in which subtree where most tips should end up but some should be free to move (as it doesn't need to be comprehensive).

But if I understand Nick's need, he wants to constrain a hard trichotomy. A topology where three (or more) clades root in the same polytomy. Picturing it:

Constraints.png

This -g cannot do, at least it says so in the manual :) "Finally also note that, any multi-furcations in the input tree will be resolved via a maximum likelihood search."

And (sorry to repeat myself): unless you have actual ancestors and their descendants in your data set (such as a sequence of mutating viruses sampled at day 1, 10, 100 etc.) constraining a hard polytomy makes bio-phylogenetically normally no sense sense not only from the basic concept of ML tree inference (modelling dichotomous evolution: each ancestor is replaced by exactly two descendants) but also because poor branch support is no criterion that you look at an evolutionary scenario like in the bottommost part: a fast radiating, widespread, genetically increasingly heterogenous ancestral species ('Precursor Z') splitting up because of range fragmentation into a series of descendant species (A, Z', G, H). Such an event would need to be represented in the evolutionary tree as a hard polytomy but I wouldn't be able to infer this directly using ML tree inference, only indirectly (and only if modern-day is still close to that polytomous speciation event).

NoAncestorsNoFun2.png


@Nick, towards what end do you feel it is necessary to constrain?

There may be workarounds (within the obligatory dichotomy of phylogenetic trees) but whether it's really something to ponder using a constraint depends on the purpose, e.g. character mapping, evidence for non-dichotomous splitting; and also whether the unsupported branches that you would like to collapse are near-zero length or not. Maybe you can post the unconstrained tree. Doesn't need to be the full one or tips labelled, just a phylogram (i.e. branches with proportional length) marking the branches you would like to collapse into a polytomy and the branch support values for the critical branches (the phylogenetic neighbourhood).

/G
Reply all
Reply to author
Forward
0 new messages