Soft topological constraints

59 views
Skip to first unread message

Ziv Lieberman

unread,
Oct 6, 2024, 9:56:12 AM10/6/24
to revbayes-users
Hi all,
Are clade constraints as passed to dnConstrainedTopology "hard"  or "soft" constraints, i.e., does defining a clade only mean that group is monophyletic (such that an unconstrained terminal could by sampled from within it), or that it is monophyletic and exclusive (unconstrained terminals cannot be sampled within it)? The context is including fossil taxa in FBD that can only be assigned to the total group; I would like to incorporate these in order to help inform fossil sampling rates, but allow them to occur anywhere in the tree - not just at the stem or crown nodes. If this isn't the way clade() and dnConstrainedTopology work, is there a way to implement it?
Thanks!
-Ziv

David Černý

unread,
Nov 5, 2024, 4:19:29 PM11/5/24
to revbayes-users
Hi Ziv,

Apologies for the late reply! dnConstrainedTopology() actually takes two types of arguments: constraints, which should be a vector of Clade objects, and backbone, which should be a Tree object. Clade constraints are hard, so if you enforce a clade consisting of A, B, C, and D, no other taxon from your taxon set can ever end up inside of it. On the other hand, backbone constraints are soft, so if you enforce the topology (A, (B, (C, D))), you're making sure that C and D will always be more closely related to each other than to B, and more closely related to B than to A, but taxa other than those four (E, F, G, ...) can end up in any position.

In practice, it is unfortunately quite difficult to get FBD+backbone analyses running: see issue #384. However, if you are able to craft a viable starting tree, you should still be able to do it.


Best,
--
David Černý 

Ziv Lieberman

unread,
Nov 5, 2024, 6:08:50 PM11/5/24
to David Černý, revbayes-users
Hi David,
Thanks for the information! This definitely helps. I'm curious how the backbone constraint approach works when I am using a fixed, previously inferred tree for extant taxa with several fossils assigned to subclades and others to the total clade only. Would I basically take the Newick representation of my extant topology, add fossil tips to the subclades, and then leave the total clade fossils out of the backbone string entirely?
Thanks!
-Ziv

--
You received this message because you are subscribed to the Google Groups "revbayes-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to revbayes-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/revbayes-users/035cb414-237a-416e-9eeb-8ddf2c8514b9n%40googlegroups.com.

David Černý

unread,
Nov 5, 2024, 11:24:45 PM11/5/24
to revbayes-users
Hi Ziv,

In the situation you're describing, my instinct would be to use both the constraints and backbone arguments. You'd use your extant tree as a backbone to fix the relationship among the living taxa, and then add individual clade constraints to further constrain the positions of your fossils. For example, if a particular subclade contains extant taxa A, B, C, and fossils F and G, you would enforce the monophyly of a group containing all 5 of these taxa. The extant taxa within each subclade would have to be listed exhaustively – otherwise, you'd get a conflict between your clade constraints and your backbone constraint.

Finally, those fossils that can only be assigned to the total clade really complicate things! You definitely want to leave them out of the backbone string, but one disadvantage of the solution above is that they would no longer be allowed to slip inside the constrained subclades: if a clade is specified to only contain A, B, C, F, and G, then total-group fossils X and Y can never become part of it. One thing that might help here (though I've never used it myself) is the optional_match argument. You should be able to do the following to specify alternative compositions for a clade constraint, though it might be a bit laborious:

constr0_0 = clade( "A", "B", "C", "F", "G" )
constr0_1 = clade( "A", "B", "C", "F", "G", "X" )
constr0_2 = clade( "A", "B", "C", "F", "G", "Y" )
constr0_3 = clade( "A", "B", "C", "F", "G", "X", "Y" )
constr0 = clade( constr0_0, constr0_1, constr0_2, constr0_3, optional_match=true )


Here, only the final variable, constr0, would make it into the list of clade constraints that you pass to dnConstrainedTopology. This should let you optionally include one or both of X, Y in your constrained subclade.

There are two problems I can foresee:

(1) If you have many such total-group fossils, then the number of alternative constraints explodes and this quickly becomes impractical;
(2) Even if that proves not to be an issue, I can see such a heavily constrained analysis being pretty fragile, though again, you should hopefully be able to get it running if you specify a good starting tree.


Best,
--
David Černý 

Ziv Lieberman

unread,
Nov 6, 2024, 5:45:11 PM11/6/24
to David Černý, revbayes-users
Hi David,
I see, thank you for the detailed explanation. Unfortunately I have a tree of about 130 extant terminals, and around 100 fossils, many of which can only be assigned to the total clade (or total subclades). It sounds like there may not be a way to implement this analysis in Rev without severely pruning the fossils or accepting exclusive monophyly.
Best,
Ziv

David Černý

unread,
Nov 6, 2024, 6:13:52 PM11/6/24
to revbayes-users
Yes, I'm afraid so. It'd be easy to run an analysis with a fixed extant backbone and fossils that are free to move within specified constraints; it would also be easy to run an analysis with a fixed extant backbone where the fossils are completely unconstrained and free to attach to the backbone anywhere. But the fact that you have both classes of fossils complicates things greatly – at that point, you are basically dealing with three different levels of phylogenetic imprecision.

You might be able to implement your analysis in BEAST 2 with its MRCAPriorWithRogues distribution, or perhaps in MrBayes, but I wouldn't bet on it.

Benjamin Redelings

unread,
Nov 6, 2024, 6:40:51 PM11/6/24
to revbaye...@googlegroups.com

I'm thinking about adding pseudodata for tree constraints, so that instead of making the constraint part of the distribution you would add a factor of F to the likelihood for each clade that is violated.  If F = 0 then it would be a hard constraint.

Under this paradigm we could (in theory) add the ability to handle clades that don't mention all taxa:  separate A B | E F but allow C+D to go on either side.  These are called "partial splits".

The question then would be how to recalculate quickly whether or not the clade applies.  One way of doing that would be do create a fake character with A = 0 B=0 E=1 F=1 C=? D=? and if the parsimony score is > 1 then the constraint is violated.

So that is kind of how it could be implemented in theory, but in practice programming this will take more time.  Possibly 6 months to a year.

-BenRI

Reply all
Reply to author
Forward
0 new messages