Pruning algorithm in FBD with sampled ancestors

67 views
Skip to first unread message

Taraka Rama K.

unread,
Apr 16, 2018, 1:59:48 AM4/16/18
to revbayes-users
Hi,

I am computational linguist working in applying and trying to understand phylogenetic methods to language.

I was trying to understand how to apply pruning algorithm to a FBD tree when an internal node is sampled. In case of languages, a sampled ancestor is an ancient language like Latin which has modern Romance languages as extant descendants. In linguistic phylogenetics, we typically work with binary data. 

I am writing my understanding of pruning algorithm applied to a FBD tree. If Latin is an internal node with one incoming edge and only one outgoing edge, then, the likelihood vector at the Latin node is calculated as usual as if the site's state is unobserved. Then, the likelihood vector is multiplied by the observed vector of Latin. Observed means the observed sequence in the nexus file.

For instance, lets say the observed Latin value at a site (or column in the alignment matrix) is 1. This means that the character state vector for Latin at that point is [0, 1].

Lets say, the computed likelihood vector at the Latin node for the states 0, 1 is [x, y] where x and y are real numbers usually less than 1. Then, an element-wise multiplication of both the observed and the modified likelihood vectors is [0*x, 1*y] which is [0, y]. If the site has missing character (encoded as [1, 1]) for Latin, then, the elementwise multiplication would be [x, y] since multiplying by [1, 1] does not change the computed likelihood vector. Is this the way the pruning algorithm is applied to a FBD tree with sampled ancestors?

Best,
Taraka.

Sebastian Höhna

unread,
Apr 19, 2018, 8:28:49 AM4/19/18
to Taraka Rama K., revbayes-users
Hi Taraka,

our implementation in RevBayes goes around this problem using a simple trick: we do not actually treat sampled ancestors as that but instead as fossil tip nodes with a branch lengths of 0.0. Therefore, all of our trees in RevBayes are still binary and we never have observed sequences for "ancestral nodes". So the standard pruning algorithm from Felsenstein can be applied.

If you really want an algorithm that uses sampled ancestor nodes which have a degree of 2 (only one incoming and one outgoing branch) and observed sequences for these nodes, then your computation has to break the likelihood computation separate, independent components, for example, one component is the tree on one side of the latin node and the second component is the tree at the other side of the latin node. For each component you perform the standard Felsenstein pruning algorithm separately and multiply the overall probabilities together at the end. The important part is that you treat the observed values of the latin node as the root frequencies for the subtree that is tip-wards from the latin node. For the subtree that is root-wards of the latin you initialize the likelihood vector of the latin node as one usually does for tips, i.e., setting the likelihoods to [0,1] for observed or [1,1] for unobserved characters in your case. 

This is of course just a very rough sketch of how one could implement such an algorithm but I hope it helps you to get an idea. As I said above, we get around the problem by enforcing trees to be strictly binary (with the possible exception of the root node) and simply using branch lengths of 0.0. 


Best,
Sebastian


--
You received this message because you are subscribed to the Google Groups "revbayes-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to revbayes-users+unsubscribe@googlegroups.com.
To post to this group, send email to revbayes-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/revbayes-users/bbfde028-3ae7-4dd3-81ce-ad5a575a3e45%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages