Conditioning on rho and/or the root for a wholly extinct phylogeny

274 views
Skip to first unread message

David Černý

unread,
Jul 31, 2018, 6:13:39 PM7/31/18
to beast-users
Hi everyone,

I'm trying to estimate divergence times within a clade that includes no extant species (the youngest representatives are 66 Ma old) using BEAST 2.4.8 plus the MM and SA packages, and I've been wondering how to best assign hyperpriors to the Fossilized Birth-Death model. Most studies (in fact, all I've been able to find so far) seem to condition the model on rho, but these usually include a mix of extinct and extant species. There also appear to be several conflicting interpretations of rho in use. Going back to Gavryushkina et al. 2014 (p. 4, eq. 2), rho is described as the probability of sampling at t = 0. However, should the t here be interpreted as root height, or rather as absolute time? Currently, I'm considering the following options:

(1) Condition on sampling rather than rho and estimate the latter (then again, what would that estimate mean?)

(2) Condition on rho and set it to 0, since the clade in question has no extant representatives

(3) Condition on rho and set it to my best estimate of the probability of sampling at 66 Ma

Which one of these would be the most appropriate?

A related question is whether I should condition the FBD process on the root or on the origin, since this is often discussed in terms of allowing fossils to be placed in the stem group of the clade formed by all extant tips included in the analysis. However, my analysis doesn't have any extant taxa in it, so the stem/crown distinction doesn't apply. I'd like to be able to allow for the possibility that the root is a sampled ancestor, but on the other hand, I have no good prior to place on the time of origin other than the age of the oldest fossil in the dataset.

Thanks,

David

David Černý

unread,
Aug 1, 2018, 5:13:23 AM8/1/18
to beast-users
I did some more research on this, and it seems that the problem has been tackled in different ways in the literature:

-- Cau (2017, PeerJ) set rho equal to 0, which corresponds to option (2) above.
-- Matzke & Irmis (2018, PeerJ) don't mention any criteria for selecting rho, but they do say in the supplement that they subtracted the age of the youngest tip from all other tip ages in the XML, which corresponds to option (3).

Again, which of these is preferable? 

There is an old article on the BEAST 2 website by Remco Bouckaert that seems to be describing an early version of the SA-FBD model (the parameterization differs from what BEAUTi currently offers) and says that setting rho to 0 leads to the FBD parameters being unidentifiable. Moreover, the article also says that rho should be set close to 1 for fossil phylogenies but "is not specified" for serially sampled data, which is confusing: aren't fossils serially sampled, too?

Any help would be much appreciated.

Alexandra Gavryushkina

unread,
Aug 1, 2018, 10:32:06 PM8/1/18
to beast-users
In the current implementation, I would agree with Remco and do not specify rho and do not specify origin time and set conditonOnRoot to true. A better way (but not currently available) would be to set rho to one and also include the time of the most recent sample to the inference. Currently, even though the input file contains the times of all fossil samples, for internal calculations, the ages of the nodes in the estimated tree are shifted by the time of the most recent sample, so that the age of the most recent sample becomes zero and its true value is not actually used.

Alexandra Gavryushkina

unread,
Aug 1, 2018, 10:36:08 PM8/1/18
to beast-users
Also do not use the old version of MM package as it has a bug. You will need BEAST 2.5 and MM 1.1.1. 

On Wednesday, 1 August 2018 10:13:39 UTC+12, David Černý wrote:

David Černý

unread,
Aug 2, 2018, 3:13:43 AM8/2/18
to beast-users
Thanks a lot, Alexandra, that's very helpful! Just to make sure, would I be right in thinking that if rho is not specified, I should condition on sampling instead to ensure parameter identifiability?

David Černý

unread,
Aug 2, 2018, 6:25:47 AM8/2/18
to beast-users
And one last question: when you say "do not specify rho", in terms of the BEAUTi settings, would that mean:

(1) Do not fix it, do not condition on it, estimate it;

(2) Fix it, do not condition on it, do not estimate it;

(3) Manually remove all references to rho from the resulting XML?

Alexandra Gavryushkina

unread,
Aug 2, 2018, 6:18:58 PM8/2/18
to beast...@googlegroups.com
Hi David, 

Condition on sampling will not make the parameters identifiable. So you will still need a good prior on at least one of them or better more than one.  Conditioning on sampling is most important for simulation studies. For a real data analysis, you should choose yourself whether to condition on sampling or not (see some similar discussion here http://bayesiancook.blogspot.com/2014/11/should-we-condition-on-non-extinction.html). In your case, instead of conditioning on non-extinction you want to condition on at least one fossil preserved. 

To not specify rho follow option (3): remove any mention of rho from XML. 

Cheers,
Sasha. 

--
You received this message because you are subscribed to a topic in the Google Groups "beast-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beast-users/I-FvZgACAOQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at https://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.
--
Best regards,
Dr. Alexandra "Sasha" Gavryushkina
Postdoctoral fellow
Department of Biochemisty
University of Otago
Dunedin, New Zealand
Reply all
Reply to author
Forward
0 new messages