coefficient of variation 1.008; too non-clock-like?

572 views
Skip to first unread message

Keir Wefferling

unread,
Mar 6, 2017, 10:48:43 AM3/6/17
to beast-users
Hi all, 

I'm estimating divergence times for an order-level phylogeny (Ranunculales) using both nuclear ribosomal and chloroplast data. I have 9 data partitions, linked clocks and trees across chloroplast regions, bModelTest for site model, relaxed clock log normal clock model, birth death tree prior, uniform (0,1) ucldMean priors, and 10 uniform monophyletic node calibrations. 
I'm getting ESSs well over 200 for almost all parameters in a single 2 million generation run (sampling every 3000). ESS for chloroplast clock ucldMean is less than 200 (>100) for individual runs, and most concerning, the rate coefficient of variation for the chloroplast clock is 1.008 (and 0.924 for nuclear ribosomal 26S). 

Can I place any confidence in these estimates, or are the data too non-clock-like to provide reliable estimates of divergence times?
I will appreciate any input on how to proceed.

Thanks so much, and best wishes, 

Keir
Screen Shot 2017-03-06 at 9.27.39 AM.png
Unl9pUni5.xml

Santiago Sánchez

unread,
Mar 6, 2017, 11:04:17 AM3/6/17
to beast-users
Hi Keir,

I think you mean 20 million generations, right? Even if your data is non-clock like divergence times can be estimated accurately. In fact, this is the whole point of using a relaxed clock model. On the other hand, if your data is very non-clock like and you would be using a strict clock model, divergence times would certainly be un reliable. I my experience, I haven't seen parameters with 100 < ESS < 200, make much difference. However, you could try doubling your chain length to 40 million, in case you want to be absolutely thorough.

Cheers,
Santiago


--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at https://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.
--
==========================
Santiago Sanchez-Ramirez, PhD
Postdoctoral Associate
Ecology and Evolutionary Biology
University of Toronto
==========================

Keir Wefferling

unread,
Mar 6, 2017, 11:28:13 AM3/6/17
to beast-users
Hi Santiago, 

Thanks so much for the reply!
Good catch on the number of generations... and I actually ran it for 200,000,000 generations per run. I agree it couldn't hurt to run it longer.

I realize now that I should have given more background into my concerns about the high (>1.0) coefficient of variation for the chloroplast clock. 
In Drummond and Bouckaert's (2015) book "Bayesian Evolutionary Analysis with BEAST, they state that "If S or cv is greater than 1, then data are very non-clock-like and probably shouldn't be used for estimating divergence times." (p. 144). I suppose I'm just wondering what the current thought is on using 1.0 as a cut-off... or if I'm simply misinterpreting what they said.

Thanks again, and best, 
Keir

Alexei Drummond

unread,
Mar 6, 2017, 6:28:55 PM3/6/17
to beast...@googlegroups.com
Dear Keir,

I wouldn’t characterise S>1.0 as a cutoff value per se. If that is the only gene you have then it is the best you can do. But be mindful that large numbers like S>>1.0 suggest that the time signal in your data is quite noisy and therefore your estimated divergence times will probably be quite sensitive to prior assumptions (such as which type of relaxed molecular clock you use, which tree prior you use, what your calibrations are, et cetera). 

Santiago is correct that the point of relaxed clock models is to allow analysis of data that does not conform to a strict molecular clock. However the extent that you can trust the results depends on your belief that the relaxed clock model you use is adequately accounting for the evolutionary process that generated your data. The caution we gave in the book is that when the data gets very far from a strict molecular clock (e.g. in my experience S >> 1.0) then any discrepancies between model assumptions and the actual process may be amplified and we think that one is more likely to make inference errors since the priors will come into play more. It is hard to be concrete about this sort of caution/advice, since there are many reasons that data can be non-clock-like. 

But we did not mean to suggest that if S>1.0 then you shouldn’t use the data. I would suggest that instead you should be cautious about your conclusions and investigate sensitivity of those conclusions to your priors.

Cheers
Alexei

Keir Wefferling

unread,
Mar 6, 2017, 8:00:31 PM3/6/17
to beast-users
Hi Alexei, 

Thanks so much for the reply! 
This is good news, and I appreciate the suggestion of investigating the effect of different priors. I'll play with these data some more.

All the best, and thanks again, 
Keir
Reply all
Reply to author
Forward
0 new messages