Convergence and ESS values with DPPDIV

127 views
Skip to first unread message

alexandre pedro

unread,
May 25, 2015, 11:02:21 PM5/25/15
to dppdiv...@googlegroups.com
Dear Tracy and all,

DPPDIV is really awesome, but I have some questions about its performance on large datsets. I am using DPPDIV to date a tree with over 1000 species and almost 15Kb, and although the program is really fast and the results are in accordance with the expected dates (in tune with recent independent sources), some of the values in the .out file do not reach ESS>200. In fact, some of them are below 100. (I ran with the default configurations in CIPRES: 1 million cycles, gamma shape 2, gamma omrates 4, hyperprior gamma 2, cbd tree prior, and three uniform, soft bounded calibrations at deeper nodes).

I understand that for a dataset this size, it might be too much to ask for all high ESS values, but how is this really an issue, in such cases? I assume that at least the likelihood, root age and priors on calibration the nodes should always be >200 (which is my case), but what about the other (uncalibrated) nodes? Is it more important to have all values greater than 200 or check if multiple runs converge to the same overall dates for the tree (or both, ideally)?

In the paper from Arcila et al 2015 - Mol. Phylo. Evol. - they comment that ESS values for DPPDIV analyses were under 200 for many nodes, but the DPPDIV dates were quite consistent with the results of other programs used in that study. My concern with my data (from my modest experience with large datasets) is that it may not reach 'good' ESS in any other bayesian dating programs, since they were not designed for such scale.

I wondered if opting for the prior to Yule, or adopting minimum exponential calibration priors, could help the ESS values. I am running these alternative analyses at this moment, and as soon as I have the results I will report back. But I really want to hear from you about these ESS issues, they would help not only in these analyses, but in my formation as a whole.

Thanks a lot!!

Cheers!

Alex

Tracy Heath

unread,
May 26, 2015, 12:08:35 PM5/26/15
to dppdiv...@googlegroups.com
Hi Alexandre,

First off, the default number of cycles (1 million) is extremely low for a dataset with over 1000 species. That is because each step in the MCMC for DPPDiv (similar to MrBayes or BEAST) is only one proposal (or one set of proposals). Thus, for larger problems you have to compensate with a much longer chain length. I recommend setting the number of generations to something extremely high (e.g., 10 billion). This way you can check MCMC samples during the run and kill them when you have sampled sufficiently at stationarity. (The default 1million generations might be reasonable for a dataset of 5 species with 1000 bases, under a strict clock model...but then I would still recommend a longer chain length.)

Second, the Dirichlet process prior does not really scale well for large numbers of data elements. In this case that is the number of branches in the tree. For a tree with 1000 species, you have 1998 branches. Which will induce long mixing times. Additionally, given that there is no information in the molecular data for absolute times, you are getting dates consistent with your calibrations because they contribute the entirety of the information. 

Ultimately, I do not recommend taking much stock in any MCMC analysis using any program on a dataset this large that was only run for 1 million iterations. So regardless if you run this in DPPDiv, MrBayes, or BEAST, you should run it much, much longer. For some parameters, if you do this, you will notice better ESS values. You should also be running multiple, independent runs and evaluating if those have sampled the same stationary distribution. 

The Arcila paper specifically addressed the use of the fossilized birth-death process, which is a much better approach to calibrating nodes with fossils compared with calibration densities, though, I assume that if they also had low ESS values, they also did not run their chains long enough. The model is discussed here: http://www.pnas.org/content/111/29/E2957.abstract. I do not recommend the Yule process unless you have a strong prior belief that extinction has never occurred in your group. 

Cheers!
Tracy

--
You received this message because you are subscribed to the Google Groups "dppdiv-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dppdiv-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexandre Pedro

unread,
May 26, 2015, 3:44:28 PM5/26/15
to dppdiv...@googlegroups.com
Hi Tracy,

Thank you so much for explaining me this in such detail. I will set the runs longer, and keep checking the ESS for better values. I will also consider reducing the number of taxa in my dataset, in case ESS values don't show any trends of stability (or if my computational resources prove it to be impractical for me).
I've read about FBD-DPPDIV when it first came out, and I'm very interested in exploring this method when available at CIPRES.

Thanks once again for your time.

Best,

Alex


You received this message because you are subscribed to a topic in the Google Groups "dppdiv-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dppdiv-users/mdg7RwHU-qk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dppdiv-users...@googlegroups.com.

Mark Miller

unread,
May 26, 2015, 5:05:51 PM5/26/15
to dppdiv...@googlegroups.com
Hi all,

 

I wanted to check and see if the FBD-DPPDIV is available in  FDPPDIV code  yet?

For CIPRES to expose/support it, we will need to have a parallel version of the code.


Mark

Reply all
Reply to author
Forward
0 new messages