Help with dating analysis.

297 views
Skip to first unread message

David Maddison

unread,
Mar 12, 2023, 2:48:27 PM3/12/23
to beast-users
I'm trying my first Bayesian dating analysis, and am struggling with choosing priors etc.  I've documented my confusions in a blog post, and would appreciate help resolving my queries (see questions in blue):


Thanks in advance!
David


Patrice Showers Corneli

unread,
Mar 12, 2023, 10:32:28 PM3/12/23
to beast-users
Dr. Maddison,

First. Thank you so much for the old MacClade and the continuing wonderful Mesquite.

Second, Thank you for a vary entertaining request for help.

Your trials with Bayesian analysis reminds me that, although I use ‘Bayesian’ analysis in phylogenetics, I have never felt quite comfortable with the various biases that result from all sorts of sources from exploring tree space in a Markov chain to the difficulty of Priors. I appreciated Ziheng Yang’s discussion of priors in his 2014 text “Molecular Evolution: A Statistical Approach” and several review papers he authored. But I assume that your are familiar with those papers.

Anyway, I some years ago decided that I am not a Bayesian having had my training under Simon Tavaré using maximum likelihood, which always made sense to me statistically. Struggling with some of the issues you mention, I decided that life is too short to take Bayesian methods too seriously. I have asked all sorts of people to answer some questions that I have had and finally, in desperation wrote to AWF Edwards who wrote the wonderful book “Likelihood” many years ago.

Here, for what it is worth is his rather entertain response to my questions. He also was mostly unable to make me feel better about Bayesian methods, which I use grudgingly.

Perhaps some Bayesian phylogeneticist can answer the questions I have posed to Dr. Edwards and other ML folks.


Regards,
Patrice

Patrice Showers Corneli, M.Stat., Ph.D
Associate Research Professor, Retired.
Department of Biology
University of Utah 

On Apr 13, 2015, at 7:47 AM, A.W.F. Edwards <aw...@cam.ac.uk> wrote:

"Dear Patrice,
 I am delighted to engage in correspondence with you though you may be disappointed in what I write not because I am too busy (always time for evolutionary trees) but because I have no direct experience of other people's tree-building programs. Simon is an academic grandchild of mine so I guess that makes you a great-grandchild. And the death of my friend Ryk Ward was a great shock.
 The best I can do is to ramble on with some thoughts engendered by your letter. I can see from your questions that you are extremely well-informed about the field. I start with the meaning of 'Bayesian'. I have been accused of introducing Bayesian methods into phylogenetic analysis by virtue of my 1970 (attached). This I deny! The word Bayesian was coined by R.A.Fisher in 1950 to describe methods of estimation which assume prior distributions for parameters for which there is no stochastic model or other justification. Nowadays it often seems to be assumed to mean just using Bayes's Theorem, which is absurd. A strong multiparameter probability model like I described in 1970 (actually 1969 at the Royal Statistical Society') is not Bayesian unless you give all the parameters prior distributions. My first question is: in which sense are the self-styled 'Bayesian' analyses truly Bayesian? If they are Bayesian in Fisher's sense then the mathematical and computing problems are soluble but the doubts about inference remain, and I continue to share them.
 That leaves us working with likelihoods, and the first clarification is to see the difference between the true parameters of the model (some of which may be nuisance parameters) and the incidental parameters. Section 3 of my paper describes the problem. I expect I learnt of this important distinction from Joe Felsenstein - if so I should have been more explicit and not just thanking him along with others. Certainly Joe solved my problem of singularities in the 'likelihood' surface by pointing out that the positions of the nodes belong to the second class and therefore the singularities appear in (continuous) probability distributions.
 Now to the questions you ask which I can't answer because I don't know the logical basis of MCMC. If it generates posterior probabilities then there must have been prior probabilities and the approach is truly Bayesian. By 'marginalisation over all parameters' I assume you mean integrating out the nuisance parameters after assuming prior distribution for them. When you ask 'Why would I care about any model parameters that are not associated with the ML tree?' I assume these are the incidental 'parameters'
which are not really parameters at all but have probability distributions indexed by the true parameters. So, as Joe said, you should not care about them.
 One last word about nested models and BIC. Why Bayesian IC? I have my own likelihood version in the second attached paper which I am sure you won't have seen.
 If all this is any help I shall be very happy.
           Best wishes, Anthony Edwards”

My letter to Dr. Edwards, a most charming fellow:

"Please do not feel compelled to answer this if you are terribly busy. 

As a former student of Simon Tavaré in statistics here at the University of Utah and later as statistician and still later as a PhD. candidate in evolutionary biology under the late Ryk Ward, I find likelihood methods to be mathematically and especially intuitively sensible with very, very nice statistical attributes for phylogenetic trees.

Then along comes Bayesian analysis. With only a passing reference to it during my statistics training in Utah, I admit not knowing as much about it as likelihood. But over the years I frequently become frustrated with the concept. I routinely add it to my likelihood phylogenetic analysis primarily because journals generally want both these days. And yet when every couple of years I have to teach it to graduate students I run into all sorts of questions about Bayes analysis for which I have found not found answers. 

My recent concern comes from developing examples for teaching molecular clock techniques which I do with MrBayes and with PAUP. I am very puzzled by an example in the Mr.Bayes manual that finds much higher likelihoods (that is harmonic means of the -lnLs) for a strict molecular clock than for the unconstrained clock model. Of course PAUP results agree with my intuition and training that the strict model is nested within the unconstrained model and so cannot have a higher likelihood than the larger model.

In trying to get a grasp on why the results should be so different for the Bayes analysis compared to likelihood and with comments from Joe Felsenstein, I realize that it boils down to the different criteria for support and the fact that, although MCMC is traveling through tree space using likelihood score to guide the path, the support values do not necessarily find the maximum likelihood tree as optimal. And that all comes from “marginalizing” over model-parameter space along with tree space and the probability under the curve and then using BICs to justify the strict clock model estimate over the unconstrained model estimate.

Which invariably leads to the question I’ve posed to some very good phylogeneticists: Why is it desirable to marginalize over the model parameters and the tree space at once? Why would I care about any model parameters that are not associated with the ML tree? Why do I care about suboptimal trees and their parameters?

I realize that the MCMC search provides a real confidence interval - I think - on the posterior probabilities and the branch lengths ( and I like that) but I can’t get my head around all the interest in nuisance parameters that are not associated with a truly optimal likelihood score for the tree. Never mind all the problems about the correlation in the MCMC search and the inflated posterior probabilities - I can account for that simply by recognizing that PPs will generally be too optimistic. 

Is the reason for marginalization over all parameters simply to get confidence limits? And how do I explain the apparently non-nested quality of the strict clock model in MrBayes to my students?

Thank you very much for any insight you can provide."

Patrice 




David Maddison

unread,
Mar 13, 2023, 8:56:54 PM3/13/23
to beast-users
Thank you, Patrice, for your very entertaining response, and the interesting emails to and from Edwards!   

Alexei Drummond

unread,
Mar 15, 2023, 7:04:18 PM3/15/23
to beast-users
Hi David,

That is a very big data set :) Your analysis has uncovered one missing feature in BEAUti that we would like to add. Right now it would require editing the XML file by hand. We are just making that change today and we should be able to release a new version of the relevant package some time next week. We will also respond to the other questions in your blog in full at the same time. Mainly the decisions you made look good except in 1 or 2 cases we would make some alternative suggestions. The error you encounter comes from the way you specify the MRCAPriors for your three fossils. To fix that error in the best way without editing the XML we need to add a different method of setting priors for the case when you want both (i) an FBD prior and (ii) uncertainty in the geological age of the fossils. This seems like a common case and so we think it should be directly supported in BEAUti. 

Thanks for taking the time to write your blog post. I hope our answers to your questions will shortly make it an even more useful resource.

Cheers
Alexei 

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beast-users/7e958432-bff9-48b9-abcd-55bb4de7e82en%40googlegroups.com.

David Maddison

unread,
Mar 16, 2023, 12:06:00 AM3/16/23
to beast...@googlegroups.com
Thank you so much, Alexei!  This is great.  I look forward to hearing more!

Best wishes,
David

You received this message because you are subscribed to a topic in the Google Groups "beast-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beast-users/dN21SNvCXJM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beast-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beast-users/9393DAEF-3438-453C-B256-538BF0658706%40gmail.com.

Jordan Douglas

unread,
Mar 22, 2023, 8:49:23 PM3/22/23
to beast-users
Hi David,

The team in the Centre of Computational Evolution at University of Auckland has prepared a response, which Remco posted at the bottom of your blog. This was a joint effort with Kylie Chen, Alexei Drummond, Remco Bouckaert, and Walter Xie.

Cheers,
Jordan

David Maddison

unread,
Mar 26, 2023, 1:14:30 PM3/26/23
to beast-users
Thanks again Aukland Team for all of your help with this!  I updated the blog post, interspersing your answers into the post to make it easier for readers to follow (your responses are indicated in brown).  I've also added my responses and the ways I have changed my choices.  Any thoughts on those would be appreciated.  However, there are three things in particular I would like to highlight:

(1) I've defined my taxon sets in the NEXUS file.  After loading the alignment, one MRCA Prior conveniently shows up for each taxon set in the Priors panel (although they are not labeled as MRCA priors - would be convenient of that somehow appeared there, perhaps in response to choices made within the prior GUI).  However, in using the Add Prior button to add the second prior for each taxon set, the Sampled Ancestors MRCA Prior, there is not the option to choose one of the existing taxon sets.  I have to recreate each of the existing taxon sets using the BEAUTi interface, which would be rather painful if I had more than three (and I have another project with dozens of taxon sets, some with many taxa). I realize I could create these within the XML file directly, but would it be possible to have an option to create Sampled Ancestors MRCA Priors for already existing taxon sets within the BEAUTi interface? 

(2) I presume the FBD model assumes equiprobable sampling of taxa.  However, in my case (and I suspect in most people's research), the taxa were not sampled equiprobably. I very specifically chose a dispersed sampling strategy, choosing one or two species from each major lineage, attempting to span the basal split of each major lineage.  What would the anticipated affect of this violation of the sampling model be?  Does this mean we shouldn't be using the FBD?

(3) Most importantly, I still get that exact same exception when running this new xml file with the split MRCA priors. I'll send the updated XML file to Alexei off-line to see what might be amiss.

Thank you!
David

Alexei Drummond

unread,
Mar 26, 2023, 4:05:13 PM3/26/23
to beast...@googlegroups.com
Hi David,

You are welcome!

You get the same error because I think there is still one fossil age calibration prior that has extant taxa in it.

Here is what Remco diagnosed from the XML you sent:

The Eupetedromus.prior is a monophyletic MRCA prior with 3 taxa.

The EupetedromusSampled.prior is a non-monophyletic SAMRCAPrior with a uniform age distribution and tipsonly=“true”. The problem is that this last calibration has the same three taxa as Eupetedromus.prior, but it should only contain the sampled taxon.

This results in the same error as the previous version because it is pretty much the same problem as with the previous version: including taxa in a tips-only MRCA prior that fall outside the range of the age distribution.

Cheers
Alexei

On 27/03/2023, at 6:14 AM, David Maddison <bemb...@gmail.com> wrote:

Thanks again Aukland Team for all of your help with this!  I updated the blog post, interspersing your answers into the post to make it easier for readers to follow (your responses are indicated in brown).  I've also added my responses and the ways I have changed my choices.  Any thoughts on those would be appreciated.  However, there are three things in particular I would like to highlight:

David Maddison

unread,
Mar 26, 2023, 9:51:50 PM3/26/23
to beast...@googlegroups.com
Ah!  I wasn't paying attention to Remco's instructions.  I will try again, report back, and update the blog post.


David Maddison

unread,
Mar 26, 2023, 9:51:51 PM3/26/23
to beast-users
Hi Alexei et al.,
OK, now that I actually paid attention to what Remco wrote, I fixed the problem and all runs well!   I've updated the blog post appropriately - feel free to suggest improvements. Hopefully it will now serve as a history of my confusions and help others who might have also struggled.  
Thanks again,
David

Patrice Showers Corneli

unread,
Mar 27, 2023, 1:28:48 PM3/27/23
to beast-users
What a terrific series of posts. Thanks

Reply all
Reply to author
Forward
0 new messages