"Dear Patrice,
I am delighted to engage in correspondence with you though you may be disappointed in what I write not because I am too busy (always time for evolutionary trees) but because I have no direct experience of other people's tree-building programs. Simon is an academic grandchild of mine so I guess that makes you a great-grandchild. And the death of my friend Ryk Ward was a great shock.
The best I can do is to ramble on with some thoughts engendered by your letter. I can see from your questions that you are extremely well-informed about the field. I start with the meaning of 'Bayesian'. I have been accused of introducing Bayesian methods into phylogenetic analysis by virtue of my 1970 (attached). This I deny! The word Bayesian was coined by R.A.Fisher in 1950 to describe methods of estimation which assume prior distributions for parameters for which there is no stochastic model or other justification. Nowadays it often seems to be assumed to mean just using Bayes's Theorem, which is absurd. A strong multiparameter probability model like I described in 1970 (actually 1969 at the Royal Statistical Society') is not Bayesian unless you give all the parameters prior distributions. My first question is: in which sense are the self-styled 'Bayesian' analyses truly Bayesian? If they are Bayesian in Fisher's sense then the mathematical and computing problems are soluble but the doubts about inference remain, and I continue to share them.
That leaves us working with likelihoods, and the first clarification is to see the difference between the true parameters of the model (some of which may be nuisance parameters) and the incidental parameters. Section 3 of my paper describes the problem. I expect I learnt of this important distinction from Joe Felsenstein - if so I should have been more explicit and not just thanking him along with others. Certainly Joe solved my problem of singularities in the 'likelihood' surface by pointing out that the positions of the nodes belong to the second class and therefore the singularities appear in (continuous) probability distributions.
Now to the questions you ask which I can't answer because I don't know the logical basis of MCMC. If it generates posterior probabilities then there must have been prior probabilities and the approach is truly Bayesian. By 'marginalisation over all parameters' I assume you mean integrating out the nuisance parameters after assuming prior distribution for them. When you ask 'Why would I care about any model parameters that are not associated with the ML tree?' I assume these are the incidental 'parameters'
which are not really parameters at all but have probability distributions indexed by the true parameters. So, as Joe said, you should not care about them.
One last word about nested models and BIC. Why Bayesian IC? I have my own likelihood version in the second attached paper which I am sure you won't have seen.
If all this is any help I shall be very happy.
Best wishes, Anthony Edwards”
My letter to Dr. Edwards, a most charming fellow:
"Please do not feel compelled to answer this if you are terribly busy.
As a former student of Simon Tavaré in statistics here at the University of Utah and later as statistician and still later as a PhD. candidate in evolutionary biology under the late Ryk Ward, I find likelihood methods to be mathematically and especially intuitively sensible with very, very nice statistical attributes for phylogenetic trees.
Then along comes Bayesian analysis. With only a passing reference to it during my statistics training in Utah, I admit not knowing as much about it as likelihood. But over the years I frequently become frustrated with the concept. I routinely add it to my likelihood phylogenetic analysis primarily because journals generally want both these days. And yet when every couple of years I have to teach it to graduate students I run into all sorts of questions about Bayes analysis for which I have found not found answers.
My recent concern comes from developing examples for teaching molecular clock techniques which I do with MrBayes and with PAUP. I am very puzzled by an example in the Mr.Bayes manual that finds much higher likelihoods (that is harmonic means of the -lnLs) for a strict molecular clock than for the unconstrained clock model. Of course PAUP results agree with my intuition and training that the strict model is nested within the unconstrained model and so cannot have a higher likelihood than the larger model.
In trying to get a grasp on why the results should be so different for the Bayes analysis compared to likelihood and with comments from Joe Felsenstein, I realize that it boils down to the different criteria for support and the fact that, although MCMC is traveling through tree space using likelihood score to guide the path, the support values do not necessarily find the maximum likelihood tree as optimal. And that all comes from “marginalizing” over model-parameter space along with tree space and the probability under the curve and then using BICs to justify the strict clock model estimate over the unconstrained model estimate.
Which invariably leads to the question I’ve posed to some very good phylogeneticists: Why is it desirable to marginalize over the model parameters and the tree space at once? Why would I care about any model parameters that are not associated with the ML tree? Why do I care about suboptimal trees and their parameters?
I realize that the MCMC search provides a real confidence interval - I think - on the posterior probabilities and the branch lengths ( and I like that) but I can’t get my head around all the interest in nuisance parameters that are not associated with a truly optimal likelihood score for the tree. Never mind all the problems about the correlation in the MCMC search and the inflated posterior probabilities - I can account for that simply by recognizing that PPs will generally be too optimistic.
Is the reason for marginalization over all parameters simply to get confidence limits? And how do I explain the apparently non-nested quality of the strict clock model in MrBayes to my students?
Thank you very much for any insight you can provide."
Patrice