Bayes Factor Calculation Useless?

Cramer Rao

unread,

Feb 22, 2012, 5:07:56 PM2/22/12

to beast-users

I had a quick question on the Bayes Factor calculations as implemented
by Tracer that I was hoping someone had some insight into. It looked
to me (though I could be wrong) that Tracer is implementing the Kass
and Raferty harmonic mean estimator, and assessing the error on this
using bootstraps from the MCMC. However, I feel that this can't
possibly work for calculating a Bayes Factor or marginal likelihood
for the reasons outlined in Radford's blog post linked below, and that
therefore this estimate must be somewhat meaningless.

http://radfordneal.wordpress.com/2008/08/17/the-harmonic-mean-of-the-likelihood-worst-monte-carlo-method-ever/

The issues mentioned there seem particularly relevant to a typical
BEAST analysis, as the posterior is almost certainly concentrated at a
very small fraction of locations in parameter space and so using the
posterior from any finite MCMC run to get samples would fail to
integrate over the parameter space correctly. But, the given
reference is to Suchard 2001, and Suchard strikes me as a very sharp
guy, and I still see this used in papers quite a lot, so I feel I may
be missing something. Anyone know what is going on?

I gather whatever quantity is being estimated is probably related to
the average deviance of the different models, which does seem like a
good tool for comparison, but don't quite believe it is related to the
marginal likelihood.

David Swofford

unread,

Feb 24, 2012, 9:00:23 AM2/24/12

to beast...@googlegroups.com

First, you're right that the harmonic mean estimator (HME) isn't a very good way to approximate marginal likelihoods. But it is too strong to say that it is not "related to the marginal likelihood." In principle, you *can* estimate the marginal likelihood by sampling only from the posterior. The problem isn't simply that the "posterior is ... concentrated at very small fraction of locations in parameter space" Explicitly attempting to "integrate over the [entire] parameter space" wouldn't be a tractable way to estimate the marginal likelihood either.

The problem with the HME is that it both biased and has high (even infinite) variance. Lartillot implemented a method called "thermodynamic integration" (path sampling) that makes much better estimates of the marginal likelihood, but it is computationally intensive to calculate and has not been widely implemented in Bayesian phylogenetics packages (it's in his Phylobayes package). More recently, Paul Lewis, Ming-Hui Chen, and collaborators have a couple of recent papers that seem to match the accuracy of thermodynamic integration but with less computation (available in the Phycas package).

I share your concern about the HME, and given its bias toward selection of overly complex models in Bayes factor comparisons, don't really think it should be trusted. I imagine that the newer methods will become more widely available soon, but until then I think Bayes factors computed from HMEs of the marginal likelihood should be treated with a great deal of suspicion. Just my opinion--others may disagree.

References:

Lartillot et al. PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics (2009) vol. 25 (17) pp. 2286-8

Xie et al. Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology (2011) vol. 60 (2) pp. 150-60

Fan et al. Choosing among partition models in Bayesian phylogenetics. MBE (2011) vol. 28 (1) pp. 523-32

Dave

--
David L. Swofford david.s...@duke.edu

Center for Evolutionary Genomics
Institute for Genome Sciences & Policy
Box 90338
Duke University
Durham, NC 27708 USA

National Evolutionary Synthesis Center (NESCent)
Suite A200
2024 W. Main Street
Durham, NC 27705 USA

(919)613-7458 (Duke)
(919)668-4591 (Nescent)

Marc Suchard

unread,

Feb 24, 2012, 10:01:58 AM2/24/12

to beast-users

Guy Baele from KU Leuven and Alex Alekseyenko from NYU recently
completed a nice comparison of the harmonic mean estimator, path
sampling and stepping-stone sampling, all implemented in BEAST. One
nice aspect of this implementation is that they allow users to take
the tree topology as random and integrate over phylogenetic
uncertainty while computing marginal likelihoods. There's a
manuscript under review:

Baele et al. (under review) Improving the accuracy of demographic and
molecular clock model comparison while accommodating phylogenetic
uncertainty. Molecular Biology and Evolution.

Either Guy or Alex can provide interested users with example XML for
use with BEAST 1.7.

best, Marc

Andrew Rambaut

unread,

Feb 24, 2012, 10:07:30 AM2/24/12

to beast...@googlegroups.com

Thanks, Dave, for giving a comprehensive reply. I agree with your appraisal that HME should be avoided.

The good news is that path sampling is implemented in BEAST. We have a paper in review that explores its use for comparing coalescent models and molecular clock models. The gist is that HME indeed performs badly but is cheap to compute and path sampling gives good performance at a considerable computational cost (above the original analysis). We also compare these results to the AICM (another model comparison metric in the AIC family) which can be cheaply computed after the BEAST analysis and has statistical properties that are intermediate between the other two. This is implemented in the next version of Tracer (and the choice of HME will be discouraged).

Andrew

> --
> You received this message because you are subscribed to the Google Groups "beast-users" group.
> To post to this group, send email to beast...@googlegroups.com.
> To unsubscribe from this group, send email to beast-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/beast-users?hl=en.
>

Tommi

unread,

Apr 16, 2012, 7:25:13 AM4/16/12

to beast-users

Hi,

I have a simple question. When I want to compare the Bayesian factors
of two different Beast runs, which likelihood trace should I select?
because the result is not the same changing the different likelihood
traces (likelihood, prior, posterior etc...). I mean, the model with
the higher score is not always the same.
Thanks in advance for the answers and sorry for the dummy question!
Greetings,

Tommi

On 24 Feb., 17:07, Andrew Rambaut <ramb...@gmail.com> wrote:
> Thanks, Dave, for giving a comprehensive reply. I agree with your appraisal that HME should be avoided.
>
> The good news is that path sampling is implemented in BEAST. We have a paper in review that explores its use for comparing coalescent models and molecular clock models. The gist is that HME indeed performs badly but is cheap to compute and path sampling gives good performance at a considerable computational cost (above the original analysis). We also compare these results to the AICM (another model comparison metric in the AIC family) which can be cheaply computed after the BEAST analysis and has statistical properties that are intermediate between the other two. This is implemented in the next version of Tracer (and the choice of HME will be discouraged).
>
> Andrew
>
> On 24 Feb 2012, at 15:00, David Swofford wrote:
>
>
>
>
>
>
>
> > First, you're right that the harmonic mean estimator (HME) isn't a very good way to approximate marginal likelihoods. But it is too strong to say that it is not "related to the marginal likelihood." In principle, you *can* estimate the marginal likelihood by sampling only from the posterior. The problem isn't simply that the "posterior is ... concentrated at very small fraction of locations in parameter space" Explicitly attempting to "integrate over the [entire] parameter space" wouldn't be a tractable way to estimate the marginal likelihood either.
>

> > The problem with the HME is that it both biased and has high (even infinite) variance. Lartillot implemented a method called "thermodynamic integration" (path sampling) that makes much better estimates of the marginal likelihood, but it is computationally intensive to calculate and has not been widely implemented inBayesianphylogenetics packages (it's in his Phylobayes package). More recently, Paul Lewis, Ming-Hui Chen, and collaborators have a couple of recent papers that seem to match the accuracy of thermodynamic integration but with less computation (available in the Phycas package).
>
> > I share your concern about the HME, and given its bias toward selection of overly complex models in Bayesfactorcomparisons, don't really think it should be trusted. I imagine that the newer methods will become more widely available soon, but until then I think Bayes factors computed from HMEs of the marginal likelihood should be treated with a great deal of suspicion. Just my opinion--others may disagree.
>
> > References:
>
> > Lartillot et al. PhyloBayes 3: aBayesiansoftware package for phylogenetic reconstruction and molecular dating. Bioinformatics (2009) vol. 25 (17) pp. 2286-8
>
> > Xie et al. Improving marginal likelihood estimation forBayesianphylogenetic model selection. Systematic Biology (2011) vol. 60 (2) pp. 150-60
>
> > Fan et al. Choosing among partition models inBayesianphylogenetics. MBE (2011) vol. 28 (1) pp. 523-32

>
> > Dave
>
> > On Feb 22, 2012, at 5:07 PM, Cramer Rao wrote:
>

> >> I had a quick question on the BayesFactorcalculations as implemented

> >> by Tracer that I was hoping someone had some insight into. It looked
> >> to me (though I could be wrong) that Tracer is implementing the Kass
> >> and Raferty harmonic mean estimator, and assessing the error on this
> >> using bootstraps from the MCMC. However, I feel that this can't

> >> possibly work for calculating a BayesFactoror marginal likelihood

> >> for the reasons outlined in Radford's blog post linked below, and that
> >> therefore this estimate must be somewhat meaningless.
>

> >>http://radfordneal.wordpress.com/2008/08/17/the-harmonic-mean-of-the-...

>
> >> The issues mentioned there seem particularly relevant to a typical
> >> BEAST analysis, as the posterior is almost certainly concentrated at a
> >> very small fraction of locations in parameter space and so using the
> >> posterior from any finite MCMC run to get samples would fail to
> >> integrate over the parameter space correctly. But, the given
> >> reference is to Suchard 2001, and Suchard strikes me as a very sharp
> >> guy, and I still see this used in papers quite a lot, so I feel I may
> >> be missing something. Anyone know what is going on?
>
> >> I gather whatever quantity is being estimated is probably related to
> >> the average deviance of the different models, which does seem like a
> >> good tool for comparison, but don't quite believe it is related to the
> >> marginal likelihood.
>
> > --

> > David L. Swofford david.swoff...@duke.edu

Guy Baele

unread,

Apr 16, 2012, 11:04:26 AM4/16/12

to beast...@googlegroups.com

You have to select the likelihood trace to calculate Bayes factors using the harmonic mean estimator (HME).
However, as David and Andrew said in this thread, the HME should be avoided.

In the meantime, the advance access version of our paper is out, which illustrates this:

Baele G, Lemey P, Bedford T, Rambaut A, Suchard MA and Alekseyenko AV (2012) 'Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty' Molecular Biology and Evolution (in press)

On top of that, you may also want to check out the references provided by David.

Best regards,

Guy

Op maandag 16 april 2012 13:25:13 UTC+2 schreef Tommi het volgende:

> > To unsubscribe from this group, send email to beast-users+unsubscribe@googlegroups.com.

Andrés Parada

unread,

Apr 27, 2012, 12:01:47 PM4/27/12

to beast...@googlegroups.com

Can I add a minor question?

I think the traces I must compare should have a good mixing good ESS for all parameters. It's pointless to compare traces with low ESS since they have not converged right? But if a trace with low ESS for some parameters but which has more likelihood (seems like a better model) will it under-perform against another with lower likelihood once a good mixing is obtained?

thanks for your help

Best

nilofar alaie

unread,

May 4, 2016, 6:24:52 PM5/4/16

to beast-users

Hi
Hi
I want to compare two log file of Coalescence, Constant size with Coalescence, Exponential model.

I need to calculate the Bayes Factors of these two in the tracer

if any body know after these two models are selected in tracer which option should be used for Likelihood trace?

Thanks in advances

Niloo

> > To unsubscribe from this group, send email to beast-users...@googlegroups.com.

Guy Baele

unread,

May 5, 2016, 5:52:38 AM5/5/16

to beast-users

Comparing models based on the log files of an MCMC run is very inaccurate.
Please read the replies earlier in this thread and resort for performing either path sampling (PS), stepping-stone sampling (SS) or generalized stepping-stone sampling (GSS).

No matter which of these you pick, you'll have to run an additional analysis for each model.

Best regards,

Guy

Op donderdag 5 mei 2016 00:24:52 UTC+2 schreef nilofar alaie:

Reply all

Reply to author

Forward