Bayes factors

Gerald M. Schneeweiss

unread,

Mar 5, 2008, 6:47:00 AM3/5/08

to beast-users

Hi,

I am using Bayes factors to compare different rooting positions on a
tree. I am using both estimation methods, and they give qualitatively
similar results, but the harmonic mean estimator not only generally
gives worse marginal likelihoods, but also usually larger log-Bayes
factors than the second method (sorry for my ignorance, but I failed
to find out what precisely the second [bootstrap] method really
does). The largest difference I encountered was a logBF of 8 from the
harmonic mean estimator, but only of 1.3 from the second method. As
this has an effect on the interpretation, I would like to ask whether
you have any recommendations concerning which method might be
better/more reliable/.... Thanks in advance.

Best,
Gerald

Marc Suchard

unread,

Mar 6, 2008, 10:10:08 AM3/6/08

to beast-users

Dear Gerald,

Things may have changed since I provided the source code to Alexei and
Andrew for Bayes Factor estimates, but originally there was only one
method of estimation. This method is based on a harmonic mean
estimator first proposed by Newton and Raftery.

The code first estimated the harmonic mean estimator for the raw MCMC
output, yielding a single number -- this is probably what you are
calling method #1. However, harmonic mean estimators have some bad
properties (e.g., sometimes they have infinite variance), so re-
running your MCMC chain and re-estimating the harmonic mean can give
you pretty big differences; this is called Monte Carlo error. (What
"pretty big" means is very problem-dependent unfortunately.) Re-
running your analysis over and over again to get a handle on the Monte
Carlo error in your original estimate would be very time-consuming. A
former student of mine, Ben Redelings, suggested that one could use
the Bootstrap (introduced by Efron) on a single MCMC chain to adjust
the harmonic estimator for Monte Carlo error -- this is probably what
you are calling method #2. The Bootstrapping is a little more
difficult than normal because the samples from the MCMC chain are
highly auto-correlated, so we employ some techniques from time-series
analysis.

Anyway, the Bootstrapped harmonic estimates provide a range in which
you might expect independent estimates from additional MCMC chains to
fall. If you are curious, try running a couple more independent
simulations and see what type of variance you seen in the single point-
estimates (what you are calling method #1).

A log_{10} BF 1.3 means that one model is about 20 times more likely
than another ... that seems like fairly good evidence.

If you use these methods don't forgot to make the appropriate
references. The first use of BFs in phylogenetics is Sinsheimer, Lake
and Little (1996) Biometrics, but most people reference Suchard,
Sinsheimer and Weiss (2001) MBE because the methods are much easier.
This first use of the harmonic mean estimator in phylogenetics is
Suchard (2005) Genetics. The bootstrap addition comes from Redelings
and Suchard (2005) SystBiol.

best, Marc

On Mar 5, 3:47 am, "Gerald M. Schneeweiss"

Gerald M. Schneeweiss

unread,

Mar 7, 2008, 3:40:53 AM3/7/08

to beast-users

Dear Marc,

thanks for these clarifications and the references, which I will
gladly incorporate. When employing the bootstrap method (i.e., using
harmonicOnly=false bootstrap=yes), after running the analysis it is
indicated that the harmonic mean is smoothed, while otherwise
(harmonicOnly=true), this is not case. So I guess that this smoothing
is causing the differences in the harmonic mean estimates. The
"confidence interval" from bootstrapping ranges between +/- 0.105 to
0.135 log-units only (compared to 0.3 to 8.8 between the presumably
non-smoothed and the smoothed estimator).

Thanks again,
best,
Gerald

Reply all

Reply to author

Forward