Positive Marginal Likelihood values for Path-Sampling and Stepping Stone

Samantha Strickland

unread,

Jul 11, 2013, 4:57:48 PM7/11/13

to beast...@googlegroups.com

I obtained postive values for the log marginal likelihood using both path sampling and stepping stone. Initially, I ran the beast runs with the HKY+G nucleotide model and the results were as expected. However, when I changed the nucleotide model to a non-standard nested GTR model the postive values were obtained. Are there any suggestions as why this occurred or how to avoid it. The log file of the MCMC and the MCC tree were as expected.

Below is a sample of the Beast output

PathParameter MeanPathLikelihood
.

.

1.2633e-06 -22350
1.5284e-06 -28013
1.8303e-06 -53563
2.1715e-06 -4771.6
2.5550e-06 -34945
2.9836e-06 -4786.8
3.4601e-06 -13150
3.9875e-06 6.2766e+05
4.5688e-06 1.2562e+06
5.2069e-06 1.2328e+06
5.9049e-06 1.2973e+06
6.6659e-06 1.4727e+06
7.4931e-06 1.9172e+06
8.3895e-06 2.0179e+06
9.3585e-06 1.7118e+06
1.0403e-05 1.7363e+06

.

log marginal likelihood (using path sampling) from pathLikelihood.delta = 225422.81943952467

Thank you,

Samantha

chervin...@gmail.com

unread,

Jul 15, 2013, 4:03:26 AM7/15/13

to beast...@googlegroups.com, Jean-Luc BAILLY

Hello,

I got the same problem when I used the pathsampling and stepping stones tests with my datasets.

To compare SRD06 and GTR substitution models and the BSP versus GMRF BS tree models, I ran four analyses with the PS and SS tests.

Both tests return positive values (294 263 and 298 299) for the log marginal likelihood for the analyses with the GTR model. For the analyses done with the SRD06 model, the log marginal likelihood values look fine.

Inspection of the log file with Tracer indicates that the path likelihood.delta statistic jumps to values higher than 1.25e+08 from 8.5e+07 states to the end.

What's wrong in the analysis ?

Thank you for your help,

Chervin

Marc Suchard

unread,

Jul 15, 2013, 12:52:12 PM7/15/13

to beast...@googlegroups.com

While positive log marginal likelihoods are theoretically possible, It is much more likely that your analysis uses an improper prior. The marginal likelihood may not exist if you use improper priors; see Baele et al. (2012) -- http://mbe.oxfordjournals.org/content/early/2012/04/10/molbev.mss084.full

best, Marc

Guy Baele

unread,

Aug 15, 2013, 6:26:03 AM8/15/13

to beast...@googlegroups.com

Note that it is perfectly possible to log the parameters of your model(s) during the marginal likelihood estimation.
While this may take up quite a bit of disk space, it should tell you which parameter suddenly takes off and yields this large positive contribution to the likelihood. Looking at the MCMC log will not help much since the problem occurs at a power posterior close to 0, i.e. close to the prior.

Also, try to run the PS/SS with fewer path steps and longer chain per path step first.

Best regards,

Guy

Op donderdag 11 juli 2013 22:57:48 UTC+2 schreef Samantha Strickland:

Message has been deleted

Maude Jacquot

unread,

Jun 17, 2015, 6:25:45 AM6/17/15

to beast...@googlegroups.com

Hi all,

Regarding the number of path steps and the length of chain per path step, how to choose the right setting a priori?

Moreover, I have read it is impossible to (re)compute PS/SS independently of the BEAST run because it uses power posterior and not posterior only. However, it remains unclear for me what does it mean exactly. Could you please explain me?

Then, when having results from a very long MCMC run, is there any correct way to perform a new quicker analysis and compute PS/SS?

Sorry about all these naive questions and hope that makes sens!

Many thanks for your answers,

Maude

Guy Baele

unread,

Jun 18, 2015, 9:34:38 AM6/18/15

to beast...@googlegroups.com

Hi Maude,

Unfortunately, there's no one setting that will work for every data set, just like there's no one run length for your BEAST run that will give you adequate ESS values for every data set (unless it's billions of iterations).

So try something reasonable based on the time it would take to compute.

If you can for example do 5 million iterations per hour, then run 50 path steps for 1 million iterations each, and you should have a first result after approximately 50 hours.

To check if your result is stable or has converged, run it a bit longer, i.e. 100 path steps for 1 million iterations each.

If the two results are close to one another, you can stop and use that estimate as your final one.

A standard BEAST run gathers samples from a power posterior run with power = 1.

To perform marginal likelihood estimation, we need to gather samples from a whole series of power posteriors, i.e. power = 1, 0.99, 0.98, ..., 0.01, 0.0 (here the power are uniformly distributed between 1 and 0, just to make my point).
So while you may have a large number of samples from one power posterior, you still need samples from the other 100 power posteriors, which is why your samples from your BEAST run alone will be insufficient.

There are no real shortcuts I'm afraid, although we are continuing to work on new methods to make the computations less demanding and hence faster.

Best regards,

Guy

Op woensdag 17 juni 2015 12:25:45 UTC+2 schreef Mj:

Mj

unread,

Jun 18, 2015, 12:30:15 PM6/18/15

to beast...@googlegroups.com

Hi Guy,

Thank you very much for your answer. Now, that makes more sens.
It would be definitively useful to be able to develop new faster methods, however I suppose that it is not easy. Good luck and thanks again.

Best wishes,

Maude

Josh Sealy

unread,

Feb 28, 2017, 1:26:37 PM2/28/17

to beast-users

Hi All,

I know this is an old post but i wanted to share something that helped me solve the issue of unlikely positive log marginal likelihoods from path sampling in BEAST2.

When setting up an XML file in BEAUTi I erroneously imported a nucleotide alignment that had ambiguous site calls i.e. anything other than AGTC e.g. r, -, etc. An early indicator that i was doing something wrong that i did not pick up on was that when importing an alignment the default format selected was "aminoacid" and not "nucleotide".

After ensuring everything in my nucleotide alignment file was in fact a nucleotide i managed to get log marginal likelihoods down from +125,000 to -2681.