*BEAST species coalescent convergence issues

1,200 views
Skip to first unread message

Jim McGuire

unread,
Feb 2, 2016, 2:05:08 PM2/2/16
to beast-users
Hi All,

I am running calibrated *BEAST analyses on multilocus data sets (9 loci, 134 tips, 43 species and 50 loci, 166 tips, 51 species). I have typically run these analyses for 1 billion (or more) generations, with a calibrated Yule tree model, and a log normal calibration prior. For the multi-species coalescent prior, I've tried both 'linear with constant root' and 'linear.'  

These analyses return pretty sensible trees with realistic dates of divergence and appropriate posterior probability estimates for nodes, but the convergence properties are poor and ESS values low for many parameters. Most notably, the trace plot for the likelihood has excellent convergence properties and ESS values, but the traces for both the species coalescent and the posterior converge poorly, rise and fall heterogeneously, and have identical shapes and ESS values to one another. This is evident in the three attached plots. 

My question for the Beast-users group is whether the source of the problem is something known/obvious and if there is a recommendation on what I should consider adjusting that might help improve performance? I apologize if this issue has received prior coverage on the user-group - I looked pretty carefully without finding anything that specifically addressed this problem.

Thanks!

Jim     

posterior.pdf
likelihood.pdf
speciescoalescent.pdf

Alexei Drummond

unread,
Feb 2, 2016, 2:53:53 PM2/2/16
to beast...@googlegroups.com
Hi Jim,

Possibly you have a 1/X prior on the species tree populationMean parameter?

Cheers
Alexei

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at https://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.
<posterior.pdf><likelihood.pdf><speciescoalescent.pdf>

Jim McGuire

unread,
Feb 2, 2016, 3:01:31 PM2/2/16
to beast-users
Yes, you are right! Can you suggest an alternative? 

Jim

Alexei Drummond

unread,
Feb 2, 2016, 3:24:40 PM2/2/16
to beast...@googlegroups.com
Hi Jim,

Perhaps you could consider a diffuse log normal distribution (S=2?) and an M parameter that is sensible for the species you are analyzing (and the units you are using).

Alexei

Jim McGuire

unread,
Feb 2, 2016, 3:52:02 PM2/2/16
to beast-users
Thanks for the advice, Alexei - will do!

Jim

Jim McGuire

unread,
Feb 2, 2016, 4:19:31 PM2/2/16
to beast-users
I think I spoke too soon. Were you suggesting that I use a diffuse log normal distribution for the popMean parameter? When I set the popMean parameter to log normal and then attempt to set the parameters, I see windows for 'Value', 'lower' and 'upper', and 'Dimension' and 'Minordimension'. I looked through the Beast Book and the user-group but didn't see explanations for these settings and I didn't see M and S settings such as the ones that appear when setting the calibration priors. So, my confident reply was premature and I'm not sure how to implement your suggestion. I feel like I'm learning to crawl here - apologies for not being up to speed on these issues!

Jim   

Huw A. Ogilvie

unread,
Feb 3, 2016, 5:05:42 AM2/3/16
to beast-users
Hi Jim,

Could you attach the trace log for popMean? If that parameter is mixing well, it may be premature to change the popMean prior. On "large" data sets *BEAST often mixes poorly. Given your data, it may be necessary to sample from 10-20 billion states. This can be achieved by running 10-20 chains of 1 billion states each in parallel, then concatenating the resulting log & tree files using logcombiner.

It may also be worth applying *BEAST to a reduced data set with a single representative individual (using both phased haplotypes for that individual) per species. This is very likely to have better mixing than the full data set with multiple individuals per species.

- Huw

Graham

unread,
Feb 3, 2016, 6:08:05 AM2/3/16
to beast-users
Jim,

On the problem setting the prior for PopMean, I think that's a Beauti bug. You should see M and S. Someone else reported something similar when using STACEY:

"When I tried to adjust the lognormal priors as you suggested, the window that opens when you click on the box to enter various values (mean, sd, for example) doesn’t have these parameters listed.  I has upper, lower, starting value and something about whether the values will be in real space and some dimensions.  It looks more like the options for a uniform prior.  I tried changing it to a normal distribution, but I get the same window and options as before.  Perhaps a Beauti bug?  Have you seen that one? I am using v2.3.1"

I have been working on improving convergence in STACEY (which has a lot in common with *BEAST). The slowly waving, and matching, traces for posterior and coalescent are familiar sights to me! I'd say *BEAST should not have huge problems converging with the 9 loci data set (here Alexei's suggestion may help) but mainly I agree with Huw. I'd expect the 50 loci analysis to take a very long time. I will be releasing a new version of STACEY very soon (next week?) and that will have a template which allows you to use *BEAST, but with MCMC operators from STACEY. That should improve convergence, but whether it does remains to be seen.

Graham Jones, www.indriid.com.

Alexei Drummond

unread,
Feb 3, 2016, 6:14:28 AM2/3/16
to beast...@googlegroups.com
Hi Jim,

Yes it might be a bug in BEAUti so you would have to change the values of M and S for the popMean prior in the XML.

Graham and Huw make good points. But I would anyway argue that you should not use improper priors (like 1/X) unless you really know what you are doing. Improper priors for instance render meaningless attempts to calculate Bayes factors using path sampling because improper priors can’t be sampled.

Huw is also working on StarBEAST2 which is significantly faster and has better support for relaxed clocks. I think he plans to release it in the next month or two.

Cheers
Alexei

Graham

unread,
Feb 3, 2016, 7:26:11 AM2/3/16
to beast-users


On Wednesday, 3 February 2016 11:14:28 UTC, Alexei Drummond wrote:
I would anyway argue that you should not use improper priors...

So would I (but I doubt this is Jim's probem).

Graham

Alexei Drummond

unread,
Feb 3, 2016, 8:15:12 AM2/3/16
to beast...@googlegroups.com
Hey Graham,

You and Huw have convinced me :) Great to hear that a new version of STACEY is on the way!

Alexei

Remco Bouckaert

unread,
Feb 3, 2016, 1:47:59 PM2/3/16
to beast...@googlegroups.com
Hi Graham, Jim,

The prior of the population mean can be found in the priors panel, not in the “Multi Species Coalescent” panel.

When you double click the little pencil icon next to “Population Mean” in the “Multi Species Coalescent” panel, a dialog pops up that allows you to set properties of the Population Mean parameter, like upper and lower bounds, but it does not allow you to change the prior.

Cheers,

Remco



Jim McGuire

unread,
Feb 3, 2016, 7:28:37 PM2/3/16
to beast-users
Hi All,

Thanks for your replies. I have a bit more info now and a few more questions. First, in reply to Huw, I guess I'm confused enough about the nomenclature that I'm not certain which tracelog is for popMean. I don't actually see popMean in the Traces list. I do see 'prior' and 'speciescoalescent'. My original post had the speciescoalescent trace appended. Can you clarify for me?

I set up another analysis with the log normal distribution for the popMean prior. The behavior thus far is no different than before, so I think this confirms the opinions of Graham and Huw that an improper prior was not the root cause of the unusual convergence behavior (although having more appropriate log normal prior settings might still be in play since it does seem that there is a bug in Beauti preventing me from using the settings suggested by Alexei).

Any chance I can take an advanced run at StarBEAST2? ;) Just a thought. I'll certainly be watching for the STACEY release.

Finally, I would not be surprised if Huw's answer is correct - that I just need to run these analyses for a lot longer (15-20 billion generations versus 1 billion generations) to obtain sufficient ESS values. As I mentioned in my first post, the species tree estimates, divergence time estimates, and posterior support values all actually look quite good despite the odd traces and low ESS values.

Thanks to all of you for your time!

Jim     

Graham

unread,
Feb 6, 2016, 5:19:10 AM2/6/16
to beast-users
Jim said


> I don't actually see popMean in the Traces list.

It should be there, but it wasn't there when I tried using StarBEAST either. I have opened an issue #502, https://github.com/CompEvol/beast2/issues. You could add it to the XML for new runs.

Remco Bouckaert

unread,
Feb 7, 2016, 2:23:35 PM2/7/16
to beast...@googlegroups.com
Hi Graham,

Thanks for pointing this out — it will be fixed in the next release.

Remco
Reply all
Reply to author
Forward
0 new messages