Convergence in Stacey (BEAST2)

375 views
Skip to first unread message

mariheilertsen

unread,
Feb 2, 2017, 8:01:34 AM2/2/17
to beast-users
Hi all,
I am struggling to get convergence in my Stacey analysis. I have two datasets with four markers (two nuclear and two mitochondrial which I have linked the trees for), one with 63 specimens and one with 27. For both I have set the substitution models to HKY+G and used strict clock to simplify the model. After 2 billion generations the ESS values for the first dataset (63 specimens) is getting close to 100 for all parameters (some are much higher), but for the second dataset (27 specimens) there are still ESS values below 50 after 2 bill. gen. I am running several independent runs of 500 mill generations and looking at the combined logfiles in Tracer
Several questions:
1 - How high does the ESS values need to be for me to trust the tree? I am only interested in tree topology and species delimitation. Which parameters are most important to have high ESS values?
2 - For the smallest dataset it does not look like the ESS from the different runs are "adding up". For each run the ESS for a given parameter can be between 20-40 (four runs of 500 mill generations), but when I look at the combined values in Tracer it is only 45. Why is this? 3 - If the run has not converged by 2 bill. gen., will it never converge?
4 - I have some missing data in my dataset, could it help the analysis if I remove some of the specimens with missing data? That will mean less data in total, but less questionmarks in the matrix, which is more important?
5 - Are there ways I can further simplify the model? I have two partitions for COI (1 and 2 codon partition), should I link these (the trees are linked, and with the other mitochondrial marker)? Partition finder suggested to keep these as separate partitions. How do I know if I have simplified too much?

Thanks in advance for any input/ideas!

Mari

Alexei Drummond

unread,
Feb 2, 2017, 5:13:41 PM2/2/17
to beast...@googlegroups.com
Dear Mari,

What parameters are not mixing?

Alexei

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at https://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.

mariheilertsen

unread,
Feb 3, 2017, 1:46:30 AM2/3/17
to beast-users
Hi Alexei,
the posterior, prior, coalescent, growth rate, origin height, poppriorscale, the tree-height parameters, mutation rate parameters and clock rates are all well below 100 ESS after 2 billion generations.

Mari

Graham

unread,
Feb 4, 2017, 6:15:54 AM2/4/17
to beast-users
Mari,

You should check your priors are sensible for the growth rate, poppriorscale and other rates.

For the size of your data sets, the mixing seems exceptionally slow. Perhaps there is a model violation, eg introgression or paralogy affecting one of your loci?

If you post your data and XML, someone might spot something.

Graham

mariheilertsen

unread,
Feb 4, 2017, 6:42:28 AM2/4/17
to beast-users
Hi Graham,
here are my XMLs and alignments, I would be very grateful for any input! I tried setting the model up following the suggestions in the Stacey manuals, I think they are sensible, but I do not know if they might be to diffuse. I have some taxa that give very long branches in MrBayes/RAxML analyses, I do not know if that might be causing trouble somehow? 

Mari
CladeC_stacey_simpl.xml
16S_CladeA.nex
CladeA_stacey_simple.xml
16S_CladeC.nex
18S_CladeC.nex
28S_CladeC.nex
COI_CladeC.nex
COI_CladeA.nex
28S_CladeA.nex
18S_CladeA.nex

Graham

unread,
Feb 4, 2017, 8:46:27 AM2/4/17
to beast-users
Mari,

I attach an edited version of CladeC_stacey_simpl-popprior.xml. I changed three things. The first two were big problems, the third probably harmless.

You were estimating individual substitution rates (Site Model tab) for the loci, as well as estimating relative clock rates for all but the first (Clock Model tab). Estimating both makes these parameters almost unidentifiable, like trying to estimate the width and height of a rectangle, when the data only gives information about the area. You should not estimate the individual rates.

You had an inverse gamma (2,1) prior for popPriorScale.This will make popPriorScale have unrealistically large values. I used lognormal(-7,2).

Your priors for relative clock rates were lognormal(1,1) in log space. These are relative rates so it makes more sense to be centred around 1, unless you know something about the relative rates. I used lognormal(0,1).

The edited XML runs fine, all ESS above 200 after 30M generations.

Graham
CladeC_stacey_simpl-noestsubs-crates-popprior.xml

Mari Heggernes Eilertsen

unread,
Feb 4, 2017, 9:10:02 AM2/4/17
to beast...@googlegroups.com
That is fantastic, I am glad it was that easy to solve. 
Thank you for your help! 

Mari

You received this message because you are subscribed to a topic in the Google Groups "beast-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beast-users/y6TkSln1H0o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beast-users+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages