Question about Star-BEAST analyses

468 views
Skip to first unread message

Klaus-Peter Koepfli

unread,
Sep 25, 2013, 3:51:55 PM9/25/13
to beast...@googlegroups.com
Hello everyone,

I'm using *BEAST 2.0.2 for the first time to reconstruct a species tree using sequences from 20 nuclear loci (phased and unphased) for a group of species that are closely related and diverged rapidly. All loci show low variability, but contain substitutions that differentiate most of the species in the taxon set.

I've set up the analysis in BEAUTi and have conducted several trial runs to experiment with various parameter and prior settings. I run into a couple of issues that have raised some questions that I'm just not sure about and was wondering if I could get perhaps get some guidance from the users group.

The first question concerns linking or unlinking site models and clock models. Most of the loci have the same substitution model (HKY or HKY+G), as estimated by AICc in jModeltest, I was curious whether it makes more sense to link the site models for these loci together, thereby reducing the number of parameters that have to be estimated, or if it's better to keep these unlinked for the *BEAST analyses? I realize that it may be preferable to estimate a gamma shape fro each locus, but the variability of the loci is low in most cases that I'm not sure an accurate estimation of this parameter can be achieved. In any case, does it make sense to link OR unlink the site model of the loci if they all share the same substitution model?

I have a similar question about the clock models - link or unlink across the 20 loci? The low variability of the loci and some testing of the molecular clock suggest that all loci are best modeled with a strict molecular clock. I assume that by linking all the loci together, this means one is making a strong assumption that the loci share the same substitution rate, but this may not be valid. Nonetheless, I have not been able to find any "rules of thumb" concerning this issue.

Lastly, in preliminary runs with chain lengths of 100 million generations, the ESS values for many of the parameters, including the likelihoods of the tree and coalescent model, are quite low  (<<100). Are there any ways to remedy this situation? I realize my data set is complex in terms of the number of loci and taxa and with all the parameters that need to be estimated may require a longer chain length to obtain good convergence and mixing.

Thanks in advance for any advice and guidance.

Klaus

pepster

unread,
Sep 28, 2013, 1:56:42 PM9/28/13
to beast...@googlegroups.com
Hi Klaus

If you send me the XML file (compressed) I can have a look.

-Joseph

Klaus-Peter Koepfli

unread,
Feb 20, 2014, 2:20:19 PM2/20/14
to beast...@googlegroups.com
Dear Joseph,

I'm finally returning to this issue that I first posted about in September. I've collected some mitochondrial data since then to add to the *BEAST analyses.

Can I still send you the .xml that I created using (now) Beauti from the 2.1.1 BEAST package? I've done some test runs and the ESS for the posterior and prior are low and it seems to take a long time to reach convergence.

Best regards,

Klaus
Reply all
Reply to author
Forward
0 new messages