Hello everyone,
I'm using *BEAST 2.0.2 for the first time to reconstruct a species tree using sequences from 20 nuclear loci (phased and unphased) for a group of species that are closely related and diverged rapidly. All loci show low variability, but contain substitutions that differentiate most of the species in the taxon set.
I've set up the analysis in BEAUTi and have conducted several trial runs to experiment with various parameter and prior settings. I run into a couple of issues that have raised some questions that I'm just not sure about and was wondering if I could get perhaps get some guidance from the users group.
The first question concerns linking or unlinking site models and clock models. Most of the loci have the same substitution model (HKY or HKY+G), as estimated by AICc in jModeltest, I was curious whether it makes more sense to link the site models for these loci together, thereby reducing the number of parameters that have to be estimated, or if it's better to keep these unlinked for the *BEAST analyses? I realize that it may be preferable to estimate a gamma shape fro each locus, but the variability of the loci is low in most cases that I'm not sure an accurate estimation of this parameter can be achieved. In any case, does it make sense to link OR unlink the site model of the loci if they all share the same substitution model?
I have a similar question about the clock models - link or unlink across the 20 loci? The low variability of the loci and some testing of the molecular clock suggest that all loci are best modeled with a strict molecular clock. I assume that by linking all the loci together, this means one is making a strong assumption that the loci share the same substitution rate, but this may not be valid. Nonetheless, I have not been able to find any "rules of thumb" concerning this issue.
Lastly, in preliminary runs with chain lengths of 100 million generations, the ESS values for many of the parameters, including the likelihoods of the tree and coalescent model, are quite low (<<100). Are there any ways to remedy this situation? I realize my data set is complex in terms of the number of loci and taxa and with all the parameters that need to be estimated may require a longer chain length to obtain good convergence and mixing.
Thanks in advance for any advice and guidance.
Klaus