I've recently encountered a problem with extremely low ESS values for
the coefficient of variation and ucld.stdev, while analysing a
nucleotide dataset. The ESS for those values is about 5 (with a chain
length of 10 million), while the mean is around 1.1. All the other ESS
values are high (above 400 and many in the thousands). Also, the
histogram shows a bimodal distribution, with a peak between 0 and 0.5
and another peak 2.5-3.
I'm running an analysis with fixed mean substitution rates (at 1) and
using tree.prior for the root height. I'm not calibrating the dates as
all I'm interested in are relative distances between taxa.
Can anyone tell me what the reasons for this would be?
Thanks,
Muri
Botany Department,
University of Cape Town,
South Africa
How many unique site patterns do you have? One possibility is that you
have overparameterized your model and the data is being spread too
thinly across your parameters, so that it can't even tell if things
are very clocklike (0-0.5) or ver non-clocklike (2.5-3). I generally
recommend starting with a simple analysis (say strict clock + HKY)
just to get a feel for your data. Of course the clock is usually
violated, but at least you have a baseline to see how well things are
mixing and what sort of tree heights and so forth you should be
expecting...
However whether or not that is the problem, it appears that your MCMC
chain is jumping between two quite different possible explanations of
your data. In one explanation the rate have quite extreme rate
variation among branches, in the other the rate variation among
branchs is much more clock like. Since your chain has this bimodality
and its mixing very slowly, you could probably look at the trace of
ucld.stdev and visually pick out different parts of your chain that
are in the different modes. Then you could extract these parts of the
chain from the tree file and use tree annotator to summarize them
separately. This will allow you to look at the trees and rates
associated with each of the two modes. It may be that one of the modes
makes no biological sense and can be excluded by changing your priors
(for example one mode might have the root position in the wrong place
-- in which case you can force your ingroup to be monophyletic to
ensure this doesn't happen). If one of the modes doesn't make sense
you could also exclude it by putting limits on the ucld.stdev
parameter...
Without actually seeing the Tracer output I probably can't be more
help than that.
Cheers
Alexei
Thanks for the quick reply!
I had analysed the same dataset before without running into this
problem. What changed from the previous analysis was fixing the mean
substitution rate to 1 and setting the treeModel.rootHeight prior to
Treeprior. In this current run, the trace pattern is very distinct for
both statistics. It starts out at 3 and then drops to about an average
of 0.4 after 4 million states. In essence, when I choose a burn-in of
4 million states, the ESS goes up to about 900 for both statistics and
the trace plot looks absolutely random.
Cheers,
Muri
Botany Department,
University of Cape Town,
South Africa
Cheers
Alexei