Very low ESS for coefficient of variation and standard deviation

1,149 views
Skip to first unread message

Muri

unread,
Aug 21, 2007, 5:12:08 AM8/21/07
to beast-users
Hi all,

I've recently encountered a problem with extremely low ESS values for
the coefficient of variation and ucld.stdev, while analysing a
nucleotide dataset. The ESS for those values is about 5 (with a chain
length of 10 million), while the mean is around 1.1. All the other ESS
values are high (above 400 and many in the thousands). Also, the
histogram shows a bimodal distribution, with a peak between 0 and 0.5
and another peak 2.5-3.
I'm running an analysis with fixed mean substitution rates (at 1) and
using tree.prior for the root height. I'm not calibrating the dates as
all I'm interested in are relative distances between taxa.

Can anyone tell me what the reasons for this would be?

Thanks,
Muri

Botany Department,
University of Cape Town,
South Africa

alexei....@gmail.com

unread,
Aug 21, 2007, 5:37:43 AM8/21/07
to beast-users
Hey Muri,

How many unique site patterns do you have? One possibility is that you
have overparameterized your model and the data is being spread too
thinly across your parameters, so that it can't even tell if things
are very clocklike (0-0.5) or ver non-clocklike (2.5-3). I generally
recommend starting with a simple analysis (say strict clock + HKY)
just to get a feel for your data. Of course the clock is usually
violated, but at least you have a baseline to see how well things are
mixing and what sort of tree heights and so forth you should be
expecting...

However whether or not that is the problem, it appears that your MCMC
chain is jumping between two quite different possible explanations of
your data. In one explanation the rate have quite extreme rate
variation among branches, in the other the rate variation among
branchs is much more clock like. Since your chain has this bimodality
and its mixing very slowly, you could probably look at the trace of
ucld.stdev and visually pick out different parts of your chain that
are in the different modes. Then you could extract these parts of the
chain from the tree file and use tree annotator to summarize them
separately. This will allow you to look at the trees and rates
associated with each of the two modes. It may be that one of the modes
makes no biological sense and can be excluded by changing your priors
(for example one mode might have the root position in the wrong place
-- in which case you can force your ingroup to be monophyletic to
ensure this doesn't happen). If one of the modes doesn't make sense
you could also exclude it by putting limits on the ucld.stdev
parameter...


Without actually seeing the Tracer output I probably can't be more
help than that.

Cheers
Alexei

Muri

unread,
Aug 21, 2007, 5:53:17 AM8/21/07
to beast-users
Hi Alexei,

Thanks for the quick reply!
I had analysed the same dataset before without running into this
problem. What changed from the previous analysis was fixing the mean
substitution rate to 1 and setting the treeModel.rootHeight prior to
Treeprior. In this current run, the trace pattern is very distinct for
both statistics. It starts out at 3 and then drops to about an average
of 0.4 after 4 million states. In essence, when I choose a burn-in of
4 million states, the ESS goes up to about 900 for both statistics and
the trace plot looks absolutely random.

Cheers,
Muri

Botany Department,
University of Cape Town,
South Africa

alexei....@gmail.com

unread,
Aug 21, 2007, 6:16:35 AM8/21/07
to beast-users
Okay, then its probably just extremely poor convergence -- you should
be careful of the starting values that you use for some of the
parameters -- if the starting values are extreme it can be quite hard
for chain to get through the burnin. Are you fixing the ucld.mean
parameter to 1 or the meanRate to 1? If you fix the meanRate to 1 then
you should expect absolutely abyssmal (terrible) mixing - I don't
recommend this at all. Setting meanRate to 1 will make it almost
impossible to sample the rates and divergence times, because no change
to either an individual rate, or an individual divergence time will
keep the meanRate constant, so moves will always be rejected. I don't
think we currently have a way to efficiently keep the meanRate fixed
to 1 for the UCLN model (we should probably fix this). An alternative
(if you know where the root is and can use a starting tree that has
the correct root position) is to fix the root height to 1.0 and then
you can work out the relative heights quite easily. However we don't
usually recommend that, so if you are really only interested in the
relative heights, then fixing the ucld.mean parameter to 1.0 should be
fine. If that is what you did, then the next thing to do is to put an
upper limit on the ucld.stdev (say at 2) to avoid getting stuck in
that part of parameter space early on in the run.

Cheers
Alexei

Reply all
Reply to author
Forward
0 new messages