covariance

Regina

unread,

Apr 17, 2008, 8:42:04 AM4/17/08

to beast-users

Hi all,

I've got a negative value of the parameter covariance of rates, which
I cannot explain very well. I'm doing an analysis at superfamily
level so I was expecting a very low positive value considering that
are distant lineages, which would indicate that rates of substitution
are not autocorrelated across the tree.
But a negative value means that there is no autocorrelation? Means
that one of the lineages of the clade evolves at a higher rate than
expected and the other at a low rate than expected??

cheers

Regina

Simon Ho

unread,

Apr 17, 2008, 9:18:01 PM4/17/08

to beast-users

Hi Regina,

I'm guessing that the 95% HPD interval for the covariance includes
zero? In that case, you would not be able to reject the hypothesis of
non-autocorrelation.

I did a series of analyses of simulated data sets (in an unpublished
section of my PhD thesis, 2006), and found that the covariance
statistic in BEAST was a very weak measure of rate autocorrelation.
The covariance did increase as the actual rate autocorrelation in the
simulated data increased, but the 95% HPD interval always included
zero for individual analyses, even for high simulated levels of rate
autocorrelation (if I remember correctly). Perhaps my simulated data
set was too small (64 sequences, 2,000 bp). But I analysed 100
replicates and there was certainly a trend in the mean covariance.

See also this previous post by Alexei:
http://groups.google.com/group/beast-users/browse_thread/thread/8f59d51640ff2786/07e5cfea5920264c?lnk=gst&q=covariance#07e5cfea5920264c

Cheers,
Simon

Regina

unread,

Apr 18, 2008, 11:49:05 AM4/18/08

to beast-users

Hi Simon,

yes, you're right! The 95% HPD's were: lower -0.3; upper 0.221
so, it looks like I don't have autocorrelation.
I was confused because with Multidivtime I've got a value of 0.14839
(0.04494,0.35390), which indicates some degree of autocorrelation. But
considering what you said maybe it's not a good measure.

thanks a lot

Regina

On Apr 18, 2:18 am, Simon Ho <drsimo...@gmail.com> wrote:
> Hi Regina,
>
> I'm guessing that the 95% HPD interval for the covariance includes
> zero? In that case, you would not be able to reject the hypothesis of
> non-autocorrelation.
>
> I did a series of analyses of simulated data sets (in an unpublished
> section of my PhD thesis, 2006), and found that the covariance
> statistic in BEAST was a very weak measure of rate autocorrelation.
> The covariance did increase as the actual rate autocorrelation in the
> simulated data increased, but the 95% HPD interval always included
> zero for individual analyses, even for high simulated levels of rate
> autocorrelation (if I remember correctly). Perhaps my simulated data
> set was too small (64 sequences, 2,000 bp). But I analysed 100
> replicates and there was certainly a trend in the mean covariance.
>

> See also this previous post by Alexei:http://groups.google.com/group/beast-users/browse_thread/thread/8f59d...

Marc Suchard

unread,

Apr 19, 2008, 9:57:28 AM4/19/08

to beast-users

Regina, Simon and all Beastie Boys and Girls,

I've been thinking a bit about this hypothesis test H_0:
autocorrelation rho = 0. In a Bayesian context, just checking the HPD
does not tell you the whole picture. For example, if I take a prior
on rho that puts a huge amount of mass on or near zero, then whatever
my data say, the posterior HPD will mostly likely cover zero.

Simon, I am guessing this is what might have happened in your
dissertation chapter and with Regina's data here.

To perform a formal hypothesis test, one needs to compare the
posterior to prior and ask "how much did the data change my belief?"
To do a Bayes factor test on a sharp hypothesis (i.e. some parameter =
0), one can easily use the Savage-Dickey ratio (see Suchard et al 2001
for its first use in phylogenetics; Verdinelli and Wasserman 199X for
a discussion). The idea is to compare the height of the posterior
density to the height of the prior density at the restriction point 0.

For a while now, the "uncorrelated rates" model has been bothering me
a bit, well actually the name has been bothering me. Since each rate
in the model is set equal to one of the quantiles of the underlying
distribution, then shouldn't the rates actually be quite negatively
correlated (at least in the prior). Think of it this way, if one of
the branches wants one of the larger quantiles, then the other
branches will necessarily get the remaining smaller quantiles.

I guess this is mostly a question for Simon: Has someone looked at
the correlation statistic rho under the prior? Maybe the prior puts a
large bit of mass on negative values. These negative values in the
prior are pulling the posterior HPD towards zero.

This negative correlation in prior does not (if it even exists) imply
that the relaxed clock model is bad in anyway (I think it's great).
It just means that one should be more careful in doing the rho = 0
hypothesis test. The Savage-Dickey ratio will appropriately control
for the asymmetric prior.

Let me know if you want help using BEAST output to use the SD ratio to
calculate your Bayes factor. Ultimately, I could put some code into
Tracer to estimate density ordinates (density heights at one specific
value).

best, Marc

Reply all

Reply to author

Forward