corr & corrcoef

Lorenzo Guerrasio

unread,

Jun 24, 2008, 12:49:02 PM6/24/08

to

Dear all,

i have a very stupid question: what is the difference
between

[r,p]=corr(X)

and

[r,p]=corrcoef(X)?

it seems that corr calcualtes the R,Bravais-Pearson
correlation coefficent: why then the diagonal is 0?Should
not be 1?
R=cov(x,y)/std(x)*std(y)
cov(x,x)=v(x)=std(x)^2=std(x)*std(x)-->R=1

Finally, how to decide which one to use to test if between
two variable there is a linear correlation?

P.S.
I also calcualte the regression line, if it can be used to
calculate it.

cheers

Peter Perkins

unread,

Jun 24, 2008, 1:25:24 PM6/24/08

to

Lorenzo Guerrasio wrote:

> i have a very stupid question: what is the difference
> between
>
> [r,p]=corr(X)
>
> and
>
> [r,p]=corrcoef(X)?

The difference is mostly:

'type' 'Pearson' (the default) to compute Pearson's linear
correlation coefficient, 'Kendall' to compute Kendall's
tau, or 'Spearman' to compute Spearman's rho.

i.e., corrcoef _only_ computes the linear correlation, while corr computes rank
correlations as well.

> it seems that corr calcualtes the R,Bravais-Pearson
> correlation coefficent: why then the diagonal is 0?Should
> not be 1?
> R=cov(x,y)/std(x)*std(y)
> cov(x,x)=v(x)=std(x)^2=std(x)*std(x)-->R=1

Under what conditions is the diagonal of r from [r,p]=corr(X) zero? Certainly
the diagonal of p is zero.

Hope this helps.

Lorenzo Guerrasio

unread,

Jun 24, 2008, 1:46:02 PM6/24/08

to

So the only difference is in how P is calculate?
x=0:0.1:10;
y=x.^2+rand(1,101);
plot(x,y)
X=[x(:),y(:)];
[a,b]=corrcoef(X)
[c,d]=corr(X)

b and d are different: diag(b)=1 1, diag(d)=0 0

Why is that?

Peter Perkins <Peter.Perki...@mathworks.com> wrote
in message <g3ram4$muh$1...@fred.mathworks.com>...

Peter Perkins

unread,

Jun 25, 2008, 10:07:10 AM6/25/08

to

Lorenzo Guerrasio wrote:
> So the only difference is in how P is calculate?
> x=0:0.1:10;
> y=x.^2+rand(1,101);
> plot(x,y)
> X=[x(:),y(:)];
> [a,b]=corrcoef(X)
> [c,d]=corr(X)
>
> b and d are different: diag(b)=1 1, diag(d)=0 0
>
> Why is that?

Lorenzo, I'm curious what you think the right answer is, and why it would be useful.

CORRCOEF only accepts a single input, thus the diagonal elements necessarily
correspond to a correlation of a variable with itself, which necessarily is 1,
and therefore the p-value is 0. CORR allows "cross correlation" (although you
aren't using that syntax), and so the diagonal elements could be correlations
between two variables, and if the two vectors of values happen to be identical,
then the correlation is 1, but the p-value is 0.

Lorenzo Guerrasio

unread,

Jun 25, 2008, 6:36:02 PM6/25/08

to

I don't know what is the answer; i know that in the example
I gave, p-value obtained with corrcoef aren't 0, but 1.
Secondly, also the other values are slighlty different.
i just want to understand why is that, and which of the two
test I should use to see if two variables are significantly
correlated.

Peter Perkins <Peter.Perki...@mathworks.com> wrote
in message <g3tjee$nbm$1...@fred.mathworks.com>...

Peter Perkins

unread,

Jun 26, 2008, 10:19:01 AM6/26/08

to

Lorenzo Guerrasio wrote:
> I don't know what is the answer; i know that in the example
> I gave, p-value obtained with corrcoef aren't 0, but 1.
> Secondly, also the other values are slighlty different.
> i just want to understand why is that, and which of the two
> test I should use to see if two variables are significantly
> correlated.

Lorenzo, we're talking about the _diagonal_ of the outputs, are we not? What
test are you using to see if a variable is significantly correlated with itself,
and what would the result tell you?

Lorenzo Guerrasio

unread,

Jun 26, 2008, 12:29:02 PM6/26/08

to

Yes, we are talking about the diagonal of the Pvalue: I
think it should test the probability of not being
correlate, both with the corr and with corrcoef function;
so I'm expecting 0 both for the function corr and corrcoef;
instead th function corrcoef has as output (of the pValue)
1. Since the two function seems to calculate the same thing
(pearson correlation index, and in fact the correlation
index are the same), so there must be a difference in the
test, since the resulting pValue are different. I cannot
understan what this difference is.

Thanks

Peter Perkins <Peter.Perki...@mathworks.com> wrote
in message <g408gl$soq$1...@fred.mathworks.com>...

Peter Perkins

unread,

Jun 26, 2008, 12:44:26 PM6/26/08

to

Lorenzo Guerrasio wrote:
> Yes, we are talking about the diagonal of the Pvalue: I
> think it should test the probability of not being
> correlate, both with the corr and with corrcoef function;
> so I'm expecting 0 both for the function corr and corrcoef;
> instead th function corrcoef has as output (of the pValue)
> 1. Since the two function seems to calculate the same thing
> (pearson correlation index, and in fact the correlation
> index are the same), so there must be a difference in the
> test, since the resulting pValue are different. I cannot
> understan what this difference is.

Lorenzo, again, why do you want to test if a variable is correlated with itself?

For the reason why the two functions return different values _along the
diagonal_ of P, see my previous post.

Tom Lane

unread,

Jun 26, 2008, 2:49:28 PM6/26/08

to

>> Yes, we are talking about the diagonal of the Pvalue ...

Let me try:

>> x = rand(10,2);
>> [r,p] = corrcoef(x)
r =
1.0000 0.2682
0.2682 1.0000
p =
1.0000 0.4537
0.4537 1.0000
>> [r,p] = corr(x)
r =
1.0000 0.2682
0.2682 1.0000
p =
0 0.4537
0.4537 0

Both corr and corrcoef produce the same correlation between the x columns
(0.2682) and the same p-value for it (0.4537).

It generally doesn't make sense to test whether a variable is correlated
with itself. So corrcoef gives 1 along the diagonal of the correlation
matrix, but gives 1 as its p-value. That way if you search for significant
correlations, you'll never flag the diagonal as significant.

The corr function is a little more general and can compute correlations
between pairs of columns from two different inputs. In that case the result
won't necessarily be square, it won't necessarily have 1's along the
diagonal, and a correlation of 1 is highly significant. So it gives the
p-value as 0 when the correlation is 1. But still, if you compute a
correlation matrix for a single input, the entries along the diagonal are
necessarily 1 and the p-value is not meaningful.

So it's perhaps unfortunate that these two functions have adopted different
conventions for what to put along the diagonal for the matrix of p-values
when there's a single input matrix, but you don't want to use those diagonal
values anyway.

-- Tom