Calculating Covariance Matrix

706 views
Skip to first unread message

Sebastian Stabinger

unread,
Jun 20, 2013, 2:42:20 PM6/20/13
to scala-...@googlegroups.com
Hi

Maybe I am just too stupid to find it but how do I calculate a covariance matrix in breeze?

Thanks, Sebastian

David Hall

unread,
Jun 20, 2013, 4:37:55 PM6/20/13
to scala-...@googlegroups.com
Nope, it's just not implemented... Clearly it's something we should
have; you're just the first person to ask for it.

something like

def cov(data: DenseMatrix[Double], unbiased: Boolean = false) = {
val m = mean(data, Axis._0)
val cv = DenseMatrix[Double](data.cols,data.cols)
for(i <- 0 until cv.cols) {
cv += data(::, i) * data(::,i).t // do we need to worry about
overflow here?
}
cv /= math.max(data.cols, 1.0)
cv -= m * m.t
if(unbiased && data.cols > 1)
cv *= (data.cols /(data.cols-1.0))
}

? (Didn't test.)

-- David
> --
> You received this message because you are subscribed to the Google Groups
> "Scala Breeze" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to scala-breeze...@googlegroups.com.
> To post to this group, send email to scala-...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/scala-breeze/7f8da7cf-caa5-4f5a-8350-8b03edc3fe12%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Devin Shields

unread,
Jul 2, 2016, 1:20:34 AM7/2/16
to Scala Breeze
I'm also looking for scala covariance tools, but haven't found them. It does, however, look like there's something available in the apache commons library:

import org.apache.commons.math3.stat.correlation.Covariance
val matrix = Array(Array(1.0, 2.0, 3.0), Array(3.0, 4.0, 5.0))
val cov = (new Covariance(matrix)).getCovarianceMatrix.getData

'cov' is an array of arrays, and that should be too hard to load back into breeze.

Cheers!

absalom_hicks

unread,
Jul 3, 2016, 4:46:13 PM7/3/16
to Scala Breeze

import breeze.linalg.{Axis, DenseMatrix, DenseVector, sum }
/**

 * @param A data matrix, each column is a value of the random vector,

 * each row is a sample of a (coordinate) random variable,

 * @return empiricial covariance matrix of data

 */

 
 
def cov(A:DenseMatrix[Double]):DenseMatrix[Double] = {
 
     
// set col means to zero

    val n
= A.cols

    val D
:DenseMatrix[Double] = A.copy
 
    val mu
:DenseVector[Double] = sum(D,Axis._1):*(1.0/n) // sum along rows --> col vector
 
   
(0 until n).map(i => D(::,i):-=mu)
 
    val C
= (D*D.t):*(1.0/n)
 
   
// make exactly symmetric
 
   
(C+C.t):*(0.5)

 
}
 

The last line would seem to be superfluous but I think there is a bug in breeze since I first used this without this and the matrix subsequently failed
a test for symmetry. Note also statisticians divide by (n-1) not by n.

Devin Shields

unread,
Jul 6, 2016, 1:20:33 AM7/6/16
to Scala Breeze
This works beautifully.

After tweaking `n` => `n-1` at `val C = ...`, some REPL sanity checking show full agreement between your implementation and numpy.cov:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html

Thanks so much! Can you point me towards the Breeze contributor docs, I'd be happy to try and write something up and submit it as a PR. 

Cheers,
Devin
Reply all
Reply to author
Forward
0 new messages