Very large, sparse covariance matrix in multilevel GP

James Savage

unread,

Aug 14, 2016, 6:17:43 PM8/14/16

to Stan users mailing list

Hi all -

Let's say I have a data generating process for some observation i in group j:

y(i, j) ~ GP(mean process (X(i, j)), covariance function(X(i, j) | group level parameters (j) ) )

where group level parameters in the covariance function come from some hyperdistribution.

This parameter distribution _is_ the research question. Each group has a different number of

observations.

I'm currently implementing this by building up each group's covariance matrix with its own

group-level parameters, and then using a very large block-diagonal covariance matrix to store

all the group-specific covariance matrices. I am doing this in the transformed parameters block.

My estimation model is then

y ~ GP(mean process, enormous block-diagonal covariance matrix)

This works well for a small number of groups (10), but obviously doesn't scale well to the full

dataset.

In the full dataset, there are 65 groups with group sizes of around 15, so the resulting large

covariance matrix would be around 99% 0 entries. I'm not using any sparse structures.

How should I go about using the sparse data structures to help with the scaling? In particular, given

I'm generating the giant covariance matrix in the transformed parameters, is there a way to

call `csr_extract_v` and `csr_extract_u` in that block, so I can reassemble the dense covariance

matrix in the model block? Or should I be constructing the enormous matrix as a local variable

(to my mind, this will be very slow, it being evaluated every leapfrog step)? Or something else?

Thoughts?

Jim

Aki Vehtari

unread,

Aug 15, 2016, 2:39:17 AM8/15/16

to Stan users mailing list

On Monday, August 15, 2016 at 1:17:43 AM UTC+3, James Savage wrote:

y ~ GP(mean process, enormous block-diagonal covariance matrix)

If you don't have correlation between blocks, you can use separate independent GP for each group and avoid the enormous block-diagonal matrix.

Aki

James Savage

unread,

Aug 15, 2016, 7:35:25 AM8/15/16

to Stan users mailing list

The only correlation between blocks comes from the hyperdistributions of the GP parameters,

so yes there is no need for the big block-diagonal structure. As you say I could declare a

covariance matrix for every group, but each group would have a different size, which makes it

tough.

As it happens I did get the sparse matrix stuff working, but it is _very_ slow (6hrs for 200 iter on

1100 data points). I suspect there's a folk-theorem problem there. More digging needed.

Mike Lawrence

unread,

Aug 15, 2016, 8:32:33 AM8/15/16

to stan-...@googlegroups.com

Can't say I really follow this thread, but might Seth's work apply here? http://sethrf.com/files/fast-hierarchical-GPs.pdf

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+unsubscribe@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

--
Mike Lawrence
Graduate Student
Department of Psychology & Neuroscience
Dalhousie University

~ Certainty is (possibly) folly ~

Aki Vehtari

unread,

Aug 15, 2016, 8:35:40 AM8/15/16

to Stan users mailing list

What's the problem with different sizes? The diagonal blocks have different sizes, too.

Aki

James Savage

unread,

Aug 15, 2016, 9:45:01 AM8/15/16

to Stan users mailing list

Mike - thanks, that looks precisely like what I'm looking for.

Aki - I'm just confused about implementation. Take Sigma as an array of covariance matrices. Do you mean something like

transformed parameters {

matrix[size_of_largest_group, size_of_largest_group] Sigma[number of groups];

for(g in 1:number_of_groups) {

// define top left hand corner of Sigma[g] ...

}

And then refer only to the top corner of Sigma[g] for each group g in the model block?

Pardon the confusion.

Jim

Aki Vehtari

unread,

Aug 15, 2016, 10:00:06 AM8/15/16

to Stan users mailing list

Oh, you are referring to a problem that if you don't know the number and sizes of groups beforehand it's not so easy to create a collection of matrices with different sizes in Stan!

Sorry I forgot that. Your proposal to handle the problem is probably the easiest one.

Aki

James Savage

unread,

Aug 15, 2016, 10:13:30 AM8/15/16

to Stan users mailing list

Yep, and thanks Aki!

Bob Carpenter

unread,

Aug 15, 2016, 10:27:15 AM8/15/16

to stan-...@googlegroups.com

Really, as soon as I can squash all the outstanding
bugs assigned to me and deal with some little features,
I'll work on ragged arrays. As is, you'd have to declare
all the blocks separately.

Whatever you do, don't overdeclare parameters and only
use some of them. It's hugely inefficient for computation
per step and can cause problems in the posterior if there
aren't proper priors on each of the unused variables (and
even with priors, it can slow down sampling based on how
the no-U-turn sampler works).

- Bob

> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Jonah Gabry

unread,

Aug 17, 2016, 2:14:58 AM8/17/16

to Stan users mailing list

On Monday, August 15, 2016 at 7:35:25 AM UTC-4, James Savage wrote:

As it happens I did get the sparse matrix stuff working, but it is _very_ slow (6hrs for 200 iter on
1100 data points). I suspect there's a folk-theorem problem there. More digging needed.

As far as I know the csr stuff really isn't guaranteed to speed up anything (if anything it will probably slow it down), but it's useful if memory is a concern.

Jonah

Bob Carpenter

unread,

Aug 17, 2016, 5:57:19 AM8/17/16

to stan-...@googlegroups.com

You should get the same answer with sparse operations,
but as Jonah says, it may be slower if there's less than
10% or so sparse operations. That gap's going to get
bigger in 2.12 with Rob's improvements to dense matrix
multiplication.

But it shouldn't slow down mixing---it should implement
the same model.

- Bob

James Savage

unread,

Aug 17, 2016, 9:14:52 AM8/17/16

to Stan users mailing list

Yep - memory was the issue; there was no noticeable difference

in speed.

Reply all

Reply to author

Forward