Very large, sparse covariance matrix in multilevel GP

341 views
Skip to first unread message

James Savage

unread,
Aug 14, 2016, 6:17:43 PM8/14/16
to Stan users mailing list
Hi all - 

Let's say I have a data generating process for some observation i in group j: 

y(i, j) ~ GP(mean process (X(i, j)), covariance function(X(i, j) | group level parameters (j) ) ) 

where group level parameters in the covariance function come from some hyperdistribution.
This parameter distribution _is_ the research question. Each group has a different number of 
observations. 

I'm currently implementing this by building up each group's covariance matrix with its own
group-level parameters, and then using a very large block-diagonal covariance matrix to store
all the group-specific covariance matrices. I am doing this in the transformed parameters block.

My estimation model is then

y ~ GP(mean process, enormous block-diagonal covariance matrix) 

This works well for a small number of groups (10), but obviously doesn't scale well to the full 
dataset. 

In the full dataset, there are 65 groups with group sizes of around 15, so the resulting large 
covariance matrix would be around 99% 0 entries. I'm not using any sparse structures. 

How should I go about using the sparse data structures to help with the scaling? In particular, given
I'm generating the giant covariance matrix in the transformed parameters, is there a way to 
call `csr_extract_v` and `csr_extract_u` in that block, so I can reassemble the dense covariance 
matrix in the model block? Or should I be constructing the enormous matrix as a local variable 
(to my mind, this will be very slow, it being evaluated every leapfrog step)? Or something else?

Thoughts? 

Jim






Aki Vehtari

unread,
Aug 15, 2016, 2:39:17 AM8/15/16
to Stan users mailing list
On Monday, August 15, 2016 at 1:17:43 AM UTC+3, James Savage wrote:
y ~ GP(mean process, enormous block-diagonal covariance matrix) 


If you don't have correlation between blocks, you can use separate independent GP for each group and avoid the enormous block-diagonal matrix.

Aki

James Savage

unread,
Aug 15, 2016, 7:35:25 AM8/15/16
to Stan users mailing list
The only correlation between blocks comes from the hyperdistributions of the GP parameters, 
so yes there is no need for the big block-diagonal structure. As you say I could declare a 
covariance matrix for every group, but each group would have a different size, which makes it
tough. 

As it happens I did get the sparse matrix stuff working, but it is _very_ slow (6hrs for 200 iter on 
1100 data points). I suspect there's a folk-theorem problem there. More digging needed. 

Mike Lawrence

unread,
Aug 15, 2016, 8:32:33 AM8/15/16
to stan-...@googlegroups.com
Can't say I really follow this thread, but might Seth's work apply here? http://sethrf.com/files/fast-hierarchical-GPs.pdf
--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+unsubscribe@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

--
Mike Lawrence
Graduate Student
Department of Psychology & Neuroscience
Dalhousie University

~ Certainty is (possibly) folly ~

Aki Vehtari

unread,
Aug 15, 2016, 8:35:40 AM8/15/16
to Stan users mailing list
What's the problem with different sizes? The diagonal blocks have different sizes, too.

Aki

James Savage

unread,
Aug 15, 2016, 9:45:01 AM8/15/16
to Stan users mailing list
Mike - thanks, that looks precisely like what I'm looking for. 

Aki - I'm just confused about implementation. Take Sigma as an array of covariance matrices. Do you mean something like

transformed parameters {
matrix[size_of_largest_group, size_of_largest_group] Sigma[number of groups];

for(g in 1:number_of_groups) {
  // define top left hand corner of Sigma[g] ... 
}

And then refer only to the top corner of Sigma[g] for each group g in the model block?

Pardon the confusion. 

Jim

Aki Vehtari

unread,
Aug 15, 2016, 10:00:06 AM8/15/16
to Stan users mailing list
Oh, you are referring to a problem that if you don't know the number and sizes of groups beforehand it's not so easy to create a collection of matrices with different sizes in Stan!
Sorry I forgot that. Your proposal to handle the problem is probably the easiest one.

Aki

James Savage

unread,
Aug 15, 2016, 10:13:30 AM8/15/16
to Stan users mailing list
Yep, and thanks Aki! 

Bob Carpenter

unread,
Aug 15, 2016, 10:27:15 AM8/15/16
to stan-...@googlegroups.com
Really, as soon as I can squash all the outstanding
bugs assigned to me and deal with some little features,
I'll work on ragged arrays. As is, you'd have to declare
all the blocks separately.

Whatever you do, don't overdeclare parameters and only
use some of them. It's hugely inefficient for computation
per step and can cause problems in the posterior if there
aren't proper priors on each of the unused variables (and
even with priors, it can slow down sampling based on how
the no-U-turn sampler works).

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Jonah Gabry

unread,
Aug 17, 2016, 2:14:58 AM8/17/16
to Stan users mailing list
On Monday, August 15, 2016 at 7:35:25 AM UTC-4, James Savage wrote:
As it happens I did get the sparse matrix stuff working, but it is _very_ slow (6hrs for 200 iter on 
1100 data points). I suspect there's a folk-theorem problem there. More digging needed. 

As far as I know the csr stuff really isn't guaranteed to speed up anything (if anything it will probably slow it down), but it's useful if memory is a concern.

Jonah

Bob Carpenter

unread,
Aug 17, 2016, 5:57:19 AM8/17/16
to stan-...@googlegroups.com
You should get the same answer with sparse operations,
but as Jonah says, it may be slower if there's less than
10% or so sparse operations. That gap's going to get
bigger in 2.12 with Rob's improvements to dense matrix
multiplication.

But it shouldn't slow down mixing---it should implement
the same model.

- Bob

James Savage

unread,
Aug 17, 2016, 9:14:52 AM8/17/16
to Stan users mailing list
Yep - memory was the issue; there was no noticeable difference
in speed. 
Reply all
Reply to author
Forward
0 new messages