Gaussian predictive process models for large-ish data

Maxwell Joseph

unread,

Aug 14, 2016, 1:31:59 PM8/14/16

to Stan users mailing list

First, big thanks to the stan-dev team for creating and continuing to improve Stan.

In trying to fit some spatial Gaussian process models with hundreds/thousands of spatial locations, I was running into prohibitively long run times, and ran across Gaussian predictive process models as described in the following:

Finley, Andrew O., et al. 2009. "Improving the performance of predictive process modeling for large datasets." (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743161/)

Banerjee, Sudipto, et al. 2008. "Gaussian predictive process models for large spatial data sets." (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2741335/)

For certain spatial models, these reduced rank predictive process models should scale much better to large n datasets. In case this is useful for others running into similar problems with spatial models or other Gaussian process models, here's a little write up: http://mbjoseph.github.io/2016/08/14/gpp.html, and here's the source code for a speed test experiment testing run times and efficiency between the full and reduced rank GP models: https://github.com/mbjoseph/gpp-speed-test. For an example of this approach in action, also see the following paper from Cameron Bracken that uses a reduced rank GP representation to model spatial dependence in the parameters of the generalized extreme value distribution with over 2000 point locations: http://arxiv.org/pdf/1512.08560

Looking forward, many approaches for modeling large spatial data that have emerged in recent years rely heavily on sparse covariance matrix representations. When more sparse matrix functionality (particularly Cholesky decomposition) is added to Stan, that will be exciting for spatial applications.

Bob Carpenter

unread,

Aug 14, 2016, 3:44:10 PM8/14/16

to stan-...@googlegroups.com

Thanks. We're also working on Toeplitz matrices and Kronecker
products, which I understand will also speed up a bunch of GP code.

- Bob

> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Craig Wang

unread,

Mar 9, 2017, 11:34:38 AM3/9/17

to Stan users mailing list

Hi Maxwell and Stan users,

Thanks for your summary post and your blog article about the GPP implementation in Stan. I followed your code and compared the results with spBayes for a simple linear model. The spLM() is almost 8 times more efficient compared to my implementation, and I am interested in why this is the case.

The modified code is

data {

int<lower = 1> n; # number of sites

int<lower = 1> p; # number of coefficient

matrix[n, p] X; # design matrix

int<lower = 1, upper = n> m; # number of knots

vector[n] y; # response at sites

matrix[m, m] D_star; # distance matrix of knots

matrix[n, m] D_site_star; # distance matrix between site and knots

}

parameters {

vector[p] beta; # coefficients

real<lower = 0> sigma_sq; # variance component

real<lower = 0> tau_sq; # variance component

real<lower = 0> phi; # range/decay parameter

vector[m] w_z; # spatial latent process

}

transformed parameters {

vector[n] sigma_e_tilde; # bias corrected

vector[m] w_star;

vector[n] w;

cov_matrix[m] Cstar; # covariance matrix of sites

matrix[m, m] inv_Cstar; # inverse of C star

matrix[n, m] C_site_star; # covariance matrix of sites and knots

matrix[n, m] C_ss_inv_Cstar;

// latent gp at knots

for (i in 1:(m-1)) {

for (j in (i + 1):m) {

Cstar[i, j] = sigma_sq * exp(-D_star[i, j] * phi); # covariance matrix at knots

Cstar[j, i] = Cstar[i, j];

}

for (k in 1:m) Cstar[k, k] = sigma_sq + tau_sq; # C(s,s)

inv_Cstar = inverse(Cstar);

w_star = cholesky_decompose(Cstar) * w_z;

// latent gp at sample locations

C_site_star = sigma_sq * exp(-D_site_star * phi); # covariance matrix at sites and knots

C_ss_inv_Cstar = C_site_star * inv_Cstar;

w = C_site_star * inv_Cstar * w_star; # transformed latent process at sites

// bias adjustment from Finley et al. 2009

sigma_e_tilde = sigma_sq + tau_sq - rows_dot_product(C_ss_inv_Cstar, C_site_star); # bias corrected variance

}

model {

beta ~ normal(0, 10);

sigma_sq ~ inv_gamma(2, 1);

tau_sq ~ inv_gamma(2, 1);

phi ~ uniform(0.1, 10);

w_z ~ normal(0,1);

y ~ normal(X* beta + w, sqrt(sigma_e_tilde));

}

I'm simulating 500 locations with 30 knots for GPP, they provide very similar results for the latent w field (what I'm interested in) but not exactly the same. I've assigned the same priors.

The ESS is 328 running 4 chains, and takes 174s for each chain, while spBayes spLM() function result 139 ESS in 37s. Hence it is almost 8 times more efficient in spBayes.

Am I missing some tricks in the model implementation? In the attached is my simulation in R and stan file. Any suggestion is appreciated.

Best,

Craig

google_GPP.R

GPP_v2.stan

Craig Wang

unread,

Mar 9, 2017, 11:45:03 AM3/9/17

to Stan users mailing list

Perhaps a few additional questions I encounter when I browsed similar problems:

1. Why is setting the area range to [0,1]x[0,1] good?

2. Would be using cov_matrix[m] Cstar be slower than just matrix[m,m] Cstar because of the checks?

Thanks very much for any input!

On Sunday, August 14, 2016 at 7:31:59 PM UTC+2, Maxwell Joseph wrote:

Maxwell Joseph

unread,

Mar 9, 2017, 3:50:09 PM3/9/17

to Stan users mailing list

Hi Craig,

I'm not familiar with the spBayes package, but I can make some guesses as to why to expect differences in efficiency. First, spBayes uses custom MCMC samplers written in C++ that are designed for spatial/spatiotemporal models (e.g., https://github.com/cran/spBayes/tree/master/src). Second, I'm not sure whether spBayes implements the bias adjusted predictive process that I was talking about (described in Finley et al. 2009: “Improving the performance of predictive process modeling for large datasets.” Computational statistics & data analysis 53.8 (2009): 2873-2884.). spBayes::spLM() may be implementing the predictive process model from a previous paper that did not include the bias adjustment (Banerjee et al. 2008: Gaussian predictive process models for large spatial data sets).

As for your other q's:

1. Why is setting the area range to [0,1]x[0,1] good?

This was easy to simulate - but is not necessary in practice. You should check to see whether your priors make sense given the range of spatial coordinates.

2. Would be using cov_matrix[m] Cstar be slower than just matrix[m,m] Cstar because of the checks?

As far as I know yes, but someone else may have a better sense for how costly these checks are.

In terms of other tricks - if your likelihood is normal you can marginalize over the spatial random effect w, which may or may not be helpful (see section 2.5 in Banerjee et al. 2008: Gaussian predictive process models for large spatial data sets). I would also expect that Bob might not approve of the prior for phi, which does not match the support of the parameter defined in the parameters block. If you do manage to increase efficiency - please do share here again!

Bob Carpenter

unread,

Mar 10, 2017, 1:39:28 AM3/10/17

to stan-...@googlegroups.com

I'm not an expert in computational linear algebra,
but rather than inverting and taking the Cholesky,
you want to take the Cholesky and then use matrix
division for triangular matrices. Should be a lot faster.

> inv_Cstar = inverse(Cstar);
> w_star = cholesky_decompose(Cstar) * w_z;
>
> // latent gp at sample locations
> C_site_star = sigma_sq * exp(-D_site_star * phi); # covariance matrix at sites and knots
> C_ss_inv_Cstar = C_site_star * inv_Cstar;
> w = C_site_star * inv_Cstar * w_star; # transformed latent process at sites

Also, you don't ever want to do the same computation twice
as you do multiplying C_site_start * inv_Cstar.

Yes, declaring a transformed parameter as a cov_matrix
introduces overhead. You probably don't want to save all
those transformed parameters, so you can just make them local
variables in the model block with no constraints.

Finally, we have some built-ins for computing squared exponential
covariance functions---see the manual.

- Bob

Craig Wang

unread,

Mar 10, 2017, 10:33:58 AM3/10/17

to Stan users mailing list

Hi Maxwell,

The spBayes package do have the modified GPP implemented, their JSS paper mentioned "A heavy reliance on BLAS and LAPACK functions for matrix operations allows us to leverage multi-processor/core machines via threaded implementations of BLAS and LAPACK,", I am not sure if they automatically use multi-cores for computation? And is it using the resources the same way as multiple chains with one chain per core in Stan...

I agree that marginalize can improve the speed, but I am experimenting with the normal case first then I will need to switch to other distributions.

I'm continue working on this and will definitely post updates!

Craig Wang

unread,

Mar 10, 2017, 10:53:49 AM3/10/17

to Stan users mailing list

HI Bob,

Thanks for your reply.

On Friday, March 10, 2017 at 7:39:28 AM UTC+1, Bob Carpenter wrote:

I'm not an expert in computational linear algebra,
but rather than inverting and taking the Cholesky,
you want to take the Cholesky and then use matrix
division for triangular matrices. Should be a lot faster.

Do you mean change

inv_Cstar = inverse(Cstar);

to

D = diag_matrix(rep_vector(1,m);

Cstar_L = cholesky_decompose(Cstar);

inv_C = multiply_lower_tri_self_transpose(mdivide_left_tri_low(Cstar_L, D));

> inv_Cstar = inverse(Cstar);
> w_star = cholesky_decompose(Cstar) * w_z;
>
> // latent gp at sample locations
> C_site_star = sigma_sq * exp(-D_site_star * phi); # covariance matrix at sites and knots
> C_ss_inv_Cstar = C_site_star * inv_Cstar;
> w = C_site_star * inv_Cstar * w_star; # transformed latent process at sites

Also, you don't ever want to do the same computation twice
as you do multiplying C_site_start * inv_Cstar.

Yes of course !

Yes, declaring a transformed parameter as a cov_matrix
introduces overhead. You probably don't want to save all
those transformed parameters, so you can just make them local
variables in the model block with no constraints.

I also thought about moving those into model block but I need to save "w" and all those matrices are required for w, alternatively I could move them into model block and compute w in generated quantities? not sure if this would be more efficient

Finally, we have some built-ins for computing squared exponential
covariance functions---see the manual.

I will make use of cov_exp_quad and see, thanks for the tips.

- Bob

Craig Wang

unread,

Mar 14, 2017, 10:48:01 AM3/14/17

to Stan users mailing list

Just an update: I did managed to improve the speed by about 30% by replacing

inv_Cstar = inverse(Cstar);

w_star = cholesky_decompose(Cstar) * w_z;

with

Cstar_L = cholesky_decompose(Cstar);

w_star = Cstar_L * w_z;

inv_Cstar = multiply_lower_tri_self_transpose(mdivide_left_tri_low(Cstar_L, diag_matrix(rep_vector(1,m)));

In addition, I fixed the prior for phi by constraining the parameter directly without prior assignment,

real<lower = 0.25, upper = 10> phi;

On Thursday, March 9, 2017 at 9:50:09 PM UTC+1, Maxwell Joseph wrote:

Bob Carpenter

unread,

Mar 14, 2017, 3:50:27 PM3/14/17

to stan-...@googlegroups.com

I meant don't compute the inverse at all---you only use
it as an intermdiate quantity.

Best place to define something is in transformed data if
it doesn't depend on parameters.

Where to put a quantity that depends on parameters:

if it's used in the model:
if you want to save it:
* transformed parameters
else
* local variable in model
else
* generated quantities

It's not always possible to satisfy this if there are complex
dependencies among the parameters. Then it's usually more
efficient to make everything a local variable in the model and
redefine quantities to save as generated quantities. The generated
quantities are very cheap as there are no gradients and they're only
calculated once per iteration rather than once per leapfrog step.

- Bob

Bob Carpenter

unread,

Mar 14, 2017, 3:52:30 PM3/14/17

to stan-...@googlegroups.com

> On Mar 14, 2017, at 10:48 AM, Craig Wang <craigw...@gmail.com> wrote:
>
> Just an update: I did managed to improve the speed by about 30% by replacing
>
>
> inv_Cstar = inverse(Cstar);
> w_star = cholesky_decompose(Cstar) * w_z;
>
> with
>
> Cstar_L = cholesky_decompose(Cstar);
> w_star = Cstar_L * w_z;
> inv_Cstar = multiply_lower_tri_self_transpose(mdivide_left_tri_low(Cstar_L, diag_matrix(rep_vector(1,m)));
>
>
> In addition, I fixed the prior for phi by constraining the parameter directly without prior assignment,
>
> real<lower = 0.25, upper = 10> phi;

We recommend leaving variables unconstrained up to physical constraints
(such as phi > 0 being a requirement for the model to be well-defined).
Then just use a prior that concentrates most of the mass where you think
it has to be, like here, say a normal(5, 1) or normal(5, 2) if you want
support at zero or something zero-avoiding like a gamma otherwise. This
will be more robust in the face of data that tries to force phi up against
one of its boundaries.

- Bob

Reply all

Reply to author

Forward