Re: "BYM" Spatial Car Model: Slow Sampling (relative to BUGS) + non-symmetric warnings + avoid lp_

1,931 views
Skip to first unread message

Kyle Foreman

unread,
Mar 30, 2014, 8:57:48 PM3/30/14
to stan-...@googlegroups.com
Hi MT,

Why are you imposing the restriction of no direct manipulation of lp__? I've had success in speeding up my CAR and MCAR models by using sparse covariance matrices for the MVN, but it requires you to write out the MVN probability explicitly by manipulating lp__ instead of using the built in probability functions.

Best
Kyle   

On Sunday, March 30, 2014 4:50:06 PM UTC-7, MT wrote:
Hi all,

I've been looking at past threads regarding Conditional Auto Regressive (CAR) models.

(In order of relevance)
https://groups.google.com/forum/#!searchin/stan-users/spatial/stan-users/S8dhFpn0Lg8/F91hZXBJmnMJ
https://groups.google.com/forum/#!searchin/stan-users/spatial/stan-users/63m5A5ATzs8/LII8RuX2jI8J
https://groups.google.com/forum/#!searchin/stan-users/spatial/stan-users/he7BxxHfBfk/V6s1K14nRfgJ
https://groups.google.com/forum/#!searchin/stan-users/spatial/stan-users/iddShgJVwTQ/Ihes0SOy5yQJ

I've tried to adapt the examples to my fake toy example with little success.
(by adapting, i mean a naive specification without resorting to working with the lp_ directly)

I've finally hit my wall, and would appreciate some help.
Is working with lp_ the only way to practically implement this model?

Problems:
  • The practical problem I face is the sampling is extremely slow compared to WinBUGS.
I already have results from WinBUGS, where my attempt in Stan doesn't even return 1000 iterations when I allow 8+ hours for the running clock time.
Further, i'd like the restriction of avoiding the use of lp_ directly
  • Possibly related, I get multiple stan warnings about non-symmetric covariance matrices, when I believe they are intentionally symmetric.
covariance matrix[0,2] is 105097511.634409:0, but
covariance matrix[2,0] is 1.05098e+008:0

Background and Details:
The model is a two stage model, where the data model is Poisson, and the log Poisson rate is linear with fixed effects and random effects

Past stan examples used a "small" number of spatial neighbors, say 50.
My toy example has 480 neighbors.

Past examples used a single random effect, the spatial conditional autoregressive (CAR) effect
My example uses two random effects (as in the Besag-York-Mollie model) one spatial CAR and the other normal

Past examples estimated the spatial dependence parameter and also giving it a uniform prior
My example fixes (doesn't estimate) the spatial dependence parameter using the reciprocal of the largest eigenvalue of some matrix.

Model:
In short, i've tried to follow Cressie and Wikle's (statistics for spatiao-temporal data) more general model specification

A stylized Model Specification:

O ~ poisson(mu)
log_mu <- log(E) + alpha0 + alpha1 * x1 + ( beta1-mean(beta1) ) + beta2
beta1 ~ MVN(0,Cov_spatial)
beta2 ~ MVN(0,Cov_normal)
Cov_spatial <- tau^2 * inverse(ONES - phi * H ) * DELTA
Cov_normal <- sigma^2 * ONES

Where:
O is the response count
mu is the poisson rate
E is an offset
alpha0 is the uninteresting intercept
alpha1 is a fixed effect
beta1 is the spatial random effect using a MVN with a spatially structured covariance (Cov_spatial)
beta2 is the nonspatial random effect using a MVN with  covariance (Cov_normal)

tau is the standard deviation for the spatial term
sigma is the standard deviation for the non spatial term

ONES is the identity matrix
DELTA is a diagonal matrix, whose entries delta_ii = 1 /  the # of neighbors of row i
H is a binary then row standardized adjacency matrix (eg each row sums to 1) where h_ij = ( b_ij / # of neighbors of row i) and b_ij = 1 if i is a neighbor of j; 0 otherwise.

phi = 0.3 = 1 / max{ eigenvalue(-DELTA^(-1/2) * H * DELTA ^ (1/2) }

I believe my model specification is correct and my data manipulations (via [R]) should be correct, but I question if my stan implementation is correct.

Attachments
For your consideration, I've attached 4 files:

All of the necessary data to feed into stan
fake_spatial_bym_stan.RData  

The two files that manipulated the data
setup_stan_maf_data.R (and calls stan)
setup_stan_maf_init.R

The stan model file
maf_convolution_model_sfstd.stan

Specific Stan Model File

data {
   
int<lower=0> N; // number of areas
    matrix
[N,N] H;  // area adjacency matrix row standardized (ea row sums to 1)
    matrix
[N,N] Delta;  // diagonal matrix in BYM 1991, sfstd model B1
    matrix
[N,N] Ones; // diag matrix of ones for identity
    real phi
; // phi = 1/maxeigval(Delta*H*Delta)
   
int<lower=0> O[N];  // observed cases
    vector
[N] E;  // expected cases
    vector
[N] x1;  // covariate
}

transformed data
{
  vector
[N] zeros;  // zeros for mean of MVN specification
 
for (i in 1:N)
        zeros
[i] <- 0.0;
}

parameters
{
    real alpha0
;  // intercept
    real alpha1
;  // coefficient on covariates

    vector
[N] beta1;  // random effect (CAR)
    vector
[N] beta2;  // random effect normal

    real
<lower=1e-5> prec_sre;  // precision of CAR
    real
<lower=1e-5> prec_nre;  // precision of normal    
}

transformed parameters
{
   
// turn precisions to variances
    real sigma_sq
;
    real tau_sq
;
    sigma_sq
<- 1.0 / (prec_nre * prec_nre);
    tau_sq
<- 1.0 / (prec_sre * prec_sre);
}

model
{
    matrix
[N,N] Sigma_sre;  // covariance matrix for CAR
    matrix
[N,N] Sigma_nre;  // covariance matrix for normal RE
    real beta1_mean
;
    vector
[N] beta1_stz;  // centered spatial re
    vector
[N] log_mu;
   
    prec_sre
~ gamma(1,1);    // precision of spatial re
    prec_nre
~ gamma(0.5, 0.0005);    // precision of normal re

   
// spatial re
   
Sigma_sre <- (tau_sq * inverse(Ones - phi * H)) * Delta;
    beta1
~ multi_normal(zeros, Sigma_sre);
    beta1_mean
<- mean(beta1);
    beta1_stz
<- beta1 - beta1_mean;

   
// nonspatial re
   
Sigma_nre <- sigma_sq*Ones;
    beta2
~ multi_normal(zeros, Sigma_nre);

   
// fixed effects
   
// no explicit prior on alpha0 (implicit improper flat prior)
    alpha1
~ normal(0.0, sqrt(1e5));

   
// process model
    log_mu
<- log(E) + alpha0 + alpha1 * x1 + beta1_stz + beta2;

   
// data model
    O
~ poisson_log(log_mu);
}





Ben Goodrich

unread,
Mar 30, 2014, 9:59:53 PM3/30/14
to stan-...@googlegroups.com
A few general points. I cannot think of a reason why anyone should ever call one of the multi_normal functions when the thing to the left of the ~ is a parameter. Instead, you should either
  • Use the stochastic representation and multivariate Matt trick (beta <- mu + L * z where z ~ normal(0,1) and L is the Cholesky factor of the covariance matrix)
  • Use direct incrementation of the log-posterior with calls to increment_log_prob()
  • Use the univariate normal density when the covariance matrix is diagonal
As it applies to your situation,
  • You won't need to create the vector of zeros in the transformed parameter block, however ...
  • You should definitely construct the constant matrix L_sre <- cholesky_decompose(spd_inverse(Ones - phi * H)*Delta) inside the transformed parameters block
  • Substitute beta1_z for beta1 in the parameters block and write this in the model block beta1 <- sqrt(tau_sq) * (L_sre * beta1_z)
  • Also, in the model block do beta2 ~ normal(0, sqrt(sigma_sq))
Ben

Herra Huu

unread,
Mar 31, 2014, 4:27:59 AM3/31/14
to

I think there is a bug in current implementation of the symmetric check in the cholesky_decompose() function and I have already reported in here: https://github.com/stan-dev/stan/issues/609 Bob is taking a look at it so it probably should be fixed in next release of Stan.

One quick fix would be to change function validate_symmetric() to check_symmetric() in the file cholesky_decompose.hpp. But I'm not sure how easy it would be to recompile RStan etc.

Bob Carpenter

unread,
Mar 31, 2014, 5:03:42 AM3/31/14
to stan-...@googlegroups.com

On Mar 31, 2014, at 9:20 AM, MT <micha...@gmail.com> wrote:

> Specifically, I have the lingering confusion of when and how to use increment_log_prob correctly.
> This was one of the reasons I wanted to shy away from it, mostly due to my lack of familiarity.
>
> Reading the documentation, it says increment_log_prob can be useful for two general purposes
>
> 1) when one wishes to sample after a change of variables (making note that lp_ is not needed for transformations)
> 2) for custom distributions eg triangle dist.
>

The basic idea is that lp__ is the variable in which the total
log probability returned by the model is manipulated. A sampling
statement:

y ~ foo(theta);

is equivalent in terms of sampling to

increment_log_prob(foo_log(y,theta));

and also to

lp__ <- lp__ + foo_log(y,theta);

though direct manipulation of lp__ is deprecated (we'll eventually
remove it). The point of a Stan model is to specify the log probability
function and the sampling statements are just syntactic sugar plus an
optimization that drops constant terms (and is hence a bit more efficient
where available than incrementing with the _log function).

Yes, we need to fix the symmetry tests. And we need to add more data
types (e.g., Cholesky factor for correlation matrix) and
multivariate distributions (e.g., the LKJ density over such factors)
so that we can efficiently use the sampling notation.

What you want to compare is not so much time to 1000 iterations
as time to convergence and number of effective samples per second
after convergence. JAGS/BUGS/Stan may run fast per iteration in some
models but have low numbers of effective samples per iteration. Suggestions
like reparameterizing (different strategies for Gibbs vs. HMC) often
speed up effective samples per iteration, not speed per iteration.

- Bob

Kyle Foreman

unread,
Mar 31, 2014, 11:48:44 AM3/31/14
to stan-...@googlegroups.com
I've successfully run 2100 spatial units with the sparse method - it still takes a while to sample, but it's significantly faster than using multi_normal_prec() for my use case. Results will vary depending on sparsity etc of course. And as Bob notes, it's the effective sample size that's important, not the number of iterations. I find that with these tweaks I get better convergence and much lower time per effective sample than BUGS/JAGS.

I've attached a little tutorial on how to use increment_log_prob() to sample the sparse matrix. It uses the same GeoBUGS Scotland lip cancer example as my original CAR thread. It's pretty thoroughly documented, so hopefully you can get the gist just by reading through. 

One sacrifice it makes is that it uses an estimate of the log determinant of the precision matrix instead of computing it directly. To scale it up beyond the 56 spatial units in the example, I recommend sampling the log determinant in R instead of Stan, because that step (which fortunately only has to happen once) is a couple orders of magnitude slower in Stan than R and can take quite a while once you're up to hundreds or thousands of spatial units.

There may be some tricks to replicate the CAR distribution using something like the multivariate Matt trick that Ben mentions - I haven't tried that, but would be very interested to hear how it works if you test it out.

Agreed that a built in distribution would be easier, but before that becomes feasible in a case like this I would think a sparse matrix object for Stan would be required.


On Sun, Mar 30, 2014 at 6:16 PM, MT <micha...@gmail.com> wrote:
Kyle,

I was thinking of going the lp_ route next, and your comment reassures it is a fruitful endeavor. Do you speculate that the lp_ might solve the "larger" 480 spatial neighbors problem, if scaling the data up is indeed the problem? Amongst my browsing, i think i read a tangent comment about a stan bottleneck about the 'differentiation tree'
 
As a new user of stan, i would think successful reliance on  ' ~ ' and built in distributions would be a great pedagogical aid for newcomers, as users would not have to entertain two alternatives to implement their model.

I was hoping stan would successfully sample without relying on the lp_ mechanism.
It looks like, for niche models such as these, one is forced to use lp_.

--
You received this message because you are subscribed to a topic in the Google Groups "stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/M7T7EIlyhoo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

scotland_lip_cancer.RData.txt
sparse_MVN.R.txt

Bob Carpenter

unread,
Apr 1, 2014, 7:07:37 PM4/1/14
to stan-...@googlegroups.com
That's great news. Thanks for sharing.

And of course, to dwell on the negative, what computation
is slow in Stan compared to R? log_determinant? And
you're talking about doing this in the generated quantities block
and not with parameters?

And we definitely have sparse matrix operations fairly high up
on our to-do list (which is still beyond the immediate future,
I'm afraid).

- Bob
> You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <scotland_lip_cancer.RData.txt><sparse_MVN.R.txt>

Kyle Foreman

unread,
Apr 1, 2014, 8:06:17 PM4/1/14
to stan-...@googlegroups.com
And of course, to dwell on the negative, what computation
is slow in Stan compared to R?  log_determinant?  And
you're talking about doing this in the generated quantities block
and not with parameters?

Yes, the log_determinant of the dense matrix was substantially slower in Stan than R. Last time I tried it in Stan was in December so maybe things have changed, but at that time the log determinant of a 2000x2000 matrix took on the order of 100x longer to compute in Stan than R.

I was computing it in the transformed data block, at 1000 samples of p (which is the only parameter that the log determinant varies by, and is constrained [0,1)). That way I only have to do it once (well, 1000 times, but just during initialization), and in the model block can then simply do a linear interpolation of the log determinant based on the value of parameter p. 

Now to speed things up I have just computed those 1000 samples of the log determinant once in R and pass them to Stan as data.

It's hacky, but since the log determinant is such a tiny contributor to the overall log probability and takes so long to compute it seems to be a worthwhile compromise.

And we definitely have sparse matrix operations fairly high up
on our to-do list (which is still beyond the immediate future,
I'm afraid).

That's exciting to hear!

Bob Carpenter

unread,
Apr 1, 2014, 8:26:10 PM4/1/14
to stan-...@googlegroups.com
Thanks for the report. I created a new issue:

https://github.com/stan-dev/stan/issues/613

to remind us to look into the log_determinant speed issue.
No reason we shouldn't be able to make it as fast as R's assuming
the way to do it is available in Eigen.

- Bob

Kyle Foreman

unread,
Apr 1, 2014, 8:59:40 PM4/1/14
to stan-...@googlegroups.com
Thanks Bob. Just wrote a quick script to replicate and posted it to github issue tracker.


You received this message because you are subscribed to a topic in the Google Groups "stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/M7T7EIlyhoo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
Message has been deleted
0 new messages