1,956 views

Skip to first unread message

Mar 30, 2014, 8:57:48 PM3/30/14

to stan-...@googlegroups.com

Hi MT,

Why are you imposing the restriction of no direct manipulation of lp__? I've had success in speeding up my CAR and MCAR models by using sparse covariance matrices for the MVN, but it requires you to write out the MVN probability explicitly by manipulating lp__ instead of using the built in probability functions.

Best

Kyle

On Sunday, March 30, 2014 4:50:06 PM UTC-7, MT wrote:

On Sunday, March 30, 2014 4:50:06 PM UTC-7, MT wrote:

Hi all,

I've been looking at past threads regarding Conditional Auto Regressive (CAR) models.

(In order of relevance)

https://groups.google.com/forum/#!searchin/stan-users/spatial/stan-users/S8dhFpn0Lg8/F91hZXBJmnMJ

https://groups.google.com/forum/#!searchin/stan-users/spatial/stan-users/63m5A5ATzs8/LII8RuX2jI8J

https://groups.google.com/forum/#!searchin/stan-users/spatial/stan-users/he7BxxHfBfk/V6s1K14nRfgJ

https://groups.google.com/forum/#!searchin/stan-users/spatial/stan-users/iddShgJVwTQ/Ihes0SOy5yQJ

I've tried to adapt the examples to my fake toy example with little success.

(by adapting, i mean a naive specification without resorting to working with the lp_ directly)

I've finally hit my wall, and would appreciate some help.

Is working with lp_ the only way to practically implement this model?Problems:

- The practical problem I face is the sampling is extremely slow compared to WinBUGS.
I already have results from WinBUGS, where my attempt in Stan doesn't even return 1000 iterations when I allow 8+ hours for the running clock time.

Further, i'd like the restriction of avoiding the use of lp_ directly

- Possibly related, I get multiple stan warnings about non-symmetric covariance matrices, when I believe they are intentionally symmetric.
covariance matrix[0,2] is 105097511.634409:0, but

covariance matrix[2,0] is 1.05098e+008:0Background and Details:

The model is a two stage model, where the data model is Poisson, and the log Poisson rate is linear with fixed effects and random effects

Past stan examples used a "small" number of spatial neighbors, say 50.

My toy example has 480 neighbors.

Past examples used a single random effect, the spatial conditional autoregressive (CAR) effect

My example uses two random effects (as in the Besag-York-Mollie model) one spatial CAR and the other normal

Past examples estimated the spatial dependence parameter and also giving it a uniform prior

My example fixes (doesn't estimate) the spatial dependence parameter using the reciprocal of the largest eigenvalue of some matrix.Model:

In short, i've tried to follow Cressie and Wikle's (statistics for spatiao-temporal data) more general model specification

A stylized Model Specification:

O ~ poisson(mu)

log_mu <- log(E) + alpha0 + alpha1 * x1 + ( beta1-mean(beta1) ) + beta2

beta1 ~ MVN(0,Cov_spatial)

beta2 ~ MVN(0,Cov_normal)

Cov_spatial <- tau^2 * inverse(ONES - phi * H ) * DELTA

Cov_normal <- sigma^2 * ONES

Where:

O is the response count

mu is the poisson rate

E is an offset

alpha0 is the uninteresting intercept

alpha1 is a fixed effect

beta1 is the spatial random effect using a MVN with a spatially structured covariance (Cov_spatial)

beta2 is the nonspatial random effect using a MVN with covariance (Cov_normal)

tau is the standard deviation for the spatial term

sigma is the standard deviation for the non spatial term

ONES is the identity matrix

DELTA is a diagonal matrix, whose entries delta_ii = 1 / the # of neighbors of row i

H is a binary then row standardized adjacency matrix (eg each row sums to 1) where h_ij = ( b_ij / # of neighbors of row i) and b_ij = 1 if i is a neighbor of j; 0 otherwise.

phi = 0.3 = 1 / max{ eigenvalue(-DELTA^(-1/2) * H * DELTA ^ (1/2) }

I believe my model specification is correct and my data manipulations (via [R]) should be correct, but I question if my stan implementation is correct.Attachments

For your consideration, I've attached 4 files:

All of the necessary data to feed into stanfake_spatial_bym_stan.RData

The two files that manipulated the datasetup_stan_maf_data.R (and calls stan)

setup_stan_maf_init.R

The stan model filemaf_convolution_model_sfstd.stanSpecific Stan Model File

data {

int<lower=0> N; // number of areas

matrix[N,N] H; // area adjacency matrix row standardized (ea row sums to 1)

matrix[N,N] Delta; // diagonal matrix in BYM 1991, sfstd model B1

matrix[N,N] Ones; // diag matrix of ones for identity

real phi; // phi = 1/maxeigval(Delta*H*Delta)

int<lower=0> O[N]; // observed cases

vector[N] E; // expected cases

vector[N] x1; // covariate

}

transformed data {

vector[N] zeros; // zeros for mean of MVN specification

for (i in 1:N)

zeros[i] <- 0.0;

}

parameters {

real alpha0; // intercept

real alpha1; // coefficient on covariates

vector[N] beta1; // random effect (CAR)

vector[N] beta2; // random effect normal

real<lower=1e-5> prec_sre; // precision of CAR

real<lower=1e-5> prec_nre; // precision of normal

}

transformed parameters {

// turn precisions to variances

real sigma_sq;

real tau_sq;

sigma_sq <- 1.0 / (prec_nre * prec_nre);

tau_sq <- 1.0 / (prec_sre * prec_sre);

}

model {

matrix[N,N] Sigma_sre; // covariance matrix for CAR

matrix[N,N] Sigma_nre; // covariance matrix for normal RE

real beta1_mean;

vector[N] beta1_stz; // centered spatial re

vector[N] log_mu;

prec_sre ~ gamma(1,1); // precision of spatial re

prec_nre ~ gamma(0.5, 0.0005); // precision of normal re

// spatial re

Sigma_sre <- (tau_sq * inverse(Ones - phi * H)) * Delta;

beta1 ~ multi_normal(zeros, Sigma_sre);

beta1_mean <- mean(beta1);

beta1_stz <- beta1 - beta1_mean;

// nonspatial re

Sigma_nre <- sigma_sq*Ones;

beta2 ~ multi_normal(zeros, Sigma_nre);

// fixed effects

// no explicit prior on alpha0 (implicit improper flat prior)

alpha1 ~ normal(0.0, sqrt(1e5));

// process model

log_mu <- log(E) + alpha0 + alpha1 * x1 + beta1_stz + beta2;

// data model

O ~ poisson_log(log_mu);

}

Mar 30, 2014, 9:59:53 PM3/30/14

to stan-...@googlegroups.com

A few general points. I cannot think of a reason why anyone should ever call one of the multi_normal functions when the thing to the left of the ~ is a parameter. Instead, you should either

- Use the stochastic representation and multivariate Matt trick (beta <- mu + L * z where z ~ normal(0,1) and L is the Cholesky factor of the covariance matrix)
- Use direct incrementation of the log-posterior with calls to increment_log_prob()
- Use the univariate normal density when the covariance matrix is diagonal

- You won't need to create the vector of zeros in the transformed parameter block, however ...
- You should definitely construct the constant matrix L_sre <-
`cholesky_decompose(spd_inverse(Ones - phi * H)*Delta)`

inside the transformed parameters block - Substitute beta1_z for beta1 in the parameters block and write this in the model block beta1 <- sqrt(tau_sq) * (L_sre * beta1_z)
- Also, in the model block do beta2 ~ normal(0, sqrt(sigma_sq))

Mar 31, 2014, 4:27:59 AM3/31/14

to

I think there is a bug in current implementation of the symmetric check in the cholesky_decompose() function and I have already reported in here: https://github.com/stan-dev/stan/issues/609 Bob is taking a look at it so it probably should be fixed in next release of Stan.

One quick fix would be to change function validate_symmetric() to check_symmetric() in the file cholesky_decompose.hpp. But I'm not sure how easy it would be to recompile RStan etc.

Mar 31, 2014, 5:03:42 AM3/31/14

to stan-...@googlegroups.com

On Mar 31, 2014, at 9:20 AM, MT <micha...@gmail.com> wrote:

> Specifically, I have the lingering confusion of when and how to use increment_log_prob correctly.

> This was one of the reasons I wanted to shy away from it, mostly due to my lack of familiarity.

>

> Reading the documentation, it says increment_log_prob can be useful for two general purposes

>

> 1) when one wishes to sample after a change of variables (making note that lp_ is not needed for transformations)

> 2) for custom distributions eg triangle dist.

>

The basic idea is that lp__ is the variable in which the total

log probability returned by the model is manipulated. A sampling

statement:

y ~ foo(theta);

is equivalent in terms of sampling to

increment_log_prob(foo_log(y,theta));

and also to

lp__ <- lp__ + foo_log(y,theta);

though direct manipulation of lp__ is deprecated (we'll eventually

remove it). The point of a Stan model is to specify the log probability

function and the sampling statements are just syntactic sugar plus an

optimization that drops constant terms (and is hence a bit more efficient

where available than incrementing with the _log function).

Yes, we need to fix the symmetry tests. And we need to add more data

types (e.g., Cholesky factor for correlation matrix) and

multivariate distributions (e.g., the LKJ density over such factors)

so that we can efficiently use the sampling notation.

What you want to compare is not so much time to 1000 iterations

as time to convergence and number of effective samples per second

after convergence. JAGS/BUGS/Stan may run fast per iteration in some

models but have low numbers of effective samples per iteration. Suggestions

like reparameterizing (different strategies for Gibbs vs. HMC) often

speed up effective samples per iteration, not speed per iteration.

- Bob

Mar 31, 2014, 11:48:44 AM3/31/14

to stan-...@googlegroups.com

I've successfully run 2100 spatial units with the sparse method - it still takes a while to sample, but it's significantly faster than using multi_normal_prec() for my use case. Results will vary depending on sparsity etc of course. And as Bob notes, it's the effective sample size that's important, not the number of iterations. I find that with these tweaks I get better convergence and much lower time per effective sample than BUGS/JAGS.

I've attached a little tutorial on how to use increment_log_prob() to sample the sparse matrix. It uses the same GeoBUGS Scotland lip cancer example as my original CAR thread. It's pretty thoroughly documented, so hopefully you can get the gist just by reading through.

One sacrifice it makes is that it uses an estimate of the log determinant of the precision matrix instead of computing it directly. To scale it up beyond the 56 spatial units in the example, I recommend sampling the log determinant in R instead of Stan, because that step (which fortunately only has to happen once) is a couple orders of magnitude slower in Stan than R and can take quite a while once you're up to hundreds or thousands of spatial units.

There may be some tricks to replicate the CAR distribution using something like the multivariate Matt trick that Ben mentions - I haven't tried that, but would be very interested to hear how it works if you test it out.

Agreed that a built in distribution would be easier, but before that becomes feasible in a case like this I would think a sparse matrix object for Stan would be required.

On Sun, Mar 30, 2014 at 6:16 PM, MT <micha...@gmail.com> wrote:

Kyle,

I was thinking of going the lp_ route next, and your comment reassures it is a fruitful endeavor. Do you speculate that the lp_ might solve the "larger" 480 spatial neighbors problem, if scaling the data up is indeed the problem? Amongst my browsing, i think i read a tangent comment about a stan bottleneck about the 'differentiation tree'

As a new user of stan, i would think successful reliance on ' ~ ' and built in distributions would be a great pedagogical aid for newcomers, as users would not have to entertain two alternatives to implement their model.

I was hoping stan would successfully sample without relying on the lp_ mechanism.

It looks like, for niche models such as these, one is forced to use lp_.

--

You received this message because you are subscribed to a topic in the Google Groups "stan users mailing list" group.

To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/M7T7EIlyhoo/unsubscribe.

To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Apr 1, 2014, 7:07:37 PM4/1/14

to stan-...@googlegroups.com

That's great news. Thanks for sharing.

And of course, to dwell on the negative, what computation

is slow in Stan compared to R? log_determinant? And

you're talking about doing this in the generated quantities block

and not with parameters?

And we definitely have sparse matrix operations fairly high up

on our to-do list (which is still beyond the immediate future,

I'm afraid).

- Bob

> You received this message because you are subscribed to the Google Groups "stan users mailing list" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

And of course, to dwell on the negative, what computation

is slow in Stan compared to R? log_determinant? And

you're talking about doing this in the generated quantities block

and not with parameters?

And we definitely have sparse matrix operations fairly high up

on our to-do list (which is still beyond the immediate future,

I'm afraid).

- Bob

> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.

> <scotland_lip_cancer.RData.txt><sparse_MVN.R.txt>
Apr 1, 2014, 8:06:17 PM4/1/14

to stan-...@googlegroups.com

And of course, to dwell on the negative, what computation

is slow in Stan compared to R? log_determinant? And

you're talking about doing this in the generated quantities block

and not with parameters?

Yes, the log_determinant of the dense matrix was substantially slower in Stan than R. Last time I tried it in Stan was in December so maybe things have changed, but at that time the log determinant of a 2000x2000 matrix took on the order of 100x longer to compute in Stan than R.

I was computing it in the transformed data block, at 1000 samples of p (which is the only parameter that the log determinant varies by, and is constrained [0,1)). That way I only have to do it once (well, 1000 times, but just during initialization), and in the model block can then simply do a linear interpolation of the log determinant based on the value of parameter p.

Now to speed things up I have just computed those 1000 samples of the log determinant once in R and pass them to Stan as data.

It's hacky, but since the log determinant is such a tiny contributor to the overall log probability and takes so long to compute it seems to be a worthwhile compromise.

And we definitely have sparse matrix operations fairly high up

on our to-do list (which is still beyond the immediate future,

I'm afraid).

That's exciting to hear!

Apr 1, 2014, 8:26:10 PM4/1/14

to stan-...@googlegroups.com

Thanks for the report. I created a new issue:

https://github.com/stan-dev/stan/issues/613

to remind us to look into the log_determinant speed issue.

No reason we shouldn't be able to make it as fast as R's assuming

the way to do it is available in Eigen.

- Bob

https://github.com/stan-dev/stan/issues/613

to remind us to look into the log_determinant speed issue.

No reason we shouldn't be able to make it as fast as R's assuming

the way to do it is available in Eigen.

- Bob

Apr 1, 2014, 8:59:40 PM4/1/14

to stan-...@googlegroups.com

Thanks Bob. Just wrote a quick script to replicate and posted it to github issue tracker.

You received this message because you are subscribed to a topic in the Google Groups "stan users mailing list" group.

To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/M7T7EIlyhoo/unsubscribe.

To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Reply all

Reply to author

Forward

Message has been deleted

Message has been deleted

Message has been deleted

0 new messages

Search

Clear search

Close search

Google apps

Main menu