GeoBUGS distributions

975 views
Skip to first unread message

Marcel Jonker

unread,
Feb 14, 2013, 11:40:48 AM2/14/13
to stan-...@googlegroups.com
I've read the posts related to Kyle Foreman's attempts to implement the CAR distribution in STAN, but as a more general note, are there any plans to implement the GeoBUGS distributions directly in STAN?
I know there is a long list of things to do, including further vectorization of distributions and perhaps support for discrete parameters, but where do the CAR (and particularly the MV.CAR) distributions rank on this list?
Thanks!

Bob Carpenter

unread,
Feb 14, 2013, 2:56:09 PM2/14/13
to stan-...@googlegroups.com
I'm afraid they're not even on the list yet.  At least not until I understand them better or someone else is willing to code them.  

If you could send us a definition of the density function that you're interested in, it'd help (me at least) immensely.  Or even a link to the wikipedia page that defines it precisely.   I'm always a bit leery of trying to unpack narrative descriptions like the one in the GeoBUGS manual.  Especially when they're given conditional specifications and we just need the density. 

It would also help me understand the obstacle to just implementing them the way I implemented Gaussian processes.

- Bob

P.S.  Sorry for being off the list.  My mailer is blocking posts from the list for some @#$%^&* reason and I didn't even realize they were missing until today since I was getting reports through the new GitHub issue tracker.

Ben Goodrich

unread,
Feb 14, 2013, 3:46:13 PM2/14/13
to stan-...@googlegroups.com
On Thursday, February 14, 2013 11:40:48 AM UTC-5, Marcel Jonker wrote:
I've read the posts related to Kyle Foreman's attempts to implement the CAR distribution in STAN, but as a more general note, are there any plans to implement the GeoBUGS distributions directly in STAN?
I know there is a long list of things to do, including further vectorization of distributions and perhaps support for discrete parameters, but where do the CAR (and particularly the MV.CAR) distributions rank on this list?

I would say that *conditional* auto-regressive priors are not on the list and can't be on the list as Stan is formulated because Stan can't do conditional priors. By that I mean

beta_1 | beta_2, beta_3, ..., beta_K ~ normal()
beta_2 | beta_1, beta_3, ..., beta_K ~ normal()
...
beta_K | beta_1, beta_2,...,beta_K-1 ~ normal()

is not appropriate for an algorithm that evaluates the joint (log) posterior like Stan does.

However, if you can infer the joint distribution that a set of full-conditional distributions implies --- which is do-able for Gaussians --- then you can use that prior in Stan today, particularly using multi_normal_prec() which exists but unfortunately was not documented for v1.1.1. Basically, you just pass a precision matrix (the inverse of the joint covariance matrix) instead of the covariance matrix. There are probably some things that we can do to make spatial models more efficient in Stan, but that shouldn't stop anyone from trying them out now.

Ben

Jeffrey Arnold

unread,
Feb 14, 2013, 4:14:39 PM2/14/13
to stan-...@googlegroups.com
Just curious, but what would happen if you were to try to write a conditionally defined model in Stan?

beta_1 ~ normal( f(beta_2, beta_3), ... )
beta_2 ~ normal( f(beta_3, beta_1), ... )
beta_3 ~ normal( f(beta_1, beta_2), ... ) 

What is the posterior that Stan would be sampling from? or would it be nonsense? 


--
You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

John

unread,
Feb 14, 2013, 5:43:53 PM2/14/13
to stan-...@googlegroups.com
Well the conditional CAR is a rather weird improper distribution which requires a behind-the-scenes sum-to-zero constraint to make it identifiable in BUGS. It is, however, fast to implement.  However, the proper CAR model (also implemented in GeoBUGS) is, as the name implies, a proper statistical distribution which contain a special make-it-proper parameter in the covariance matrix.  There are many papers which describe this distribution such as: http://biostatistics.oxfordjournals.org/content/4/1/11.full.pdf. (See page 15.)

John

M.F. Jonker

unread,
Feb 14, 2013, 6:58:03 PM2/14/13
to stan-...@googlegroups.com
I came up with section 2.2. in this paper:
 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.147.3554&rep=rep1&type=pdf
It reviews several specifications and also proposes a more generalized version - not sure how to implement any of this in STAN though ;)   

Marcel


Date: Thu, 14 Feb 2013 14:43:53 -0800
From: john...@gmail.com
To: stan-...@googlegroups.com
Subject: [stan-users] Re: GeoBUGS distributions

Ben Goodrich

unread,
Feb 14, 2013, 7:06:49 PM2/14/13
to stan-...@googlegroups.com
On Thursday, February 14, 2013 6:58:03 PM UTC-5, Marcel Jonker wrote:
I came up with section 2.2. in this paper:
 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.147.3554&rep=rep1&type=pdf
It reviews several specifications and also proposes a more generalized version - not sure how to implement any of this in STAN though ;)   
 


Date: Thu, 14 Feb 2013 14:43:53 -0800
From: john...@gmail.com
To: stan-...@googlegroups.com
Subject: [stan-users] Re: GeoBUGS distributions

Well the conditional CAR is a rather weird improper distribution which requires a behind-the-scenes sum-to-zero constraint to make it identifiable in BUGS. It is, however, fast to implement.  However, the proper CAR model (also implemented in GeoBUGS) is, as the name implies, a proper statistical distribution which contain a special make-it-proper parameter in the covariance matrix.  There are many papers which describe this distribution such as: http://biostatistics.oxfordjournals.org/content/4/1/11.full.pdf. (See page 15.)
 
In both of the linked papers,

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.147.3554&rep=rep1&type=pdf
http://biostatistics.oxfordjournals.org/content/4/1/11.full.pdf

a Stan user would start with equation 2 and possibly generalizations thereof, i.e. with a joint multivariate normal distribution whose covariance matrix depends on a small number of scalars that is built within the transformed parameters {} block or the model block{}. A GeoBUGS user would start with equation 1 and go from there. But one implies the other mathematically.

To emphasize, it is the conditionality that is problematic for a non-Gibbs sampler like Stan. The improperness of a prior is not really a problem as long as it yields a proper posterior. If the joint covariance matrix is singular by construction, it will be a problem for Stan if you call multi_normal_prec() because it will take the log of a zero determinant. But if you implement the multivariate normal distribution manually in your .stan file and omit the (constant) log-determinant calculation, it will be fine.

Ben

Andrew Gelman

unread,
Feb 14, 2013, 7:08:20 PM2/14/13
to stan-...@googlegroups.com
The conditionality of the model is no problem with Stan.  All that is necessary is to write the joint density that has the desired conditionals (assuming such a joint density exists).
A

On Feb 14, 2013, at 7:06 PM, Ben Goodrich wrote:

To emphasize, it is the conditionality that is problematic for a non-Gibbs sampler like Stan. The improperness of a prior is not really a problem as long as it yields a proper posterior. If the joint covariance matrix is singular by construction, it will be a problem for Stan if you call multi_normal_prec() because it will take the log of a zero determinant. But if you implement the multivariate normal distribution manually in your .stan file and omit the (constant) log-determinant calculation, it will be fine.

Ben


Ben Goodrich

unread,
Feb 14, 2013, 7:22:36 PM2/14/13
to stan-...@googlegroups.com, gel...@stat.columbia.edu
I should have said it more precisely by saying that expressing the priors in full-conditional form --- as is typical for Gibbs samplers --- is a problem for a non-Gibbs sampler. In the abstract, the model may have equivalent representations in joint or conditional form, and the papers linked give both forms. So, if there is anyone that has an example of a GeoBUGS model that cannot be converted to joint form and run in Stan (or runs too slowly in Stan), let us know.

Ben

Ben Goodrich

unread,
Feb 14, 2013, 7:50:25 PM2/14/13
to stan-...@googlegroups.com
On Thursday, February 14, 2013 4:14:39 PM UTC-5, Jeffrey Arnold wrote:
Just curious, but what would happen if you were to try to write a conditionally defined model in Stan?

beta_1 ~ normal( f(beta_2, beta_3), ... )
beta_2 ~ normal( f(beta_3, beta_1), ... )
beta_3 ~ normal( f(beta_1, beta_2), ... ) 

What is the posterior that Stan would be sampling from? or would it be nonsense? 

It isn't quite nonsense, but it wouldn't be right. If you look at

http://www.stats.ox.ac.uk/~steffen/papers/fetal.ps

or some of the papers it references when BUGS was first being developed, there are some relevant theorems involving DAGs. Basically, if you wrote down all the full-conditionals in Stan and manually backed out the part on page 10 where it multiplies by p(w | parents_of(w)), for all w that are children of v, then you would be left with something proportional to p(v | parents_of(v)). And then according to the theorem expressed in equation 1), the product of p(v | parents_of(v)) over all v is equal to the joint density.

But, to be clear, we are not encouraging anyone to manually back that stuff out, nor does Stan have the capability of doing so automatically, nor could it easily be added to Stan. Just use the joint form of the prior from the outset.

Ben

Andrew Gelman

unread,
Feb 14, 2013, 9:13:20 PM2/14/13
to stan-...@googlegroups.com
Yup, the joint prior is what you want.  If the model has any hyperparameters to be estimated, it would be necessary in any case to write down the joint density in order to be able to figure out the normalizing function.

John

unread,
Feb 15, 2013, 12:44:50 AM2/15/13
to stan-...@googlegroups.com
Just to clarify: GeoBUGS implements a sum-to-zero constraint for the conditionally-specificed "intrinsic CAR".  Without it, there are identifiability problems with the intercept term in the model: 

"Since the CAR model defined above is improper (the overall mean of the S is not defined), it can only be used as a prior distribution for spatially distributed random effects, and not as a likelihood for data. It is often convenient to assume that suchrandom effects have zero mean. Besag and Kooperberg (1995) show that constraining the random effects to sum to zero and specifying a separate intercept term with a location invariant Uniform(-infty, +infty) prior is equivalent to the unconstrained parameterisation with no separate intercept. WinBUGS 1.4 includes a distribution called dflat() which corresponds to an improper (flat) prior on the whole real line. This prior must always be used for the intercept term in a model including CAR random effects." - GeoBUGS manual: http://mathstat.helsinki.fi/openbugs/Manuals/GeoBUGS/Manual.html#Intrinsic

It is my understanding that even if you specify the intrinsic CAR jointly, the identifiability issues still exist. I believe the proper CAR, with the extra rho parameter described on page 15 of the Biostat paper, is likely better in most cases.This is especially true if you want to model components of the spatial distribution, i.e. make them random parameters. This is something I did a while back for a genetics paper where genetic distance was used in place of physical distance.

Incidentally, I'm excited about the possibility of using spatial distributions in Stan.  Hopefully there will be large speed benefits.

John

Andrew Gelman

unread,
Feb 15, 2013, 12:53:01 AM2/15/13
to stan-...@googlegroups.com
I love talking about these models.  They were in my PhD thesis!

On Feb 15, 2013, at 12:44 AM, John wrote:

Just to clarify: GeoBUGS implements a sum-to-zero constraint for the conditionally-specificed "intrinsic CAR".  Without it, there are identifiability problems with the intercept term in the model: 

"Since the CAR model defined above is improper (the overall mean of the S is not defined), it can only be used as a prior distribution for spatially distributed random effects, and not as a likelihood for data. It is often convenient to assume that suchrandom effects have zero mean. Besag and Kooperberg (1995) show that constraining the random effects to sum to zero and specifying a separate intercept term with a location invariant Uniform(-infty, +infty) prior is equivalent to the unconstrained parameterisation with no separate intercept. WinBUGS 1.4 includes a distribution called dflat() which corresponds to an improper (flat) prior on the whole real line. This prior must always be used for the intercept term in a model including CAR random effects." - GeoBUGS manual: http://mathstat.helsinki.fi/openbugs/Manuals/GeoBUGS/Manual.html#Intrinsic

For any application, I think the way to go is to specify a proper prior on the hyperparameters, even for the (nonstationary) intrinsic CAR models.


It is my understanding that even if you specify the intrinsic CAR jointly, the identifiability issues still exist. I believe the proper CAR, with the extra rho parameter described on page 15 of the Biostat paper, is likely better in most cases.This is especially true if you want to model components of the spatial distribution, i.e. make them random parameters. This is something I did a while back for a genetics paper where genetic distance was used in place of physical distance.

Yes, but this is less of an issue with proper priors.


Incidentally, I'm excited about the possibility of using spatial distributions in Stan.  Hopefully there will be large speed benefits.

I expect this will depend on the matrix calculations in the normalizing factor being efficiently implemented.  If it's slow, please let us know and we'll look in to it!

John

unread,
Feb 15, 2013, 1:16:23 AM2/15/13
to stan-...@googlegroups.com
Just a final bit: I'm not saying that one shouldn't or couldn't use the improper intrinsic CAR in Stan. It's just that one must be careful regarding issues related to identifiably and/or singularity of the covariance matrix. 

Marcel Jonker

unread,
Apr 23, 2013, 11:59:47 AM4/23/13
to stan-...@googlegroups.com
I'm trying to implement the MVCAR distribution in STAN; see the attached code, which is still very much work-in-progress (!)

The MVCAR specification is based on formula 7 in the earlier mentioned paper ( http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.147.3554&rep=rep1&type=pdf ). However, without a Kronecker-product function and without 'Matlab-like' slicing of matrices, I'm not entirely sure what the best and/or easiest way would be to implement the required Kronecker products.. Perhaps someone else has solved this before? Thanks!

STAN LE model.txt

Bob Carpenter

unread,
Apr 23, 2013, 2:48:48 PM4/23/13
to stan-...@googlegroups.com
There's a block() operation in Stan that lets you pull
out submatrices given an upper-left (i,j) and num rows
and num cols. Is that what you mean by "slicing
of matrices"?

- Bob

M.F. Jonker

unread,
Apr 23, 2013, 7:28:09 PM4/23/13
to stan-...@googlegroups.com
I guess slicing isn't the right term then ;) In Matlab, you can calculate a Kronecker product of matrices A and B by first creating the right indices for the input matrices; for example;

[ia, ib] = meshgrid(1:nrRowsA,1:nrRowsB);
[ja, jb] = meshgrid(1:nrColsA,1:nrColsB);

The Kronecker product of A and B is then A(ia,ja) .* B(ib,jb). Is there perhaps a function to implement something similar in Stan?

> Date: Tue, 23 Apr 2013 14:48:48 -0400
> From: ca...@alias-i.com
> To: stan-...@googlegroups.com
> Subject: Re: [stan-users] Re: GeoBUGS distributions

M.F. Jonker

unread,
Apr 23, 2013, 7:47:49 PM4/23/13
to stan-...@googlegroups.com
Or, without special Matlab functions you could do something like this:

for i = 1:nrRowsA
 for j = 1:nrColsA
   Kron(nrRowsB*(i-1)+1:nrRowsB*i,nrColsB*(j-1)+1:nrColsB*j) = A(i,j)*B;
 end
end

Anyway, I'm just hoping that someone can explain me how to compute a Kronecker product in Stan ;) Thanks!


From: mfjo...@hotmail.com
To: stan-...@googlegroups.com
Subject: RE: [stan-users] Re: GeoBUGS distributions
Date: Tue, 23 Apr 2013 19:28:09 -0400

Ben Goodrich

unread,
Apr 23, 2013, 8:14:47 PM4/23/13
to stan-...@googlegroups.com
On Tuesday, April 23, 2013 7:47:49 PM UTC-4, Marcel Jonker wrote:
Anyway, I'm just hoping that someone can explain me how to compute a Kronecker product in Stan ;) Thanks!

At the moment, I think you would have to do it one element of the resulting matrix at a time.

Ben


Bob Carpenter

unread,
Apr 23, 2013, 8:49:55 PM4/23/13
to stan-...@googlegroups.com


On 4/23/13 7:28 PM, M.F. Jonker wrote:
> I guess slicing isn't the right term then ;)

I have no idea what MATLAB calls anything!

> In Matlab, you can calculate a Kronecker product of matrices A and B by
> first creating the right indices for the input matrices; for example;
>
> [ia, ib] = meshgrid(1:nrRowsA,1:nrRowsB);
> [ja, jb] = meshgrid(1:nrColsA,1:nrColsB);
>
> The Kronecker product of A and B is then A(ia,ja) .* B(ib,jb). Is there perhaps a function to implement something
> similar in Stan?

Though I don't know what that's doing (I've never
used Kronecker products), we don't have anything like
it in Stan.

> Or, without special Matlab functions you could do something like this:
>
> for i = 1:nrRowsA
> for j = 1:nrColsA
> Kron(nrRowsB*(i-1)+1:nrRowsB*i,nrColsB*(j-1)+1:nrColsB*j) = A(i,j)*B;
> end
> end
>
> Anyway, I'm just hoping that someone can explain me how to compute a Kronecker product in Stan ;) Thanks!

Can you get away without building up the matrix Kron? It's
going to be huge here (size(A) * size(B)).

- Bob

M.F. Jonker

unread,
Apr 24, 2013, 7:27:39 PM4/24/13
to stan-...@googlegroups.com
Hi Bob, Ben,

The distribution indeed requires the Kronecker product, so I've followed Ben's suggestion and programmed it element by element. It's not fast, but will have to do for the time being ;) Thank you for taking a look!

Best,
Marcel

> Date: Tue, 23 Apr 2013 20:49:55 -0400

> From: ca...@alias-i.com
> To: stan-...@googlegroups.com
> Subject: Re: [stan-users] Re: GeoBUGS distributions
>
>
>

Ben Goodrich

unread,
Apr 25, 2013, 1:03:52 PM4/25/13
to stan-...@googlegroups.com
On Wednesday, April 24, 2013 7:27:39 PM UTC-4, Marcel Jonker wrote:
The distribution indeed requires the Kronecker product, so I've followed Ben's suggestion and programmed it element by element. It's not fast, but will have to do for the time being ;) Thank you for taking a look!

Glad it is somewhat functional. In a case like a multivariate normal, you might be able to speed it up a little bit by being clever with the products. So, if you need (in R notation)

t(y) %*% ( (D - a * W) %x% Lambda ) %*% y

you end up repeatedly post-multiplying Lambda by segments of y and scaling those by elements of D - a * W. Particularly if D - a * W is sparse, you could squeeze out some efficiency gains by using temporaries.

This is one reason why Stan would need a really good implementation of Kronecker products to realize performance improvements, as opposed to merely having a convenient syntax for using the Kronecker product in Eigen.

Ben

M.F. Jonker

unread,
Apr 25, 2013, 5:48:12 PM4/25/13
to stan-...@googlegroups.com
Hi Ben,

Having a "dense" Kronecker product would already be very convenient (!) On the other hand, sparse products are indeed an order of magnitude faster (especially with spatial distributions) and optimizing the order of the areas can further improve computation speed; see the attached graph of the pre- and post-optimized spatial matrix (using Matlab's -symrcm- function).

Marcel


Date: Thu, 25 Apr 2013 10:03:52 -0700
From: goodri...@gmail.com

To: stan-...@googlegroups.com
Subject: Re: [stan-users] Re: GeoBUGS distributions

Optimized spatial matrix.png

M.F. Jonker

unread,
Apr 29, 2013, 3:09:43 PM4/29/13
to stan-...@googlegroups.com
Hi Bob, Ben,

With your help I've implemented the MVCAR distribution in two different models. One estimates age-specific mortality rates for a group of areas (e.g. neighborhoods in a city) and the other one correlates observed life expectancy with several area-specific characteristics. Both models compile, but apparently there is still something wrong.. One model throws a bad::alloc during runtime, and the other one throws an error that the covariance matrix is at some point no longer symmetric.. The latter doesn't happen with a smaller number of iterations. Apologies for such an open question, but do you perhaps have an idea what is going wrong?   I really do not know where to look (anymore).

Thanks,
Marcel


From: mfjo...@hotmail.com
To: stan-...@googlegroups.com
Subject: RE: [stan-users] Re: GeoBUGS distributions
Date: Thu, 25 Apr 2013 17:48:12 -0400
MVCAR models.zip

Bob Carpenter

unread,
Apr 29, 2013, 4:24:59 PM4/29/13
to stan-...@googlegroups.com
OK, I can see why you want a Kronecker product function after
having to write:

// MVCAR_prec <- Kronecker_product( D - A , MVCAR_scale );
temp <- (D - A);
for (a in 1:I){
for (b in 1:4){
rowNr <- (a-1)*4 +b;
for (c in 1:I){
for (d in 1:4){
colNr <-(c-1)*4 +d;
MVCAR_prec[rowNr,colNr] <- temp[a,c]*MVCAR_scale[b,d];
}}}}
f_star ~ multi_normal_prec( zeros, MVCAR_prec );


I didn't follow all the indexing closely, but after you construct
MVCAR_prec, you can check yourself if it's symmetric:

for (m in 1:rows(MVCAR_prec))
for (n in (m+1):cols(MVCAR_prec))
if (fabs(MVCAR_prec[m,n] - MVCAR_prec[n,m]) > 1e-8)
print("ASYMMETRY FOUND: m=", m, " n=", n);

If it's close but not exact, you can enforce symmetry by doing
this:

for (m in 1:rows(MVCAR_prec))
for (n in (m+1):cols(MVCAR_prec))
MVCAR_prec[m,n] <- MVCAR[n,m];

As to the second issue, you get a std::bad_alloc when the
runtime can't allocate enough memory. Can you check how your memory
grows as you run? I'm really hoping it's not a leak on our side!

One other issue is that you can vectorize (and simplify the code)
that looks like this:

HC1 ~ normal( HC_const[1] + HC_loading[1] * F[1] , HC_var[1]);
HC2 ~ normal( HC_const[2] + HC_loading[2] * F[1] , HC_var[2]);
HC3 ~ normal( HC_const[3] + HC_loading[3] * F[1] , HC_var[3]);

I'd rather see a loop with HC made into a 3D array of vectors,

for (i in 1:3)
HC[i] ~ normal(HC_const[i] + HC_loading[i] * F[1], HC_var[i]);

Although our code hasn't caught up with our designs, we plan to
vectorize further so that the above loop could be written as:

HC ~ normal(HC_const + HC_loading * F[1], HC_var);

You should get some feedback on those with more knowledge of stats
about this choice of prior:

HC_var ~ gamma(0.001, 0.001);

It may be pushing variances to zero a little too firmly.

Also note that the naming is off or you're missing a sqrt, because
Stan parameterizes basic normal() with standard deviation (i.e., scale),
not variance.

- Bob

On 4/29/13 3:09 PM, M.F. Jonker wrote:
> Hi Bob, Ben,
>
> With your help I've implemented the MVCAR distribution in two different models. One estimates age-specific mortality
> rates for a group of areas (e.g. neighborhoods in a city) and the other one correlates observed life expectancy with
> several area-specific characteristics. Both models compile, but apparently there is still something wrong.. One model
> throws a bad::alloc during runtime, and the other one throws an error that the covariance matrix is at some point no
> longer symmetric.. The latter doesn't happen with a smaller number of iterations. Apologies for such an open question,
> but do you perhaps have an idea what is going wrong? I really do not know where to look (anymore).
>
> Thanks,
> Marcel
>
> ------------------------------------------------------------------------------------------------------------------------
> From: mfjo...@hotmail.com
> To: stan-...@googlegroups.com
> Subject: RE: [stan-users] Re: GeoBUGS distributions
> Date: Thu, 25 Apr 2013 17:48:12 -0400
>
> Hi Ben,
>
> Having a "dense" Kronecker product would already be very convenient (!) On the other hand, sparse products are indeed an
> order of magnitude faster (especially with spatial distributions) and optimizing the order of the areas can further
> improve computation speed; see the attached graph of the pre- and post-optimized spatial matrix (using Matlab's -symrcm-
> function).
>
> Marcel
>
> ------------------------------------------------------------------------------------------------------------------------

M.F. Jonker

unread,
Apr 30, 2013, 5:56:41 AM4/30/13
to stan-...@googlegroups.com
Hi Bob,

1) I ran the factor model on a workstation with 36 Gb of RAM and STAN throws a bad::alloc error without ever reaching more than 2.5 Gb.
2) I implemented the asymmetry check in the mortality rate model, and in the first 20/30 iterations there are some asymmetries found and some samples rejected. Afterwards, everything sampled fine (I specified a total of 4000 iterations). The model doesn't properly converge in 4000 iterations, and when I specify 30,000 iterations, I again end up with asymmetry-warnings after approximately 8000 iterations. Given that there are several thousands of iterations in between without errors , I'm not entirely sure what to do - strictly enforcing symmetry would probably not be necessary if I had programmed it correctly, right? But on the other hand, there are thousands of iterations without asymmetry problems.

Thanks,
Marcel


> Date: Mon, 29 Apr 2013 16:24:59 -0400
> From: ca...@alias-i.com

Bob Carpenter

unread,
Apr 30, 2013, 4:58:32 PM4/30/13
to stan-...@googlegroups.com


On 4/30/13 5:56 AM, M.F. Jonker wrote:
> Hi Bob,
>
> 1) I ran the factor model on a workstation with 36 Gb of RAM and STAN throws a bad::alloc error without ever reaching
> more than 2.5 Gb.

Is this through R or on the command line?

Are you sure whatever you're running is running
in 64-bit mode rather than 32-bit mode? Often with
32-bit mode, you hit memory restrictions around 2GB
(what you can address with 31 bits [32 = 31 + sign]).

> 2) I implemented the asymmetry check in the mortality rate model, and in the first 20/30 iterations there are some
> asymmetries found and some samples rejected. Afterwards, everything sampled fine (I specified a total of 4000
> iterations). The model doesn't properly converge in 4000 iterations, and when I specify 30,000 iterations, I again end
> up with asymmetry-warnings after approximately 8000 iterations. Given that there are several thousands of iterations in
> between without errors , I'm not entirely sure what to do - strictly enforcing symmetry would probably not be necessary
> if I had programmed it correctly, right? But on the other hand, there are thousands of iterations without asymmetry
> problems.

Probably the best thing to do is to enforce symmetry
anyway. If you can only compute the entries for
(i,j) with i <= j, you can use those to set the values
for i > j. This is both more efficient computationally
(re-using auto-diff vars) and also avoids lack of symmetry.

Floating point operations are tricky. Even if the
algebra over real numbers is symmetric, it's not necessarily
going to be the case in a computer implementation. There
are very simple cases where (a + (b + c)) is not equal to
((a + b) + c). Or where a, b > 0 and (a + b) == a.

M.F. Jonker

unread,
May 1, 2013, 7:03:18 PM5/1/13
to stan-...@googlegroups.com
Hi Bob,

I used the command line (not R) and made sure that the model is compiled using 64bit. The model itself uses much less than 1 Gb during runtime (the rest of the 2.5 Gb is used by Windows and other programs). The precision matrix in the model is approx. 5600 by 5600 - is that too large for STAN? Or perhaps you have another suggestion where to look at?

Marcel

> Date: Tue, 30 Apr 2013 16:58:32 -0400

> From: ca...@alias-i.com
> To: stan-...@googlegroups.com
> Subject: Re: [stan-users] Re: GeoBUGS distributions
>
>
>

Bob Carpenter

unread,
May 1, 2013, 8:09:38 PM5/1/13
to stan-...@googlegroups.com


On 5/1/13 7:03 PM, M.F. Jonker wrote:
> Hi Bob,
>
> I used the command line (not R) and made sure that the model is compiled using 64bit. The model itself uses much less
> than 1 Gb during runtime (the rest of the 2.5 Gb is used by Windows and other programs). The precision matrix in the
> model is approx. 5600 by 5600 - is that too large for STAN? Or perhaps you have another suggestion where to look at?

That could certainly blow out memory. Just
the double values in a 5600 x 5600 matrix take up
250MB. If they're auto-dif vars, it's around triple
that. (We could cut this down with a good symmetric
matrix implementation, but we don't have one and neither
does the Eigen lib we're using.

Maybe Ben has some idea on how to reformulate this
more efficiently?

Even if you can get this model off the ground, it's
going to be slow. It'll require on the order of
O(N^3) operations to run solvers on it, which is
on the order of 2e+11. And that'll be per leapfrog
step per iteration.

- Bob

M.F. Jonker

unread,
May 2, 2013, 3:32:58 AM5/2/13
to stan-...@googlegroups.com
Hi Bob,

I programmed the MVCAR in STAN because the OpenBUGS MVCAR implementation only accepts a Wishart as prior for the scale matrix. The required variance-of-one constraint in the model demands a unit (or at least a constant) diagonal. And STAN has this really great (positive-definitive) correlation matrix with unit-diagonal and flat priors on the off-diagonal elements. Hence if you or Ben have an idea how to get the model of the ground that would be absolutely great!

Best,
Marcel


> Date: Wed, 1 May 2013 20:09:38 -0400

Ben Goodrich

unread,
May 2, 2013, 10:12:23 AM5/2/13
to stan-...@googlegroups.com
On Thursday, May 2, 2013 3:32:58 AM UTC-4, Marcel Jonker wrote:
I programmed the MVCAR in STAN because the OpenBUGS MVCAR implementation only accepts a Wishart as prior for the scale matrix. The required variance-of-one constraint in the model demands a unit (or at least a constant) diagonal. And STAN has this really great (positive-definitive) correlation matrix with unit-diagonal and flat priors on the off-diagonal elements. Hence if you or Ben have an idea how to get the model of the ground that would be absolutely great!

Clarification: The prior on a correlation matrix is jointly uniform (iff the shape hyperparameter is 1.0) but not marginally uniform on the correlations.

As for using such a prior with a really large correlation matrix, doing would be more feasible if it were possible to work directly with its Cholesky factor and thus avoid some O(K^3) operations. It is technically possible to do so with Stan syntax but if it isn't urgent, it would probably be better to wait until we can make Cholesky factors be primitive types.

Ben

M.F. Jonker

unread,
May 2, 2013, 12:52:18 PM5/2/13
to stan-...@googlegroups.com
Hi Ben,

 If there is a way to estimate the model with the current version of STAN, that would be great - even if it samples very slowly - because it's somewhat urgent by now. The problematic matrix is the Kronecker product of a 4x4 correlation matrix and a large adjacency matrix (loaded as data), so the correlation matrix itself is pretty small. Only the precision matrix of the mult_normal_prec is large. Perhaps that makes a difference?


Date: Thu, 2 May 2013 07:12:23 -0700
From: goodri...@gmail.com

To: stan-...@googlegroups.com
Subject: Re: [stan-users] Re: GeoBUGS distributions

Ben Goodrich

unread,
May 2, 2013, 1:16:49 PM5/2/13
to stan-...@googlegroups.com
On Thursday, May 2, 2013 12:52:18 PM UTC-4, Marcel Jonker wrote:
 If there is a way to estimate the model with the current version of STAN, that would be great - even if it samples very slowly - because it's somewhat urgent by now. The problematic matrix is the Kronecker product of a 4x4 correlation matrix and a large adjacency matrix (loaded as data), so the correlation matrix itself is pretty small. Only the precision matrix of the mult_normal_prec is large. Perhaps that makes a difference?

It wouldn't help much, in part because the correlation matrix is small and in part because even if you had the Cholesky factor, our matrix multiplication operations generally wouldn't exploit the fact that it is triangular.

If you have something like

Sigma^-1 = (D - a * W) %x% Lambda

where Lambda is a correlation matrix and are going to multiply that from both sides by a vector y and D - a * W is sparse, then you can be more efficient without explicitly constructing the Kronecker product. Basically, make an array of 4-vectors that holds the product of Lambda post-multiplied by each 4-element segment of y. Scale those by the appropriate (if non-zero) element of D - a * W, sum up to get a single vector, premultiply that by transpose(y), multiply that sum by 0.5 and subtract it from lp__.

Ben


Ben Goodrich

unread,
May 2, 2013, 1:47:59 PM5/2/13
to stan-...@googlegroups.com

Attached is an R script that demonstrates this idea of circumventing the Kronecker product, which could be adapted to Stan syntax fairly easily but it would probably be worth the effort to avoid multiplying by elements of D - a * W that are zero.

Ben


quadratic_kronecker.txt

Marcus Brubaker

unread,
May 3, 2013, 5:44:24 PM5/3/13
to stan-...@googlegroups.com
Hi Marcel,

I've been working on something related to this and I realized that
perhaps the easiest way to do what you need in Stan in a reasonably
efficient way is to implement a matrix_normal(_prec) distribution. I
believe the distribution you're computing can formulated in terms of a
matrix normal distribution, see
http://en.wikipedia.org/wiki/Matrix_normal_distribution for details.

If this is the case let me know and I should be able to get a fairly
efficient version of that in Stan pretty quickly.

Cheers,
Marcus

M.F. Jonker

unread,
May 4, 2013, 3:42:06 AM5/4/13
to stan-...@googlegroups.com
Hi Marcus,

You're right! :) A matrix_normal_prec distribution would be a great solution (!)
So, if/once you have a working version, I would love to use it..

Thanks,
Marcel

> Date: Fri, 3 May 2013 17:44:24 -0400

> Subject: Re: [stan-users] Re: GeoBUGS distributions

M.F. Jonker

unread,
May 6, 2013, 11:33:25 AM5/6/13
to stan-...@googlegroups.com
Dear all,

Ben has been so kind to explain where the relationship between F ~ multi_normal_prec(zeros, prec_matrix) and his previously posted multiplication of y' * prec_matrix * y comes from. For those interested (I thought it was pretty cool), I attached it here.


Let Sigma_inv be the precision matrix defined by Kronecker_product( D - a*W , corr_matrix ). Then, F ~ multi_normal_prec(zeros, Sigma_inv) is (after taking the logarithm) proportional to (see http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Density_function )
log(det(Sigma_inv)) - 0.5 * (y - zeros)' * Sigma_inv * (y - zeros)

or since the vector of zeros is irrelevant
log(det(Sigma_inv)) - 0.5 * y' * Sigma_inv * y

So, it is worth looking into optimized ways to calculate y' * Sigma_inv * y where you don't have to explicitly form the big, sparse Sigma_inv matrix via Kronecker_product( D - a*W , corr_matrix ).
I neglected to mention the part about log(det(Sigma_inv)) but again you would want to exploit the properties of the Kronecker product (see http://en.wikipedia.org/wiki/Kronecker_product#Properties ) and write

log(det(Sigma_inv))

as

p * log(det(D - a*W)) + n * log(det(corr_matrix))

where p and n are the order of the respective matrices.

All together, instead of doing

F ~ multi_normal_prec( zeros, Kronecker_product( D - a*W , corr_matrix ) );

do the equivalent two steps

lp__ <- lp__ + p * log_determinant(DmaW) + n * log_determinant(corr_matrix); // determinant part
lp__ <- lp__ - 0.5 * that_trick_I_posted; // kernel part

It may be possible to speed this up further. For example, instead of doing log(det(corr_matrix)), I think it would be faster to take the Cholesky factor of corr_matrix and calculate twice the sum of the log diagonal elements, but for a 4x4 matrix, it probably won't make much difference. Another thing is to not literally compute D - a * W but instead exploit the fact that the elements of W are either 0 or 1 and do

matrix[p,p] DmaW;
for(i in 1:p) {
  DmaW[i,i] <- diagD[i]; // i'th diagonal element
  for(j in 1:p) if(W[i,j] ==1) DmaW[i,j] <- -a; // off-diagonals
}

Again, it may be faster to take the Cholesky factor of DmaW and use that to calculate the log determinant than to do log(det(DmaW)), which uses a QR factorization.

Ben

Kyle Foreman

unread,
May 9, 2013, 3:55:22 PM5/9/13
to stan-...@googlegroups.com
I just caught up on this thread and realized I should've been following it more closely since we're trying to do very similar things, Marcel! 

Re: using lp__ += instead of multi_normal_prec(), I found about a 2x speedup in my own case (where W is 51x51). Would be curious as to how that speedup scales to your larger matrix. 

And I'll have to try your trick for computing DmaW more efficiently, Ben - will report back on its effectiveness.

Best
Kyle

Kyle Foreman

unread,
May 9, 2013, 10:51:00 PM5/9/13
to stan-...@googlegroups.com
In case anyone is interested, just wanted to update that in my case using the loop to fill in the DmaW matrix made no difference in terms of speed. Maybe for something bigger than a [51,51] matrix it'll have a bigger impact though?

tds151

unread,
Dec 18, 2013, 5:17:41 PM12/18/13
to stan-...@googlegroups.com
Hi, While one could certainly directly update lp for the log_det and matrix variate kernel associated to a separable covariance matrix (e.g.  a covariance matrix formulated as a tensor product of lower order covariance matrices), a matrix variate normal (that would input precision matrices) would be very nice as it would presumably provide an easy-to-use implementation.  I didn't fit it in the manual for Stan 2.0.1.  Did this ever happen? 

thank you, terrance.

M.F. Jonker

unread,
Dec 19, 2013, 6:01:11 AM12/19/13
to stan-...@googlegroups.com
Hi Terrance,

Marcus has been so kind to send me a pre-final version of the matrix_normal_prec distribution, and it works like a charm for small spatial matrices.  However, with more areas (e.g. 100+) STAN quickly becomes orders of magnitude slower than GeoBUGS.

Cheers,
Marcel


Date: Wed, 18 Dec 2013 14:17:41 -0800
From: tds...@gmail.com

To: stan-...@googlegroups.com
Subject: Re: [stan-users] Re: GeoBUGS distributions

terrance savitsky

unread,
Dec 19, 2013, 10:26:35 AM12/19/13
to stan-...@googlegroups.com
Thank you for your reply and input on the relative performance.  The dimension of each GP trajectory would be about 300 for me, so I take note.  In my case, however, I'd like to additively chain together multiple covariance kernels, while also modeling the dependence among a set of units where each expresses a function (drawn from a GP).  So GeoBUGS isn't going to get it.  In theory, STAN has the flexibility to model something like this.
 
Re: speed, I know the best approach would be to project an N dimension GP to a lower dimension N_1 < N.  The resulting approximation would also numerically stabilize the covariance matrix.  This is what Aki implements in GPstuff in a variety of flavors.  Though STAN is a wonderful tool, I've never achieved reasonable chain mixing in STAN under a variety of research-calibre models due to scale differences across the parameter space.  a black box HMC with a sparse/diagonal mass matrix just doesn't seem to work for anything I've ever tried to implement.  So before I invest a lot of time to perform the projection to a lower dimension, myself (by directly updating lp_) in STAN, I'd like to have a quicker solution that I can test to see if I achieve good mixing.  My model is less hierarchical than is typical for my work, so I'm hoping this time I will achieve good mixing.
 
So Marcus, may I have a copy of your matrix_normal() distribution.  I also found a post about a multi_gp() distribution.  Is this available, as well? 
 
Thank you, Terrance

Thank you, Terrance Savitsky

Bob Carpenter

unread,
Dec 19, 2013, 2:52:39 PM12/19/13
to stan-...@googlegroups.com
Does anyone know why GeoBUGS is faster? I barely
understand these models, but I'm wondering if it's
something we could code into Stan.

- Bob

Andrew Gelman

unread,
Dec 19, 2013, 2:55:48 PM12/19/13
to stan-...@googlegroups.com
Yah, things shouldn't be that way!

Marcus Brubaker

unread,
Dec 19, 2013, 8:05:42 PM12/19/13
to stan-...@googlegroups.com
I don't know GeoBUGS very well but I can pretty easily guess two of the major culprits:

- Stan has no support for sparse matrices and some of these are very sparse.  Alternatively, we don't support the specific spatial distributions which could be specified in a more restricted form without requiring the use of sparse matrices.
- A smart implementation of these distributions would cache the Cholesky factorizations, log determinants, etc of matrices which are only a function of data.  Stan has no capacity to do this, making some models *very* inefficient because the same large matrix must be refactored at every log_prob evaluation.

I've actually been thinking about how we could solve the second issue if we were so inclined by adding an optional "cache" structure where the *_log functions could store and reuse information about data parameters.  I don't really have time to do it, but it could significantly speed up GP prediction models as well as these kings of spatial distributions.

Cheers,
Marcus

terrance savitsky

unread,
Dec 19, 2013, 8:39:53 PM12/19/13
to stan-...@googlegroups.com
thanks for providing more insight into how stan works.  i'd appreciate your defining 'refactoring'?  let's say there is a single predictor used to generate a squared exponential GP covariance matrix.  So it would like something like:
G = sigma^2 * exp(-rho^-1 * Omega) where Omega is a solely a function of the data and sigma and rho are parameters.  So G must be re-built and the cholesky re-computed.  is this the refactoring?  of course, it all gets worse if there are multiple predictors that each index a length-scale parameter.  then Omega is a dense function of parameters and data that is not seperable and must be re-built on every sampling iteration.

i tried to find something about the sampler for the geoBUGS for the squared exponential, but I couldn't find anything.  Ironically, i wondered if the sampler was HMC, but with analytic derivatives.  i thought this because i think geoBUGS focuses on a single predictor, squared exponential GP.

terrance.
Thank you, Terrance Savitsky

terrance savitsky

unread,
Dec 20, 2013, 8:07:20 AM12/20/13
to stan-...@googlegroups.com
Marcus, May I have a copy of the matrix_normal() distribution you wrote?  I will have one of the two covariance matrices formulated as a sum of squared exponential GP formulations.  Although it will be 300 x 300, I'd like to see if i'm able to achieve good mixing in STAN.
 
thank you, Terrance.

Bob Carpenter

unread,
Dec 20, 2013, 1:39:58 PM12/20/13
to stan-...@googlegroups.com
On 12/19/13, 8:39 PM, terrance savitsky wrote:
> thanks for providing more insight into how stan works. i'd appreciate your defining 'refactoring'?

We talk about this a lot, so it's worth mentioning with its
own subject. You could do worse than reading this:

http://en.wikipedia.org/wiki/Code_refactoring

The first paragraph is:

Code refactoring is a "disciplined technique for restructuring
an existing body of code, altering its internal
structure without changing its external behavior",[1] undertaken
in order to improve some of the nonfunctional
attributes of the software. Advantages include improved code
readability and reduced complexity to improve the
maintainability of the source code, as well as a more expressive
internal architecture or object model to improve
extensibility.

It's a grey line as to when something moves from being "refactoring"
into more general code development.

Fowler's book is great, by the way, even if it is in Java.

- Bob

Marcus Brubaker

unread,
Dec 20, 2013, 7:25:39 PM12/20/13
to stan-...@googlegroups.com
I don't think this was quite what he was after.  :)  

The 'refactoring' in this case was the need to repeatedly compute the Cholesy factorization of the same matrix at every log_prob call.

Cheers,
Marcus





- Bob

--
You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+unsubscribe@googlegroups.com.

Marcus Brubaker

unread,
Dec 20, 2013, 7:30:42 PM12/20/13
to stan-...@googlegroups.com
Terrance,

You can find the version of matrix_normal_prec in this branch: https://github.com/mbrubake/stan/tree/improper_matrix_normal_prec in my personal repo.  Note that it's based on an ancient (by Stan standards) version of Stan so it's likely to have issues and uses the old command line interface.  Also, note that the only thing it really does is skip the positive-definiteness check of matrices which are data.  This avoids some unnecessary computation and lets you use a semi-definite matrix, however you need to make sure the result isn't an improper posterior so use at your own risk.

Cheers,
Marcus
Reply all
Reply to author
Forward
0 new messages