random effects models with known covariance matrix

1,468 views
Skip to first unread message

f....@gmx.net

unread,
Sep 10, 2012, 12:22:18 PM9/10/12
to stan-...@googlegroups.com
Hello,

I wanted to try to use STAN for fitting more complex random effects models, where the random effects have covariance matrices that are known up to a constant.

I tried to fit the basic "animal model" (used for computing breeding values for selection purposes) to get started. In this model, each individual is assigned a random effect, which has the following prior


data { 
  matrix[N,N] A ;
         ...
}

parameters { 
  vector[N] u;
  real <lower=0> sigmasq_u;
 ...
}

... 
model{
u ~ multi_norm(0 , A * sigmasq_u)
}

where A is the "numerator relationship matrix" and is known from pedigree, theory or from genomic marker data. 

Even for a relatively moderate size of N = 500 individuals, does STAN take extremely long to get in just 500 iterations, and for N = 1000, STAN will use up all of my 16 GB memory.
This can't be, even if STAN is inverting A over and over again.


The code is attached. Any suggestion would be appreciated.


Regards,

Frank

animal_model.txt

Marcus Brubaker

unread,
Sep 10, 2012, 12:46:52 PM9/10/12
to stan-...@googlegroups.com

One thing you can try is to use multi_normal_cholesky

I.E., add

transformed data {
matrix[N,N] A_chol;
A_chol <- cholesky_decompose(A);
}

then in the model

u ~ multi_normal_cholesky(0,sqrt(sigmasq_u)*A_chol);

Also, you should be able to use the vectorized version of the normal distribution. I.E., instead of the for loop over ys write

y ~ normal(beta0 + u, sqrt(sig2_e));

Cheers,
Marcus

Bob Carpenter

unread,
Sep 10, 2012, 2:27:38 PM9/10/12
to stan-...@googlegroups.com


On 9/10/12 12:22 PM, f....@gmx.net wrote:
>...
> data {
> matrix[N,N] A ;
> ...
> }
>
> parameters {
> vector[N] u;
> real <lower=0> sigmasq_u;
> ...
> }
>
> ...
> model{
> u ~ multi_norm(0 , A * sigmasq_u)
> }
>
> where A is the "numerator relationship matrix" and is known from pedigree, theory or from genomic marker data.
>
> Even for a relatively moderate size of N = 500 individuals, does STAN take extremely long to get in just 500 iterations,
> and for N = 1000, STAN will use up all of my 16 GB memory.
> This can't be, even if STAN is inverting A over and over again.

Unfortunately, it can be. Because sigmasq_u is a parameter,
A * sigmasq_u is implicitly a transformed parameter. That means
when the solver's run on it in this basic parameterization, it needs
to run an O(N^3) algorithm on it. That generates O(N^3) subexpressions,
each of which requires approximately 12 bytes of storage.
The whole expression graph gets stored in order to evaluate
gradients.

With N = 500, that's 500^3, and I'm surprised you could get
Stan off the ground at all.

As Marcus suggested, the Cholesky parameterization will be
MUCH more efficient.

- Bob

Ben Goodrich

unread,
Sep 10, 2012, 8:30:41 PM9/10/12
to stan-...@googlegroups.com
On Monday, September 10, 2012 12:46:53 PM UTC-4, Marcus wrote:

One thing you can try is to use multi_normal_cholesky

I.E., add

transformed data {
matrix[N,N] A_chol;
A_chol <- cholesky_decompose(A);
}

then in the model

u ~ multi_normal_cholesky(0,sqrt(sigmasq_u)*A_chol);

Also, you should be able to use the vectorized version of the normal distribution. I.E., instead of the for loop over ys write

y ~ normal(beta0 + u, sqrt(sig2_e));

Another thing to try (especially if RAM is the binding constraint) is to do the multivariate normal part yourself in the .stan file with

lp__ <- lp__ + whatever;

statements. If A_chol is fixed but sigmasq_u is a parameter, then sqrt(sigmasq_u)*A_chol is a parameter and multi_normal_cholesky() is doing a giant triangular solve every time the function gets called which also spews out a lot of autodiff calculations. If you invert A once at the outset (to get A_inv) and be careful not to divide by sigmasq_u too soon, it will go faster. Something like

lp__ <- lp__ - 0.5 * N * log(sigmasq_u); // non-constant part of log(det(sigmasq_u * A)^-0.5)
lp__ <- lp__ - (u' * A_inv * u) / ( 2 * sigmasq_u); // log of kernel of multivariate normal

Ben

Frank

unread,
Sep 11, 2012, 4:32:24 AM9/11/12
to stan-...@googlegroups.com
The custom made multivariate normal sampler suggested by Ben really did the trick! Thanks allot.

Memory is not an issue anymore, even with N = 1000. 

I think that Stan is now at least not slower than Jags/MCMCglmm ... on a per iteration base. Considering that Stan is supposed to need much less total number of iterations for a given ESS, it may have quiet an edge.

Best regards,

Frank

Ben Domingue

unread,
Sep 11, 2012, 7:12:23 AM9/11/12
to stan-...@googlegroups.com
Frank,
Would you mind sharing your updated code? I'm not quite clear on how the accumulator trick is altering the sampling part for u.

This has been a tremendously helpful thread.

Ben

Frank

unread,
Sep 11, 2012, 7:18:06 AM9/11/12
to stan-...@googlegroups.com
here it is, from simulated data it seems that the results are very sensible

Frank
animal_model.txt

Frank

unread,
Sep 11, 2012, 7:19:54 AM9/11/12
to stan-...@googlegroups.com
I replied to the wrong post. So here is the code. From simulations it seems that the results produced are very sensible.

Frank
animal_model.txt

Ben Goodrich

unread,
Sep 11, 2012, 9:44:15 AM9/11/12
to stan-...@googlegroups.com
If you want a little more juice, you can exploit the fact that

u' * A_inv * u = u' * ( LA_inv * LA_inv' ) * u = (u' * LA_inv) * (LA_inv' * u) = v' v

where LA_inv is the Cholesky factor of A_inv and v = LA_inv' * u . Thus, the lower triangle of LA_inv' is all zeros. Stan is smart enough to ignore multiplication and addition involving zeros and also knows not to carry through a bunch of constant autodiffs.

So, the blocks of your .stan file would look like

transformed data {
  matrix
[N,N] A_inv; A_inv <- inverse(A);
  t_LA_inv
<- transpose(cholesky(A_inv));
}
 
model
{
 vector
[N] v;
 real sum_of_squares
;
 v
<- t_LA_inv * u;

 sum_of_squares
<- 0.0; // or sum(pow(v, 2)) may / should work
 
for(i in 1:N) {
   sum_of_squares
<- sum_of_squares + pow(v[i], 2);

 
}

 lp__
<- lp__ - 0.5 * N * log(sigmasq_u); // non-constant part of log(det(sigmasq_u * A)^-0.5)

 lp__
<- lp__ - sum_of_squares / (2 * sigmasq_u); // log of kernel of multivariate normal

 
// rest of your likelihood and priors
}


Ben

Marcus Brubaker

unread,
Sep 11, 2012, 11:04:57 AM9/11/12
to stan-...@googlegroups.com
With this latest version you should be able to write:

v ~ normal(0,sqrt(sigmasq_u));

And get almost exactly the same performance without having to muck
about with the log probability directly.

Or, alternatively, if you really care about getting the most speed
possible leave the lp__ statements as they are and replace the loop

> sum_of_squares <- 0.0; // or sum(pow(v, 2)) may / should work
> for(i in 1:N) {
> sum_of_squares <- sum_of_squares + pow(v[i], 2);
>
> }

with

sum_of_squares <- dot_product(v,v);

which will be a fair bit faster than the loop, particularly for large N.

Cheers,
Marcus

Bob Carpenter

unread,
Sep 11, 2012, 4:46:54 PM9/11/12
to stan-...@googlegroups.com
Glad to hear it worked.

The basic idea is to just generate values for the
missing data using the same model as for the non-missing
data. In this situation, zero predictors is just a very
small set of predictors. So you just sample unconditionally
from the prior rather than conditioning on predictors.

- Bob


On 9/11/12 7:19 AM, Frank wrote:
> I replied to the wrong post. So here is the code. From simulations it seems that the results produced are very sensible.
>
> Frank
>
> Am Dienstag, 11. September 2012 13:12:44 UTC+2 schrieb Ben:
>
> Frank,
> Would you mind sharing your updated code? I'm not quite clear on how the accumulator trick is altering the sampling
> part for u.
>
> This has been a tremendously helpful thread.
>
> Ben
>
> On Tue, Sep 11, 2012 at 2:32 AM, Frank <f....@gmx.net <javascript:>> wrote:
>
> The custom made multivariate normal sampler suggested by Ben really did the trick! Thanks allot.
>
> Memory is not an issue anymore, even with N = 1000.
>
> I think that Stan is now at least not slower than Jags/MCMCglmm ... on a per iteration base. Considering that
> Stan is supposed to need much less total number of iterations for a given ESS, it may have quiet an edge.
>
> Best regards,
>
> Frank
>
>
>
>
> Am Dienstag, 11. September 2012 02:30:41 UTC+2 schrieb Ben Goodrich:
>
> On Monday, September 10, 2012 12:46:53 PM UTC-4, Marcus wrote:
>
> One thing you can try is to use multi_normal_cholesky
>
> I.E., add
>
> transformed data {
> matrix[N,N] A_chol;
> A_chol <- cholesky_decompose(A);
> }
>
> then in the model
>
> u ~ multi_normal_cholesky(0,sqrt(__sigmasq_u)*A_chol);

Ben Domingue

unread,
Sep 12, 2012, 5:32:03 PM9/12/12
to stan-...@googlegroups.com
Hello all,
I've been trying to get estimation along these lines to work for a few
days now with no success. My attempts based on Frank's and Ben's
suggestions are attached (with errors at bottom). I also tried to use
Marcus's suggestion but got the same error regarding the cholesky
decomposition. It looks to me like there are some type problems with
how it's currently set, but the details are a little fuzzy. Any
advice?

I know at least one of the solutions (only the last suggestion from
Marcus I believe) was only supposed to work in 1.0.1. I'm still
running 1.0.0 as I get the error at bottom when I attempt to install
1.0.1.

Thanks again!
Ben


r<T, std::allocator<_CharT> >&) [with T = double]':
chains.cpp:150: instantiated from here
../inst/include/stansrc/stan/prob/autocovariance.hpp:63: error:
return-statement with a value, in function returning 'void'
make: *** [chains.o] Error 1
ERROR: compilation failed for package 'rstan'
* removing '/home/domingue/R/x86_64-redhat-linux-gnu-library/2.15/rstan'
Warning in install.packages("rstan", type = "source") :
installation of package 'rstan' had non-zero exit status
frank_solution.txt
ben_solution.txt

Ben Goodrich

unread,
Sep 12, 2012, 9:27:12 PM9/12/12
to stan-...@googlegroups.com
On Wednesday, September 12, 2012 5:32:25 PM UTC-4, Ben wrote:
r<T, std::allocator<_CharT> >&) [with T = double]':
chains.cpp:150:   instantiated from here
../inst/include/stansrc/stan/prob/autocovariance.hpp:63: error:
return-statement with a value, in function returning 'void'
make: *** [chains.o] Error 1
ERROR: compilation failed for package 'rstan'
* removing '/home/domingue/R/x86_64-redhat-linux-gnu-library/2.15/rstan'
Warning in install.packages("rstan", type = "source") :
  installation of package 'rstan' had non-zero exit status

Our bad.

## TRANSLATING MODEL 'anon_model' FROM Stan CODE TO C++ CODE NOW.

## Error in stanc(file = file, model_code = model_code, model_name = model_name,  :
##   failed to parse Stan model 'anon_model' and error message provided as:
## LOCATION:  file=input; line=11, column=40

##   t_LA_inv <- transpose(cholesky(A_inv));
##                                        ^-- here

## DIAGNOSTIC(S) FROM PARSER:

## no matches for function name="cholesky"
##     arg 0 type=matrix
## expression is ill formed
## Parser expecting: <eps>

Contrary to what I said, the function in question is called cholesky_decompose() rather than cholesky().

Ben

 

Ben Goodrich

unread,
Sep 12, 2012, 9:28:56 PM9/12/12
to stan-...@googlegroups.com
On Wednesday, September 12, 2012 5:32:25 PM UTC-4, Ben wrote:
r<T, std::allocator<_CharT> >&) [with T = double]':
chains.cpp:150:   instantiated from here
../inst/include/stansrc/stan/prob/autocovariance.hpp:63: error:
return-statement with a value, in function returning 'void'
make: *** [chains.o] Error 1
ERROR: compilation failed for package 'rstan'
* removing '/home/domingue/R/x86_64-redhat-linux-gnu-library/2.15/rstan'
Warning in install.packages("rstan", type = "source") :
  installation of package 'rstan' had non-zero exit status

BTW, can you tell us the output of

g++ --version

?

Thanks,
Ben

Ben Domingue

unread,
Sep 12, 2012, 9:50:22 PM9/12/12
to stan-...@googlegroups.com
~>g++ --version
g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Ben Domingue

unread,
Sep 12, 2012, 10:47:12 PM9/12/12
to stan-...@googlegroups.com
That does explain it. Just a note, but I'm finding a reference to
cholesky() on page 119 but no cholesky_decompose().

With that change, the solution using the Cholesky decomposition starts
producing the same error as the version produced by Frank's code
(which used just the inverse):

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Error in sampler$call_sampler(args_list[[i]]) :
No acceptably small step size could be found. Perhaps the posterior
is not continuous?

What is strange about this error is that if I restrict the data to a
small enough sample things run just fine. This makes me think I'm at
least not doing something too dumb, but clearly I'm still out in left
field.

Thanks!
Ben

Ben Domingue

unread,
Sep 12, 2012, 10:48:29 PM9/12/12
to stan-...@googlegroups.com
Sorry to complicate things. That output (the 4.6.3 version of g++ was
from my laptop). The server that is refusing to update stan is older:
g++ (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
Copyright (C) 2010 Free Software Foundation, Inc.

I'll ask to have it upgraded as that could definitely be part of the
problem (2010!).

Ben

Ben Goodrich

unread,
Sep 12, 2012, 11:11:52 PM9/12/12
to stan-...@googlegroups.com
On Wednesday, September 12, 2012 10:47:33 PM UTC-4, Ben wrote:
That does explain it. Just a note, but I'm finding a reference to
cholesky() on page 119 but no cholesky_decompose().

That got me too. The documentation is wrong.
 
With that change, the solution using the Cholesky decomposition starts
producing the same error as the version produced by Frank's code
(which used just the inverse):

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Error in sampler$call_sampler(args_list[[i]]) :
  No acceptably small step size could be found. Perhaps the posterior
is not continuous?

Can you attach the code that simulates the data?

Ben

Daniel Lee

unread,
Sep 12, 2012, 11:16:33 PM9/12/12
to stan-...@googlegroups.com
Please grab the latest version of the documentation. This and other
typos have been corrected for v1.0.1.

Ben Goodrich

unread,
Sep 12, 2012, 11:26:04 PM9/12/12
to stan-...@googlegroups.com
Good. But, in general, would it be easy to make stanc --functions spit out all the exposed functions / signatures?

Daniel Lee

unread,
Sep 12, 2012, 11:32:27 PM9/12/12
to stan-...@googlegroups.com
On Wed, Sep 12, 2012 at 11:26 PM, Ben Goodrich <goodri...@gmail.com> wrote:
> Good. But, in general, would it be easy to make stanc --functions spit out
> all the exposed functions / signatures?

I'm not sure how easy / hard that would be, but I think viewing it
through the command line isn't the way to go. We already have 200+
separate function signatures.

Ben Domingue

unread,
Sep 13, 2012, 9:49:45 AM9/13/12
to stan-...@googlegroups.com
In preparing a simulated example, I found the problem. Thanks for the nudge.

Ben

Ben Goodrich

unread,
Sep 13, 2012, 10:59:43 AM9/13/12
to stan-...@googlegroups.com
That is certainly good. It would still help if you could attach the simulation code and the .stan file you ultimately ended up using, so when someone else asks about this in the future, we can just link to your post. Or we might want to incorporate it into the manual in the future or something.

Ben

Bob Carpenter

unread,
Sep 13, 2012, 2:41:52 PM9/13/12
to stan-...@googlegroups.com
On 9/12/12 11:26 PM, Ben Goodrich wrote:
> Good. But, in general, would it be easy to make
> stanc --functions spit out all the exposed functions / signatures?

It wouldn't be difficult in the sense that it'd just
be a big block of prints within an if statement in the
command code.

The nice part of such a list is that it would be
guaranteed to match the list of functions for the release
the user was using. The not so nice part of such a list
is that we'd have to keep it in synch. So far we have:

1. actual function definitions in C++
2. function_signatures specification for Stan parser
3. LaTeX manual documentation

and we want to add

4. LaTeX manual cheat-sheet short lists

and perhaps

5. a list spit out using --functions on stanc

The only difficulty is in maintenance -- making sure the
list printed by stanc using a --functions flag matches
what we really have.

I'm also not convinced that the ASCII output would
be that readable, but hey, neither is the output for
"man g++" or even "man tar".

Andrew's already requested a few such tables for the manual.
I'm going to look at auto-generating the table as
a kind of index structure in LaTeX once I figure out how to format
the argument and result types. If at the same time I can
generate a matching ASCII file of functions, then adding
a flag to stanc to spit out the functions would make sense.

One issue in such a list is how to list vectorized function
signatures. I'm just using the pseudo-type "reals" in the manual,
which can match simple scalars, scalar arrays, and Eigen
vectors or row vectors.

- Bob

Ben Domingue

unread,
Sep 14, 2012, 11:34:25 AM9/14/12
to stan-...@googlegroups.com
Howdy all,
I've wrapped the few examples I've been working with into the attached
monstrousity. To offer a little context, this paper is the jumping off
point:
http://www.gllamm.org/BIOMtwins_08.pdf

I'm trying to demonstrate that stan makes the tricks explored in that
paper unnecessary since one can structure the covariance matrix of
random effects directly. To that end, the attached code comes in two
big blocks:
1. Simulate family data to correspond to Section 5.2 of the referenced paper.
3. Estimate model directly based on structure of A covariance matrix.

If anyone has thoughts on this (or finds errors), I'd be interested in
hearing them. Thanks again for all the help.

Ben

PS-While I'm writing, I have two add'l questions:
1. I'm wondering if there are plans to add the skew-normal
distribution to the distributions one can access.
2. This block of stan code:
vector[ni] vA;
real sum_of_squaresA;
vA <- t_LA_inv * uA;
#
sum_of_squaresA <- dot_product(vA,vA);
lp__ <- lp__ - 0.5 * ni * log(var_l2A); // non-constant part of
log(det(var_l2A * A)^-0.5)
lp__ <- lp__ - sum_of_squaresA / (2 * var_l2A); // log of kernel of
multivariate normal
magically leads to sampling of the uA random effects. I'd be really
interested in knowing more about how this happens.
sim_problems.R

Ben Goodrich

unread,
Sep 14, 2012, 2:39:57 PM9/14/12
to stan-...@googlegroups.com
On Friday, September 14, 2012 11:34:46 AM UTC-4, Ben wrote:
Howdy all,
I've wrapped the few examples I've been working with into the attached
monstrousity. To offer a little context, this paper is the jumping off
point:
http://www.gllamm.org/BIOMtwins_08.pdf

Thank you.

PS-While I'm writing, I have two add'l questions:
1. I'm wondering if there are plans to add the skew-normal
distribution to the distributions one can access.

To the extent there are "plans", they are usually referenced in TO-DO.txt in the top level of the Stan directory, which can also be read at

https://code.google.com/p/stan/source/browse/TO-DO.txt

In general, we plan to add more distributions, although I have not heard of any mention of the skew-normal. However, adding a distribution to Stan is probably one of the easiest ways to contribute, even if you know very little C++. Just copy

src/stan/prob/distributions/univariate/continuous/normal.hpp

to

src/stan/prob/distributions/univariate/continuous/skew_normal.hpp

Then, hack skew_normal.hpp to add the skewness parameter (say alpha). Finally, you need to expose the skew_normal distribution to the parser, which we can help you with when the time comes.

Alternatively, it is possible to use any (log) density that can be written in closed form inside the .stan file using the

lp__ <- lp__ + whatever;

syntax. In the case of the skew-normal,

https://en.wikipedia.org/wiki/Skew_normal_distribution#Definition

you can just utilize the (log) normal density and add this line to the model{} block

lp__ <- lp__ + log(normal_cdf(alpha * x));

provided that alpha is defined in the parameters{} block. But you presumably want an informative prior on alpha to prevent alpha * x from going into a region of the parameter space where log(normal_cdf(alpha * x)) cannot be evaluated accurately in floating point arithmetic.

2. This block of stan code:
  vector[ni] vA;
  real sum_of_squaresA;
  vA <- t_LA_inv * uA;
  #
  sum_of_squaresA <- dot_product(vA,vA);
  lp__ <- lp__ - 0.5 * ni * log(var_l2A); // non-constant part of log(det(var_l2A * A)^-0.5)
 
  lp__ <- lp__ - sum_of_squaresA / (2 * var_l2A); // log of kernel of multivariate normal
 
magically leads to sampling of the uA random effects. I'd be really
interested in knowing more about how this happens.

This was primarily intended to ease the computational burden that Frank was up against when the A matrix was fixed but huge. In general, one could simply use the non-magic

u ~ multi_normal(mean_vector, var_l2A * A);

However, this was wasteful in Frank's case because var_l2A is an unknown parameter in which case var_l2A * A is an unknown parameter in which case Stan needs to invert var_l2A * A and carry thousands of auto-differentiation sub-expressions to calculate the partial derivative with respect to var_l2A. We can avoid this by noting that

(var_l2A * A)^-1 = (1/var_l2A) * A^-1

and further noting that A^-1 is a constant iff A is a constant. Unfortunately, at the moment Stan does not have a parameterization of the multivariate normal in terms of the precision matrix; it is only parameterized in terms of the covariance matrix. Thus, we have to implement the desired parameterization of the (log) multivariate normal density within the .stan file. See

https://en.wikipedia.org/wiki/Multivariate_normal#Non-degenerate_case

There are several mathematically equivalent ways of proceeding that differ in computation time. Note that Frank's prior mean vector is zero, so we can ignore that part. The simplest way is to write

sum_of_squares = u' * A_inv * u;

However, this calculation does not exploit the fact that A_inv is symmetric, nor does it exploit the fact that the gradient wrt u is known to be A_inv * u. So, Stan would have to eventually arrive at this fact via the chain rule.

It is better to exploit the fact that

A_inv = L * L'

where L is a Cholesky factor of A_inv, in which case

u' * A_inv * u = u' (L * L') * u = v' * v

where v = L' * u. This formulation is helpful to Stan in two ways. First, L' has all zeros in its upper triangle so when it is post-multiplied by u, Stan does not waste (much) time multiplying elements of u by zero and more importantly does not add constant branches to the expression tree that it uses to calculate the gradient wrt u. Second, Marcus implemented a special version of dot_product(v,v) that is faster and uses less memory. It would be even better if we exposed the dot_self(v) function to the parser, but at the moment we don't.

At this point, the elements of the transformed vector v (or vA as you call it) are each distributed univariate normal with variance var_l2A, so you could just write

vA ~ normal(0, sqrt(var_l2A));

but Marcus says it is slightly faster to not do that an instead add the lines

sum_of_squaresA = dot_product(vA,vA);

lp__
<- lp__ - 0.5 * ni * log(var_l2A); // non-constant part of log(det(var_l2A * A)^-0.5)
lp__
<- lp__ - sum_of_squaresA / (2 * var_l2A); // log of kernel of multivariate normal

Note that

- sum_of_squaresA / (2 * var_l2A) = - 0.5 * u' * (var_l2A * A)^-1 * u

which is the kernel of the (log) multivariate normal density. But we also have to remember that det(var_l2A * A) is a function of the unknown parameter var_l2A, which is reflected in the second line. But we do not want to literally calculate that determinant (or any determinant of a dense matrix if we can avoid it). Thus, we use the identity for the determinant of a scalar times a matrix given at

https://en.wikipedia.org/wiki/Determinant#Properties_of_the_determinant

and take the log to get - 0.5 * ni * log(var_l2A).

Ben

Bob Carpenter

unread,
Sep 14, 2012, 3:52:15 PM9/14/12
to stan-...@googlegroups.com


On 9/14/12 11:34 AM, Ben Domingue wrote:
> Howdy all,
> I've wrapped the few examples I've been working with into the attached
> monstrousity. To offer a little context, this paper is the jumping off
> point:
> http://www.gllamm.org/BIOMtwins_08.pdf
>
> I'm trying to demonstrate that stan makes the tricks explored in that
> paper unnecessary since one can structure the covariance matrix of
> random effects directly. To that end, the attached code comes in two
> big blocks:
> 1. Simulate family data to correspond to Section 5.2 of the referenced paper.
> 3. Estimate model directly based on structure of A covariance matrix.
>
> If anyone has thoughts on this (or finds errors), I'd be interested in
> hearing them. Thanks again for all the help.

I hope someone else who's better at multivariate modeling
than me can take a look at your code.

We're going to be adding more data types to make
definining covariance structures easier. (Specifically
a Cholesky factor type.)

> PS-While I'm writing, I have two add'l questions:
> 1. I'm wondering if there are plans to add the skew-normal
> distribution to the distributions one can access.

We're waiting for user feedback on what to add. It's
pretty easy at this point for us to add vectorized log
probability functions -- around a person day or less with
testing. Is there an R function somewhere you trust we
can use to generate unit tests? (Once we get some basic
values we trust, then we can use the function itself to
test the gradients with finite differences.)

The CDFs for truncation can be much trickier, because
we want to be able to differentiate w.r.t. the parameters
as well as the outcome.

> 2. This block of stan code:
> vector[ni] vA;
> real sum_of_squaresA;
> vA <- t_LA_inv * uA;
> #
> sum_of_squaresA <- dot_product(vA,vA);
> lp__ <- lp__ - 0.5 * ni * log(var_l2A); // non-constant part of
> log(det(var_l2A * A)^-0.5)
> lp__ <- lp__ - sum_of_squaresA / (2 * var_l2A); // log of kernel of
> multivariate normal
> magically leads to sampling of the uA random effects. I'd be really
> interested in knowing more about how this happens.

I'm not sure what exactly you're asking. There's
a description of how Stan works in the manual.

The basic idea is that lp__ is the log probability accumulator.
You can write an entire Stan model without ~ by incrementing
lp__ directly. We then automatically differentiate the function
defined by the model and use it in NUTS. There's a transform
step for constrained variables, too. And like other Metropolis
algorithms we only need the log probability function up to a
proportion.

- Bob

Bob Carpenter

unread,
Sep 14, 2012, 4:17:00 PM9/14/12
to stan-...@googlegroups.com


On 9/14/12 2:39 PM, Ben Goodrich wrote:
> On Friday, September 14, 2012 11:34:46 AM UTC-4, Ben wrote:
>
> Howdy all,
> I've wrapped the few examples I've been working with into the attached
> monstrousity. To offer a little context, this paper is the jumping off
> point:
> http://www.gllamm.org/BIOMtwins_08.pdf <http://www.gllamm.org/BIOMtwins_08.pdf>
>
>
> Thank you.
>
> PS-While I'm writing, I have two add'l questions:
> 1. I'm wondering if there are plans to add the skew-normal
> distribution to the distributions one can access.
>
>
> To the extent there are "plans", they are usually referenced in TO-DO.txt in the top level of the Stan directory, which
> can also be read at
>
> https://code.google.com/p/stan/source/browse/TO-DO.txt
>
> In general, we plan to add more distributions, although I have not heard of any mention of the skew-normal. However,
> adding a distribution to Stan is probably one of the easiest ways to contribute, even if you know very little C++. Just
> copy
>
> src/stan/prob/distributions/univariate/continuous/normal.hpp
>
> to
>
> src/stan/prob/distributions/univariate/continuous/skew_normal.hpp

It might be easier to start with an unvectorized distribution like
Student-t. The vectorized distributions like the normal
add the complexity of expression templates and heavily
templated base auto-dif classes to the already metaprogramming heavy
probability function code (which uses trait-based metaprogramming for
return types and intermediate value types and to optimize calcs up
to a proportion by eliminating constraint terms).

> Then, hack skew_normal.hpp to add the skewness parameter (say alpha). Finally, you need to expose the skew_normal
> distribution to the parser, which we can help you with when the time comes.
>
> Alternatively, it is possible to use any (log) density that can be written in closed form inside the .stan file using the
>
> lp__ <- lp__ + whatever;
>
> syntax. In the case of the skew-normal,
>
> https://en.wikipedia.org/wiki/Skew_normal_distribution#Definition
>
> you can just utilize the (log) normal density and add this line to the model{} block
>
> lp__ <- lp__ + log(normal_cdf(alpha * x));

There is no unary normal_cdf() function. Looking
at the definition, I think you meant

alpha ~ normal(x,0,1); // adds log(normal(x,0,1)) to lp
lp__ <- lp__ + log(Phi(alpha * x));

You can add appropriate offsets and divde by scale
if not centered and unit scaled.

This is additional motivation for CDFs on the log scale
in addition to numerical stability of truncation.

- Bob

Ben Goodrich

unread,
Sep 14, 2012, 5:04:16 PM9/14/12
to stan-...@googlegroups.com

For the general Wikipedia parameterization of the skew-normal, I think it would be

// some priors on alpha, xi, and omega
x
~ normal(xi, omega);
lp__
<- lp__ + log(normal_cdf(alpha * x, xi, omega));

but if xi = 0 and omega = 1, then

// some prior on alpha
x
~ normal(0,1);

lp__
<- lp__ + log(Phi(alpha * x));
This is additional motivation for CDFs on the log scale
in addition to numerical stability of truncation.

Indeed.

Ben

Andrew Gelman

unread,
Sep 15, 2012, 9:39:12 PM9/15/12
to stan-...@googlegroups.com
Ben Domingue:
If/when you get this working, please let me know and I can tell Sophia Rabe-Hesketh about it, as she and I have been talking about the possibility of fitting various psychometrics models in Stan. Thanks.
Andrew
> <sim_problems.R>

Message has been deleted

Bob Carpenter

unread,
Oct 19, 2016, 11:37:56 AM10/19/16
to stan-...@googlegroups.com, gel...@stat.columbia.edu
I didn't see the code, just colons followed by nothing.

Text in the body or a plain text attachment is best for reading.

- Bob

> On Oct 17, 2016, at 5:37 PM, Robert Corty <rco...@gmail.com> wrote:
>
> Hi all,
>
> I'm working with a model similar to the one Frank asked about in 2012 -- multilevel model with known covariance matrix.
>
> I started inefficiently with:
>
>
>
> Based on the recommendations of the STANual, I improved that to:
>
>
>
>
> Now I'm looking at the section on "Multivariate Reparametrizations" (page 233 in version 2.12.0-2 and seeing some more maths relating to the Cholesky decomposition and the multivariate normal.
> Have already gotten the benefits of the Cholesky trick described here by using "multi_normal_cholesky"?
> Are there any other ways I can/should improve this model?
>
> Robert Corty
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages