Help with multivariate ordered logistic model

1,351 views
Skip to first unread message

terrance savitsky

unread,
Sep 29, 2012, 7:30:21 PM9/29/12
to stan-...@googlegroups.com
I have 2 sets of multivariate ordinal responses, y1[i,d] and y2[i,d], modeled as ordered_logistic with intrinsic traits, c1[i,d] and c2[i,d] for d = 1, ..., D.   The intrinsic traits are connected by sharing hierarchical multivariate random effects.  For example, there is D x M, Delta[T],  and Phi[S] where d = 1, ... D are multivariate dimension and m = 1, ..., (M=2) are alternate measurement modes.   Then c1[i,d] = Delta[teacher1[i]][d,1] + Phi[section1[i]][d,1] and  c2[i,d] = Delta[teacher2[i]][d,2] + Phi[section2[i]][d,2].  Each latent response has ordered cutpoints, D x ncut, lim1 and lim2.

The model crashes during sampling, with the following error message:

SAMPLING FOR MODEL 'modelFAcum_lkj' NOW (CHAIN 1).
Error in sampler$call_sampler(args_list[[i]]) : 
  Error in function stan::prob::ordered_logistic(N4stan5agrad3varE): Location parameter is 1.#QNAN:0, but must be finite!

This indicates to me that one or more of the ordered cutpoint values spins off to infinity.   In the attached model, the cutpoints are defined such that the first is fixed to 0 as a hard constraint, which is necessary to establish location identifiability.  (A soft constraint of an informative prior isn't sufficient).   That said, I ran a simpler version, taking away this constraint to see if the problem went away, but it didn't - I received the exact same error message.   

The model section includes statement blocks for each of the hierarchical multivariate random effects.  lp__ is incremented using a matrix normal distribution, which is the equivalent of vectorizing the parameters and using a structured kronecker product of D x D and M x M covariance matrices.   The first set of random effect parameters, Delta, composes the D x D covariance under a factor analytic structure, where cov(Delta) = Lambda * Lambda' + Theta, for D x K, Lambda where K < D.   A lower triangular structure is enforced for Lambda (for rotation identification) by reading a binary "pick" matrix, B, filled with 0's and 1's that multiplies against a D x K matrix of parameters, Lambda_s, with each element sampled from a standard normal.

My intuition is that placing informative priors on the cutpoints won't fix the problem, but maybe all parameters need relatively informative priors due to the model complexity to accomplish joint sampling.  I let Stan generate initial parameter values, which should be fine (under the declared constraints).  That said, I'm now experimenting with inputing my own initial values.

This model is a first step to see if i'm able to get Stan to work.   I'm intending to replace the ordered_logistic construction with a latent response.  I've created another script that employs tricks generously offered from Stan developers to get around the lack of functionality for by-element truncation points.   Such a construction will eventually allow me to generalize the logit link function to something semi-parametric (e.g. multimodal).   I have successfully implemented similar models in JAGS, but for M = 1 mode.   That said, I have combined datasets in JAGS under this type of model that share portions of multivariate random effect parameters in a similar fashion.   Because JAGS requires conjugacy for covariance matrices (and doesn't allow direct updates to the joint log probability) I would have to define D*M x D*M unstructured covariance matrices to use JAGS.

Thanks for your time and help.  Terrance


-- 
Thank you, Terrance Savitsky
modelFAcum_lkj.stan
datstan.Rdump

Ben Goodrich

unread,
Sep 29, 2012, 8:33:00 PM9/29/12
to stan-...@googlegroups.com
Hi Terrance,

For help debugging models --- particularly for models as complicated as this --- it would help if you could narrow the problem down yourself as much as possible before posting. Here is what I did to figure it out. If you haven't already done so, definitely install version 1.0.2 because it introduces the print() function.

I added these print lines to your .stan file:
        /* Formulate c1[i,d] and c2[i,d], mean of latent response or multi-variate trait in ordered_logistic model */
       
print("Delta= ", Delta[teacher1[1]][1,1]);
       
print("Phi= ", Phi[section1[1]][1,1]);
       
print("Theta= ", Theta[session1[1]][1,1]);
       
print("Zeta= ", Zeta[session_rater1[1]][1,1]);
       
print("Beta_mean= ", Beta_mean[rater1[1]][1,1]);
       
for( i in 1:N1 ) for( d in 1:D ) c1[i,d] <- Delta[teacher1[i]][d,1] + Phi[section1[i]][d,1] + Theta[session1[i]][d,1] + Zeta[session_rater1[i]][d,1] + Beta_mean[rater1[i]][d,1];
       
for( i in 1:N2 ) for( d in 1:D ) c2[i,d] <- Delta[teacher2[i]][d,2] + Phi[section2[i]][d,2] + Theta[session2[i]][d,2] + Zeta[session_rater2[i]][d,2] + Beta_mean[rater2[i]][d,2];


I then compiled compiled with -g (you may have to add that in the Makefile) and O=0 and ran it under gdb

goodrich@CYBERPOWERPC:/tmp/Terrance$ gdb --args ./modelFAcum_lkj --data=datstan.Rdump
GNU gdb
(GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /tmp/Terrance/modelFAcum_lkj...done.
(gdb) catch throw
Catchpoint 1 (throw)
(gdb) run
Starting program: /tmp/Terrance/modelFAcum_lkj --data=datstan.Rdump
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Delta= 1.04147
Phi= -1.2006
Theta= 1.17
Zeta= 1.49939
Beta_mean= nan
Catchpoint 1 (exception thrown), 0x00007ffff791db90 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6


At this point, you could fool around in gdb to investigate further; Google shows lots of HOWTOs for gdb.

But the print() statements in the .stan file indicate that the elements of c1 (and c2) are nan because the elements of Beta_mean are nan, which happens because you attempted to use it before filling it:

goodrich@CYBERPOWERPC:/tmp/Terrance$ grep -n Beta_mean modelFAcum_lkj.stan
79:     matrix[D,M] Beta_mean[Rr];
97:     print("Beta_mean= ", Beta_mean[rater1[1]][1,1]);
98:     for( i in 1:N1 ) for( d in 1:D ) c1[i,d] <- Delta[teacher1[i]][d,1] + Phi[section1[i]][d,1] + Theta[session1[i]][d,1] + Zeta[session_rater1[i]][d,1] + Beta_mean[rater1[i]][d,1];
99:     for( i in 1:N2 ) for( d in 1:D ) c2[i,d] <- Delta[teacher2[i]][d,2] + Phi[section2[i]][d,2] + Theta[session2[i]][d,2] + Zeta[session_rater2[i]][d,2] + Beta_mean[rater2[i]][d,2];
118:    for( r in 1:Rr ) Beta_mean[r] <- Beta[r] + mean_Beta;
228:    /* functions of multivariate hierarchical parameters, Delta, Phi, Theta, Zeta, Beta_mean */


So, I think line 118 (maybe 113 in your original .stan file) needs to be moved farther up or something.

Ben

Andrew Gelman

unread,
Sep 29, 2012, 8:37:55 PM9/29/12
to stan-...@googlegroups.com
We should definitely have an example in the manual using the print statement to debug (if we don't have that already in the manual).
A

Daniel Lee

unread,
Sep 29, 2012, 8:48:11 PM9/29/12
to stan-...@googlegroups.com

Bob has written it into the latest version of the manual. Section 5 starting on page 29.

For anyone that hasn't downloaded the latest versions of the manual or the software, please do. There have been many improvements to both.

Daniel

terrance savitsky

unread,
Sep 29, 2012, 8:55:55 PM9/29/12
to stan-...@googlegroups.com
Ben,  Thanks a lot for the coaching and sorry for just dropping the problem on you.  To tell the truth, I've struggled with how to use gdb for debugging C++ code in an R environment.   I typically use Rcpp and wind up debugging through inputting C++ code files with inline.   Your points help me to get started with gdb.  Thanks for taking the time with this and getting back to me so quickly.

I now have my opportunity to try using gdb and print as I fixed the point you noted and now receive:

SAMPLING FOR MODEL 'modelFAcum_lkj' NOW (CHAIN 1).
Error in sampler$call_sampler(args_list[[i]]) : 
  Error in function stan_fit15383be23b0c_modelFAcum_lkj_namespace::write_array(d): y is not positive definite. y(0,0) is 0.27558073566315999.

That seems like something really bad.   I will come back to the list, if necessary, after performing a debugging.

Terrance.

Ben Goodrich

unread,
Sep 29, 2012, 9:00:47 PM9/29/12
to stan-...@googlegroups.com
Daniel or Bob,

I think we should also compile a version of the manual that highlights the changes relative to the previous version of the manual so that people can quickly find where user-relevant changes are discussed. Maybe something like

https://eothred.wordpress.com/2010/11/07/latexdiff-and-git/

Ben

Ben Goodrich

unread,
Sep 29, 2012, 9:04:25 PM9/29/12
to stan-...@googlegroups.com
On Saturday, September 29, 2012 8:55:57 PM UTC-4, tds151 wrote:
I now have my opportunity to try using gdb and print as I fixed the point you noted and now receive:

SAMPLING FOR MODEL 'modelFAcum_lkj' NOW (CHAIN 1).
Error in sampler$call_sampler(args_list[[i]]) : 
  Error in function stan_fit15383be23b0c_modelFAcum_lkj_namespace::write_array(d): y is not positive definite. y(0,0) is 0.27558073566315999.

That seems like something really bad.   I will come back to the list, if necessary, after performing a debugging.

Off the top of my head, I would guess that maybe one of your unique variances underflowed to zero so that a covariance matrix with factor structure would only be positive semi-definite. But it could also be a typo somewhere.

Ben

Ben Goodrich

unread,
Sep 29, 2012, 9:13:01 PM9/29/12
to stan-...@googlegroups.com
On Saturday, September 29, 2012 8:55:57 PM UTC-4, tds151 wrote:
Ben,  Thanks a lot for the coaching and sorry for just dropping the problem on you.  To tell the truth, I've struggled with how to use gdb for debugging C++ code in an R environment.   I typically use Rcpp and wind up debugging through inputting C++ code files with inline.   Your points help me to get started with gdb.  Thanks for taking the time with this and getting back to me so quickly.

You can utilize R (plain R, probably not one of the GUIs) in conjunction with gdb, see

http://cran.r-project.org/doc/manuals/R-exts.html#Debugging-compiled-code

And in rstan, see the new set_cppo(mode =  "debug") functionality.

Ben

terrance savitsky

unread,
Sep 29, 2012, 9:48:49 PM9/29/12
to stan-...@googlegroups.com
Ben,  A quick question.  I'm including print statements in my .stan file, but am not receiving output.   of course, maybe i am not making it through even a leap frog step within an iteration (as my print statements are in the transformed parameters block).  So I went back and re-created the error you found as you indicate that you received a result from an included print statement.   I will also go the gdb route, but i would typically start with print() as it will be quicker.  Is there something else to do to receive the printed outputs?



On Sat, Sep 29, 2012 at 5:33 PM, Ben Goodrich <goodri...@gmail.com> wrote:

terrance savitsky

unread,
Sep 29, 2012, 10:12:51 PM9/29/12
to stan-...@googlegroups.com
I guess the print output does not migrate to the R layer if executing using rstan.  one has to run from the command line??

Ben Goodrich

unread,
Sep 29, 2012, 11:05:21 PM9/29/12
to stan-...@googlegroups.com, Jiqiang Guo
On Saturday, September 29, 2012 10:12:52 PM UTC-4, tds151 wrote:
I guess the print output does not migrate to the R layer if executing using rstan.  one has to run from the command line??

I think it was intended to be visible from R. @Mav, do you know what's up?

Ben

Jiqiang Guo

unread,
Sep 29, 2012, 11:27:05 PM9/29/12
to stan-...@googlegroups.com
Yes. RStan should support the print statement. 

But in this case, I guess the error was thrown out before the print gets called. But I am not sure since you have got that in gdb.  But I cannot get it in Stan either.  I added a print to the model and run the model in Stan:

mmac2:Downloads jq$ grep print modelFAcum_lkj.stan 
    print("Beta_mean=", Beta_mean[rater1[1]][1,1]);
mmac2:Downloads jq$ ./modelFAcum_lkj --data=datstan.Rdump 

Exception: Error in function stan::prob::ordered_logistic(N4stan5agrad3varE): Location parameter is nan:0, but must be finite!
Diagnostic information: 
Throw location unknown (consider using BOOST_THROW_EXCEPTION)
Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::domain_error> >
std::exception::what: Error in function stan::prob::ordered_logistic(N4stan5agrad3varE): Location parameter is nan:0, but must be finite!

Jiqiang

terrance savitsky

unread,
Sep 29, 2012, 11:39:31 PM9/29/12
to stan-...@googlegroups.com
Hi Jiqiang, I wonder how Ben was able to obtain output from the printed statement on the same .stan file.  I thought maybe the difference was compiling, linking and running from the command line instead of through R.  Also, I correct the error and eventually receive another about that the sampler is unable to find a small enough step size and suggests maybe i have a discrete likelihood.   anyway, it seems the sampler would be stepping through some leap frog steps, which should generate print() output.

On a related point, I've used stanc() in R to write the c++ code to a file.  I wanted to avoid installing a duplicate copy of stan, since one is installed from R.   So i updated the #include <stan/model/model_header.hpp>  statement to #include <R/R-2.15.1/library/rstan/include/stansrc/stan/model/model_header.hpp>.   I'm working in windows and also updated the PATH variable.   I'm not able to compile from the command line, however, as the compiler still can't find the model_header.hpp file (which is there), even after I update the #include statement.  Any ideas?

terrance.

Ben Goodrich

unread,
Sep 29, 2012, 11:49:30 PM9/29/12
to stan-...@googlegroups.com
I can see the print() statements within or outside gdb

goodrich@CYBERPOWERPC:/tmp/Terrance$ ./modelFAcum_lkj --data=datstan.Rdump
Delta= 0.371263
Phi= 0.803654
Theta= -0.220654
Zeta= 0.873778
Beta_mean= nan

Exception: Error in function stan::prob::ordered_logistic(N4stan5agrad3varE): Location parameter is nan:0, but must be finite!
Diagnostic information:
Throw location unknown (consider using BOOST_THROW_EXCEPTION)
Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::domain_error> >
std
::exception::what: Error in function stan::prob::ordered_logistic(N4stan5agrad3varE): Location parameter is nan:0, but must be finite!


goodrich@CYBERPOWERPC
:/tmp/Terrance$


Ben

terrance savitsky

unread,
Sep 30, 2012, 11:47:36 AM9/30/12
to stan-...@googlegroups.com
thanks,  3 things:

1. working from windows 7, i'm unable to compile the c++ code file generated from stanc from the command line.  the compiler is unable to find module_header.hpp, despite my changing the subdirectory in the #include statement to the correct location (as noted in an earlier post).  so i'm unable to get launched with gdb, just now.

2. you note that the print statements appear under command line runs - either with or without gdb.  that says they should also appear in R to the extent that the Rcpp command is used to migrate the output.  The print outputs don't appear in R for me.  Do they for you?

3. so i fixed the syntax error where a parameter was used before defined that you helpfully uncovered by just moving the statement.  now i receive 2 error alternating error messages.  one notes that some variable is not positive definite.  the other error message (that replaces this one) is that the sampler is unable to find a sufficiently small leapfrog step size.  although my model looks complicated, it's actually a relatively straightforward parametric multivariate sampler wrapped in a glm of a type i've successfuly implemented in JAGS.  it looks complicated because i've employed tips learned from the stan team to deliver better speed and memory management under automated differentiation by taking inverses once and directly updating lp__.
i hope to eventually sort out how to compile from the command line so that i may view the print statements for further debugging, but i wonder whether NUTS breaks down under multivariate hierarchical models specified with relatively non-informative priors.

Ben Goodrich

unread,
Sep 30, 2012, 12:55:34 PM9/30/12
to stan-...@googlegroups.com
On Sunday, September 30, 2012 11:47:37 AM UTC-4, tds151 wrote:
1. working from windows 7, i'm unable to compile the c++ code file generated from stanc from the command line.  the compiler is unable to find module_header.hpp, despite my changing the subdirectory in the #include statement to the correct location (as noted in an earlier post).  so i'm unable to get launched with gdb, just now.

We have a Windows 7 computer that sits in an office and runs the Stan tests everyday, but none of us actually use Windows. Maybe someone on the internet knows what to do. My guess is that the easiest thing is to delete the old Stan and download the new Stan. If you really, really need to conserve hard disk space, there is a possibility of installing rstan through R and then making command-line Stan use the Boost headers from rstan. To do this on my laptop, I would

# execute this in R to find out where rstan lives to utilize below
file
.path(dirname(system.file(package = "rstan")), "rstan", "include", "stanlib")

// download Stan and change this line in STAN_HOME/makefile from
BOOST
?= lib/boost_1.51.0

// to

BOOST ?= /path/to/rstan/include/stanlib/boost_1.51.0

Then you should be able to delete the STAN_HOME/lib/boost_1.51.0 directory.
 
2. you note that the print statements appear under command line runs - either with or without gdb.  that says they should also appear in R to the extent that the Rcpp command is used to migrate the output.  The print outputs don't appear in R for me.  Do they for you?

No, I am not seeing them inside R for some reason that we'll have to get to the bottom of for the next (R)Stan release.
 
 
3. so i fixed the syntax error where a parameter was used before defined that you helpfully uncovered by just moving the statement.  now i receive 2 error alternating error messages.  one notes that some variable is not positive definite.  the other error message (that replaces this one) is that the sampler is unable to find a sufficiently small leapfrog step size.  although my model looks complicated, it's actually a relatively straightforward parametric multivariate sampler wrapped in a glm of a type i've successfuly implemented in JAGS.  it looks complicated because i've employed tips learned from the stan team to deliver better speed and memory management under automated differentiation by taking inverses once and directly updating lp__.
i hope to eventually sort out how to compile from the command line so that i may view the print statements for further debugging, but i wonder whether NUTS breaks down under multivariate hierarchical models specified with relatively non-informative priors.

Can you attach the new .stan file? I will look at it under gdb. To work well, any model may need good priors, but Stan shouldn't completely break down as long as the posterior is proper / continuous / not too heavy-tailed .

Ben

terrance savitsky

unread,
Sep 30, 2012, 4:48:24 PM9/30/12
to stan-...@googlegroups.com
Thanks a lot, Ben, for running gdb on my code.   I attach 2 stan script files.   The first, "...lkj.stan" is my focus.   The second, "...lkj2.stan" is the same as the first, except that for specification of the sets of cutpoints used in the ordinal_logistic distribution.   lkj.stan fixes one member of each vector to 0 for identifiabiilty; lkj2.stan does not.   the multivariate latent trait, c, of the ordinal logistic model is composed from from the addition of multivariate nested objects.  So there are exchangeable multivariate draws at each level.   The structures are identical, except for the highest / first level where a factor analytic construction is employed for one of the two covariances.  this may be where the issue arises.   I have also tried increasing the informativeness of the lkj priors on correlation matrices by increasing the value of the shape parameter, but no change.

Thanks, too, for your tips on configuring my windows set-up to help the compiler find the dependent libraries.   since i know exactly where stan is loaded with the R/libraries/rstan folder structure, I hoped pointing the #include statement in the generated stanc c++ code to the location of module_header.hpp i would make everything go, but the compiler still couldn't find module_header.hpp.  Anyway, I was trying to save harddisk space, but i'm not that desperate.   So I'm using rstan 1.0.2, which I want to keep using, in general.  Currently, the only installation of stan i have is stan 1.0.2 through rstan, so i don't think i need to delete anything.    if i separately download stan, is there a potential conflict with the stan load that comes with the rstan folders?   It just seemed logical to me that to use the stan installation made with rstan since i will most frequently interact with stan via the R layer.   Maybe since you've provided me such detailed instructions, I'll just stick with the stan 1.0.2 that loads with rstan.  I attach another file, "mystanmodel.cpp", so that you may see the code returned by running stanc() within in an R environment.

what computational set-up do most stan developers use?

Terrance.
modelFAcum_lkj2.stan
modelFAcum_lkj.stan
datstan.Rdump
mystanmodel.cpp

Jiqiang Guo

unread,
Sep 30, 2012, 4:55:55 PM9/30/12
to stan-...@googlegroups.com
I would strong recommend using Stan downloaded from mc-stan.org rather than the one included in RStan.  The space you saved does not worth all of our time to make it work. 

And there is no conflict with Stan and RStan if both are installed. 

Last, the cpp code generated using RStan is different from the one that is generated by stanc in Stan since the later would generate a main function. 

--
Jiqiang

terrance savitsky

unread,
Sep 30, 2012, 5:16:21 PM9/30/12
to stan-...@googlegroups.com
okay, that really helps.  Thanks, Jiqiang.  I didn't understand the distinction between how stanc() in rstan and stanc() in stan generate the cpp code.  
After downloading stan and following the windows instructions for unpacking the tarball, using

tar -xzf --no-same-owner stan-src-1.0.2.tgz
I receive errors that it can't find the file and it also doesn't recognize the command --no-same-owner.  so i take part out (since it is noted as optional) and it runs, but i receive a bunch of error messages that "Cannot change ownership to uid 501, gid 20.  (Do I have permission issues on my machine?)   It then exits with a "failure status due to previous errors".

terrance.

Ben Goodrich

unread,
Sep 30, 2012, 5:58:26 PM9/30/12
to stan-...@googlegroups.com, Bob Carpenter
On Sunday, September 30, 2012 4:48:26 PM UTC-4, tds151 wrote:
Thanks a lot, Ben, for running gdb on my code.   I attach 2 stan script files.   The first, "...lkj.stan" is my focus.   The second, "...lkj2.stan" is the same as the first, except that for specification of the sets of cutpoints used in the ordinal_logistic distribution.   lkj.stan fixes one member of each vector to 0 for identifiabiilty; lkj2.stan does not.   the multivariate latent trait, c, of the ordinal logistic model is composed from from the addition of multivariate nested objects.  So there are exchangeable multivariate draws at each level.   The structures are identical, except for the highest / first level where a factor analytic construction is employed for one of the two covariances.  this may be where the issue arises.   I have also tried increasing the informativeness of the lkj priors on correlation matrices by increasing the value of the shape parameter, but no change.

There is a problem before it really gets into the model. The elements of Lambda_s start out reasonable but become nan for some reason that I don't understand, because it is just a matrix of parameters. But there seems to be a problem with Stan also because --test_grad seems to hang. Bob will have to look into this whenever he gets back. Original attachments are at:

https://groups.google.com/d/msg/stan-users/MYdrB11_S5g/lKVzTcPjRtwJ
 
Thanks, too, for your tips on configuring my windows set-up to help the compiler find the dependent libraries.   since i know exactly where stan is loaded with the R/libraries/rstan folder structure, I hoped pointing the #include statement in the generated stanc c++ code to the location of module_header.hpp i would make everything go, but the compiler still couldn't find module_header.hpp.  Anyway, I was trying to save harddisk space, but i'm not that desperate.   So I'm using rstan 1.0.2, which I want to keep using, in general.  Currently, the only installation of stan i have is stan 1.0.2 through rstan, so i don't think i need to delete anything.    if i separately download stan, is there a potential conflict with the stan load that comes with the rstan folders?   It just seemed logical to me that to use the stan installation made with rstan since i will most frequently interact with stan via the R layer.   Maybe since you've provided me such detailed instructions, I'll just stick with the stan 1.0.2 that loads with rstan.  I attach another file, "mystanmodel.cpp", so that you may see the code returned by running stanc() within in an R environment.

What Jiqiang said.

what computational set-up do most stan developers use?

Mac, except for me.

Ben

tgs

unread,
Oct 1, 2012, 8:23:39 AM10/1/12
to stan-...@googlegroups.com
Hi Terrance

This is off topic: But I saw your very complicated model, and my model has some elements of your model (maybe 5%), but my model runs slow in Rstan. I am wondering, your model must take days or weeks to run, or you may have set up stan in a way that makes it run faster by a magnitude, in which case I would be curious to know, if you are willing to share.

Thanks
Toby

terrance savitsky

unread,
Oct 1, 2012, 1:02:53 PM10/1/12
to stan-...@googlegroups.com
Hi Ben, Thanks a lot for taking your time to work with my model.  I appreciate it.  My model was a very planned expedition into stan.   I've been reading documentation and asking questions on the list, along-the-way, to both understand whether stan was possible and to focus the effort.  You guys have been very generous in providing tip and tricks.

Relative to the parameter matrix, Lambda_s, it represents loadings of a factor analytic construction from one of my covariance matrices.   To rotationally identify this model, I must restrict the structure of Lambda_s in some way; that is, to set certain entries to a fixed value.  You may know that a typical approach (Muirhead) restricts the loadings matrix to lower triangular.   I accomplish this restriction by term-by-term multiplying (hadamard or schur product) the D x K, Lambda_s by a D x K "pick" matrix, B, filled with 0's and 1's to enforce the lower triangular restriction.   I then place an informative N(0,1) prior on the elements of Lambda_s.  The idea is that the likelihood doesn't inform the upper triangular elements of Lambda_s, but the proper prior would anyway produce a proper posterior so all should be well.  I ran this idea by Bob and he thought it should work fine.   Of course, this construction probably doesn't explain why the elements of Lambda_s go to nan.  I will have to think about that.

Bob Carpenter

unread,
Oct 1, 2012, 1:16:17 PM10/1/12
to stan-...@googlegroups.com
I believe Marcus Brubaker is also on linux (must have something
to do with Ben and Marcus having coded most of the intense
linear algebra and multivariate distribution code).

Our integration (buildbot) server runs on Windows, with Daniel
Lee in the driver's seat.

Making Stan run on the various versions of g++ and clang++
on the various platforms has been a major time sink. We've
all learned a lot about it, so it should get easier, but we'll
never get over the fact that the compilers vary by platform
and there are lots of platform/compiler combos.

- Bob

terrance savitsky

unread,
Oct 1, 2012, 1:16:42 PM10/1/12
to stan-...@googlegroups.com
Hi Toby, I've not yet been able to get my model to run, though it does compile relatively quickly (even under optimization level -O3).  In my view, my model is not nearly so complicated as it looks.   The unique twist is that parameter objects in this model are multivariate under unknown covariance structures.   Ben explained that using the multivariate Gaussian is parameterized in the covariance matrix.  So the automated differentiator that compose the gradient of the log joint probability will have to invert this matrix on every sampling step.   Further, one of the covariance matrices in my model is composed from a factor analytic structure.  So if my loadings matrix is D x K where K < D, I can use the Woodbury Sherman matrix identity and accomplish a D x D inversion of a covariance matrix through a smaller K x K inversion (with some associated matrix products).   All to say, it should be faster to perform the inversions, once, and then directly update lp__.   

If I get the model to work, I will write back about runtimes.  That said, I'm confused on how the speed will compare to JAGS for similar models.   JAGS is relatively slow and I don't expect stan to be any faster - even after accounting for the greater informativeness of each stan iteration.   The graphical model nodal parameterization of JAGS, however, is a function of the number of observations in my data, so as that goes up, the runtimes go up (with an amount depending, in part, on the relationship between number of observations and number of model parameters).   I was hoping that stan would prove much more scalable in the number of data observations.  If so, I wouldn't have to code my own models for larger datasets.  After all, 2 days per run is worth it if it save me development time (unless i'm running monte carlo simulations).

terrance.

Daniel Lee

unread,
Oct 1, 2012, 1:20:22 PM10/1/12
to stan-...@googlegroups.com
On Mon, Oct 1, 2012 at 1:16 PM, Bob Carpenter <ca...@alias-i.com> wrote:
> On 9/30/12 5:58 PM, Ben Goodrich wrote:
>> On Sunday, September 30, 2012 4:48:26 PM UTC-4, tds151 wrote:
>> what computational set-up do most stan developers use?
>>
>> Mac, except for me.
>
>
> I believe Marcus Brubaker is also on linux (must have something
> to do with Ben and Marcus having coded most of the intense
> linear algebra and multivariate distribution code).
>
> Our integration (buildbot) server runs on Windows, with Daniel
> Lee in the driver's seat.

I'm running in both Windows 7 with g++ 4.6.3 (installed with RTools
2.15) and Mac OS X 10.7 with clang++ 3.1 and g++ 4.6.3. But I'm not
actively running anything through RStan.


> Making Stan run on the various versions of g++ and clang++
> on the various platforms has been a major time sink. We've
> all learned a lot about it, so it should get easier, but we'll
> never get over the fact that the compilers vary by platform
> and there are lots of platform/compiler combos.

True.

Marcus Brubaker

unread,
Oct 1, 2012, 1:23:28 PM10/1/12
to stan-...@googlegroups.com
On Mon, Oct 1, 2012 at 1:16 PM, Bob Carpenter <ca...@alias-i.com> wrote:
>
>
> On 9/30/12 5:58 PM, Ben Goodrich wrote:
>>
>> On Sunday, September 30, 2012 4:48:26 PM UTC-4, tds151 wrote:
>
>
>> what computational set-up do most stan developers use?
>>
>>
>> Mac, except for me.
>
>
> I believe Marcus Brubaker is also on linux (must have something
> to do with Ben and Marcus having coded most of the intense
> linear algebra and multivariate distribution code).

I use both Mac (laptop) and Linux (servers) with both gcc and clang.
However, perhaps somewhat revealingly, I have never even run R and
always feel like I'm reading the result of an obfuscated code contest
when I see it ;)

Cheers,
Marcus

Ben Goodrich

unread,
Oct 1, 2012, 1:43:29 PM10/1/12
to stan-...@googlegroups.com
On Monday, October 1, 2012 1:16:13 PM UTC-4, Bob Carpenter wrote:
On 9/30/12 5:58 PM, Ben Goodrich wrote:
> On Sunday, September 30, 2012 4:48:26 PM UTC-4, tds151 wrote:

>     what computational set-up do most stan developers use?
>
>
> Mac, except for me.

I believe Marcus Brubaker is also on linux (must have something
to do with Ben and Marcus having coded most of the intense
linear algebra and multivariate distribution code).

Let's give Marcus half-credit for OS choice. Anyway, since we have a small army of Mac people, has anyone tried LLDB? Supposedly, it is better than gdb.

Ben

Bob Carpenter

unread,
Oct 1, 2012, 1:55:43 PM10/1/12
to stan-...@googlegroups.com


On 10/1/12 1:16 PM, terrance savitsky wrote:
>..
>
> If I get the model to work, I will write back about runtimes. That said, I'm confused on how the speed will compare to
> JAGS for similar models. JAGS is relatively slow and I don't expect stan to be any faster - even after accounting for
> the greater informativeness of each stan iteration.

When JAGS can use conjugate priors, it applies
proper Gibbs sampling, with independent draws from
the condtional. In those cases, JAGS will be scale
invariant.

NUTS without varying stepsize adaptation is rotation
invariant. NUTS with varying stepsize adaptation (the
default sampler in Stan) is not technically either, but
has a bit of the advantages of both.

If you have a problem that does not have a lot of
correlation among the posterior parameters AND is
conjugate, JAGS will almost certainly be faster than
Stan and continue to be so in the future.

In both JAGS and Stan, it can often help to reparameterize
to remove some of the direct dependence. There's a discussion
of this process for JAGS in both of Andrew's books,
the regression book with Jennifer and Bayesian Data Analysis.
The next edition of BDA will discuss doing this with Stan,
as will our next manual.

We've had a lot of sucess so far in pulling out the means,
as I do in the item-response models in the current user's guide.
I'm writing a new chapter for the user's guide on variable
transforms and reparameterizations that will go into more
detail.

Ben's been outlining code transforms that can help for
multivariates. We also plan to introduce some new data
structures (triangular, symmetric, and cholesky factor
matrices) and vectorize some more operations there to make
Stan faster (we probably have at least a factor of 5--50 or so
combining better partial-eval auto-dif and vectorization).

> ... I was hoping that stan would
> prove much more scalable in the number of data observations. If so, I wouldn't have to code my own models for larger
> datasets. After all, 2 days per run is worth it if it save me development time (unless i'm running monte carlo
> simulations).

I haven't done controlled comparisons, but Stan should
scale much better in terms of how much memory it takes
per variable. That will depend heavily for Stan on
what kinds of probability distributions are used. If
they need iterative linear algebra algorithms, Stan
can wind up taking up a lot of memory, too.

The reimplementations with vectorization will help with
scaling as will partial evals in auto-dif (fewer intermediate
expression template classes to store).

- Bob

Jiqiang Guo

unread,
Oct 1, 2012, 2:16:33 PM10/1/12
to stan-...@googlegroups.com
Back to problems, I got the same problem with tar on windows as well. 

It seems the following order would work:

tar --no-same-owner -xzf stan-src-1.0.2.tgz

Actually, as mc-stan.org/windows-install.html points out that --no-same-owner is not necessary though it will report warnings and say `exits with failure ...'


Jiqiang

terrance savitsky

unread,
Oct 1, 2012, 2:22:19 PM10/1/12
to stan-...@googlegroups.com
It was noted that my model fails with the elements of Lambda_s achieving nan values.   I have been able to rule out my term-by-term multiplication of this product by a matrix composed of 0's and 1's to enforce a lower triangular structure as the cause of the problem.   I just replaced this approach with a hack that employs a hack to directly build a lower triangular matrix and receive the same error.   I will wait for Bob's feedback on whether there is a stan issue that causes --test_grad to hang.  I'm able to find nothing in my model script that would explain my problem.

Ben Goodrich

unread,
Oct 1, 2012, 2:46:53 PM10/1/12
to stan-...@googlegroups.com, Bob Carpenter
On Sunday, September 30, 2012 5:58:26 PM UTC-4, Ben Goodrich wrote:
On Sunday, September 30, 2012 4:48:26 PM UTC-4, tds151 wrote:
Thanks a lot, Ben, for running gdb on my code.   I attach 2 stan script files.   The first, "...lkj.stan" is my focus.   The second, "...lkj2.stan" is the same as the first, except that for specification of the sets of cutpoints used in the ordinal_logistic distribution.   lkj.stan fixes one member of each vector to 0 for identifiabiilty; lkj2.stan does not.   the multivariate latent trait, c, of the ordinal logistic model is composed from from the addition of multivariate nested objects.  So there are exchangeable multivariate draws at each level.   The structures are identical, except for the highest / first level where a factor analytic construction is employed for one of the two covariances.  this may be where the issue arises.   I have also tried increasing the informativeness of the lkj priors on correlation matrices by increasing the value of the shape parameter, but no change.

There is a problem before it really gets into the model. The elements of Lambda_s start out reasonable but become nan for some reason that I don't understand, because it is just a matrix of parameters. But there seems to be a problem with Stan also because --test_grad seems to hang. Bob will have to look into this whenever he gets back. Original attachments are at:

https://groups.google.com/d/msg/stan-users/MYdrB11_S5g/lKVzTcPjRtwJ

I learned from

http://stackoverflow.com/questions/3615724/how-to-trace-nan-in-c

that if you add

#include <fenv.h>

to the includes and add

feenableexcept(-1);

to main(), then in gdb you can do

(gdb) handle SIGFPE stop nopass
(gdb) run

then it should stop when something goes NaN. However, when I do that, I get

Exception: std::bad_alloc
Diagnostic information:
Dynamic exception type: std::bad_alloc
std::exception::what: std::bad_alloc

And when I do


(gdb) catch throw
(gdb) run

then I don't make it to the bad_alloc and instead hit the check_nan() failure.

So, something is very, very wrong, but I don't know what.

Ben

Bob Carpenter

unread,
Oct 1, 2012, 2:48:50 PM10/1/12
to stan-...@googlegroups.com


On 10/1/12 2:16 PM, Jiqiang Guo wrote:
> Back to problems, I got the same problem with tar on windows as well.
>
> It seems the following order would work:
>
> tar --no-same-owner -xzf stan-src-1.0.2.tgz
>
> Actually, as mc-stan.org/windows-install.html <http://mc-stan.org/windows-install.html> points out that --no-same-owner
> is not necessary though it will report warnings and say `exits with failure ...'
>
>
> Jiqiang
>
>
> On Sun, Sep 30, 2012 at 5:16 PM, terrance savitsky <tds...@gmail.com <mailto:tds...@gmail.com>> wrote:
>
> okay, that really helps. Thanks, Jiqiang. I didn't understand the distinction between how stanc() in rstan and
> stanc() in stan generate the cpp code.
> After downloading stan and following the windows instructions for unpacking the tarball, using
>
> tar -xzf --no-same-owner stan-src-1.0.2.tgz
>
> I receive errors that it can't find the file and it also doesn't recognize the command --no-same-owner. so i take
> part out (since it is noted as optional) and it runs, but i receive a bunch of error messages that "Cannot change
> ownership to uid 501, gid 20. (Do I have permission issues on my machine?) It then exits with a "failure status
> due to previous errors".

I'm a bit confused about the context of the error
reports above. We really need to know:

- which version of Windows
- what path you're trying to unpack at
- what the permissions are on that path
- the exact error output you get

Without context, I'd guess the issue is that you need
admin priveleges to unpack the tarball where
you're trying to unpack it. If you're running Windows
Vista or later, I believe you need to start the
DOS shell in admin mode (I think you can right click on
the icon).

If you try it as admin and it still fails, could you
please post back with the above information? I believe
the only Windows platform we have in house is Windows 7,
so we may need some external help to get the instructions
ironed out for other versions.

- Bob

Bob Carpenter

unread,
Oct 1, 2012, 2:52:20 PM10/1/12
to stan-...@googlegroups.com


On 10/1/12 2:22 PM, terrance savitsky wrote:
> It was noted that my model fails with the elements of Lambda_s achieving nan values. I have been able to rule out my
> term-by-term multiplication of this product by a matrix composed of 0's and 1's to enforce a lower triangular structure
> as the cause of the problem. I just replaced this approach with a hack that employs a hack to directly build a lower
> triangular matrix and receive the same error. I will wait for Bob's feedback on whether there is a stan issue that
> causes --test_grad to hang. I'm able to find nothing in my model script that would explain my problem.

Sorry -- didn't realize the ball was in my court.

I'd guess that test_grad is not technically hanging
(for instance, check your activity monitor or equivalent
on your system), but rather grinding away calculating
finite differences. We've found it to be rather slow
for large numbers of parameters with complex ops.

It should get a factor of four or five faster as soon as
I get around to templating out the log_prob function
in the generated models -- then we can use the version without
calculating the gradients as well.

My global suggestion is to try simpler models first and
gradually add structure. For instance, try a diagonal
covariance matrix first and see if you can get that working.
Or fix the variance component at something reasonable and
fit the rest of the model. The specifics, of course, depend
on the model.

- Bob

terrance savitsky

unread,
Oct 1, 2012, 2:53:09 PM10/1/12
to stan-...@googlegroups.com
Jiqiang,  This works for me under Windows 7, and the command unpacks the folders into stan-src-1.0.2.   So I go that folder and then % make bin/libstan.a 
I receive some warnings, but it seems to work.  Then i issue the command 
% make bin/stanc

but then I receive an error message that there is no rule to make target bin/stanc and it stops.  Any ideas?

terrance

Jiqiang Guo

unread,
Oct 1, 2012, 2:55:06 PM10/1/12
to stan-...@googlegroups.com
On windows, in the stan home directory, if you just execute make, which will give you a help page, it will tell you that you should do 

make bin/stanc.exe

Jiqiang

Bob Carpenter

unread,
Oct 1, 2012, 2:59:34 PM10/1/12
to stan-...@googlegroups.com


On 10/1/12 2:55 PM, Jiqiang Guo wrote:
> On windows, in the stan home directory, if you just execute make, which will give you a help page, it will tell you that
> you should do
>
> make bin/stanc.exe

Yet another annoying way in which Windows is
slightly different for C++. (I actually like
the Windows convention better, all else being
equal. But the varying behaviors across platforms
makes these things an absolute nightmare to doc
and test.)

Any suggestions on how to make the doc clearer
would be greatly appreciated. I tried very hard
to point out the Windows variants of all the command-line
behavior, but I probably missed some important
qualifications somewhere given all the trouble people
seem to be having with Windows installs.

- Bob


>
> Jiqiang
>
> On Mon, Oct 1, 2012 at 2:53 PM, terrance savitsky <tds...@gmail.com <mailto:tds...@gmail.com>> wrote:
>
> Jiqiang, This works for me under Windows 7, and the command unpacks the folders into stan-src-1.0.2. So I go that
> folder and then % make bin/libstan.a
> I receive some warnings, but it seems to work. Then i issue the command
>
> % make bin/stanc
>
>
> but then I receive an error message that there is no rule to make target bin/stanc and it stops. Any ideas?
>
> terrance
>
> On Mon, Oct 1, 2012 at 11:16 AM, Jiqiang Guo <guo...@gmail.com <mailto:guo...@gmail.com>> wrote:
>
> Back to problems, I got the same problem with tar on windows as well.
>
> It seems the following order would work:
>
> tar --no-same-owner -xzf stan-src-1.0.2.tgz
>
> Actually, as mc-stan.org/windows-install.html <http://mc-stan.org/windows-install.html> points out that

terrance savitsky

unread,
Oct 1, 2012, 2:59:34 PM10/1/12
to stan-...@googlegroups.com
okay, thanks jiqiang.  I've been following the instructions on the web and in the manual.  both have a couple of  little errors - some obvious to me, others not.  when you write "stan home directory" , I assume you mean stan-src-1.0.2 directory.   

terrance savitsky

unread,
Oct 1, 2012, 3:10:24 PM10/1/12
to stan-...@googlegroups.com
Okay, thanks.  I will simplify and see what happens.  That said, the model isn't very complex.  It looks complex because I employ statements to promote better memory management and speed under the automated differentiator which I learned from Ben.   Alternatively, I could just employ multivariate Gaussian priors, but the runtime would probably be unacceptably slow.   This model is an update of one I've run in JAGS.   the extension of the JAGS model structures covariance matrices by effectively using kronecker products.  (Actually, I just use a matrix variate prior, which is the equivalent).  JAGS doesn't recognize the matrix variate normal, so that this model would be non-conjugate for the covariance matrices, which it cannot handle.

I have tried increasing the value for the shape parameter under the lkj prior on correlation matrices - with no change in failure mode.

I'm hoping the error is on my end b/c stan should be able to handle something like this.

Bob Carpenter

unread,
Oct 1, 2012, 3:12:48 PM10/1/12
to stan-...@googlegroups.com


On 10/1/12 2:59 PM, terrance savitsky wrote:
> okay, thanks jiqiang. I've been following the instructions on the web and in the manual. both have a couple of little
> errors - some obvious to me, others not.

Please let us know where you think there are errors.
We really do want to make this as easy as possible
for users. There's nothing I find more frustrating than not
being able to (re)install software and would rather not
inflict that misery on others.

> when you write "stan home directory" , I assume you mean stan-src-1.0.2
> directory.

We mean the one with the makefile in it. Here's the relevant
section from the 1.0.2 manual:

-------------------
2.3. Building Stan

Building Stan itself works the same way across platforms.
To build Stan, first open a command-line terminal application.
Then change directories to the directory in which Stan is
installed (i.e., the directory containing the file named makefile).

% cd <stan-home>
---------------------

I then repeat that <stan-home> is the directory with the makefile
in it, but apparently not enough!

By the way, I found this by searching for "home" in the PDF.

- Bob


>
> On Mon, Oct 1, 2012 at 11:55 AM, Jiqiang Guo <guo...@gmail.com <mailto:guo...@gmail.com>> wrote:
>
> On windows, in the stan home directory, if you just execute make, which will give you a help page, it will tell you
> that you should do
>
> make bin/stanc.exe
>
> Jiqiang
>
>
> On Mon, Oct 1, 2012 at 2:53 PM, terrance savitsky <tds...@gmail.com <mailto:tds...@gmail.com>> wrote:
>
> Jiqiang, This works for me under Windows 7, and the command unpacks the folders into stan-src-1.0.2. So I go
> that folder and then % make bin/libstan.a
> I receive some warnings, but it seems to work. Then i issue the command
>
> % make bin/stanc
>
>
> but then I receive an error message that there is no rule to make target bin/stanc and it stops. Any ideas?
>
> terrance
>
> On Mon, Oct 1, 2012 at 11:16 AM, Jiqiang Guo <guo...@gmail.com <mailto:guo...@gmail.com>> wrote:
>
> Back to problems, I got the same problem with tar on windows as well.
>
> It seems the following order would work:
>
> tar --no-same-owner -xzf stan-src-1.0.2.tgz
>
> Actually, as mc-stan.org/windows-install.html <http://mc-stan.org/windows-install.html> points out that

Bob Carpenter

unread,
Oct 1, 2012, 3:16:48 PM10/1/12
to stan-...@googlegroups.com


On 10/1/12 3:05 PM, Ben Goodrich wrote:
> On Mon, Oct 1, 2012 at 2:56 PM, Bob Carpenter <ca...@alias-i.com <mailto:ca...@alias-i.com>> wrote:
>
> One often sees different behavior in debug code
> than in live code, especially under optimization
> or multi-threading situations.
>
>
> I compiled it with -O0.
>
>
> > There is a problem before it really gets into the model. The elements of Lambda_s start out reasonable but become
> > nan for some reason that I don't understand, because it is just a matrix of parameters. But there seems to be a
> > problem with Stan also because --test_grad seems to hang. Bob will have to look into this whenever he gets back.
>
> Could you try, for instance, recording what the
> matrix of params is and whether it's positive
> definite outside of Stan?
>
>
> The Lambda_s is not a covariance matrix. It is just a matrix. The covariance matrix is a function of Lambda_s.
>
> Lambda_s * Lambda_s' + ADiagonalMatrix

Same issue. Can you do something simpler that does
work and build up to the more complex model? It's
the easiest way to debug these models and I don't really
have enough understanding of your models or enough time
to do it for you.

> And Lambda_s starts out fine. Here is one random init of it
>
> Lambda_s= [ -0.481748:0 -1.86276:0 0.634311:0, -0.239152:0 0.532343:0 -1.97019:0, -1.23689:0 1.60545:0
> -1.14489:0, 1.37424:0 -0.580043:0 1.20517:0, -0.11951:0 1.06162:0 1.20734:0, -1.87962:0 -1.49004:0
> 0.916584:0, -0.283664:0 -0.554244:0 -0.418359:0, -1.11854:0 -0.190424:0 -0.667752:0, 0.823483:0
> 1.583:0 0.816803:0, -0.661565:0 -1.62819:0 0.473345:0]
>
> I don't have enough experience with covariance matrix
> algebra myself to have a good idea of what circumstances
> induce NaN values. In general, NaN comes up if you
> take logs of negative numbers, divide zero by zero or
> divide two infinite values.
>
>
> I don't think there are any logs or divisions involved here.

There are logs implicit everywhere in Stan because we use log
probabilities. There are probably divisions in the solvers
used over the covariance matrices you produce that are based
on Lambda_s.

> I am guessing that the gradient goes NaN somehow and that
> propagates. That guess is based on the fact that --test_grad hangs.

That won't cause --test_grad to hang. It'll
report NaN. That's why we wrote it.

As I said before, I'm guessing test_grad is just
slow, not hanging.

It can take as much time to run test_grad as to run
10s, 100s or even 1000s of iterations of Stan depending
on the posterior geometry and the number of params.

- Bob

terrance savitsky

unread,
Oct 1, 2012, 3:25:31 PM10/1/12
to stan-...@googlegroups.com
Page 5 of the manual, under Building stan, the command suggested is %make bin/stanc (rather than %make bin/stanc.exe).   The on-line windows-specific instructions incorrectly lists the stan tarball as - stan-src.1.m.p.tgz (instead of stan-src-1.m.p.tgz).  The manual is correct.  

More in the way of confusion, the manual refers to the stan home directory.   the directory i have after extracting from the .tgz file is stan-src-1.0.2 and making libstan.a and stanc.  I interpret a "home directory" as some type of installation would take place that produces a directory named "stan".  that's apparently not how the process works.   While not in the directions for Windows, I assume I need to perform an update to the PATH environment variable in order to compile a .stan file (when not in stan-src-1.0.2).

It also bears noting that the new "print" command doesn't print to the R layer for those compiling and running from there.  

Ben Goodrich

unread,
Oct 1, 2012, 4:27:23 PM10/1/12
to stan-...@googlegroups.com, Bob Carpenter

I think I forgot to recompile fully before. Now when I do this, the floating point exception seems to show up in generate_uniform_real()

goodrich@CYBERPOWERPC:/tmp/Terrance$ gdb --args ./modelFAcum_lkj --data=datstan.Rdump
GNU gdb
(GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /tmp/Terrance/modelFAcum_lkj...done.
(gdb) run
Starting program: /tmp/Terrance/modelFAcum_lkj --data=datstan.Rdump
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGFPE, Arithmetic exception.
0x0000000000476a5e in boost::random::detail::generate_uniform_real<boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014u, 0u, 2147483563u>, boost::random::linear_congruential_engine<unsigned int, 40692u, 0u, 2147483399u> >, double> (eng=..., min_value=-2, max_value=2)
    at lib
/boost_1.51.0/boost/random/uniform_real_distribution.hpp:61
61              T result = numerator / divisor * (max_value - min_value) + min_value;
(gdb) bt
#0  0x0000000000476a5e in boost::random::detail::generate_uniform_real<boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014u, 0u, 2147483563u>, boost::random::linear_congruential_engine<unsigned int, 40692u, 0u, 2147483399u> >, double> (eng=..., min_value=-2, max_value=2)
    at lib
/boost_1.51.0/boost/random/uniform_real_distribution.hpp:61
#1  0x000000000046d0d1 in boost::random::detail::generate_uniform_real<boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014u, 0u, 2147483563u>, boost::random::linear_congruential_engine<unsigned int, 40692u, 0u, 2147483399u> >, double> (eng=..., min_value=-2, max_value=2)
    at lib
/boost_1.51.0/boost/random/uniform_real_distribution.hpp:71
#2  0x0000000000458227 in boost::random::uniform_real_distribution<double>::operator()<boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014u, 0u, 2147483563u>, boost::random::linear_congruential_engine<unsigned int, 40692u, 0u, 2147483399u> > > (this=0x7fffffffdda8, eng=...)
    at lib
/boost_1.51.0/boost/random/uniform_real_distribution.hpp:191
#3  0x0000000000447143 in boost::random::variate_generator<boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014u, 0u, 2147483563u>, boost::random::linear_congruential_engine<unsigned int, 40692u, 0u, 2147483399u> >&, boost::random::uniform_real_distribution<double> >::operator() (this=0x7fffffffdda0)
    at lib
/boost_1.51.0/boost/random/variate_generator.hpp:73
#4  0x000000000043911e in stan::gm::nuts_command<modelFAcum_lkj_model_namespace::modelFAcum_lkj_model> (argc=2, argv=0x7fffffffe2c8) at src/stan/gm/command.hpp:351
#5  0x000000000040f31e in main (argc=2, argv=0x7fffffffe2c8) at /tmp/Terrance/modelFAcum_lkj.cpp:1575
(gdb) frame 0
#0  0x0000000000476a5e in boost::random::detail::generate_uniform_real<boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014u, 0u, 2147483563u>, boost::random::linear_congruential_engine<unsigned int, 40692u, 0u, 2147483399u> >, double> (eng=..., min_value=-2, max_value=2)
    at lib
/boost_1.51.0/boost/random/uniform_real_distribution.hpp:61
61              T result = numerator / divisor * (max_value - min_value) + min_value;
(gdb) print numerator
$1
= 402877394
(gdb) print divisor
$2
= 2147483562
(gdb) print max_value
$3
= 2
(gdb) print min_value
$4
= -2
(gdb) print numerator / divisor
$5
= 0.18760441343019693


But everything looks okay here. How can this be?

Ben

Bob Carpenter

unread,
Oct 1, 2012, 4:33:52 PM10/1/12
to stan-...@googlegroups.com
Thanks! Specific comments inline.

On 10/1/12 3:25 PM, terrance savitsky wrote:
> Page 5 of the manual, under Building stan, the command suggested is %make bin/stanc (rather than %make bin/stanc.exe).

Fixed.

> The on-line windows-specific instructions incorrectly lists the stan tarball as - stan-src.1.m.p.tgz (instead of
> stan-src-1.m.p.tgz). The manual is correct.

Fixed.

> More in the way of confusion, the manual refers to the stan home directory. the directory i have after extracting from
> the .tgz file is stan-src-1.0.2

The directory into which something unpacks is
usually the "home" directory.

> and making libstan.a and stanc.

These should be in the home/bin

> I interpret a "home directory" as some type of
> installation would take place that produces a directory named "stan". that's apparently not how the process works.

This is done differently by different systems.
Some pull out just the name (as R does), some pull
out name/version and some just name-version.

We decided to go the name-version route.

This is a common strategy that's employed by
both Boost and Eigen (as well as all of the Java
Apache projects and Java itself).

> While not in the directions for Windows, I assume I need to perform an update to the PATH environment variable in order
> to compile a .stan file (when not in stan-src-1.0.2).

I'd suggest building from <stan-home> using the makefile.
Our makefile doesn't work from other paths (though we
hope to fix that in the future).

If you want to build without the makefile, then
yes, you need to put <stan-home>/bin/stanc on the PATH
to translate a .stan file to C++ using stanc.
And you also need to change the library paths or
create environment variables for those relative
to STAN_HOME.

A common approach is to define a variable
STAN_HOME that points to the directory containing
the makefile into which Stan unpacks.

Then add %STAN_HOME%\bin to the path in Windows.

You can also define an EIGEN_LIB variable as
%STAN_HOME%/lib/..., etc.

> It also bears noting that the new "print" command doesn't print to the R layer for those compiling and running from there.

This should work in Stan 1.0.2. Jiqiang just
plumbed in the R stdout and stderr.

How are you running R?

- Bob

Marcus Brubaker

unread,
Oct 1, 2012, 4:45:03 PM10/1/12
to stan-...@googlegroups.com
On Mon, Oct 1, 2012 at 4:33 PM, Bob Carpenter <ca...@alias-i.com> wrote:
> Thanks! Specific comments inline.
>
>
> On 10/1/12 3:25 PM, terrance savitsky wrote:
>>
>> Page 5 of the manual, under Building stan, the command suggested is %make
>> bin/stanc (rather than %make bin/stanc.exe).
>
>
> Fixed.

This is, of course, platform specific. Does it make sense to have a
generic make target (all is traditional, but anything really) which
chooses the right platform specific files?

I'm not sure how useful it is, but I've attached a really basic
makefile that someone could use to build a model outside of the stan
source directory. It's all very simple make syntax so it should be
straightforward for people to modify it for their uses.

Cheers,
Marcus
makefile

terrance savitsky

unread,
Oct 1, 2012, 4:49:01 PM10/1/12
to stan-...@googlegroups.com
Thanks for fixing so rapidly, Bob.  Some comments:

1. So stan_src_1.0.2 is my STAN-HOME directory.

2.When I want to compile a .stan file located in another directory, I interpret your comment as suggesting I copy the .stan file over to STAN-HOME.  fair enough.

terrance.

Daniel Lee

unread,
Oct 1, 2012, 5:07:09 PM10/1/12
to stan-...@googlegroups.com
On Mon, Oct 1, 2012 at 4:45 PM, Marcus Brubaker
<marcus.a...@gmail.com> wrote:
> This is, of course, platform specific. Does it make sense to have a
> generic make target (all is traditional, but anything really) which
> chooses the right platform specific files?
>
> I'm not sure how useful it is, but I've attached a really basic
> makefile that someone could use to build a model outside of the stan
> source directory. It's all very simple make syntax so it should be
> straightforward for people to modify it for their uses.

Right now, if your current directory is the Stan home directory, you
can build a stan model in any (writable) location. If the stan file is
in foo/bar/model.stan, you can build the model by typing:

In Linux / mac:
make foo/bar/model

Windows:
make foo/bar/model.exe

When I have time, I'll incorporate some of what you're suggesting. We
already have it on our TO-DO list. I am reluctant to have users set
environment variables. That adds another step when installing /
updating Stan.

Ideally, when I do get to rewriting the makefile, I want to do as Bob
originally wrote it into the manual. That is, from the directory with
the stan model file, be able to type:
make -f <stan-home>/makefile model

That require some makefile craziness to pull off (automatic variable
setting, determining the path of the makefile called, probably have to
deal with secondary targets, etc). I would prefer to do this without
having to depend on gnu make specific functions.

Hopefully we can do that sometime soon.



Daniel

Bob Carpenter

unread,
Oct 1, 2012, 5:22:36 PM10/1/12
to stan-...@googlegroups.com

On 10/1/12 4:49 PM, terrance savitsky wrote:
> Thanks for fixing so rapidly, Bob.

I should say that I haven't pushed the new web
page out yet and I'll wait for our next release
to push out a new manual.

> Some comments:
>
> 1. So stan_src_1.0.2 is my STAN-HOME directory.

Yes. The directory in which the makefile resides
is the STAN-HOME directory. There's nothing special
about that name, I just needed some way to refer to it.

> 2.When I want to compile a .stan file located in another directory, I interpret your comment as suggesting I copy the
> .stan file over to STAN-HOME. fair enough.

Daniel beat me to the punch with a much more
coherent answer here.

- Bob

terrance savitsky

unread,
Oct 1, 2012, 5:51:41 PM10/1/12
to stan-...@googlegroups.com
I have taken Bob's advice and successively whittled down the model to center around the elements that are causing the sampling errors.  I attach a much-reduced model - with nearly all of the computational efficiencies removed for quick readability.   Associated data are also attached.  This reduced model produces the same errors as we've previously encountered; in particular, the sampler is not able to find a sufficiently small step size.  i have occasionally also received the error message that a matrix is not symmetric.  I would sincerely appreciate your having one more look ...

The model employs 2 sets of multivariate ordinal responses, y1[N1,D] and y2[N2,D] for each of two measurement modes.   These data are modeled in an ordinal logistic model with ordered cutpoints, gamma1[D,ncut], gamma2[D,ncut] and a common set of intrinsic traits, matrix[D,2] Delta[T].   Each column represents a mode.   The Delta[T] are modeled as zero mean matrix variate normal with density kernel (up to a proportionality constant), |S_m|^{-D} |S_d|^{-M} exp(-0.5 trace[ S_d^{-1} Delta[t] S_m^{-1} Delta[t]').   Covariance matrices S_d are D x D and S_m are (M=2) x (M=2). Finally, S_d = Lambda * Lambda' + diag(the_del), is composed under a factor analytic structure.

Here's what I think is happening:   the sampler is unable to retain numerical stability for a covariance matrix constructed in a factor analytic manner and the resulting computed covariance matrix, S_d, is violating symmetric, positive definiteness.   I already have informative N(0,1) priors on Lambda.  I tried an alternative to restrict the signs to positive, but no effect.  The 2 x 2 S_m covariance matrix receives an lkj prior.  I tried replacing that with an inv_wishart with no change in results.

If the problem is what I think it is, we're seeing a curse of dimensionality problem expressing itself in numerical instability.   Conjugacy restricts posterior objects to readily computed algebraic forms which helps stability.  Similarly, sequential scan sampling - block-by-block - probably also produces a more numerically stable sampler than jointly sampling the entire parameter space in each metropolis step.

Thanks for your help,

Terrance.

datstan_test.Rdump
modelFAcum_test2.stan

Bob Carpenter

unread,
Oct 1, 2012, 6:37:29 PM10/1/12
to stan-...@googlegroups.com
I'll take a look when I get some time. Thanks
for trimming down the model.

We do have other users and other things to do, though!

Some comments below.

On 10/1/12 5:51 PM, terrance savitsky wrote:
> I have taken Bob's advice and successively whittled down the model to center around the elements that are causing the
> sampling errors. I attach a much-reduced model - with nearly all of the computational efficiencies removed for quick
> readability. Associated data are also attached. This reduced model produces the same errors as we've previously
> encountered; in particular, the sampler is not able to find a sufficiently small step size. i have occasionally also
> received the error message that a matrix is not symmetric. I would sincerely appreciate your having one more look ...
>
> The model employs 2 sets of multivariate ordinal responses, y1[N1,D] and y2[N2,D] for each of two measurement modes.
> These data are modeled in an ordinal logistic model with ordered cutpoints, gamma1[D,ncut], gamma2[D,ncut] and a common
> set of intrinsic traits, matrix[D,2] Delta[T]. Each column represents a mode. The Delta[T] are modeled as zero mean
> matrix variate normal with density kernel (up to a proportionality constant), |S_m|^{-D} |S_d|^{-M} exp(-0.5 trace[
> S_d^{-1} Delta[t] S_m^{-1} Delta[t]'). Covariance matrices S_d are D x D and S_m are (M=2) x (M=2). Finally, S_d =
> Lambda * Lambda' + diag(the_del), is composed under a factor analytic structure.
>
> Here's what I think is happening: the sampler is unable to retain numerical stability for a covariance matrix
> constructed in a factor analytic manner and the resulting computed covariance matrix, S_d, is violating symmetric,
> positive definiteness.

Symmetry should be under your direct control if
you construct the covariance matrix.

One option is to copy one side into the other:

for (m in 1:rows(S_d))
for (n in (m+1):cols(S_d))
S_d[m,n] = S_d[n,m];

though I don't see how defining

S_d = Lambda * Lambda' + diag(the_del)

is going to break symmetry.


> I already have informative N(0,1) priors on Lambda. I tried an alternative to restrict the
> signs to positive, but no effect.

I don't know what you mean by "restrict". Just
adding truncation won't do it -- you need to declare
the variables as constrained. Also, by "no effect"
do you mean "didn't solve the problem" or "didn't
restrict to positive"?

> The 2 x 2 S_m covariance matrix receives an lkj prior. I tried replacing that with
> an inv_wishart with no change in results.
>
> If the problem is what I think it is, we're seeing a curse of dimensionality problem expressing itself in numerical
> instability. Conjugacy restricts posterior objects to readily computed algebraic forms which helps stability.
> Similarly, sequential scan sampling - block-by-block - probably also produces a more numerically stable sampler than
> jointly sampling the entire parameter space in each metropolis step.

My own speculation on why things break is almost always wrong.

We haven't run into any critical numerical issues we haven't
been able to solve yet, but then we haven't really hammered
on covariance matrices either.

I should change the test_grad to print out incrementally.
As is, it computes all the gradients then returns the answer.
At least this would let people know it's slow, not hung,
as well as providing some useful feedback.

- Bob

Bob Carpenter

unread,
Oct 1, 2012, 7:19:36 PM10/1/12
to stan-...@googlegroups.com
I played around a bit with the new version and it's
indeed very unstable.

I can get it to run as follows:

/Users/carp/temp2/savitsky2/model --data=/Users/carp/temp2/savitsky2/data.dump.R --epsilon=0.01 --leapfrog_steps=10
--iter=100 --seed=10

(I renamed your model and data file).

By setting leapfrog_steps=10, I'm turning off the
NUTS adaptation. If I take that out, it takes many
more steps to turn around than 10. By setting --epsilon=0.01
I'm providing a reasonable step size (discovered by trial
and error).

Suspiciously many of the parameters are 0, which should almost
never happen in a well-defined model. There are so many params
in this model (a single draw takes several dense pages to print) that
it's still very hard to debug. I would try to figure out what these
parameters are -- if they're things that aren't being used, that
can mess up HMC and NUTS, and also our adaptation.

If I try larger step sizes or to let NUTS adapt, positive definiteness
begins to fail due to zero entries on the diagonal of the covariance
matrices. This may have to do with all the zeroes I see in the
parameters.

I would suggest running as above and trying to track
down why some parameters are zero.

On our side, I need to figure out why that failure to adapt
method shows up so quickly. It doesn't look like that should
be happening from the code. I'll try to find some time to
look at this tomorrow.

Sorry this is so harrowing, but you have a really complex
model (I know you think it's relatively simple, but I'm counting
number of parameters, which are in the thousands, and model complexity,
which involves fiddling with covariance matrices).

I would again urge you to try to think about building up
from simpler models rather than down from something this
complex.

- Bob

On 10/1/12 5:51 PM, terrance savitsky wrote:
> https://groups.google.com/d/__msg/stan-users/MYdrB11_S5g/__lKVzTcPjRtwJ
> http://stackoverflow.com/__questions/3615724/how-to-__trace-nan-in-c
> <http://stackoverflow.com/questions/3615724/how-to-trace-nan-in-c>
>
> that if you add
>
> |#include <fenv.h>
>
>
> to the includes and add
>
> ||feenableexcept(-1);
>
> to main(), then in gdb you can do
>
> (gdb) handle SIGFPE stop nopass
> (gdb) run
>
> then it should stop when something goes NaN. However, when I do that, I get
>
>
> Exception: std::bad_alloc
> Diagnostic information:
> Dynamic exception type: std::bad_alloc
> std::exception::what: std::bad_alloc
>
> And when I do|
>
> |(gdb) catch throw|
> |(gdb) run
>
>
> then I don't make it to the bad_alloc and instead hit the check_nan() failure.
>
> So, something is very, very wrong, but I don't know what.
> |
>
>
> I think I forgot to recompile fully before. Now when I do this, the floating point exception seems to show up in
> |generate_uniform_real()|
>
> |
> goodrich@CYBERPOWERPC:/tmp/Terrance$ gdb --args ./modelFAcum_lkj --data=datstan.Rdump
> GNU gdb (GDB) 7.4.1-debian
> Copyright (C) 2012 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html <http://gnu.org/licenses/gpl.html>>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/ <http://www.gnu.org/software/gdb/bugs/>>...
> $2 = 2147483562 <tel:2147483562>

terrance savitsky

unread,
Oct 1, 2012, 7:55:15 PM10/1/12
to stan-...@googlegroups.com
Bob,  I'm really sorry for bugging you guys so much.   The reason I'm persisting here is that my hunch tells me the problem is not with specification of my probability model.   I understand you will need to let this go relatively soon, however.

I think the problem is with this step:   S_d =  Lambda * Lambda' + diag(the_del).    My sense from stripping away everything is that the Lambda*Lambda' construction becomes numerically imprecise under joint sampling such that asymmetry is introduced under a relatively low magnitude diag(the_del).

Now, i can't simply hard set S_d to be symmetric if I want to implement my model (e.g. a factor analytic structure).

When I tried to sample Lambda_s as truncated, I performed a restricted declaration to conform with the truncation.  I wanted to see if restricting the sign to be positive would matter.  It didn't.

Ben Goodrich

unread,
Oct 1, 2012, 8:43:30 PM10/1/12
to stan-...@googlegroups.com
On Monday, October 1, 2012 6:37:25 PM UTC-4, Bob Carpenter wrote:
Symmetry should be under your direct control if
you construct the covariance matrix.

One option is to copy one side into the other:

   for (m in 1:rows(S_d))
     for (n in (m+1):cols(S_d))
        S_d[m,n] = S_d[n,m];

though I don't see how defining

   S_d =  Lambda * Lambda' + diag(the_del)

is going to break symmetry.

It can happen numerically, but it probably means the check for symmetry doesn't have enough tolerance. Since Lambda is triangular, you could use the multiply_lower_tri_self_transpose(Lambda) function or (in git's master branch) tcrossprod(Lambda). Either of those should preserve symmetry.

Ben

Ben Goodrich

unread,
Oct 1, 2012, 9:00:43 PM10/1/12
to stan-...@googlegroups.com
 
Nevermind. multiply_lower_tri_self_transpose() seems to require a square matrix. But tcrossprod() from master works.

Ben

Ben Goodrich

unread,
Oct 1, 2012, 9:43:39 PM10/1/12
to stan-...@googlegroups.com
 
With tcrossprod(), I can get it to run 2000 iterations eventually, but it doesn't move.

samples <- read.csv("samples.csv", comment.char="#")
> unique(samples[,1])
[1] -130083
> nrow(unique(samples))
[1] 1

Ben


Bob Carpenter

unread,
Oct 2, 2012, 3:34:01 AM10/2/12
to stan-...@googlegroups.com


On 10/1/12 7:55 PM, terrance savitsky wrote:
> Bob, I'm really sorry for bugging you guys so much. The reason I'm persisting here is that my hunch tells me the
> problem is not with specification of my probability model.

It's often hard to tell if it's the model or the implementation.
Often the model's conceptually right, but unimplementable in some
software, in which case you can pretty much only change the
model (or the software to implement it, of course).

> I understand you will need to let this go relatively soon,
> however.
>
> I think the problem is with this step: S_d = Lambda * Lambda' + diag(the_del). My sense from stripping away
> everything is that the Lambda*Lambda' construction becomes numerically imprecise under joint sampling such that
> asymmetry is introduced under a relatively low magnitude diag(the_del).

The asymmetry problem we should be able to solve.

For a start, how about just implementing the following assignment
to S_d in a loop:

Lambda <- Lambda_s .* B; // element-by-element multiplication to lower triangular
S_d <- Lambda * Lambda' + diag_matrix(the_del);

You'll be guaranteed symmetry that way if you just calculate
the entry S_d[m,n] then copy it to S_d[n,m] following the approach
in the C++ code for lower triangular self-multiply in:
src/stan/agrad/matrix.hpp.

> Now, i can't simply hard set S_d to be symmetric if I want to implement my model (e.g. a factor analytic structure).

I don't see why not. It should be OK for the derivatives
because of how everything would get constructed. You'd
lose some terms in the expression tree, but you'd have their
copies replace them.

As Ben pointed out, we're going to implement some triangular
self-multiplies. I should generalize the multiply-self-transpose
function to non-square matrices -- I was only thinking about
Cholesky factors when I implemented it the first time.

> When I tried to sample Lambda_s as truncated, I performed a restricted declaration to conform with the truncation. I
> wanted to see if restricting the sign to be positive would matter. It didn't.

OK. Good. (Well not good, but the right thing
to do.)

- Bob

Marcus Brubaker

unread,
Oct 2, 2012, 11:17:12 AM10/2/12
to stan-...@googlegroups.com
To symmetrize a matrix it is generally best/easiest to do:

S_d <- 0.5*(S_d + S_d')

That said, the expressions being used to form S_d should always result
in a symmetric matrix. However, the symmetry test could be failing if
some of the entries of S_d were not finite.

I'll try to take a closer look at this later if I have a moment.

Cheers,
Marcus

terrance savitsky

unread,
Oct 2, 2012, 5:12:10 PM10/2/12
to stan-...@googlegroups.com
I'd appreciate one more pass at having your input.   I'd previously sent you a simplified version of my model that continued to demonstrate the same runtime errors.   I'd suspected the factor analytic construction for one of the covariance matrices was the culprit.  So I completely removed it from this attached version of an even more simplified approach that replaces the factor analytic covariance composition with an lkj prior on the correlation matrix.   I *still* get runtime errors about lack of symmetry.  A typical error message reads:

Error in function stan_fit198c4de07c3f_modelSimpleOrd_test_namespace::write_array(d): y is not symmetric. y[0,1] is -320905.95642820111, but y[1,0] element is -320906

It now (unlike earlier) almost seems just outside of tolerance, so then I hard-restrict the covariances to be symmetric and then receive an error that no acceptably small stepsize may be found.  I've tried more informative prior formulations for everything and nothing changes.  (I thought maybe the model was under-powered and the relatively broad priors may produce a nearly improper posterior, but this doesn't seem to be the case).

The only unusual remaining facet of my model is employment of the matrix-variate normal prior for D x 2 parameter matrices, Delta[t], t = 1,...,T,  which I employ to update lp__.   I use it all the time in my own model coding, but in a conjugate fashion.   I will later replace the matrix-variate construction with a multi-variate normal prior by vectorizing delta and hand-cranking a kronecker product of the 2 covariances.   that said, these two formulations should produce the exact same increments to lp__.  

terrance.
datstan_test.Rdump
modelSimpleOrd_test.stan

Marcus Brubaker

unread,
Oct 2, 2012, 7:08:32 PM10/2/12
to stan-...@googlegroups.com
I've tracked the issue down. There are two issues at play.

One is there appears to be a bug with the step size selection code.
I'm not sure what's going on exactly, but turning it off (setting a
fixed step size with epsilon=0.0001 for instance) allows sampling to
proceed. (I'm not sure if this is exposed through the R interface or
not.) Matt, do you have time to check whats happening with the
stepsize adaption in this case? The fixed epsilon works just fine.

The other issue is that inverse is not certain to return a symmetric
matrix when called on a symmetric matrix. As a result Sd_inv (and
maybe Sm_inv) would become asymmetric after a point due to numerical
issues. This is numerical precision issue which could be worked
around explicitly if you wanted, but the easy fix is to remove the
declarations of Sd_inv and Sm_inv from the transformed parameters
block and put them in the model block as local variables. There may
also be an issue with symmetry testing not selecting thresholds
appropriately but the fact that this is being hit likely reflects that
the covariance matrices S_d (and/or S_m) are becoming singular.

Hope this helps.

As an aside, Bob, how hard would be it to catch transformation
exceptions when they're thrown in the model code and report the
variable name they arose on? Alternatively could we propogate the
model level expressions and/or variable names down through the calls
in case exceptions are thrown? The hardest part of debugging this was
figuring out exactly which matrix was failing the symmetry test which
required the use of gdb.

Cheers,
Marcus

terrance savitsky

unread,
Oct 2, 2012, 7:36:47 PM10/2/12
to stan-...@googlegroups.com
very helpful, Marcus.  Thanks much.   My grand opus monster model does define the inverses as local (unrestricted) parameter matrices.   I purposely moved them into the transformed parameters block to trip an error message if symmetry were violated.   Once I move them back into the models block (in my reduced model), the asymmetry error is replaced by the sampler indicating it is not able to find a sufficiently small step size to gain acceptance of the proposal.   I believe you address this latter point in your comment on step size selection code.

All said, what do you think causes the numerical imprecision in the inverse computations?   The two covariance matrices in my model are relatively low dimension -  one is 10 x 10 and the other, 2 x 2.   Is there some drift introduced in the repeated gradient computations in the automated differentiator?  I've built gaussian process models, before, where it is standard to add a diagonal "nugget" to stabilize inversion under high dimensions, but that doesn't seem the case here.   (I had also used a simpler version of HMC based on analytic differentiation).

It would be super helpful to have variable names in error messages.  That said, the "print" statement will help, though that is not yet implemented to propagate print outputs to the R layer.

terrance.

Ben Goodrich

unread,
Oct 2, 2012, 7:59:25 PM10/2/12
to stan-...@googlegroups.com
On Tuesday, October 2, 2012 7:08:34 PM UTC-4, Marcus wrote:
One is there appears to be a bug with the step size selection code.
I'm not sure what's going on exactly, but turning it off (setting a
fixed step size with epsilon=0.0001 for instance) allows sampling to
proceed.  (I'm not sure if this is exposed through the R interface or
not.)  Matt, do you have time to check whats happening with the
stepsize adaption in this case?  The fixed epsilon works just fine.

Those things can be set from R by passing them through the ... in the stan() function.

Did you get the sampler to actually move? When I fixed the symmetry thing and ran it with a fixed epsilon and fixed number of leapfrog steps, the sampler proceeded, but never moved.

Ben

Bob Carpenter

unread,
Oct 2, 2012, 8:17:09 PM10/2/12
to stan-...@googlegroups.com


On 10/2/12 7:36 PM, terrance savitsky wrote:
> very helpful, Marcus. Thanks much. My grand opus monster model does define the inverses as local (unrestricted)
> parameter matrices. I purposely moved them into the transformed parameters block to trip an error message if symmetry
> were violated. Once I move them back into the models block (in my reduced model), the asymmetry error is replaced by
> the sampler indicating it is not able to find a sufficiently small step size to gain acceptance of the proposal.

That sounds like we're not testing symmetry everywhere
we should be.

> I
> believe you address this latter point in your comment on step size selection code.
>
> All said, what do you think causes the numerical imprecision in the inverse computations?

Are we sure that's what it is?

> The two covariance matrices
> in my model are relatively low dimension - one is 10 x 10 and the other, 2 x 2. Is there some drift introduced in the
> repeated gradient computations in the automated differentiator?

There could be, but so far we haven't seen it in any
of our tests. We're moving toward implementing gradients
by hand.

Anything you can do to avoid actually doing an inverse
might help. We have the MATLAB-style left and right
residual operators, which you can often use instead, or
you can sometimes express models in terms of Cholesky
factors.

> I've built gaussian process models, before, where it is
> standard to add a diagonal "nugget" to stabilize inversion under high dimensions, but that doesn't seem the case here.
> (I had also used a simpler version of HMC based on analytic differentiation).



> It would be super helpful to have variable names in error messages.

It would be, but it's not trivial to implement because
the errors arise in functions with their own local variables. We'd
need to change all of our distributions to look like this:

normal_log(T1 y, T2 mu, T3 sigma,
const std::string& y_name, const std::string& mu_name, ...);

in order to have them be accessible. We may do that, because
I agree it'd be very helpful.

> That said, the "print" statement will help, though

That's what it's there for. It was way easier to implement
than passing argument names everywhere.

> that is not yet implemented to propagate print outputs to the R layer.

Actually, it IS implemented to print output to R.
Are you using RStan 1.0.2? If it's not working for you,
there's some difference in your setup and everyone else's.
We'd like to diagnose these issues too.

Are you using RStudio, command-line or R GUI or something
else? I don't know how RStudio deals with stdin and stdout.
And which OS?

- Bob

Marcus Brubaker

unread,
Oct 2, 2012, 8:33:21 PM10/2/12
to stan-...@googlegroups.com
I left the NUTS criterion on and fixed the stepsize and it certainly
moved in terms of log prob. I didn't look at parameter values closely
though.

Cheers,
Marcus

Marcus Brubaker

unread,
Oct 2, 2012, 8:40:35 PM10/2/12
to stan-...@googlegroups.com
On Tue, Oct 2, 2012 at 7:36 PM, terrance savitsky <tds...@gmail.com> wrote:
> very helpful, Marcus. Thanks much. My grand opus monster model does
> define the inverses as local (unrestricted) parameter matrices. I
> purposely moved them into the transformed parameters block to trip an error
> message if symmetry were violated. Once I move them back into the models
> block (in my reduced model), the asymmetry error is replaced by the sampler
> indicating it is not able to find a sufficiently small step size to gain
> acceptance of the proposal. I believe you address this latter point in
> your comment on step size selection code.
>
> All said, what do you think causes the numerical imprecision in the inverse
> computations? The two covariance matrices in my model are relatively low
> dimension - one is 10 x 10 and the other, 2 x 2. Is there some drift
> introduced in the repeated gradient computations in the automated
> differentiator? I've built gaussian process models, before, where it is
> standard to add a diagonal "nugget" to stabilize inversion under high
> dimensions, but that doesn't seem the case here. (I had also used a
> simpler version of HMC based on analytic differentiation).

It shouldn't really have anything to do with the size of the matrix.
Rather, if I had to guess, the values of s_m or s_d were getting very
small. If they become zero, the resulting covariance matrices are
singular but nearby the inverse matrix will be unstable. Since
inverse() doesn't know that the matrix is symmetric, it's not able to
ensure that the resulting matrix is also symmetric. As Bob noted, you
could compute the inverse using the cholesky decomposition which is
numerically more stable. Even better though is to avoid computing the
inverse unless you absolutely must Again as Bob mentioned, there are
functions which avoid the inverse and are much more numerically
stable. mdivide_left(A,b) and mdivide_right(b,A) will give you A^-1
*b and b*A^-1 respectively. See also mdivide_left_tri if you're using
the cholesky factors. I'm not sure if they're exposed to the
modelling language yet. Let me know if you need them and they're not
and I can maybe get that cleaned up.

> It would be super helpful to have variable names in error messages. That
> said, the "print" statement will help, though that is not yet implemented to
> propagate print outputs to the R layer.

As noted, print should be there in the latest version.

Cheers,
Marcus

Marcus Brubaker

unread,
Oct 2, 2012, 8:53:48 PM10/2/12
to stan-...@googlegroups.com
On Tue, Oct 2, 2012 at 8:17 PM, Bob Carpenter <ca...@alias-i.com> wrote:
>
>
> On 10/2/12 7:36 PM, terrance savitsky wrote:
>>
>> very helpful, Marcus. Thanks much. My grand opus monster model does
>> define the inverses as local (unrestricted)
>> parameter matrices. I purposely moved them into the transformed
>> parameters block to trip an error message if symmetry
>> were violated. Once I move them back into the models block (in my
>> reduced model), the asymmetry error is replaced by
>> the sampler indicating it is not able to find a sufficiently small step
>> size to gain acceptance of the proposal.
>
>
> That sounds like we're not testing symmetry everywhere
> we should be.

We're not testing it when transformed parameters get set I expect, but
that seems reasonable to me.

>
>> I
>>
>> believe you address this latter point in your comment on step size
>> selection code.
>>
>> All said, what do you think causes the numerical imprecision in the
>> inverse computations?
>
>
> Are we sure that's what it is?

I'm reasonably certain. It's not hard to find matrices which are
numerically unstable for inverses. In particular, in this model is
very easy for the covariance matrix to become singular because s_m and
s_d are unconstrained. Ideally they should be positive. The only
other thing I could figure would be if there was a bug in the inverse
code, but that would imply a bug in Eigen which seems highly unlikely.


>> It would be super helpful to have variable names in error messages.
>
>
> It would be, but it's not trivial to implement because
> the errors arise in functions with their own local variables. We'd
> need to change all of our distributions to look like this:
>
> normal_log(T1 y, T2 mu, T3 sigma,
> const std::string& y_name, const std::string& mu_name, ...);
>
> in order to have them be accessible. We may do that, because
> I agree it'd be very helpful.

For exceptions in the model this would be problematic. However,
parameters which fail their constraint checks in write_data/write_csv
should be done more easily....Actually, I just looked and it's already
kind of implemented but the check_cov_matrix function doesn't pass
along the variable name and just substitutes "y". I just pushed a fix
for this for symmetry and positive definiteness but fixing the size
checks is more involved.

That said, I was thinking that more generally when the constraint
check code is generated it could be wrapped in a try {} catch {} block
which would add the variable name to the error message avoiding the
need to propagate things down at all.

Cheers,
Marcus

terrance savitsky

unread,
Oct 2, 2012, 9:40:15 PM10/2/12
to stan-...@googlegroups.com
> It shouldn't really have anything to do with the size of the matrix.

my experience is that covariance matrices tend to become nearly singular in high dimensions due to numerical instability as it's hard to maintain SPD in high dimensions.   that's why a multivariate Gaussian with an unstructured covariance (not generated from lower dimensional constructs) becomes impractical under high dimensions.  Performing lower-dimensional projections and employing a nugget is common in the gaussian process literature.

> Rather, if I had to guess, the values of s_m or s_d were getting very
small. 

Maybe this is the problem.  I did impose more informative priors on these objects, but with no improvement.

> mdivide_left(A,b) and mdivide_right(b,A) will give you A^-1
*b and b*A^-1 respectively.  Let me know if you need them and they're not

and I can maybe get that cleaned up.

I didn't find these functions in the stan manual.  I've used just such constructions (gaussian elimination with a cholesky decomposition to compute a quadratic product) - again in gaussian process models.  In my original stan script (before I extracted a simpler version), I avoided direct computation of a D x D inverse by using an idea from Ben to project to a lower-dimensional K x K matrix, where in my case, K = 3.   I had the same issue where the sampler couldn't find an acceptably small stepsize.

Big picture, your comment about singularity causing the stability problems may be quite correct.  Yet, the root cause of the singularity may not have anything to do with the inversion method and it's stability properties.  Unless there's something unusual going on behind the scenes in the automated differentiator, computing an inversion in each sampling iteration of a 3 x 3 SPD matrix wouldn't seem like it should be a problem.   When I take Marcus' tip on constraining the matrices to be symmetric, the symmetry errors - not surprisingly - go away.   They are replaced by the sampler being unable to find a sufficiently small stepsize.   It seems possible that there's some other sampling problem going on that creates singularity, which is the symptom, but not the cause of the problem.  


Marcus Brubaker

unread,
Oct 2, 2012, 10:58:56 PM10/2/12
to stan-...@googlegroups.com
On Tue, Oct 2, 2012 at 9:40 PM, terrance savitsky <tds...@gmail.com> wrote:
>> It shouldn't really have anything to do with the size of the matrix.
>
> my experience is that covariance matrices tend to become nearly singular in
> high dimensions due to numerical instability as it's hard to maintain SPD in
> high dimensions. that's why a multivariate Gaussian with an unstructured
> covariance (not generated from lower dimensional constructs) becomes
> impractical under high dimensions. Performing lower-dimensional projections
> and employing a nugget is common in the gaussian process literature.
>
>> Rather, if I had to guess, the values of s_m or s_d were getting very
> small.
>
> Maybe this is the problem. I did impose more informative priors on these
> objects, but with no improvement.

You don't need informative priors and certainly some informative
priors will not help. (E.G., N(0,s) for small s will make matters
worse.) Currently your parametrization admits singular covariance
matrices and makes near singular matrices easy to reach. Constraining
the elements of s_m and s_d to be positive will fix this.

>
>> mdivide_left(A,b) and mdivide_right(b,A) will give you A^-1
> *b and b*A^-1 respectively. Let me know if you need them and they're not
>
> and I can maybe get that cleaned up.
>
> I didn't find these functions in the stan manual. I've used just such
> constructions (gaussian elimination with a cholesky decomposition to compute
> a quadratic product) - again in gaussian process models. In my original
> stan script (before I extracted a simpler version), I avoided direct
> computation of a D x D inverse by using an idea from Ben to project to a
> lower-dimensional K x K matrix, where in my case, K = 3. I had the same
> issue where the sampler couldn't find an acceptably small stepsize.
>
> Big picture, your comment about singularity causing the stability problems
> may be quite correct. Yet, the root cause of the singularity may not have
> anything to do with the inversion method and it's stability properties.
> Unless there's something unusual going on behind the scenes in the automated
> differentiator, computing an inversion in each sampling iteration of a 3 x 3
> SPD matrix wouldn't seem like it should be a problem. When I take Marcus'
> tip on constraining the matrices to be symmetric, the symmetry errors - not
> surprisingly - go away. They are replaced by the sampler being unable to
> find a sufficiently small stepsize. It seems possible that there's some
> other sampling problem going on that creates singularity, which is the
> symptom, but not the cause of the problem.

The symmetry errors also go away (for me) when you just don't worry
about symmetry of the inverses. Symmetry of the constructed
covariances is fine on my end. You don't need to enforce it
explicitly.

There is definitely a sampler issue going on as it should be able to
find an appropriate stepsize but is giving up too early. See my note
about using a fixed step size as an intermediate solution. This
sampler problem will be harder to track down and may not have an
immediate fix. You could play with the parameters of the nesterov
dual averaging a bit (--delta and --gamma are the relevant parameter
names) but you should probably read up on the algorithm to have an
idea of what they do.

Cheers,
Marcus

Bob Carpenter

unread,
Oct 3, 2012, 1:49:03 PM10/3/12
to stan-...@googlegroups.com


On 10/2/12 10:58 PM, Marcus Brubaker wrote:

> There is definitely a sampler issue going on as it should be able to
> find an appropriate stepsize but is giving up too early. See my note
> about using a fixed step size as an intermediate solution. This
> sampler problem will be harder to track down and may not have an
> immediate fix. You could play with the parameters of the nesterov
> dual averaging a bit (--delta and --gamma are the relevant parameter
> names) but you should probably read up on the algorithm to have an
> idea of what they do.

The key thing that I didn't realize until Matt explained
in a meeting was that the target acceptance rate is for
steps in the final half of the tree generated by NUTS.

All that you need to move is to have one of the steps
accepted. So the relation between target rejection and
duplicated samples (overall rejection in NUTS at an
iteration) will depend on the size of the tree (i.e., the
number of leapfrog steps).

- Bob

Ben Goodrich

unread,
Oct 3, 2012, 2:07:57 PM10/3/12
to stan-...@googlegroups.com

Has anyone gotten movement? When using tcrossprod(Lambda) in master to fix the symmetry issue, --epsilon=0.01 (or something) and --leapfrog_steps=10, I got no errors but no movement.

Ben
 

Bob Carpenter

unread,
Oct 3, 2012, 2:14:33 PM10/3/12
to stan-...@googlegroups.com, stan...@googlegroups.com


On 10/2/12 8:40 PM, Marcus Brubaker wrote:
> ... mdivide_left(A,b) and mdivide_right(b,A) will give you A^-1
> *b and b*A^-1 respectively. See also mdivide_left_tri if you're using
> the cholesky factors. I'm not sure if they're exposed to the
> modelling language yet. Let me know if you need them and they're not
> and I can maybe get that cleaned up.

We used Matlab notation,

mdivide_left(A,b) is (A \ b)

mdivide_right(b,A) is (b / A)

We didn't add mdivide_left_tri yet, but I'll add that
to the function_signatures and doc as soon as I finish
this e-mail.

- Bob

Marcus Brubaker

unread,
Oct 3, 2012, 2:32:52 PM10/3/12
to stan-...@googlegroups.com

I got movement with epsilon=0.0001 and leapfrog_steps=-1 (ie, with NUTS).  The reported tree depth was usually around 2 or 3 though meaning most trajectories were much shorter than 10.  If it's not using a window acceptance function for a fixed leapfrog_steps then that would explain why you're not seeing movement in that case.

Cheers,
Marcus

Matt Hoffman

unread,
Oct 3, 2012, 3:01:45 PM10/3/12
to stan-...@googlegroups.com
Sorry for taking so long to weigh in, I've been ignoring my email lately.

The step size adaptation assumes that the accuracy of the Hamiltonian
simulation can always be increased to an acceptable range by
decreasing the step size. If this assumption is violated, then the
adaptation absolutely will not converge. Models with reasonably smooth
posteriors pretty much always satisfy this assumption. Numerical
problems (e.g. from trying to invert near-singular matrices) and
domain issues (e.g. from failing to constrain non-negative variables
or positive-definite matrices) can cause it to be violated.

My guess is that the adaptation failures will go away if the domain
issues (s_m and s_d being unconstrained) and the numerical issues
(using matrix inversion instead of mdivide) people have brought up are
resolved.

Matt

On Tue, Oct 2, 2012 at 7:58 PM, Marcus Brubaker

Bob Carpenter

unread,
Oct 3, 2012, 3:04:35 PM10/3/12
to stan-...@googlegroups.com


On 10/3/12 2:32 PM, Marcus Brubaker wrote:
> On Wed, Oct 3, 2012 at 2:07 PM, Ben Goodrich <goodri...@gmail.com <mailto:goodri...@gmail.com>> wrote:

> Has anyone gotten movement? When using tcrossprod(Lambda) in master to fix the symmetry issue, --epsilon=0.01 (or
> something) and --leapfrog_steps=10, I got no errors but no movement.

I found it moved with epsilon=0.0001 and any number of
leapfrog steps (I tried 10 and 100). Depending on the init,
it would also move with higher values of epsilon.

> I got movement with epsilon=0.0001 and leapfrog_steps=-1 (ie, with NUTS). The reported tree depth was usually around 2
> or 3 though meaning most trajectories were much shorter than 10. If it's not using a window acceptance function for a
> fixed leapfrog_steps then that would explain why you're not seeing movement in that case.

Right -- basic HMC is NOT using a windowed acceptance.
We could change it so that it does. The code's
pretty modular around HMC. (Unlike the parser/generator.)

- Bob



Matt Hoffman

unread,
Oct 3, 2012, 3:16:08 PM10/3/12
to stan-...@googlegroups.com
I'll reiterate that I think the step size is a red herring (or
possibly a canary, but definitely some kind of metaphorical animal).
My guess is that the simulation is running off the rails quickly,
which is causing the adaptation to drive the step size to 0. In that
scenario, using small fixed step sizes allows for some movement, but
not nearly enough to get decent inferences.

Matt

Bob Carpenter

unread,
Oct 3, 2012, 3:21:30 PM10/3/12
to stan-...@googlegroups.com


On 10/3/12 3:16 PM, Matt Hoffman wrote:
> I'll reiterate that I think the step size is a red herring (or
> possibly a canary, but definitely some kind of metaphorical animal).
> My guess is that the simulation is running off the rails quickly,
> which is causing the adaptation to drive the step size to 0. In that
> scenario, using small fixed step sizes allows for some movement, but
> not nearly enough to get decent inferences.

Agreed. I was just trying to make sure it wasn't
a bug in the basic operations.

- Bob

terrance savitsky

unread,
Oct 3, 2012, 7:22:05 PM10/3/12
to stan-...@googlegroups.com
I had tried many of the steps you've suggested on my full model, but have now also tried your suggestions on the simplified model (that still produces the same errors as the  full model).   In short, the very same runtime errors persist.  Here's some of what I did:

1. I restricted the scale parameters (s_d, s_m) to be positive reals for composing the covariance matrices (S_d, S_m) from the correlation matrices (R_d, R_m), which receive lkj priors.

1.5.  I placed informative inv_gamma priors on scale parameters, s_d and s_m.

2. I replaced the above formulation with inverse wishart priors directly to the covariance matrices.   Part of my reasoning for trying such was that highly smoothing priors may do more (in a weak sense) to ensure smoothness of the posterior to help me get around the stepsize selection problem (even in a situation where conjugacy isn't used for sampling).  I recognize it shouldn't matter - so long as my model is identified - under a proper probability model, but i decided to anyway try.  Didn't help, as noted.

3. I, next, added back one piece of complexity by building the D x D (for D = 10) covariance matrix, S_d,  in a factor analytic / sparse construction with K = 3 factors.  Ben pointed out that the Woodbury Sherman matrix identity may be applied to recover the inverse of the D x D, S_d, by inverting a lower-dimensional K x K matrix.   Since the inverse of the  K x K matrix  is used in a quadratic formulation with a D x K (loadings) matrix, I first performed the cholesky decomposition of the K x K matrix, inverted it, and multiplied it by the transpose of the D x K loadings matrix and then took their quadratic product.   The other covariance matrix, S_m, is 2 x 2 and I just perform the direct inverse.

You've noted that the sampling goes off the rails pretty quickly.  It's counter-intuitive that inverting a 3 x 3 lower triangular and a 2 x 2 covariance matrix should cause such rapid failure.   Even if the inverse computation were numerically unstable that would likely accrue over iterations and it appears the sampler gets right away stuck.   Of course, I really have no idea.   Maybe there's something else incorrectly specified in my script that I'm not seeing. 

 I've scripted a version of this model in JAGS with an unstructured (more general) D*M x D*M covariance matrix, rather than from the kronecker product of lower order D x D and M x M covariances.   The model is entirely conjugate (as I also employ a latent response for ordered outcomes), so JAGS handles it without issue.   I have a nagging doubt, however, about this data set because the signal is relatively weak so that the prior overly influences the posterior.   That's why I wanted to employ a more parsimonious structured covariance formulation in stan.   I don't think a lack of signal would ordinally cause stan to fail under smooth priors - e.g. a proper probability model; it should just return the prior, but maybe that's not the case.  I have other data under this project with a very strong signal in which I've performed similar modeling in JAGS.  when I have time, I will write a stan script for these data.  the model would be less complex, though still heavily parameterized, and see if i achieve a better outcome.  i would confess that i'm wondering whether HMC is able to handle joint sampling of the parameter space under a heavily parameterized (though identified and well-powered) model formulation.   I've always used HMC under analytic differentiation targeted to a sub-set of parameters inside a larger sequential scan sampling scheme.

I would welcome any other suggestions, if any occur to you.

Thanks for your time and help.  Terrance.
Reply all
Reply to author
Forward
0 new messages