beta regression model parses but does not compile

262 views
Skip to first unread message

Simon Blomberg

unread,
Dec 5, 2012, 9:34:00 PM12/5/12
to stan-...@googlegroups.com
Hi,

I'm trying to implement a beta regression model in Stan. The model parses correctly, but does not compile. My C++ compiler is: g++ (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2. I'm not (currently) using rstan. The compiler message is below.

g++ -I src -I lib/eigen_3.1.1 -I lib/boost_1.51.0 -O3 -Wall   -c -o src/models/myModels/rodents.o src/models/myModels/rodents.cpp
In file included from lib/eigen_3.1.1/Eigen/Core:286:0,
                 from lib/eigen_3.1.1/Eigen/Dense:1,
                 from src/stan/math/matrix.hpp:11,
                 from src/stan/meta/traits.hpp:7,
                 from src/stan/math/error_handling.hpp:14,
                 from src/stan/agrad/error_handling.hpp:4,
                 from src/stan/agrad/special_functions.hpp:7,
                 from src/stan/model/model_header.hpp:17:
lib/eigen_3.1.1/Eigen/src/Core/Matrix.h: In instantiation of ‘Eigen::Matrix<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols>::Matrix(const Eigen::MatrixBase<OtherDerived>&) [with OtherDerived = Eigen::MatrixWrapper<const Eigen::CwiseUnaryOp<Eigen::internal::scalar_add_op<stan::agrad::var>, const Eigen::CwiseUnaryOp<Eigen::internal::scalar_opposite_op<stan::agrad::var>, const Eigen::ArrayWrapper<Eigen::Matrix<stan::agrad::var, -1, 1, 0, -1, 1> > > > >; _Scalar = double; int _Rows = -1; int _Cols = 1; int _Options = 0; int _MaxRows = -1; int _MaxCols = 1]’:
src/stan/agrad/matrix.hpp:1101:53:   required from ‘Eigen::Matrix<typename boost::math::tools::promote_args<T1, T2>::type, R, C> stan::agrad::subtract(const T1&, const Eigen::Matrix<T2, R, C>&) [with T1 = int; T2 = double; int R = -1; int C = 1; typename boost::math::tools::promote_args<T1, T2>::type = double]’
src/models/myModels/rodents.cpp:288:44:   required from here
lib/eigen_3.1.1/Eigen/src/Core/Matrix.h:277:7: error: ‘YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY’ is not a member of ‘Eigen::internal::static_assertion<false>’
In file included from lib/boost_1.51.0/boost/random/detail/large_arithmetic.hpp:19:0,
                 from lib/boost_1.51.0/boost/random/detail/const_mod.hpp:23,
                 from lib/boost_1.51.0/boost/random/linear_congruential.hpp:30,
                 from lib/boost_1.51.0/boost/random/additive_combine.hpp:27,
                 from src/stan/gm/command.hpp:13,
                 from src/stan/model/model_header.hpp:20:
lib/boost_1.51.0/boost/random/detail/integer_log2.hpp:71:35: warning: always_inline function might not be inlinable [-Wattributes]
In file included from src/stan/agrad/special_functions.hpp:8:0,
                 from src/stan/model/model_header.hpp:17:
src/stan/math/special_functions.hpp:591:19: warning: ‘double stan::math::log_sum_exp(const std::vector<double>&)’ defined but not used [-Wunused-function]
In file included from src/stan/model/model_header.hpp:16:0:
src/stan/agrad/agrad.hpp:2215:17: warning: ‘void stan::agrad::free_memory()’ defined but not used [-Wunused-function]
make: *** [src/models/myModels/rodents.o] Error 1

Also, is it possible to have vectorised link functions? Perhaps in the next version? :-)

Cheers,

Simon.

Simon Blomberg, BSc (Hons), PhD, MAppStat, AStat.
Lecturer and Consultant Statistician
School of Biological Sciences
The University of Queensland
St. Lucia Queensland 4072
Australia
T: +61 7 3365 2506
email: S.Blomberg1_at_uq.edu.au
http://www.evolutionarystatistics.org

Policies:
1.  I will NOT analyse your data for you.
2.  Your deadline is your problem.

Statistics is the grammar of science - Karl Pearson.



rodents.txt

Ben Goodrich

unread,
Dec 6, 2012, 12:47:26 AM12/6/12
to stan-...@googlegroups.com
On Wednesday, December 5, 2012 9:34:00 PM UTC-5, Simon Blomberg wrote:
I'm trying to implement a beta regression model in Stan. The model parses correctly, but does not compile.

This looks like the parser generated the wrong C++ code. For me, it compiles with the master branch which will hopefully soon become v1.0.4. There is an off-chance that the attached .cpp file might compile and run.

But I think a better route would be to optimize the .stan file and hopefully circumvent this bug. For a fixed correlation matrix, multi_normal is unnecessarily slow. You could do

data {
   
int<lower=1> N; // Number of rows
   
int<lower=1> K; // Number of beta parameters
    matrix
[N, K] mmat; // model matrix
    vector
[N] decline; // dependent variable
    corr_matrix
[N] Sigma; // phylogenetic correlation matrix
}

transformed data
{
    matrix
[N,N] L;
   
Lt <- transpose(cholesky_decompose(inverse(Sigma)));
}

parameters
{
    vector
[K] beta; // regression parameters
    real
<lower=0> g; // dispersion parameter
    vector
[N] species; // random effect for species    
}

transformed parameters
{
    vector
[N] a; // parameter for beta distn
    vector
[N] b; // parameter for beta distn
    vector
<lower=0,upper=1>[N] mvals; // mean values for species
    mvals
<- mmat*beta + species;
    a
<- mvals*g; // reparameterise for beta dist
    b  
<- (1.0-mvals)*g; // reparameterise
}

model
{
//priors
    beta
~ normal(0, 1000);
    g
~ gamma(0.001, 0.001); // dispersion parameter
    lp
<- lp - 0.5 * dot_self(Lt * species); // MVN

//likelihood

    decline
~ beta(a, b);
}

where -0.5 * dot_self(Lt * species) is proportional to a multivariate normal density with correlation matrix Sigma. Also, I put the required bounds on mvals, which will cause the proposal to be rejected if any are outside the unit interval and thus may not be efficient. You could also use a loop with a logistic or normal link function here. Although these are not vectorized, loops are pretty cheap unless they are updating the log-posterior.

Ben


rodents.cpp

Simon Blomberg

unread,
Dec 6, 2012, 1:32:04 AM12/6/12
to stan-...@googlegroups.com

On 06/12/12 15:47, Ben Goodrich wrote:
On Wednesday, December 5, 2012 9:34:00 PM UTC-5, Simon Blomberg wrote:
I'm trying to implement a beta regression model in Stan. The model parses correctly, but does not compile.

This looks like the parser generated the wrong C++ code. For me, it compiles with the master branch which will hopefully soon become v1.0.4. There is an off-chance that the attached .cpp file might compile and run.

But I think a better route would be to optimize the .stan file and hopefully circumvent this bug. For a fixed correlation matrix, multi_normal is unnecessarily slow.
Hi Ben,

I realised multi_normal would be slow, but I thought I'd use it to get the code working, and then optimise later. Thanks for the advice on how to do that! I fixed the typos in the code you sent me below, and in Stan 1.0.3, it still parses but doesn't compile (same error). And the .cpp file you sent me would also not compile with my g++. Also, I take your point about using a link function. I was going to use a logit, but since it isn't vectorised, I couldn't figure out how to index the rows of mmat. I RTFM and realised to get row i, use mmat[i] NOT mmat[i,] , as I was used to from R. I incorporated the logit link but still the same problem. I'm looking forward to Stan version 1.0.4!

Thanks again,

Simon.

You could do

data {
    int<lower=1> N; // Number of rows
    int<lower=1> K; // Number of beta parameters
    matrix[N, K] mmat; // model matrix
    vector[N] decline; // dependent variable
    corr_matrix[N] Sigma; // phylogenetic correlation matrix
}

transformed data {
    matrix[N,N] L;
    Lt <- transpose(cholesky_decompose(inverse(Sigma)));
}

parameters {
    vector[K] beta; // regression parameters
    real<lower=0> g; // dispersion parameter
    vector[N] species; // random effect for species    
}

transformed parameters {
    vector[N] a; // parameter for beta distn
    vector[N] b; // parameter for beta distn

    vector<lower=0,upper=1>[N] mvals; // mean values for s!
 pecies

    mvals <- mmat*beta + species;
    a <- mvals*g; // reparameterise for beta dist
    b  <- (1.0-mvals)*g; // reparameterise
}

model {
//priors
    beta ~ normal(0, 1000);
    g ~ gamma(0.001, 0.001); // dispersion parameter
    lp <- lp - 0.5 * dot_self(Lt * species); // MVN

//likelihood

    decline ~ beta(a, b);
}

where -0.5 * dot_self(Lt * species) is proportional to a multivariate normal density with correlation matrix Sigma. Also, I put the required bounds on mvals, which will cause the proposal to be rejected if any are outside the unit interval and thus may not be efficient. You could also use a loop with a logistic or normal link function here. Although these are not vectorized, loops are pretty cheap unless they are updating the log-posterior.

Ben


--
 
 

-- 
Simon Blomberg, BSc (Hons), PhD, MAppStat, AStat.
Lecturer and Consultant Statistician
School of Biological Sciences
The University of Queensland
St. Lucia Queensland 4072
Australia
T: +61 7 3365 2506

Ben Goodrich

unread,
Dec 6, 2012, 1:43:17 AM12/6/12
to stan-...@googlegroups.com
On Thursday, December 6, 2012 1:32:04 AM UTC-5, Simon Blomberg wrote:
I realised multi_normal would be slow, but I thought I'd use it to get the code working, and then optimise later. Thanks for the advice on how to do that! I fixed the typos in the code you sent me below, and in Stan 1.0.3, it still parses but doesn't compile (same error). And the .cpp file you sent me would also not compile with my g++. Also, I take your point about using a link function. I was going to use a logit, but since it isn't vectorised, I couldn't figure out how to index the rows of mmat. I RTFM and realised to get row i, use mmat[i] NOT mmat[i,] , as I was used to from R. I incorporated the logit link but still the same problem. I'm looking forward to Stan version 1.0.4!

Try passing in (or creating in transformed data) a vector of ones; i.e. instead of (1 - mvals) do (ones  - mvals).

Ben

Simon Blomberg

unread,
Dec 6, 2012, 1:47:59 AM12/6/12
to stan-...@googlegroups.com
Woohoo! That worked! Parses and compiles!

Thanks again,

Simon.
--
 
 
Reply all
Reply to author
Forward
0 new messages