linear predictors

Ben Goodrich

unread,

Mar 11, 2015, 3:30:11 AM3/11/15

to stan...@googlegroups.com

I was talking to Rob today and we were thinking about places where we can optimize on derivatives. One idea I had was to utilize a pair to represent a linear predictor (and maybe other things) in the Stan language:

http://www.cplusplus.com/reference/utility/pair/

I don't think we should introduce a pair as a type that you could declare, assign to, etc. in the Stan language, but we can expose a function like this (with a future-proof name when we switch to C++11)

    template <typename T1, int R1, int C1, typename T2, int R2, int C2>
    inline
    const std::pair<Eigen::Matrix<T1,R1,C1>,
                    Eigen::Matrix<T2,R2,C2> >
    tuple(const Eigen::Matrix<T1,R1,C1>& A, const Eigen::Matrix<T2,R2,C2>& B) {
      return std::make_pair(A,B);
    }

Then, a user could call

y ~ poisson_log_log(tuple(X,beta));

and we can specialize high priority distributions to accept a pair, e.g.

    // PoissonLog(n|X,beta)  [n >= 0]   = Poisson(n|exp(X*beta))
    template <bool propto,
              typename T_n, typename T_X, int N, int K, typename T_beta>
    typename return_type<T_X,T_beta>::type
    poisson_log_log(const T_n& n, const std::pair<Eigen::Matrix<T_X,N,K>,
                                                  Eigen::Matrix<T_beta,K,1> >& Xbeta) {
      typedef typename stan::partials_return_type<T_n,T_X,T_beta>::type
        T_partials_return;

      static const char* function("stan::prob::poisson_log_log");

      // check if any vectors are zero length
      if (!(stan::length(n) && N))
        return 0.0;

      // set up return value accumulator
      T_partials_return logp(0.0);

      // FIXME: implement
      return logp;
    }

and Rob can go crazee on the derivatives of logp with respect to the elements of Xbeta. I can see how it would be a hurdle to document how exactly a pair is going to be interpreted by each function that has it in its signature. But no one reads the documentation anyway.

Thoughts?
Ben

Bob Carpenter

unread,

Mar 11, 2015, 8:46:18 AM3/11/15

to stan...@googlegroups.com

I don't understand why you're suggesting using
a pair rather than two separate arguments.

And by "user", do you mean a client of the C++ API
or an end user of RStan?

And which doc are we talking about --- the API doc in
doxygen? We've been very inconsistent with that, but
I'm trying to add proper doc to all my commits now.

But my biggest worry (despite being the one who suggested
to Rob this would be a good idea) is that it's not
general enough. For example, it's not going to work
for hierarchical/multilevel models unless we pad out X lme4 style,
because there's more index fiddling beyond a simple product.

- Bob

> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Ben Goodrich

unread,

Mar 11, 2015, 9:55:58 AM3/11/15

to stan...@googlegroups.com

On Wednesday, March 11, 2015 at 8:46:18 AM UTC-4, Bob Carpenter wrote:

I don't understand why you're suggesting using
a pair rather than two separate arguments.

Well, I thought you were opposed to proliferating variants on the same distribution name. But it would be good to create an idiom that was common across many of the distributions that are used for GLMs.

We could also do Kronecker products this way and things like D * L. Basically anything where we want to represent multiplication but not actually multiply anything right away.

And by "user", do you mean a client of the C++ API
or an end user of RStan?

Both

And which doc are we talking about --- the API doc in
doxygen? We've been very inconsistent with that, but
I'm trying to add proper doc to all my commits now.

Actually, the API doc may have a higher read rate than the user manual. But the lookup table in the next version of RStan is going to be really useful I think.

> lookup(dpois)
       StanFunction              Arguments ReturnType Page
443     poisson_log (ints n, reals lambda)       real  342
444 poisson_log_log  (ints n, reals alpha)       real  342
445     poisson_log                      ~       real  342
447         poisson                      ~       real  341

The other interfaces should implement something like that. I am proposing that it would get another entry.

> lookup(dpois)
       StanFunction              Arguments ReturnType Page
443     poisson_log (ints n, reals lambda)       real  342
444 poisson_log_log  (ints n, reals alpha)       real  342
??? poisson_log_log   (ints n, pair Xbeta)       real  342
445     poisson_log                      ~       real  342
447         poisson                      ~       real  341

But my biggest worry (despite being the one who suggested
to Rob this would be a good idea) is that it's not
general enough. For example, it's not going to work
for hierarchical/multilevel models unless we pad out X lme4 style,
because there's more index fiddling beyond a simple product.

It's not super-general, but it might be general enough for Rob's sponsor. I think the likelihood of many lme4 style models would be better implemented in Stan as a loop over grouped data, in which case they could use a construction like this for each group.

Ben

Ben Goodrich

unread,

Mar 11, 2015, 2:30:19 PM3/11/15

to stan...@googlegroups.com

On Wednesday, March 11, 2015 at 9:55:58 AM UTC-4, Ben Goodrich wrote:

It's not super-general, but it might be general enough for Rob's sponsor. I think the likelihood of many lme4 style models would be better implemented in Stan as a loop over grouped data, in which case they could use a construction like this for each group.

This brings me to my next question. I get why it would be a lot of effort to support arrays of matrices that have different numbers of rows and columns in the data and parameters block. But I don't understand why that restriction is necessary in the other blocks, when it is not required by std::vector. In other words, I would like to be able to do something like

data {
  int<lower=2> J;
  int<lower=1> N[J];
  int<lower=1> K;
  matrix[sum(N),K] X;
  vector[sum(N)] y;
}
transformed data {
  vector y_j[J];  // parses as std::vector<Eigen::Matrix<T,Eigen::Dynamic,1> > y_j(J);
  matrix X_j[J];  // parses as std::vector<Eigen::Matrix<T,Eigen::Dynamic,Eigen::Dynamic> > X_j(J);
  int mark;       // where T is double in transformed data / generated quantities and var otherwise
  mark <- 1;
  for (j in 1:J) {
    X_j[j] <- block(X, mark, 1, N[j], K);
    y_j[j] <- segment(y, mark, N[j]);
    mark <- mark + N[j]; 
  }
}
...
model {
  for(j in 1:J) likelihood_lp(theta[j], y_j[j], X_j[j]);
}

Similarly, we could do things like have a eigendecomposition function that returned both the eigenvectors and then vector of eigenvalues as a std::vector<Eigen::Matrix<T,Eigen::Dynamic,Eigen::Dynamic> > of length two that the user could extract from.

Ben

Bob Carpenter

unread,

Mar 12, 2015, 12:03:18 AM3/12/15

to stan...@googlegroups.com

It took me a while to realize that your comments were for what you'd
like the code generator to do assuming the syntax were legal (declaring
vectors and matrices without sizes), not what Stan currently generates.

The main reason the sizing is all there is for error checking so
that we don't segfault at runtime.

So the main reason they're there is for error checking.

What I would like to do is implement a properly size-declared
ragged array type. Not exactly what you want, but safer.

For what you want to do, I'd suggest changing this:

> transformed data {
> vector y_j[J]; // parses as std::vector<Eigen::Matrix<T,Eigen::Dynamic,1> > y_j(J);
> matrix X_j[J]; // parses as std::vector<Eigen::Matrix<T,Eigen::Dynamic,Eigen::Dynamic> > X_j(J);
> int mark; // where T is double in transformed data / generated quantities and var otherwise
> mark <- 1;
> for (j in 1:J) {
> X_j[j] <- block(X, mark, 1, N[j], K);
> y_j[j] <- segment(y, mark, N[j]);
> mark <- mark + N[j];
> }
> }
> ...
> model {
> for(j in 1:J) likelihood_lp(theta[j], y_j[j], X_j[j]);

to do the work in the model:

model {
int mark;
mark <- 1; // I really need to get compound declare/define going!

for (j in 1:J)

likelihood_lp(theta[j],
segment(y, mark, N[j]),
block(X, mark, 1, N[j], K));

mark <- mark + N[j];
}

It won't be that bad computationally because there's no autodiff
implicated in segment() or block(). And it'll get better once we
can pass expression templates, but still not as fast as what you
were suggesting. And not as general, of course.

What I'd like to do longer term rather than overload our current
types based on context is to allow ragged arrays, which would use
N in the declaration of sizes.

- Bob

Ben Goodrich

unread,

Mar 12, 2015, 12:36:15 AM3/12/15

to stan...@googlegroups.com

On Thursday, March 12, 2015 at 12:03:18 AM UTC-4, Bob Carpenter wrote:

The main reason the sizing is all there is for error checking so
that we don't segfault at runtime.

Would segfaulting actually happen though? This program

#include <vector>
#include <Eigen/Dense>

int main() {
  std::vector<Eigen::MatrixXd> x(2);
  Eigen::MatrixXd y = x[0] + x[1];
  return y.rows();
}

returns 0 so apparently they are initialized to 0 by 0 matrices (here). If someone tried to multiply x[0] by an actual matrix, they would get Stan's non-conformability error, if someone tried to access an element of x[0], they would get Stan's out of bounds error, etc. There are probably edge cases that I am not thinking of but if we thought of them, we might be able to implement the necessary checks.

For what you want to do, I'd suggest ... to do the work in the model:

That's basically what I'm doing now. The copies are a bit of a hit for big matrices. Another thing that would help if we get all C++ functions ready for an Eigen::MatrixBase is to have a Stan function that would create a const Eigen::Map instead of a copy. Something like

block_view(X, mark, 1, N[j], K); // returns an N[j] by K Eigen::Map that points to the mark,1 element of X

Ben

Bob Carpenter

unread,

Mar 12, 2015, 12:49:18 AM3/12/15

to stan...@googlegroups.com

> On Mar 12, 2015, at 12:55 AM, Ben Goodrich <goodri...@gmail.com> wrote:
>
> On Wednesday, March 11, 2015 at 8:46:18 AM UTC-4, Bob Carpenter wrote:
> I don't understand why you're suggesting using
> a pair rather than two separate arguments.
>
> Well, I thought you were opposed to proliferating variants on the same distribution name.

I'm missing how this is relevant, because we're comparing

foo(y,pair(a,b))

and

foo(y,a,b)

We already overload to vectorize the distributions and operations
like multiply() are super-overloaded.

So I would suggest something like

normal_lm(y, x, beta, sigma);

which would be equal to

normal(y, x * beta, sigma);

But I think that only works for (matrix x, vector beta),
or were you imagining also having (row_vector x, vector beta)?

What would happen to the other arguments and vectorization?

> But it would be good to create an idiom that was common across many of the distributions that are used for GLMs.

OK, that's some motivation. But then we'd need to
introduce pair() into the language to use it, right? I probably
misunderstood what you were suggesting.

> We could also do Kronecker products this way and things like D * L. Basically anything where we want to represent multiplication but not actually multiply anything right away.
>
> And by "user", do you mean a client of the C++ API
> or an end user of RStan?
>
> Both
>
> And which doc are we talking about --- the API doc in
> doxygen? We've been very inconsistent with that, but
> I'm trying to add proper doc to all my commits now.
>
> Actually, the API doc may have a higher read rate than the user manual.

I'd bet dollars to donuts that's not true. Nobody ever sends
us corrections to the API doc!

> But the lookup table in the next version of RStan is going to be really useful I think.
>
> > lookup(dpois)
> StanFunction Arguments ReturnType Page
> 443 poisson_log (ints n, reals lambda) real 342
> 444 poisson_log_log (ints n, reals alpha) real 342
> 445 poisson_log ~ real 342
> 447 poisson ~ real 341

That will be super helpful. I like the layout, too.

Calling it lookup rather than stan_lookup may
be a problem with more namespace clashes. Will we be able to do
lookup(poisson), too? And maybe "translate" would be better to take
R functions to Stan functions?

> The other interfaces should implement something like that. I am proposing that it would get another entry.
>
> > lookup(dpois)
> StanFunction Arguments ReturnType Page
> 443 poisson_log (ints n, reals lambda) real 342
> 444 poisson_log_log (ints n, reals alpha) real 342
> ??? poisson_log_log (ints n, pair Xbeta) real 342
> 445 poisson_log ~ real 342
> 447 poisson ~ real 341
>
> But my biggest worry (despite being the one who suggested
> to Rob this would be a good idea) is that it's not
> general enough. For example, it's not going to work
> for hierarchical/multilevel models unless we pad out X lme4 style,
> because there's more index fiddling beyond a simple product.
>
> It's not super-general, but it might be general enough for Rob's sponsor. I think the likelihood of many lme4 style models would be better implemented in Stan as a loop over grouped data, in which case they could use a construction like this for each group.

I don't think Rob's "sponsor" is hiding! But I don't know for sure.

Maybe we should work out some kind of extensions package? I really
do worry about cluttering up our interfaces and doc, but I'd be OK
with adding the popular glms with simple linear predictors.

I just wouldn't want to go on and do one-level multilevel models,
two-level multilevel models, etc.

- Bob

Bob Carpenter

unread,

Mar 12, 2015, 12:52:18 AM3/12/15

to stan...@googlegroups.com

> On Mar 12, 2015, at 3:36 PM, Ben Goodrich <goodri...@gmail.com> wrote:
>
> On Thursday, March 12, 2015 at 12:03:18 AM UTC-4, Bob Carpenter wrote:
> The main reason the sizing is all there is for error checking so
> that we don't segfault at runtime.
>
> Would segfaulting actually happen though? This program
>
> #include <vector>
> #include <Eigen/Dense>
>
> int main() {
> std::vector<Eigen::MatrixXd> x(2);
> Eigen::MatrixXd y = x[0] + x[1];
> return y.rows();
> }
>
> returns 0 so apparently they are initialized to 0 by 0 matrices (here).

I think that's right.

> If someone tried to multiply x[0] by an actual matrix, they would get Stan's non-conformability error, if someone tried to access an element of x[0], they would get Stan's out of bounds error, etc.

Oh, I think you're right --- I think we check the actual size,
not the declared size, because they're the same. At least in
assignment.

> There are probably edge cases that I am not thinking of but if we thought of them, we might be able to implement the necessary checks.
>
> For what you want to do, I'd suggest ... to do the work in the model:
>
> That's basically what I'm doing now. The copies are a bit of a hit for big matrices. Another thing that would help if we get all C++ functions ready for an Eigen::MatrixBase is to have a Stan function that would create a const Eigen::Map instead of a copy. Something like
>
> block_view(X, mark, 1, N[j], K); // returns an N[j] by K Eigen::Map that points to the mark,1 element of X

Exactly what I was trying to say. That's what's going on in that thread with Daniel and Joshua,
so hopefully it'll be done soon.

- Bob

Ben Goodrich

unread,

Mar 12, 2015, 1:04:59 AM3/12/15

to stan...@googlegroups.com

On Thursday, March 12, 2015 at 12:49:18 AM UTC-4, Bob Carpenter wrote:

So I would suggest something like

normal_lm(y, x, beta, sigma);

which would be equal to

normal(y, x * beta, sigma);

If we want to go this function way, I think Rob would be okay with that. But we could possibly get by with fewer function names and some conventions on how pair / tuples are interpreted.

But I think that only works for (matrix x, vector beta),
or were you imagining also having (row_vector x, vector beta)?

Maybe

What would happen to the other arguments and vectorization?

The sigma argument to normal() could be reals still, I guess. But for something like multi_normal_cholesky() we could have two pair arguments, one for X * beta, and one for D * L.

> But it would be good to create an idiom that was common across many of the distributions that are used for GLMs.

OK, that's some motivation. But then we'd need to
introduce pair() into the language to use it, right? I probably
misunderstood what you were suggesting.

If we are close to C++11, then we might put this on hold and just introduce (VS 2010 willing) the tuple instead of the pair. That would open up more opportunities, like doing sparse matrix stuff by passing in a std::vector of std::tuple<int,int,T>, which Eigen can initialize into a SparseMatrix.

Calling it lookup rather than stan_lookup may
be a problem with more namespace clashes. Will we be able to do
lookup(poisson), too? And maybe "translate" would be better to take
R functions to Stan functions?

I can change the function name. Currently, lookup(poisson) and / or lookup("poisson") return nothing except a message saying that there is no poisson function in R. I didn't want to implement a way for a Stan function to be associated with multiple R functions (dpois, ppois, rpois).

Maybe we should work out some kind of extensions package?

Rob and I mentioned that as a possibility too.

Ben

Ben Goodrich

unread,

Mar 12, 2015, 1:10:28 AM3/12/15

to stan...@googlegroups.com

On Thursday, March 12, 2015 at 12:52:18 AM UTC-4, Bob Carpenter wrote:

> If someone tried to multiply x[0] by an actual matrix, they would get Stan's non-conformability error, if someone tried to access an element of x[0], they would get Stan's out of bounds error, etc.

Oh, I think you're right --- I think we check the actual size,
not the declared size, because they're the same. At least in
assignment.

Iff so, then I really think we should do this in the language. Being able to, e.g., return a std::vector of two Eigen things where the first was eigenvectors and the second was eigenvalues would be much less dumb than calling eigenvectors_sym() and eigenvalues_sym() separately and similarly for the other matrix decompositions. And Seth / Aki could get going on this idea of representing a Kronecker product of p covariance matrices of different orders as a cov_matrix KP[p].

Ben

Bob Carpenter

unread,

Mar 12, 2015, 4:01:19 AM3/12/15

to stan...@googlegroups.com

> On Mar 12, 2015, at 4:10 PM, Ben Goodrich <goodri...@gmail.com> wrote:
>
> On Thursday, March 12, 2015 at 12:52:18 AM UTC-4, Bob Carpenter wrote:
> > If someone tried to multiply x[0] by an actual matrix, they would get Stan's non-conformability error, if someone tried to access an element of x[0], they would get Stan's out of bounds error, etc.
>
> Oh, I think you're right --- I think we check the actual size,
> not the declared size, because they're the same. At least in
> assignment.

But right now it won't work because everything gets sized when
it's defined along with being declared.

> Iff so, then I really think we should do this in the language. Being able to, e.g., return a std::vector of two Eigen things where the first was eigenvectors and the second was eigenvalues would be much less dumb than calling eigenvectors_sym() and eigenvalues_sym() separately and similarly for the other matrix decompositions.

I'm not sure how we'd do that because the eigenvectors form a matrix
and the eigenvalues a vector.

> And Seth / Aki could get going on this idea of representing a Kronecker product of p covariance matrices of different orders as a cov_matrix KP[p].
>

This I know from nothing.

- Bob

Bob Carpenter

unread,

Mar 12, 2015, 4:06:18 AM3/12/15

to stan...@googlegroups.com

> On Mar 12, 2015, at 7:00 PM, Bob Carpenter <ca...@alias-i.com> wrote:
>
>
>> On Mar 12, 2015, at 4:10 PM, Ben Goodrich <goodri...@gmail.com> wrote:
>

...

>> Iff so, then I really think we should do this in the language. Being able to, e.g., return a std::vector of two Eigen things where the first was eigenvectors and the second was eigenvalues would be much less dumb than calling eigenvectors_sym() and eigenvalues_sym() separately and similarly for the other matrix decompositions.
>
> I'm not sure how we'd do that because the eigenvectors form a matrix
> and the eigenvalues a vector.

I suppose right now we could return an (N + 1) x N matrix where
the first row was the eigenvalues and the rest of the rows the
eigenvectors, then users could slice out the piece they need.

- Bob

Ben Goodrich

unread,

Mar 13, 2015, 7:14:05 PM3/13/15

to stan...@googlegroups.com

On Thursday, March 12, 2015 at 4:01:19 AM UTC-4, Bob Carpenter wrote:

> Oh, I think you're right --- I think we check the actual size,
> not the declared size, because they're the same. At least in
> assignment.

But right now it won't work because everything gets sized when
it's defined along with being declared.

OK. I invented a new rule (locally): You are allowed to assign anything to a matrix with 0 rows and 0 columns. So, this now works:

model {
  matrix[0,0] X[2];
  vector[4] ones;
  ones[1] <- 1;
  ones[2] <- 1;
  ones[3] <- 1;
  ones[4] <- 1;
  X[1] <- diag_matrix(ones);
  theta ~ normal(0,1);
}

However, the comment in src/stan/math/prim/mat/fun/assign.hpp does not seem to match what the code does. I had to do trial and error to figure out that this was being called rather than the one above it that allegedly fires when the dimensions do not match.

    /**
     * Copy the right-hand side's value to the left-hand side
     * variable.
     *
     * The <code>assign()</code> function is overloaded.  This
     * instance will be called for arguments that are both
     * <code>Eigen::Matrix</code> types and whose shapes match.  The
     * shapes are specified in the row and column template parameters.
     *
     * @tparam LHS Type of left-hand side matrix elements.
     * @tparam RHS Type of right-hand side matrix elements.
     * @tparam R Row shape of both matrices.
     * @tparam C Column shape of both mtarices.
     * @param x Left-hand side matrix.
     * @param y Right-hand side matrix.
     * @throw std::invalid_argument if sizes do not match.
     */
    template <typename LHS, typename RHS, int R, int C>
    inline void
    assign(Eigen::Matrix<LHS,R,C>& x,
           const Eigen::Matrix<RHS,R,C>& y) {
      if(x.rows() == 0 && x.cols() == 0) {
        x = y;
        return ;
      }
      stan::math::check_matching_dims("assign",
                                                "x", x,
                                                "y", y);
      for (int i = 0; i < x.size(); ++i)
        assign(x(i),y(i));
    }

But it would probably be better to support (outside of the data and parameters blocks)

matrix X[2]; 
// parse to vector<Eigen::Matrix<T__,Eigen::Dynamic,Eigen::Dynamic> > X(2);
// i.e. no fill value after the 2

rather than making users declare

matrix[0,0] X[2];

> Iff so, then I really think we should do this in the language. Being able to, e.g., return a std::vector of two Eigen things where the first was eigenvectors and the second was eigenvalues would be much less dumb than calling eigenvectors_sym() and eigenvalues_sym() separately and similarly for the other matrix decompositions.

I'm not sure how we'd do that because the eigenvectors form a matrix
and the eigenvalues a vector.

I think we're good in that case in the sense that this runs so I guess the column vector gets promoted:

#include <vector>
#include <Eigen/Dense>

int main() {


  std::vector<Eigen::Matrixd> > x(2);
  Eigen::MatrixXd m(2,2);
  m(0,0) = 3;
  m(1,0) = 2.5;
  m(0,1) = -1;
  m(1,1) = m(1,0) + m(0,1);
  x[0] = m;
  x[1] = m.col(1);
  return 0;
}

> And Seth / Aki could get going on this idea of representing a Kronecker product of p covariance matrices of different orders as a cov_matrix KP[p].

This I know from nothing.

They need a structure where they can represent the Kronecker product of say, a 2x2 matrix, a 3x3 matrix, and a 4x4 matrix without having to explicitly create the 24x24 matrix. Then a to-be-written quad_form_kronecker() function could take one of these std::vectors of different sized matrices as its first argument.

BTW, in hacking the above in, I discovered that this parses but does not compile (with "/usr/lib/R/site-library/RcppEigen/include/Eigen/src/Core/Assign.h:493:32: error: no member named 'YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY' in 'Eigen::internal::static_assertion<false>'")

transformed data {
  vector[4] ones;
  ones[1] <- 1;
  ones[2] <- 1;
  ones[3] <- 1;
  ones[4] <- 1;
}
parameters {
  real theta;
}
model {
  matrix[4,4] X[2];
  X[1] <- diag_matrix(ones);
  theta ~ normal(0,1);
}

and this parses and compiles but fails at runtime (with std::bad_alloc)

parameters {
  real theta;
}
model {
  matrix[-1,-1] X[2];
  theta ~ normal(0,1);
}

Ben

Bob Carpenter

unread,

Mar 13, 2015, 7:31:19 PM3/13/15

to stan...@googlegroups.com

Let me work on turning some of these into issues, then we
can continue the discussion there.

- Bob

Ben Goodrich

unread,

Mar 13, 2015, 10:29:15 PM3/13/15

to stan...@googlegroups.com

On Friday, March 13, 2015 at 7:31:19 PM UTC-4, Bob Carpenter wrote:

Let me work on turning some of these into issues, then we
can continue the discussion there.

Is this one below an issue with the code, the compiler, or my understanding of what the compiler should do with the code?

Ben

Reply all

Reply to author

Forward