Array of vectors/matrices

Ted

unread,

Nov 1, 2012, 6:52:21 PM11/1/12

to stan-...@googlegroups.com

Suppose we have

data {
    int<lower=1> N;
    int<lower=1> J;
    int<lower=1> Q;
    matrix[N, J] data_matrix[Q];
    vector[N] data_vector[Q];
}

From the reference manual I surmise that data_matrix[1] is an N x J matrix, and data_vector[1] is a vector of dimension N. Is this correct? Also, what form should the data take in R using rstan? Should data_matrix be an array with dimensions c(Q, N, J), an array with dimensions c(N, J, Q), a list of NxJ matrices, or something else? Should data_vector be a matrix with dimensions Q x N, N x Q, or something else?

Thanks,

Ted

Jiqiang Guo

unread,

Nov 1, 2012, 9:15:22 PM11/1/12

to stan-...@googlegroups.com

I think in terms of reading data, matrix[N, J] data_matrix[Q] is equivalent to data_matrix[Q, N, J]. We can see this using print statement in Stan.

Say I have model:

$ more array.stan

data {

int<lower=1> N;

int<lower=1> J;

int<lower=1> Q;

matrix[N, J] data_matrix[Q];

int data_matrix2[Q, N, J]; // specify the same data with data_matrix

vector[N] data_vector[Q];

}

parameters {

real y;

}

model {

y ~ normal(0, 1);

print("dm=", data_matrix);

print("dm[3,2,1]=", data_matrix[3,2,1]);

print("dm2[3,2,1]=", data_matrix2[3,2,1]);

print("dm[2,1,2]=", data_matrix[2,1,2]);

print("dm2[2,1,2]=", data_matrix[2,1,2]);

print("dm2=", data_matrix2);

print("dv=", data_vector);

}

This is some of the output in RStan:

> library(rstan)

>

> dat <- list(N = 2, J = 3, Q = 4,

+ data_matrix = array(1:24, dim = c(4, 2, 3)),

+ data_matrix2 = array(1:24, dim = c(4, 2, 3)),

+ data_vector = array(1:8, dim = c(4, 2)));

>

> fit <- stan(file = 'array.stan', data = dat, iter = 1, chains = 1)

TRANSLATING MODEL 'array' FROM Stan CODE TO C++ CODE NOW.

COMPILING THE C++ CODE FOR MODEL 'array' NOW.

SAMPLING FOR MODEL 'array' NOW (CHAIN 1).

dm=[ 1 9 17

5 13 21, 2 10 18

6 14 22, 3 11 19

7 15 23, 4 12 20

8 16 24]

dm[3,2,1]=7

dm2[3,2,1]=7

dm[2,1,2]=10

dm2[2,1,2]=10

dm2=[[[1,9,17],[5,13,21]],[[2,10,18],[6,14,22]],[[3,11,19],[7,15,23]],[[4,12,20],[8,16,24]]]

dv=[1

5,2

6,3

7,4

8]

--

Ben Goodrich

unread,

Nov 1, 2012, 9:35:37 PM11/1/12

to stan-...@googlegroups.com

On Thursday, November 1, 2012 9:15:23 PM UTC-4, Jiqiang Guo wrote:

I think in terms of reading data, matrix[N, J] data_matrix[Q] is equivalent to data_matrix[Q, N, J]. We can see this using print statement in Stan.

Thanks for asking this question, Ted. I, too, had to figure out the answer with print() statements because it is not currently documented and because it is counter-intuitive to the way R users would implement it.

Perhaps this is a good time to mention that it can't hurt to put some print statements into the transformed data {} block (even if you do not actually transform any data) just to verify that your data look sane. Such print statements will only be executed once.

Ben

Ben Goodrich

unread,

Nov 1, 2012, 10:09:16 PM11/1/12

to stan-...@googlegroups.com

Maybe we didn't fully answer your questions. Stan's behavior is somewhat like this R code

list_of_matrices <- lapply(1:Q, FUN = function(...) matrix(rnorm(N * J), nrow = N, ncol = J))

But, I don't think you can pass a list of matrices to Stan from R currently. So, you would then need to do

data_matrix <- array(NA_real_, dim = c(Q,N,J))
for(q in 1:Q) data_matrix[q,,] <- list_of_matrices[[q]]

and pass data_matrix to Stan.

To me, it is confusing to do

vector[N] data_vector[Q];

so I would probably just make that a single QxN matrix so that it looks the same in R.

Ben

Ted

unread,

Nov 2, 2012, 11:38:28 AM11/2/12

to stan-...@googlegroups.com

Thanks for all the help. The reason I use vector[N] data_vector[Q] instead of real data_vector[Q,N] or matrix[Q,N] data_vector is because I want to loop through Q column vectors for ease of matrix multiplication later. That is also why I use matrix[N, J] data_matrix[Q] rather than data_matrix[Q,N,J].

The reason I asked all of this is because I have a large model that is running quite slowly - it has been running on a cluster computer for nearly 48 hours and is at iteration 4000/40000 in the first chain. The main part of the model specifies a poisson distribution, so I was hoping that instead of looping through all 50,000 observations to specify the poisson distribution I could just loop through Q=108 quarters and have the poisson distribution vectorized for each quarter q (see below). But it appears from page 205 of the 1.0.3 reference manual that the poisson distribution is not vectorized yet. Does anyone know if this is on the to-do list, and if so, when it might be done? Are there any work-arounds I should consider? Thanks.

My stan code:

data {
    int<lower=1> N; // number of observations per quarter
    int<lower=1> Q; // number of quarters
    int<lower=1> J; // number of variables
    int<lower=1> D; // number of district-seasons
    // injuries
    vector[N] traum_injuries[Q];
    // log total hours worked
    vector[N] log_total_hours_worked[Q];
    // main covariates: Q quarters of NxJ matrices

matrix[N, J] data_matrix[Q];

    // district-season indicators: Q quarters of a Nx(D-1) matrix
    matrix[N, D-1] district_season[Q];
}

parameters {
    vector[J] beta[Q + 1];
    real<lower=-40,upper=20> intercept;
    vector[D-1] district_season_eff;
    real<lower=.01, upper=1.4142> sig[J];
}

transformed parameters {
    vector[N] intercept_vector;
    for (i in 1:N) intercept_vector[i] <- intercept;
}

model {
    // implicit: sig ~ uniform(.01,1.4142);
    // implicit: intercept ~ uniform(-40,20);
    beta[1] ~ normal(0,100);
    district_season_eff ~ normal(0, 1);
    for(q in 2:(Q + 1))
        beta[q] ~ normal(beta[q-1],sig);
    for (q in 1:Q) {
        traum_injuries[q] ~ poisson(exp(log_total_hours_worked[q]
                                + intercept_vector
                                + data_matrix[q] * beta[q]
                                + district_season[q] * district_season_eff));
    }
}

Ben Goodrich

unread,

Nov 2, 2012, 2:43:30 PM11/2/12

to stan-...@googlegroups.com

On Friday, November 2, 2012 11:38:28 AM UTC-4, Ted wrote:

Thanks for all the help. The reason I use vector[N] data_vector[Q] instead of real data_vector[Q,N] or matrix[Q,N] data_vector is because I want to loop through Q column vectors for ease of matrix multiplication later. That is also why I use matrix[N, J] data_matrix[Q] rather than data_matrix[Q,N,J].

That makes sense for matrix[N, J] data_matrix[Q]. To loop through the columns of a matrix,

for(q in 1:Q) col(matrix,q) ~ something;

is fast.

But it appears from page 205 of the 1.0.3 reference manual that the poisson distribution is not vectorized yet. Does anyone know if this is on the to-do list, and if so, when it might be done? Are there any work-arounds I should consider?

I think all the continuous distributions are vectorized to some degree, but not the truncated continuous or discrete ones. You can vectorize the Poisson distribution yourself with something like

model {
  vector[Q] log_lambda;
  ...
  log_lambda <- intercept + log_total_hours_worked + data_matrix * beta + district_season * district_season_eff;
  lp__ <- lp__ + dot_product(traum_injuries, log_lambda) - sum(exp(log_lambda)); # ignore factorial thing
}

I don't know if you can do a dot_product of an integer vector and a real vector. If not, just do sum(traum_injuries .* log_lambda) but note the period for element-wise multiplication.

Ben

Bob Carpenter

unread,

Nov 2, 2012, 2:56:22 PM11/2/12

to stan-...@googlegroups.com

On 11/2/12 2:43 PM, Ben Goodrich wrote:
> On Friday, November 2, 2012 11:38:28 AM UTC-4, Ted wrote:
>
> Thanks for all the help. The reason I use vector[N] data_vector[Q] instead of real data_vector[Q,N] or matrix[Q,N]
> data_vector is because I want to loop through Q column vectors for ease of matrix multiplication later. That is
> also why I use matrix[N, J] data_matrix[Q] rather than data_matrix[Q,N,J].
>
>
> That makes sense for matrix[N, J] data_matrix[Q]. To loop through the columns of a matrix,
>
> |
> for(q in 1:Q) col(matrix,q) ~ something;

They should be comparable in the context of a model.

We're doing an end-around on the expression templates
in Eigen, so the array may be a bit faster.

> is fast.
>
> But it appears from page 205 of the 1.0.3 reference manual that the poisson distribution is not vectorized yet.
> Does anyone know if this is on the to-do list, and if so, when it might be done? Are there any work-arounds I
> should consider?
>
>
> I think all the continuous distributions are vectorized to some degree, but not the truncated continuous or discrete
> ones.

I can hear the tap of Daniel's keys at the next desk banging
out the vectorized discrete distributions. But even with that,
they won't necessarily be efficient until another pass of computing
all the gradients by hand -- that's the big saving of vectorization,
not the basic matrix ops.

You can vectorize the Poisson distribution yourself with something like
>
> |
> model {
> vector[Q] log_lambda;
> ...
> |log_lambda <- intercept + |log_total_hours_worked + data_matrix * beta + district_season * |||district_season_eff;|

> lp__ <- lp__ + dot_product(|traum_injuries, log_lambda) - sum(exp(log_lambda));| # ignore factorial thing

> }
> |
>
> I don't know if you can do a dot_product of an integer vector and a real vector.

No -- there are no integer vectors. But given that it's
data, you could always convert to real.

> If not, just do sum(traum_injuries .*
> log_lambda) but note the period for element-wise multiplication.

- Bob

Ben Goodrich

unread,

Nov 2, 2012, 2:59:53 PM11/2/12

to stan-...@googlegroups.com

On Friday, November 2, 2012 11:38:28 AM UTC-4, Ted wrote:

Are there any work-arounds I should consider? Thanks.

There is another thing about these betas:

model {

beta[1] ~ normal(0,100); district_season_eff ~ normal(0, 1); for(q in 2:(Q + 1)) beta[q] ~ normal(beta[q-1],sig);

This is unnecessarily slow, and I believe it will also yield the wrong answer in Stan because you have specified a bunch of conditional distributions for the betas whose product is not equal to the joint distribution of the betas. It would be better to specify beta as a transformed parameter that is a function of the initial beta and noise, as in

parameters {
  real beta_init;
  vector[Q-1] beta_noise;
  ...
}
transformed parameters {
  vector[Q] beta;
  beta[1] <- beta_init;
  for (q in 2:Q)
    beta[q] <- beta[q-1] + beta_noise[q];
}
model {
  beta_init ~ normal(0.0, 100.0);
  beta_noise ~ normal(0.0, sig);
  ...
}

Ben

Bob Carpenter

unread,

Nov 2, 2012, 3:22:47 PM11/2/12

to stan-...@googlegroups.com

Are you running in R or from the command line?

Do you know if everything fits in memory without
swapping?

Do you have anything like a reasonable way to
initialize? It can make a big difference in
a model like this. You might also try the zero
init option because that'll make all the unbounded
coefficients zero, which is usually an easier place
to start than drawing a value for beta[1] and then

Did you run fewer than 40K iterations to see
if you're converging? The default is to use the
entire first half of each chain for adaptation,
so you can speed up nearly a factor of 2 if you
are converging earlier.

Also, I'd recommend running for shorter periods
first just to see how things are going.

What would also really help here is a

poisson_log(y|alpha) = poisson(y|exp(alpha))

> model {
> // implicit: sig ~ uniform(.01,1.4142);

This will be initialized between (.14,1.3), so low values could be
problematic given the hierarchical model. It looks like you
really wanted variances on (0.0001,2) -- that's what you get
in terms of range, but it's not quite uniform because of the square
transform (you need to add the Jacobian adjustment if you want it
to be uniform on variances).

> // implicit: intercept ~ uniform(-40,20);

Initial values will be in the range (-34,14).

In general, Stan's default is to use uniform inits between [-2,2]
on the unconstrained params. For bounded params, that amounts
to a range of

(inv_logit(-2), inv_logit(2)) = (.12,.88)

So you sample from the central 80% interval, but non-uniformly on
the transformed space because of the logit transform. I should write
this into the manual as an example along with positive-only inits,
which are also common.

> beta[1] ~ normal(0,100);

Initialized on range (-2,2) uniformly.

> district_season_eff ~ normal(0, 1);
> for(q in 2:(Q + 1))
> beta[q] ~ normal(beta[q-1],sig);

Here you can apply the Matt trick, which would look
as follows, and should significantly speed up sampling
speed (despite adding some extra named variables -- if you
don't want to write out the beta[] values, you can define
them instead as local variables in the model instead of as
transformed params).

parameters {
vector[J] beta[Q+1] beta_raw;
}
transformed parameters {
vector[J] beta[Q+1] beta;

beta[1] <- beta_raw[1] / 100;
for (q in 2:(Q + 1))
beta[q] = (beta_raw[q] + beta[q-1]) / sigma;
}
model {
beta_raw ~ normal(0,1);
}

I haven't compiled it, but I'm pretty sure all the vector
ops here are kosher.

This also conveniently vectorizes the normal draw
for the entire vectored time series including the initial point.

- Bob

Bob Carpenter

unread,

Nov 2, 2012, 3:27:33 PM11/2/12

to stan-...@googlegroups.com

On 11/2/12 2:59 PM, Ben Goodrich wrote:
> On Friday, November 2, 2012 11:38:28 AM UTC-4, Ted wrote:
>
> Are there any work-arounds I should consider? Thanks.
>
>
> There is another thing about these betas:
>
> |
> model {
> beta[1] ~ normal(0,100);
> district_season_eff ~ normal(0, 1);
> for(q in 2:(Q + 1))
> beta[q] ~ normal(beta[q-1],sig);
> |
>
>
> This is unnecessarily slow, and I believe it will also yield the wrong answer in Stan because you have specified a bunch
> of conditional distributions for the betas whose product is not equal to the joint distribution of the betas.

Why not? Isn't this the usual way to do a time series?
Isn't the joint just as follows:

p(beta[1]) * PROD_{q in 2:(Q+1)} p(beta[q]|beta[q-1])

> It would
> be better to specify beta as a transformed parameter that is a function of the initial beta and noise, as in

Right -- that's what I said in much longer form,
though I suggested going all the way to a full unit
normal that's both re-centered and re-scaled.

> parameters {
> real beta_init;
> vector[Q-1] beta_noise;
> ...
> }
> transformed parameters {
> vector[Q] beta;
> beta[1] <- beta_init;
> for (q in 2:Q)
> beta[q] <- beta[q-1] + beta_noise[q];
> }
> model {
> beta_init ~ normal(0.0, 100.0);
> beta_noise ~ normal(0.0, sig);
> ...
> }

This seems like it'll lead to exactly the same joint
probability on the beta. What am I missing?

- Bob

Bob Carpenter

unread,

Nov 2, 2012, 3:52:25 PM11/2/12

to stan-...@googlegroups.com

On 11/1/12 9:35 PM, Ben Goodrich wrote:
> On Thursday, November 1, 2012 9:15:23 PM UTC-4, Jiqiang Guo wrote:
>
> I think in terms of reading data, matrix[N, J] data_matrix[Q] is equivalent to data_matrix[Q, N, J]. We can see
> this using print statement in Stan.
>
>
> Thanks for asking this question, Ted. I, too, had to figure out the answer with print() statements because it is not
> currently documented and because it is counter-intuitive to the way R users would implement it.

I'll add some clarifying examples to the doc on
arrays and for I/O.

I didn't even think about this as being ambiguous
because it's the "natural" C++ ordering for an
array of matrices to take the array index first.

To make matters more confusing, for compatibility, we took the reads
from R dump format to be column (or last index) major,
just like in R, JAGS and Fortran (but unlike C/C++/Java/Python and
BUGS). So the loop to read

real x[I,J,K];
matrix[J,K] m[I];

is

for (k in 1:K)
for (j in 1:J)
for (i in 1:I)
x[i,j,k] = next...

- Bob

Ben Goodrich

unread,

Nov 2, 2012, 5:03:54 PM11/2/12

to stan-...@googlegroups.com

On Friday, November 2, 2012 3:27:36 PM UTC-4, Bob Carpenter wrote:

On 11/2/12 2:59 PM, Ben Goodrich wrote:
> This is unnecessarily slow, and I believe it will also yield the wrong answer in Stan because you have specified a bunch
> of conditional distributions for the betas whose product is not equal to the joint distribution of the betas.

Why not? Isn't this the usual way to do a time series?
Isn't the joint just as follows:

p(beta[1]) * PROD_{q in 2:(Q+1)} p(beta[q]|beta[q-1])

You're right. The joint distribution is improper because of the unit-root, so I was having trouble wrapping my head around it being the product of proper distributions. Usually for a random walk, there would be an improper prior on beta[1]. It is fine to put a prior with finite variance on beta[1]; it just is somewhat odd to have it have a difference variance than the rest of the series.

Anyway, we're in agreement that the fastest way is to parameterize the noise and along the lines of what Bob said

beta[q] <- beta[q - 1] + sig * beta_noise[q];

with iid standard normal priors on beta_noise is best.

Ben

Ted

unread,

Nov 2, 2012, 7:27:05 PM11/2/12

to stan-...@googlegroups.com

Thanks for all the help. I am really liking Stan's language so far - more intuitive and clear than BUGS but with more functionality and also similar enough to make translation not too bad.

Re Bob - I'm running from R, currently using poisson regression output as initial parameters. I will try it with fewer iterations to see if it converges.

The code posted below compiled and is running with one chain at 1000 iterations now. However, I don't understand the lp__ stuff conceptually so I'm not sure if the code that I adapted from Ben's reply is correct. Would you mind checking it out for me?

Thanks again,

Ted

data {
  int<lower=1> N; // number of observations (mines) per quarter

  int<lower=1> Q; // number of quarters
  int<lower=1> J; // number of variables
  int<lower=1> D; // number of district-seasons
  // injuries

int<lower=0> traum_injuries[Q,N];

  // log total hours worked
  vector[N] log_total_hours_worked[Q];
  // main covariates: Q quarters of NxJ matrices
  matrix[N, J] data_matrix[Q];
  // district-season indicators: Q quarters of a Nx(D-1) matrix
  matrix[N, D-1] district_season[Q];
}

transformed data {
  vector[N] traum_injuries_vec[Q];

for (i in 1:N)

for (q in 1:Q)

      traum_injuries_vec[q,i] <- traum_injuries[q,i];
}

parameters {
  vector[J] beta_init;
  vector[J] beta_noise[Q];

  real<lower=-40,upper=20> intercept;
  vector[D-1] district_season_eff;

  real<lower=.01,upper=1.4142> sig[J];
}

transformed parameters {
  vector[J] sig_vec;
  vector[J] beta[Q+1];
  for (j in 1:J) sig_vec[j] <- sig[j];
  beta[1] <- beta_init;
  for (q in 2:(Q+1)) beta[q] <- beta[q-1] + sig_vec .* beta_noise[q-1];
}

model {
  vector[N] log_lambda;

// implicit: sig ~ uniform(.01,1.4142);

  // implicit: intercept ~ uniform(-40.0,20.0);
  beta_init ~ normal(0.0, 100.0);
  district_season_eff ~ normal(0.0, 1.0);

for (q in 1:Q) {

    beta_noise[q] ~ normal(0.0,1.0);
    log_lambda <- intercept + log_total_hours_worked[q] + data_matrix[q] * beta[q] + district_season[q] * district_season_eff;
    lp__ <- lp__ + sum(traum_injuries_vec[q] .* log_lambda) - sum(exp(log_lambda));
  }
}

Ben Goodrich

unread,

Nov 2, 2012, 10:00:53 PM11/2/12

to stan-...@googlegroups.com

On Friday, November 2, 2012 7:27:06 PM UTC-4, Ted wrote:

The code posted below compiled and is running with one chain at 1000 iterations now. However, I don't understand the lp__ stuff conceptually so I'm not sure if the code that I adapted from Ben's reply is correct. Would you mind checking it out for me?

Attached is how I would do it with a few simplifying tweaks. There is stuff on the

lp__ <- lp__ + whatever;

syntax in the manual and also an entry in the FAQ

https://groups.google.com/d/msg/stan-users/4gv3fNCqSNk/4dwmJj1YwbsJ

It is primarily intended for distributions that Stan doesn't have yet and manual Jacobian determinants and odd things like that. But sometimes it is just faster to use that syntax than to use the log density that Stan has, such as the Poisson which isn't currently vectorized and entails an unnecessary logarithm.

Since the Poisson density is

f(y | lambda) = lambda^y * exp(-lambda) / y!

its log is

log(f(y | lambda)) = y * log(lambda) - lambda - log(y!)

and the log(y!) is a constant so you don't need to account for it. All you need to do is sum that over the observations and add it to log priors to get the log-posterior (lp__).

Ben

ted.txt

Ted Westling

unread,

Nov 3, 2012, 3:17:51 PM11/3/12

to stan-...@googlegroups.com

Thanks Ben! That seems to be working, and the lp__ trick is good to know.

Ben

--

Reply all

Reply to author

Forward