how to constrain the sum of variables to stay positive

2,600 views
Skip to first unread message

Sebastian Weber

unread,
Feb 24, 2013, 6:29:10 AM2/24/13
to stan-...@googlegroups.com
Hi,

I am wondering if it is possible to constrain within Stan the sum of two variables to stay positive. I.e. consider the eight schools example pasted below. Let's assume that the theta variables have to be positive for some reason - then how do you express this in the best way in such a model? One way to do this is to model log(theta), but this is numerically not advisable if some of the parameters are almost 0; resulting in very small log-values. Can I just declare real theta<lower=0> and thats it? How does this affect the sampling performance as many draws of eta will get rejected.

Thank you very much for any advice.

Best,
Sebastian Weber



schools_code <- '
data {
int<lower=0> J; // number of schools
real y[J]; // estimated treatment effects
real<lower=0> sigma[J]; // s.e. of effect estimates
}
parameters {
real mu;
real<lower=0> tau;
real eta[J];
}
transformed parameters {
real theta[J];
for (j in 1:J)
theta[j] <- mu + tau * eta[j];
}
model {
eta ~ normal(0, 1);
y ~ normal(theta, sigma);
}
'

schools_dat <- list(J = 8,
y = c(28, 8, -3, 7, -1, 1, 18, 12),
sigma = c(15, 10, 16, 11, 9, 11, 10, 18))

fit <- stan(model_code = schools_code, data = schools_dat,
iter = 1000, chains = 4)
--
Sebastian Weber


Ben Goodrich

unread,
Feb 24, 2013, 8:54:09 AM2/24/13
to stan-...@googlegroups.com
On Sunday, February 24, 2013 6:29:10 AM UTC-5, Sebastian Weber wrote:
I am wondering if it is possible to constrain within Stan the sum of two variables to stay positive. I.e. consider the eight schools example pasted below. Let's assume that the theta variables have to be positive for some reason - then how do you express this in the best way in such a model? One way to do this is to model log(theta), but this is numerically not advisable if some of the parameters are almost 0; resulting in very small log-values. Can I just declare real theta<lower=0> and thats it? How does this affect the sampling performance as many draws of eta will get rejected.

I think if you declare a transformed parameter to be positive and that constraint fails to hold then the sampler will stop. So, I guess the best way to do it would be to declare the sum of two numbers to be a positive parameter and then declare one unconstrained parameter. Then the other unconstrained one can be obtained via subtraction. Like

parameters {
  real a
;
  real
<lower=0> c; // a + b
}
transformed parameters
{
  real b
;
  b
<- c - a;
}

Ben

Andrew Gelman

unread,
Feb 24, 2013, 4:31:06 PM2/24/13
to stan-...@googlegroups.com
You can also do it as a soft constraint by adding a prior on theta_1 + theta_2. For realistic modeling, I think a soft constraint would make more sense anyway.
A
> --
> You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

Bob Carpenter

unread,
Feb 24, 2013, 4:47:20 PM2/24/13
to stan-...@googlegroups.com
> I am wondering if it is possible to constrain within Stan
> the sum of two variables to stay positive.

To model variables x, y in (-inf,inf) s.t. (x + y) > 0,
you could use:

parameters {
real<lower=0> x_plus_y;
real x;
}
transformed parameters {
real y;
y <- x_plus_y - x;
}

You can use scaled simplexes if you want them to sum to
a specific value.

> One way to do this is to model log(theta),

Declaring a variable with type real<lower=0> does
a log transform under the hood

> but this is numerically not advisable
> if some of the parameters are almost 0; resulting in very
> small log-values.

It's really the other way around that's problematic.
Even if the parameter is 1E-800, the log of that isn't
very small. But when the log-scale parameter gets small,
the exp() transform underflows to 0. The underflow point
is around 1E-800.

If you need accuracy down to below 1E-100 or so on the
linear scale, you are going to need to rescale.

- Bob

Sebastian Weber

unread,
Feb 26, 2013, 3:14:26 PM2/26/13
to stan-...@googlegroups.com
Hmm,

the approach to declare x_plus_y sounds nice, but it misses the structure of how the sum is arranged. I mean x is on the group level while y is per subject (from which we got more than one per group). So how does this trick work over two hierarchies in a model? This is basically a consequence of the Matt trick where one samples effectivly on two levels, a group-overall mean level and the nested one, but at the end one places a constrain on the overall sum. 

The approach from Andrew sounds very interesting, but I am not sure on how to implement a soft prior on a sum of two variables in stan. It is straightforward to put priors on individual variables, but the sum of it? Any reference where I could grab that?

Thanks a lot!

Best,
Sebastian

Andrew Gelman

unread,
Feb 26, 2013, 4:21:03 PM2/26/13
to stan-...@googlegroups.com
I think you could literally write, 

x + y ~ normal (1, 1);

(for example), but I'm not sure.
A

Bob Carpenter

unread,
Feb 26, 2013, 8:55:10 PM2/26/13
to stan-...@googlegroups.com
Let me go back to the beginning.

>> >> I am wondering if it is possible to constrain within Stan the sum of two variables to stay positive.

Yes, but this isn't really what you're asking. The answer
to the above question is what's come earlier in this thread.

>> consider the eight schools example pasted below. Let's assume that the theta variables have to be positive for
>> some reason - then how do you express this in the best way in such a model?

Here's the relevant declarations and definition:

parameters {
real mu;
real<lower=0> tau;
real eta[J];
}
transformed parameters {
real theta[J];
for (j in 1:J)
theta[j] <- mu + tau * eta[j];
}
}

So the condition is:

mu + tau * eta[j] > 0

tau * eta[j] > - mu

eta[j] > - mu / tau

In this case, you can enforce the constraint by
declaring eta as follows.

real<lower = - mu / tau> eta[J];

If you add another predictor, this will get very hairy and
go beyond what you can declare in Stan. Stan requires every
variable in an array to have the same constraint.

You can always define the transform yourself as long as you
add the log Jacobian determinant to lp__.

>> One way to do this is to model
>> log(theta), but this is numerically not advisable if some of the parameters are almost 0; resulting in very small
>> log-values.

I think this is the traditional thing to do.
It's how something like logistic regression works
to model something in [0,1]. It's just another kind
of GLM.

>> Can I just declare real theta<lower=0> and thats it? How does this affect the sampling performance as
>> many draws of eta will get rejected.

You'd need to initialize to a vector of parameter
values in the support. Then your mileage would vary
depending on the data and distributions. In general, it's
not a good idea to have parameters that satisfy declared
constraints but have 0 probability (-inf log probability).

- Bob


Andrew Gelman

unread,
Feb 26, 2013, 8:59:19 PM2/26/13
to stan-...@googlegroups.com
I think there's a big difference between a parameter that is "physically" restricted to be positive, as compared to a parameter that you "believe" is positive. To think of the best model for this, it would depend on context.

Bob Carpenter

unread,
Feb 26, 2013, 9:07:34 PM2/26/13
to stan-...@googlegroups.com


On 2/26/13 8:59 PM, Andrew Gelman wrote:
> I think there's a big difference between a parameter that is "physically" restricted to be positive,
> as compared to a parameter that you "believe" is positive.
> To think of the best model for this, it would depend on context.

Absolutely.

I was just taking Sebastian Weber's question at face
value as a question about the expressiveness of Stan's
constraint language. He wasn't seriously suggesting
constraining 8 schools to positive effects!

We could start writing a book of Stan brain teasers :-)

- Bob

Sebastian Weber

unread,
Feb 27, 2013, 1:35:04 PM2/27/13
to stan-...@googlegroups.com
Well, of course, I just used the school example which is similar to my model in its structure, but in my application the respective parameter is for physical reasons always positive.

I did not know that stan can handle lower bounds which depend on other parameter which get sampled, cool! However, in my case it is even worse, since I have C different mu's and eta is structured such that etas belong to one of the C groups, so I would need:

real<lower= -mu[SC]/tau[SC]> eta[S];

by which I would refer to S different eta values out which SC maps subjects to cases. I guess this is now definetly outside the range of stan syntax...

Best,

Sebastian



- Bob

--
You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+unsubscribe@googlegroups.com.

Bob Carpenter

unread,
Feb 27, 2013, 2:03:24 PM2/27/13
to stan-...@googlegroups.com


On 2/27/13 1:35 PM, Sebastian Weber wrote:
> Well, of course, I just used the school example which is similar to my model in its structure, but in my application the
> respective parameter is for physical reasons always positive.
>
> I did not know that stan can handle lower bounds which depend on other parameter which get sampled, cool! However, in my
> case it is even worse, since I have C different mu's and eta is structured such that etas belong to one of the C groups,
> so I would need:
>
> real<lower= -mu[SC]/tau[SC]> eta[S];
>
> by which I would refer to S different eta values out which SC maps subjects to cases. I guess this is now definetly
> outside the range of stan syntax...

Yes, that's outside of Stan's syntax.

You could do it yourself by declaring unconstrained
parameters and transforming them --- you just need to
account for the change of variables on lp__.

I should put an example of that in the manual.
A future version of Stan will have the transformations
accessible as functions with automatic lp__ adjustments.

- Bob

Katherine Grace Jonas

unread,
Aug 28, 2014, 2:14:04 PM8/28/14
to stan-...@googlegroups.com
Hi!

I've been playing around with this technique.  What would you suggest doing with regard to priors?  For example, in this model:


On Sunday, February 24, 2013 3:47:20 PM UTC-6, Bob Carpenter wrote:

To model variables x, y in (-inf,inf) s.t. (x + y) > 0,
you could use:

   parameters {
     real<lower=0> x_plus_y;
     real x;
   }
   transformed parameters {
     real y;
     y <- x_plus_y - x;
   }

You might have an idea of what prior you would want on y and x, but not on x_plus_y.  Or the sum of x and y might not fit a particular form.  What would you suggest doing?  Could you place a prior on y and leave x_plus_y with the default prior? 

Bob Carpenter

unread,
Aug 28, 2014, 2:28:16 PM8/28/14
to stan-...@googlegroups.com
You can use other parameters in the bounds. So you could
also do this:

parameters {
real x;
real<lower=-x> y;
}

which will ensure x + y > 0. Then you can put whatever priors
you want on x and y.

I'm not sure whether you then want to do this:

y ~ normal(0,1);

or use the truncation to have the support in the model exactly
match the constraints:

y ~ normal(0,1) T[-x,];

They'll be different models because the variable x isn't constant.
The latter will include

log(1 - normal_cdf(-x,0,1))

= normal_ccdf_log(-x,0,1);

as a term in the log probability function.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Katherine Grace Jonas

unread,
Aug 29, 2014, 11:38:29 AM8/29/14
to stan-...@googlegroups.com
Hi Bob,


On Thu, Aug 28, 2014 at 1:28 PM, Bob Carpenter <ca...@alias-i.com> wrote:
You can use other parameters in the bounds.  So you could
also do this:

  parameters {
    real x;
    real<lower=-x> y;
  }

That is a really nice solution.  My data is similar to the original poster's, in that x and y are both vectors.  Can vector parameters be used as constraints?

Katherine

Katherine Grace Jonas

unread,
Aug 29, 2014, 2:46:00 PM8/29/14
to stan-...@googlegroups.com
Hmm...looks like that won't quite work.  RStan prints (translated into x and y, as in the example):

vector x;
vector<lower=

EXPECTED: <expression, precedence 7, binary +, -> BUT FOUND:

-x>[N] y;
}


Daniel Lee

unread,
Aug 29, 2014, 3:07:03 PM8/29/14
to stan-...@googlegroups.com
Yes, that doesn't work.

We currently can't bound things with vectors.


Bob Carpenter

unread,
Aug 29, 2014, 4:20:01 PM8/29/14
to stan-...@googlegroups.com
If you have varying constraings, like a vector x lower bounded by
a vector of y, then you have to perform the transform with Jacobian
yourself in the Stan model.

- Bob

Andrew Gelman

unread,
Aug 29, 2014, 5:38:04 PM8/29/14
to stan-...@googlegroups.com
Remember the folk theorem: if it’s hard to program, the model itself might not make sense. I return to the question of where the constraint is coming from.
A

Bob Carpenter

unread,
Aug 29, 2014, 6:08:00 PM8/29/14
to stan-...@googlegroups.com
It's only hard to program because we don't allow vectors
in the declaration. Adding them is on the to-do list.
That would allow:

vector[K] a;
vector<lower=-a>[K] b;

and it's no longer hard to program :-)

That doesn't answer Andrew's question, though, of where the constraint
comes from.

- Bob

Katherine Grace Jonas

unread,
Aug 30, 2014, 3:02:34 PM8/30/14
to stan-...@googlegroups.com
The constraint is essentially an ordering constraint.  The model is an graded response IRT model, in which category thresholds are decomposed into a item-specific and person-specific component.  The item thresholds can be declared as an ordered vector.  The person components are not necessarily ordered, but the sum of the person and item components is an ordered vector.  So each person component depends on the distance between item thresholds and the adjacent person component.  For example, if the minimum distance between two item thresholds is z, and the person components of those two thresholds are x and y, respectively, then x < z + y. 

I've tried declaring a (person x item x category) array of ordered vectors, but it becomes prohibitively large.  Declaring the array as a local variable would work, but it looks like local variables need to be basic variable types.  It's tricky, but I think the constraint is coming from a reasonable assumption: that individuals may differ in the extent to which they are willing to endorse "good" versus "very good," (or some similar categories), but that those tendencies do not change the rank ordering of categories.


You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/JI6yrSZz_Xg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Jonathan Gilligan

unread,
Aug 30, 2014, 4:55:53 PM8/30/14
to stan-...@googlegroups.com
I have a related problem:

I have a parameter, gamma, that must be greater than zero. I can fit the model to the data very nicely with something like:

theta <- 0.0;

for (i in 1:N) {
  theta
<- theta + ( ( gain[i] ^alpha - 1.0) / alpha - (lambda * loss[i]^beta - 1.0 ) / beta ) * p^gamma / (p^gamma + (1.0-p)^gamma)^( 1.0 / gamma);
}

data
~ bernoulli_logit( scale_factor * theta )

where gain, loss, and p are data and alpha, beta, gamma, and scale_factor are parameters. This converges nicely and I a sensible posterior for gamma. Then I try to explain gamma in terms of demographic covariates:

gamma <- b_sex * sex + b_age * age + b_household_size * household_size

where b_xxx are parameters and sex, age, etc. are data. I expect that some of the b_ parameters will be positive and some will be negative, so I don't want to constrain them to all be positive; but I would need to constrain the dot product of b with the demographic variables to be positive.

I didn't see a way to constrain the sum of the b_ coefficients times the data to be positive and everything I did led to horrible performance and failure to converge.

I can get this to converge if I put very restrictive priors on the individual coefficients, but if I do that, then my results would just be fine-tuning my priors and I don't have enough actual prior knowledge of these parameters to justify that.

The folk theorem has led me to try a using different model for probability weighting, whose parameters don't need to be constrained this way, but that introduces its own complications because the original model is the Kahneman-Tversky model of probability weighting for cumulative prospect theory, so using a different model for probability weighting would mean my analysis would break with most of the literature from the last two decades simply for computational reasons, and that makes me uncomfortable.

Bob Carpenter

unread,
Sep 2, 2014, 2:59:03 AM9/2/14
to stan-...@googlegroups.com

On Aug 30, 2014, at 4:55 PM, Jonathan Gilligan <jonathan...@gmail.com> wrote:

> I have a related problem:
>
> I have a parameter, gamma, that must be greater than zero. I can fit the model to the data very nicely with something like:
>
> theta <- 0.0;
>
> for (i in 1:N) {
> theta <- theta + ( ( gain[i] ^alpha - 1.0) / alpha - (lambda * loss[i]^beta - 1.0 ) / beta ) * p^gamma / (p^gamma + (1.0-p)^gamma)^( 1.0 / gamma);
> }
>
> data ~ bernoulli_logit( scale_factor * theta )
>
> where gain, loss, and p are data and alpha, beta, gamma, and scale_factor are parameters. This converges nicely and I a sensible posterior for gamma. Then I try to explain gamma in terms of demographic covariates:
>
> gamma <- b_sex * sex + b_age * age + b_household_size * household_size
>
> where b_xxx are parameters and sex, age, etc. are data. I expect that some of the b_ parameters will be positive and some will be negative, so I don't want to constrain them to all be positive; but I would need to constrain the dot product of b with the demographic variables to be positive.


> I didn't see a way to constrain the sum of the b_ coefficients times the data to be positive and everything I did led to horrible performance and failure to converge.

So you want to select coefficients that guarantee gamma is positive?
If household_size > 0 in all cases, you can do it with a minimum
on b_household_size. You can write a function to compute the minimum
and then use it in the constraint.

> I can get this to converge if I put very restrictive priors on the individual coefficients, but if I do that, then my results would just be fine-tuning my priors and I don't have enough actual prior knowledge of these parameters to justify that.
>
> The folk theorem has led me to try a using different model for probability weighting, whose parameters don't need to be constrained this way, but that introduces its own complications because the original model is the Kahneman-Tversky model of probability weighting for cumulative prospect theory, so using a different model for probability weighting would mean my analysis would break with most of the literature from the last two decades simply for computational reasons, and that makes me uncomfortable.

How do people usually deal with this kind of constraint?

I'm used to seeing link functions in GLMs that enforce the
right signs on everythnig (e.g., inv_logit for logistic regression
and exp for poisson).

- Bob

Bob Carpenter

unread,
Sep 2, 2014, 1:40:05 PM9/2/14
to stan-...@googlegroups.com
The usual thing to do here is have
On Aug 30, 2014, at 3:01 PM, Katherine Grace Jonas <katherine....@gmail.com> wrote:

> The constraint is essentially an ordering constraint. The model is an graded response IRT model, in which category thresholds are decomposed into a item-specific and person-specific component. The item thresholds can be declared as an ordered vector. The person components are not necessarily ordered, but the sum of the person and item components is an ordered vector. So each person component depends on the distance between item thresholds and the adjacent person component. For example, if the minimum distance between two item thresholds is z, and the person components of those two thresholds are x and y, respectively, then x < z + y.
>
> I've tried declaring a (person x item x category) array of ordered vectors, but it becomes prohibitively large. Declaring the array as a local variable would work, but it looks like local variables need to be basic variable types.

In transformed data and transformed parameters, the constraints are just consistency tests.
For parameters, they implicitly perform a transform to unconstrained representations
and include the log Jacobian in the log probability function.

> It's tricky, but I think the constraint is coming from a reasonable assumption: that individuals may differ in the extent to which they are willing to endorse "good" versus "very good," (or some similar categories), but that those tendencies do not change the rank ordering of categories.

Does each person use the same regression coefficients to produce
a latent "quality" score and then just have different cutpoints?

Otherwise, I don't see how you could maintain the same ordering across
items.

I don't even see how that'd make sense in a context like grading stage of
tumor --- two different doctors might order two tumor images differently
in terms of cancer stage.

And if you look at something like the Netflix challenge data
with customer ratings, there was wildly different orderings among
customers of movies.

Andrew Gelman and Jennifer Hill discuss models with cutpoints varying
by individual in their regression book. And I just happen to know Andrew's
been thinking more about priors --- we'll probably be discussing next week
in the Stan meeting and then coming out with some recommendations in
the next manual for hierarchical priors on the cutpoints.

- Bob

Katherine Grace Jonas

unread,
Sep 2, 2014, 2:28:42 PM9/2/14
to stan-...@googlegroups.com
On Tue, Sep 2, 2014 at 12:38 PM, Bob Carpenter <ca...@alias-i.com> wrote:
The usual thing to do here is have
On Aug 30, 2014, at 3:01 PM, Katherine Grace Jonas <katherine....@gmail.com> wrote:

> The constraint is essentially an ordering constraint.  The model is an graded response IRT model, in which category thresholds are decomposed into a item-specific and person-specific component.  The item thresholds can be declared as an ordered vector.  The person components are not necessarily ordered, but the sum of the person and item components is an ordered vector.  So each person component depends on the distance between item thresholds and the adjacent person component.  For example, if the minimum distance between two item thresholds is z, and the person components of those two thresholds are x and y, respectively, then x < z + y.
>
> I've tried declaring a (person x item x category) array of ordered vectors, but it becomes prohibitively large.  Declaring the array as a local variable would work, but it looks like local variables need to be basic variable types.

In transformed data and transformed parameters, the constraints are just consistency tests.
For parameters, they implicitly perform a transform to unconstrained representations
and include the log Jacobian in the log probability function.

Thank you!  That helps me to understand what's going on a little better.
 
> It's tricky, but I think the constraint is coming from a reasonable assumption: that individuals may differ in the extent to which they are willing to endorse "good" versus "very good," (or some similar categories), but that those tendencies do not change the rank ordering of categories.

Does each person use the same regression coefficients to produce
a latent "quality" score and then just have different cutpoints?

Yes, so the coefficient is constant in this model, but the cutpoints vary.
 
Otherwise, I don't see how you could maintain the same ordering across
items.

I don't even see how that'd make sense in a context like grading stage of
tumor --- two different doctors might order two tumor images differently
in terms of cancer stage.

And if you look at something like the Netflix challenge data
with customer ratings, there was wildly different orderings among
customers of movies.

Sorry, I was unclear.  I meant that the categories themselves maintain an order (i.e. all doctor's agree that the stage II label should indicate a cancer is more progressed than a stage I cancer, and all Netflix raters agree that a three star rating should be used to indicate that they like a movie better than a two star rating).  So the raters agree on the appropriate ordering of the categories, but not necessarily on the ordering of the thing being rated.

Bob Carpenter

unread,
Sep 2, 2014, 3:01:06 PM9/2/14
to stan-...@googlegroups.com
That should be no problem, then. The easiest way to code
this in Stan is using the ordered logistic distribution.
And you want to make the cutpoints an ordered vector.

There's no need to keep the latent value (eta in the formula
in the manual) positive for this.

- Bob

Katherine Grace Jonas

unread,
Sep 3, 2014, 10:54:28 AM9/3/14
to stan-...@googlegroups.com
The tricky part is that the cutpoints are the sum of two components: the item cutpoints, which are also an ordered vector, and a person component (i.e. the extent to which one doctor is more likely than the average doctor to rate a cancer as stage II versus stage I, or the extent to which a discriminating Netflix viewer loathes to hand out 5 star reviews), which is not necessarily ordered.  So the cutpoints in the ordered logistic distribution are the sum of an ordered vector and a vector of normally distributed person components.

Bob Carpenter

unread,
Sep 3, 2014, 12:53:07 PM9/3/14
to stan-...@googlegroups.com
Is there a reason you can't make a hierarchical model where
there's a population distribution of cutpoints then an
individual varies from that? Then each cutpoint gets a prior
in the ordered vector. As in:

parameters {
ordered[K] mu;
ordered[K] c[J];
real<lower=0> sigma[K];
...
}
model {
mu ~ some prior;
for (j in 1:J)
c[j,k] ~ normal(mu[k], sigma[k]);
...
}

This essentially makes c[j,k] the sum of mu[k] and an epsilon[j,k]. In
regression terms:

c[j,k] = mu[k] + epsilon[j,k];
epsilon[j,k] ~ normal(0,sigma[k]);

I think this does the same thing, puts it in the context of a hierarchical
model, and maintains all the appropriate ordering constraints.

I'd be inclined to start with a single sigma, then have varying sigma,
then try to move to a hierarchical model for the sigma, as well.

I'm a little unclear on what a good prior for mu is. It should
be strongly identified for location and scale, though, whatever it is.

- Bob

Katherine Grace Jonas

unread,
Sep 5, 2014, 12:16:56 PM9/5/14
to stan-...@googlegroups.com
Bob,

Thanks for taking the time to read and advise me on this model.

I ran something similar to what you suggested, but incorporating multiple items (where i indexes items, n indexes respondents, and k indexes categories):

parameters {
ordered
[K-1] c_i[I];
ordered
[K-1] c_ni[N,I];
...other parameters...
}
model
{
...prior on item cutpoints...

for (n in 1:N) {
 
for (i in 1:I) {
   
for (k in 1:(K-1)) {
    c_ni
[n,i,k] ~ normal(c_i[i,k],c_n_sigma[k]);
   
}
 
}
}
c_n_sigma
~ cauchy(0,2.5);

As soon as I started running this, though, I realized the model would draw a new person parameter for each category for each item (N x I x K parameters).  How can I write this to sample N x K parameters, while maintaining the ordering constraint on the N x I x K array?  I tried:

for (n in 1:N) {
  for (k in 1:(K-1)) {
  segment(c_ni[n],1,I-1)[k] ~ normal(0,c_n_sigma[k]) + col(c_i,k);
  }
}

But arithmetic on the right side of the sampling statement throws an error.  

Bob Carpenter

unread,
Sep 5, 2014, 8:01:09 PM9/5/14
to stan-...@googlegroups.com
I items, N respondents, K categories (so K - 1 cutpoints).

parameters {
ordered[K-1] mu_c;
ordered[K-1] c[N];
}
model {
c_mu ~ normal(0,5);
sigma_c ~ cauchy(0,2.5);
for (n in 1:N)
c[n] ~ normal(mu_c, sigma_c);
}

mu_c: population cutpoints

c[n] : cut points for respondent n

Then presumably the score for the item would be based
on a regression and the observed response would use
the cutpoints c[n]. The regression coefficients can also
be modeled hierarchically, as in:

data {
row_vector[M] x[I]; // predictors for item i
int<lower=1,upper=K> y[N,I]; // respondent n category for item i


parameters {
...
vector[M] mu_beta; // prior mean coeffs
vector[M] beta[N]; // coeffs for respondent n

model {
...
mu_beta ~ normal(0,5);
sigma_beta ~ cauchy(0,2.5);
beta ~ normal(mu_beta, sigma_beta);

for (i in 1:I)
for (n in 1:N)
y[i,n] ~ ordered_logistic(x[i] * beta[n], c[n]);


I'm not sure that's well enough identified, though.

- Bob


On Sep 5, 2014, at 12:16 PM, Katherine Grace Jonas <katherine....@gmail.com> wrote:

> Bob,
>
> Thanks for taking the time to read and advise me on this model.
>
> I ran something similar to what you suggested, but incorporating multiple items (where i indexes items, n indexes respondents, and k indexes categories):
>
> parameters {
> ordered[K-1] c_i[I];
> ordered[K-1] c_ni[N,I];
> ...other parameters...
> }
> model {
> ...prior on item cutpoints...
>
> for (n in 1:N) {
> for (i in 1:I) {
> for (k in 1:(K-1)) {
> c_ni[n,i,k] ~ normal(c_i[i,k],c_n_sigma[k]);
> }
> }
> }
> c_n_sigma ~ cauchy(0,2.5);
>




> As soon as I started running this, though, I realized the model would draw a new person parameter for each category for each item (N x I x K parameters). How can I write this to sample N x K parameters, while maintaining the ordering constraint on the N x I x K array? I tried:
>
> for (n in 1:N) {
> for (k in 1:(K-1)) {
> segment(c_ni[n],1,I-1)[k] ~ normal(0,c_n_sigma[k]) + col(c_i,k);
> }
> }
>

Why not:

c_

Katherine Grace Jonas

unread,
Sep 6, 2014, 5:02:56 PM9/6/14
to stan-...@googlegroups.com
Hi!

Thanks for the idea!  I hope I'm understanding your code and indexing correctly.  I'm running into a problem with the dimensions of mu_c and c.

  for (n in 1:N)
    c[n] ~ normal(mu_c, sigma_c);

The cutpoints differ between items, so mu_c is: ordered[K-1] mu_c[I].  That makes the right side of the sampling statement I x K-1, and the left hand side is N X K-1, causing an error.  I'd like to find a way to sample N K-1 vectors from normal(0, sigma_c), subject to the constraint that c[n] + mu_c[i] is an ordered vector.  That's how I came to try constraining the sum of two variables to be less than a real, as (c[n,k] - c[n,k+1]) must always be less than the minimum distance between item cutpoints to ensure that c[n] + mu_c is always properly ordered.  Any ideas?

Bob Carpenter

unread,
Sep 6, 2014, 5:24:08 PM9/6/14
to stan-...@googlegroups.com
If two vectors are both ordered and you add them, the result's
still ordered.

- Bob

Katherine Grace Jonas

unread,
Sep 6, 2014, 7:19:41 PM9/6/14
to
Right.  The trick is the item cutpoints are ordered, and (item cutpoints + person components) are ordered, but the person cutpoints themselves are not necessarily ordered (i.e. c[n,1] could be positive while c[n,3] is negative, resulting in a greater than average tendency to rate something as either 1 or 4, as opposed to 2 or 3).  So it has to be some constraint that results in ordered + vector = ordered.


On Sat, Sep 6, 2014 at 4:23 PM, Bob Carpenter <ca...@alias-i.com> wrote:
If two vectors are both ordered and you add them, the result's
still ordered.

- Bob
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/JI6yrSZz_Xg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.

Jan

unread,
Aug 7, 2016, 9:59:07 PM8/7/16
to Stan users mailing list


On Saturday, August 30, 2014 at 8:08:00 AM UTC+10, Bob Carpenter wrote:
It's only hard to program because we don't allow vectors
in the declaration.  Adding them is on the to-do list.
That would allow:

  vector[K] a;
  vector<lower=-a>[K] b;

I agree Bob, it is an important issue every time we try (for example) to do a regression with bound priors: for example 

y_vector - a_parameter * x_parameter  ~ lognormal(mu, sigma)

we want to do

parameters{
 real<lower=0> a;
 vector<lower=0, upper = y_vector / a> [N] x_parameter;   
}
 
Was this implemented eventually?

Thanks

Bob Carpenter

unread,
Aug 8, 2016, 10:30:40 PM8/8/16
to stan-...@googlegroups.com
No, it'll be a while before that's implemented.

We'd never code a regression like this:

> y_vector - a_parameter * x_parameter ~ lognormal(mu, sigma)

It's saying, modulo Jacobian, that we have

(y_vector - a_parameter * x_parameter - log(mu)) ~ normal(0, sigma);

We'd instead want to roll a_parameter * x_parameter in with log(mu).

If we do regressions that are constrained to be positive,
we typically use a link function like log, so we deal
with an exp() inverse transform to constrain to positive. That
does change the scales.

- Bob
> To post to this group, send email to stan-...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages