Multilevel Ordered Logistic Regression

Antonio Martini

unread,

Jul 28, 2016, 5:24:52 AM7/28/16

to stan-...@googlegroups.com

I'm trying to create a multilevel ordinal logistic regression model in Stan and the following code would seem to work, in the sense that Stan seems to convergence to sensible answers:

   stanmodel <- '
data {
  int<lower=2> K;  // ordinal response with 4 values, 3 cutpoints
  int<lower=0> N;  // number of measurements

  int<lower=1,upper=K> y[N]; // response


  int<lower=0> Ntests;         // number of groups
  int<lower=1,upper=Ntests> tests[N];  // groups
}

parameters {

  // population cutpoints and associated variance.
  ordered[K-1]  CutpointsMean;
  real<lower=0> CutpointsSigma[K-1];


  ordered[K-1]  Cutpoints[Ntests];   // ordinal response cutpoints for groups
}

model {

  CutpointsSigma ~ exponential(1);
  CutpointsMean  ~ normal(2, 3);

  for (i in 1:Ntests) {
    Cutpoints[i][1] ~ normal(CutpointsMean[1] , CutpointsSigma[1]);
    Cutpoints[i][2] ~ normal(CutpointsMean[2] , CutpointsSigma[2]);
    Cutpoints[i][3] ~ normal(CutpointsMean[3] , CutpointsSigma[3]);
  }


  for (i in 1:N)
    y[i] ~ ordered_logistic(0, Cutpoints[tests[i]]);

}

I have removed the part relating to the covariates for clarity.

'CutpointsMean' and 'CutpointsSigma' define the population global ordinal response while Cutpoints[i][1:3] is the ordinal response for group i.

The idea is that for example the first 'cutpoint' of each group is generated from a normal distribution centered on the first 'cutpoint' of the overall population.

The second 'cutpoint' of each group is generated from a normal distribution centered on the second 'cutpoint' of the overall population and so on.

As Cutpoints[i] is an ordered vector of 3 elements, what happens when I write directly into Cutpoints[i][2] ?

Is the write operation rejected if the constraints are not satisfied or simply the entry is written in the vector and the results is sorted?

Is this the correct way of modelling a multilevel ordinal response in Stan?

Thanks,

Antonio

Bob Carpenter

unread,

Jul 31, 2016, 6:40:01 PM7/31/16

to stan-...@googlegroups.com

I'm not sure what the intention is here with a constant 0 predictor.
The cutpoints won't be identified:

> for (i in 1:N)
> y[i] ~ ordered_logistic(0, Cutpoints[tests[i]]);

I don't think this is what you want, either:

> parameters {
> ordered[K-1] Cutpoints[Ntests];
...

> for (i in 1:Ntests) {
> Cutpoints[i][1] ~ normal(CutpointsMean[1] , CutpointsSigma[1]);
> Cutpoints[i][2] ~ normal(CutpointsMean[2] , CutpointsSigma[2]);
> Cutpoints[i][3] ~ normal(CutpointsMean[3] , CutpointsSigma[3]);
> }

From a purely coding perspective, these can be vectorized
as:

Cutpoints[, 1] ~ normal(CutpointsMean[1], CutpointsSigma[1]);
...

But then I'm pretty sure if you have hierarchical priors here, you'll
need to deal with the ordering constraint so that everything normalizes
properly.

- Bob

> On Jul 28, 2016, at 5:24 AM, Antonio Martini <antonio...@hotmail.com> wrote:
>
>
> I'm trying to create a multilevel ordinal logistic regression model in Stan and the following code would seem to work, in these sense that Stan seems to convergence to sensible answers:
>
>
>
> stanmodel <- '
> data {
> int<lower=2> K; // ordinal variable with 3 elements

> int<lower=0> N; // number of measurements
>
> int<lower=1,upper=K> y[N]; // response
>
>
> int<lower=0> Ntests; // number of groups
> int<lower=1,upper=Ntests> tests[N]; // groups
> }
>
> parameters {
>

> // population overall ordinal response.

> ordered[K-1] CutpointsMean;
> real<lower=0> CutpointsSigma[K-1];
>
>
> ordered[K-1] Cutpoints[Ntests]; // ordinal response cutpoints for groups
> }
>
> model {
>
> CutpointsSigma ~ exponential(1);
> CutpointsMean ~ normal(2, 3);
>
> for (i in 1:Ntests) {
> Cutpoints[i][1] ~ normal(CutpointsMean[1] , CutpointsSigma[1]);
> Cutpoints[i][2] ~ normal(CutpointsMean[2] , CutpointsSigma[2]);
> Cutpoints[i][3] ~ normal(CutpointsMean[3] , CutpointsSigma[3]);
> }
>
>
> for (i in 1:N)
> y[i] ~ ordered_logistic(0, Cutpoints[tests[i]]);
>
> }
>
> I have removed the part relating to the covariates for clarity.
>
>
>
> 'CutpointsMean' and 'CutpointsSigma' define the population global ordinal response while Cutpoints[i][1:3] is the ordinal response for group i.
>
> The idea is that for example the first 'cutpoint' of each group is generated from a normal distribution centered on the first 'cutpoint' of the overall population.
>
> The second 'cutpoint' of each group is generated from a normal distribution centered on the second 'cutpoint' of the overall population and so on.
>
>
>
> As Cutpoints[i] is an ordered vector of 3 elements, what happens when I write directly into Cutpoints[i][2] ?
>

> Is the write operation rejected if the constraints are not satisfied or simply the entry is written in the vector and and the results is sorted?

>
> Is this the correct way of modelling a multilevel ordinal response in Stan?
>
>
>
> Thanks,
>
> Antonio
>
>

> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Antonio Martini

unread,

Aug 1, 2016, 7:18:56 AM8/1/16

to stan-...@googlegroups.com

Hi Bob,

Thanks for looking into this, As I wrote: " I have removed the part relating to the covariates for clarity. ".This in order to focus on the part that is more difficult to model without polluting the view. In the final model predictors are included.

regarding to:

  for (i in 1:Ntests) { 
     Cutpoints[i][1] ~ normal(CutpointsMean[1] , CutpointsSigma[1]); 
     Cutpoints[i][2] ~ normal(CutpointsMean[2] , CutpointsSigma[2]); 
     Cutpoints[i][3] ~ normal(CutpointsMean[3] , CutpointsSigma[3]); 
  }

Efficiency aside, is the above a correct way of modelling a hierarchical ordinal response in Stan? I'm new to Stan and I'm not quite sure on how the internals of the ordered vector work. Said that, the results I get seem sensible.

In order to enforce the ordering constraints more explicitly, I have also created another model

where the first cutpoint is modelled as a normal distribution while the other two cutpoints are modelled as two half normal distributions that represent a relative positive offset respect to the previous cutpoint.

The final cutpoints are computed by adding the first cutpoint and the other two cutpoints offsets as appropriate:

 parameters {
    real          Cutpoints1[Ntests]; // cutpoint 1 group i 
    real<lower=0> Cutpoints2[Ntests]; // offset cutpoint 2 group i 
    real<lower=0> Cutpoints3[Ntests]; // offset cutpoint 3 group i 


    real          CutpointsMean1;     // global cutpoint 1 
    real<lower=0> CutpointsDMean2;    // global cutpoint 2 offset
    real<lower=0> CutpointsDMean3;    // global cutpoint 3 offset
  
    real<lower=0> CutpointsSigma[3];  // global cutponts sigma values


    ...
 }


 model {


    CutpointsSigma   ~ exponential(1);
    CutpointsMean1   ~ normal(1, 1);   // these priors are problem specific
    CutpointsDMean2  ~ normal(1.5, 1.5);
    CutpointsDMean3  ~ normal(2, 2);

    
    for (i in 1:Ntests) {

      Cutpoints1[i] ~ normal(CutpointsMean1 , CutpointsSigma[1]); // cutpoint 1 group i
      Cutpoints2[i] ~ normal(CutpointsDMean2, CutpointsSigma[2]); // delta cutpoint 2                   
      Cutpoints3[i] ~ normal(CutpointsDMean3, CutpointsSigma[3]); // delta cutpoint 3
    }


  for (i in 1:N) {
    
      vector[3]  cuts;
      vector[3]  ocuts;
      
      cuts[1]    <-  Cutpoints1[tests[i]]; // from normal
      cuts[2]    <-  Cutpoints2[tests[i]]; // from normal > 0 (half normal)
      cuts[3]    <-  Cutpoints3[tests[i]]; // from normal > 0 (half normal)
      
      // recover cutpoints from offsets
      ocuts[1] <- cuts[1];                            // can take any value  
      ocuts[2] <- cuts[1] + cuts[2];                  // ocuts[2] > ocuts[1]
      ocuts[3] <- cuts[1] + cuts[2] + cuts[3];        // ocuts[3] > ocuts[2] > ocuts[1] 
      
      y[i] ~ ordered_logistic("predictors go here", ocuts);
  
  }

Both the above models produce very similar and sensible looking results, however as I'm new to Stan I may have missed something?

Thanks,

Antonio

Bob Carpenter

unread,

Aug 1, 2016, 9:02:51 AM8/1/16

to stan-...@googlegroups.com

The details are all in the manual chapter on constraint
transformations near the end. Basically, the first
element is unconstrained and the others are exp(base) + previous
element.

I don't think your more explicit model will do that.

Yes, that'll put a hierarchical prior on them, but I think
you need the truncations, so I'm not sure it's well formed.
You are following our recommendations here:

https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations

but I'm pretty sure those need to be revised to indicate
that we need the truncation points. I will make a fresh post
to try to get an answer. I have very little time over the next
week or two, so I don't have time to dig deeper on my own, but
I'm sure someone else will know.

- Bob

> On Aug 1, 2016, at 7:18 AM, Antonio Martini <antonio...@hotmail.com> wrote:
>
> Hi Bob,
>
> Thanks for looking into this, As I wrote: " I have removed the part relating to the covariates for clarity. ".This in order to focus on the part that is more difficult to model without polluting the view. In the final model predictors are included.
>
> regarding to:
>
> for (i in 1:Ntests) {
> Cutpoints[i][1] ~ normal(CutpointsMean[1] , CutpointsSigma[1]);
> Cutpoints[i][2] ~ normal(CutpointsMean[2] , CutpointsSigma[2]);
> Cutpoints[i][3] ~ normal(CutpointsMean[3] , CutpointsSigma[3]);
> }
>
>
>
> Efficiency aside, is the above a correct way of modelling a hierarchical ordinal response in Stan? I'm new to Stan and I'm not quite sure on how the internals of the ordered vector work. Said that, the results I get seem sensible.
>
> In order to enforce the ordering constraints more explicitly, I have also created another model
> where the first cutpoint is modelled as a normal distribution while the other two cutpoints are modelled as two half normal distributions that represent a relative positive offset respect to the previous cutpoint.
> The final cutpoints are computed by adding the first cutpoint and the other two cutpoints offsets as appropriate:
>
> parameters {
> real Cutpoints1[Ntests]; // cutpoint 1 group i
> real<lower=0> Cutpoints2[Ntests]; // offset cutpoint 2 group i
> real<lower=0> Cutpoints3[Ntests]; // offset cutpoint 3 group i
>
>
> real CutpointsMean1; // global cutpoint 1
> real<lower=0> CutpointsDMean2; // global cutpoint 2 offset
> real<lower=0> CutpointsDMean3; // global cutpoint 3 offset
>
> real<lower=0> CutpointsSigma[3]; // global cutponts sigma values
>
>
> ...
> }
>
>
> model {
>
>
> CutpointsSigma ~ exponential(1);

> CutpointsMean1 ~ normal(1, 1); // these prior are problem specific

> CutpointsDMean2 ~ normal(1.5, 1.5);
> CutpointsDMean3 ~ normal(2, 2);
>
> for (i in 1:Ntests) {
> Cutpoints1[i] ~ normal(CutpointsMean1 , CutpointsSigma[1]); // cutpoint 1 group i
> Cutpoints2[i] ~ normal(CutpointsDMean2, CutpointsSigma[2]); // delta cutpoint 2 Cutpoints3[i] ~ normal(CutpointsDMean3, CutpointsSigma[3]); // delta cutpoint 3
> }
>
>
> for (i in 1:N) {
>
> vector[3] cuts;
> vector[3] ocuts;
>
> cuts[1] <- Cutpoints1[tests[i]]; // from normal
> cuts[2] <- Cutpoints2[tests[i]]; // from normal > 0 (half normal)
> cuts[3] <- Cutpoints3[tests[i]]; // from normal > 0 (half normal)
>
> // recover cutpoints from offsets
> ocuts[1] <- cuts[1]; // can take any value
> ocuts[2] <- cuts[1] + cuts[2]; // ocuts[2] > ocuts[1]
> ocuts[3] <- cuts[1] + cuts[2] + cuts[3]; // ocuts[3] > ocuts[2] > ocuts[1]
>
> y[i] ~ ordered_logistic("predictors go here", ocuts);
>
> }
>
>
> Both the above models produce very similar and sensible looking results, however as I'm new to Stan I may have missed something?
>
> Thanks,
> Antonio
>
>
>

Antonio Martini

unread,

Aug 1, 2016, 1:20:22 PM8/1/16

to stan-...@googlegroups.com

Hi Bob,

Thanks a lot for the pointers, I have rewritten the model using the Ordered Inverse Transform as follows:

parameters {
  vector[K-1]   Cutpoints[Ntests]; // no longer ordered
  vector[K-1]   CutpointsMean;
  real<lower=0> CutpointsSigma[K-1];


  ...
}

model {

  CutpointsSigma ~ exponential(1);


  CutpointsMean  ~ normal(0, 10);

 // The vectorized version didn't work for some reason, I will double check later on .


  for (i in 1:Ntests) {
    Cutpoints[i][1] ~ normal(CutpointsMean[1] , CutpointsSigma[1]);
    Cutpoints[i][2] ~ normal(CutpointsMean[2] , CutpointsSigma[2]);
    Cutpoints[i][3] ~ normal(CutpointsMean[3] , CutpointsSigma[3]);
  }


  
  for (i in 1:N)
  {
    vector[3]  cuts;

    // map to ordered sequence by Ordered Inverse Transform 
    cuts    <- Cutpoints[tests[i]];
    cuts[2] <- cuts[1] + exp(cuts[2]);
    cuts[3] <- cuts[2] + exp(cuts[3]);

    y[i] ~ ordered_logistic("predictors go here", cuts);
  }
}

'

As you can see, the cutpoints are no longer declared as ordered variables.The distribution of the ordinal response is computed after the model fit by applying the ordered inverse transform to the posterior of the unordered cutpoints.

Would this work or there is still something missing?:)

Thanks,Antonio

Bob Carpenter

unread,

Aug 1, 2016, 1:47:45 PM8/1/16

to stan-...@googlegroups.com

That will work, but with the exp transform, the priors will
be on different scales. You don't need the Jacobians
because you're using them on the right side of ~.

> parameters {
> vector[K-1] Cutpoints[Ntests]; // no longer ordered
> vector[K-1] CutpointsMean;
> real<lower=0> CutpointsSigma[K-1];

You can use

vector<lower=0>[K - 1] CutpointsSigma;

if you wnat to use vector for all of the variables (won't matter
for efficiency --- it just makes everything consistent).

> for (i in 1:Ntests) {
> Cutpoints[i][1] ~ normal(CutpointsMean[1] , CutpointsSigma[1]);
> Cutpoints[i][2] ~ normal(CutpointsMean[2] , CutpointsSigma[2]);
> Cutpoints[i][3] ~ normal(CutpointsMean[3] , CutpointsSigma[3]);
> }

You can vectorize this as:

Cutpoints[ , 1] ~ normal(CutpointsMean[1], CutpointsSigma[1]);
...

and it will be much more efficient.

- Bob

> > On Aug 1, 2016, at 7:18 AM, Antonio Martini <antonio...@hotmail.com> wrote:
> >
> > Hi Bob,
> >
> > Thanks for looking into this, As I wrote: " I have removed the part relating to the covariates for clarity. ".This in order to focus on the part that is more difficult to model without polluting the view. In the final model predictors are included.
> >
> > regarding to:
> >

> > for (i in 1:Ntests) {
> > Cutpoints[i][1] ~ normal(CutpointsMean[1] , CutpointsSigma[1]);
> > Cutpoints[i][2] ~ normal(CutpointsMean[2] , CutpointsSigma[2]);
> > Cutpoints[i][3] ~ normal(CutpointsMean[3] , CutpointsSigma[3]);
> > }
> >
> >
> >

> > Efficiency aside, is the above a correct way of modelling a hierarchical ordinal response in Stan? I'm new to Stan and I'm not quite sure on how the internals of the ordered vector work. Said that, the results I get seem sensible.
> >
> > In order to enforce the ordering constraints more explicitly, I have also created another model
> > where the first cutpoint is modelled as a normal distribution while the other two cutpoints are modelled as two half normal distributions that represent a relative positive offset respect to the previous cutpoint.
> > The final cutpoints are computed by adding the first cutpoint and the other two cutpoints offsets as appropriate:
> >
> > parameters {
> > real Cutpoints1[Ntests]; // cutpoint 1 group i
> > real<lower=0> Cutpoints2[Ntests]; // offset cutpoint 2 group i
> > real<lower=0> Cutpoints3[Ntests]; // offset cutpoint 3 group i
> >
> >
> > real CutpointsMean1; // global cutpoint 1
> > real<lower=0> CutpointsDMean2; // global cutpoint 2 offset
> > real<lower=0> CutpointsDMean3; // global cutpoint 3 offset
> >
> > real<lower=0> CutpointsSigma[3]; // global cutponts sigma values
> >
> >

> > ...
> > }
> >
> >
> > model {
> >
> >
> > CutpointsSigma ~ exponential(1);

> > CutpointsMean1 ~ normal(1, 1); // these prior are problem specific
> > CutpointsDMean2 ~ normal(1.5, 1.5);
> > CutpointsDMean3 ~ normal(2, 2);
> >

> > for (i in 1:Ntests) {

> > Cutpoints1[i] ~ normal(CutpointsMean1 , CutpointsSigma[1]); // cutpoint 1 group i
> > Cutpoints2[i] ~ normal(CutpointsDMean2, CutpointsSigma[2]); // delta cutpoint 2 Cutpoints3[i] ~ normal(CutpointsDMean3, CutpointsSigma[3]); // delta cutpoint 3
> > }
> >
> >

> > for (i in 1:N) {
> >
> > vector[3] cuts;

Antonio Martini

unread,

Aug 1, 2016, 2:08:15 PM8/1/16

to stan-...@googlegroups.com

regarding the priors on different scales, I suppose that priors that are scale correct can be found by starting from the "ordered space" and using the Ordered Transform to find the corresponding mean and variance in the "unordered space".

Further my understanding is that applying the ordered inverse transform to 'CutpointsMean' to recover the overall population response is the correct thing to do here.

I will also look into the efficiency improvement you suggested.

Thanks, Antonio

Bob Carpenter

unread,

Aug 1, 2016, 2:27:20 PM8/1/16

to stan-...@googlegroups.com

It should be OK as is, you just want to check the
distribution of the posteriors you get. I just
wanted to warn you that they were on diffrent scales
so couldn't all be interpreted the same way.

- Bob

> On Aug 1, 2016, at 2:08 PM, Antonio Martini <antonio...@hotmail.com> wrote:
>
>
> regarding the priors on different scales, I suppose that priors that are scale correct can be found by starting from the "ordered space" and using the Ordered Transform to find the corresponding mean and variance in the "unordered space".
>

> I will also look into the efficiency improvement you suggested.
>
> Thanks, Antonio
>
> On Monday, 1 August 2016 18:47:45 UTC+1, Bob Carpenter wrote:
> That will work, but with the exp transform, the priors will
> be on different scales. You don't need the Jacobians
> because you're using them on the right side of ~.
>
> > parameters {
> > vector[K-1] Cutpoints[Ntests]; // no longer ordered
> > vector[K-1] CutpointsMean;
> > real<lower=0> CutpointsSigma[K-1];
>
>
> You can use
>
> vector<lower=0>[K - 1] CutpointsSigma;
>
> if you wnat to use vector for all of the variables (won't matter
> for efficiency --- it just makes everything consistent).
>
>
> > for (i in 1:Ntests) {
> > Cutpoints[i][1] ~ normal(CutpointsMean[1] , CutpointsSigma[1]);
> > Cutpoints[i][2] ~ normal(CutpointsMean[2] , CutpointsSigma[2]);
> > Cutpoints[i][3] ~ normal(CutpointsMean[3] , CutpointsSigma[3]);
> > }
>
> You can vectorize this as:
>
> Cutpoints[ , 1] ~ normal(CutpointsMean[1], CutpointsSigma[1]);
> ...
>
> and it will be much more efficient.
>
> - Bob
>
>
>

Antonio Martini

unread,

Aug 1, 2016, 2:31:46 PM8/1/16

to stan-...@googlegroups.com

That's fantastic, thanks a lot for your help!

Antonio

Reply all

Reply to author

Forward