Ordinal Negative Binomial Hurdle

328 views
Skip to first unread message

Mehdi Hosseini

unread,
Sep 16, 2016, 12:36:56 PM9/16/16
to stan-...@googlegroups.com

Hi there,

I am working on an ordinal regression problem.

The idea is to model ordinal responses of a survey’s question by using a linear or non-linear combination of a set of predictors, e.g. respondent’s age and gender, under the assumption that the underlying distribution of the response variable is Hurdle Negative Binomial. The survey questions are how often you do something, and the ordinal answers are e.g. every day, once week, once a month, or never.

I have treated the problem as an ordinal regression, and so mapped the survey answers to a set of cut-points. 

I have a maximum likelihood solution for this problem. But now I want to use stan to solve the same problem in a Bayesian fashion. The motivation is to put a prior on the cut-points, because respondents don’t literally mean ‘6 months’ but the model treats them like they do.

I can get the hurdle negative binomial to work (thanks BRMS) but I cannot figure out how to do the ordinal negative binomial bit. Below is the code of the stan model.

Thanks in advance for any help.

 Likelihood for Ordinal Negative Binomial (doesn’t always work, Log probability evaluates to log(0), i.e. negative infinity. )

 data {
 
int<lower=1> N;                // total number of observations
 
int<lower=2> N_cat;            // number of categories
 
int<lower=1,upper=N_cat> y[N]; // response variable
 
int<lower=1> K;                // number of population-level effects
  vector
[K] x_means;             // column means of X before centering
  row_vector
[K] x[N];            // pop level variables by row (for looping)
 
int<lower=0> bound_lo[N_cat];  // lower bound of cut off points
}

transformed data
{
}

parameters
{
  vector
[K] beta;  // population-level effects
  real
<lower=0> phi; // neg binomial shape parameter
}

transformed parameters
{

}

model
{
   vector
[N_cat] theta;
 
// prior specifications
 
// likelihood contribution
 
for (n in 1:N) {
    real eta
;
    eta
= exp(x[n] * beta);
    theta
[1] = neg_binomial_2_cdf(0 , eta, phi);
   
for (i in 2:N_cat-1) {
          theta
[i] = neg_binomial_2_cdf(bound_lo[i] , eta, phi) - neg_binomial_2_cdf(bound_lo[i-1] , eta, phi);
   
}
    theta
[N_cat] = 1-neg_binomial_2_cdf(bound_lo[N_cat-1] , eta, phi);
 /*
if (is_inf(target())){
print("beta params: ",beta);
print("phi: ",phi);
print("theta is: ", theta);
print("log-posterior = ", target()); 
}*/
    y[n] ~ categorical(theta);
   
}
 
}


Mehdi Hosseini

unread,
Sep 20, 2016, 11:12:16 AM9/20/16
to stan-...@googlegroups.com
I also included the error messages as below: 

[296] "  Log probability evaluates to log(0), i.e. negative infinity."                                       
[297] "  Stan can't start sampling from this initial value."                                                 
[298] "Rejecting initial value:"                                                                             
[299] "  Log probability evaluates to log(0), i.e. negative infinity."                                       
[300] "  Stan can't start sampling from this initial value."                                                 
[301] "Initialization between (-2, 2) failed after 100 attempts. "                                           
[302] " Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model."
[303] "Error in eval(substitute(expr), envir, enclos) : "                                                    
[1] "error occurred during calling the sampler; sampling not done"


On Friday, September 16, 2016 at 5:36:56 PM UTC+1, Mehdi Hosseini wrote:

Hi there,

I am working on an ordinal regression problem.

The idea is to model ordinal responses of a survey’s question by using a linear or non-linear combination of a set of predictors, e.g. respondent’s age and gender, under the assumption that the underlying distribution of the response variable is Hurdle Negative Binomial. The survey questions are how often you do something, and the ordinal answers are e.g. every day, once week, once a month, or never.

I have treated the problem as an ordinal regression, and so mapped the survey answers to a set of cut-points. Each cut-point is the number of days a person goes for a run, plays tennis etc a year, e.g. the cut-point of “once a month” is 12, and the cut-point of every day is 365 etc.

I have a maximum likelihood solution for this problem. But now I want to use stan to solve the same problem in a Bayesian fashion. The motivation is to put a prior on the cut-points, because respondents don’t literally mean ‘6 months’ but the model treats them like they do.

I can get the hurdle negative binomial to work (thanks BRMS) but I cannot figure out how to do the ordinal negative binomial bit. Also my stan model is too slow and not sure how can I make it faster. Below is the code of the stan model.

Thanks in advance for any help.


(1) Negative binomial hurdle …. works but ignores cut points

functions

  /* hurdle negative binomial log-PDF of a single response 

   * logit parameterization for the hurdle part

   * Args: 

   *   y: the response value 

   *   eta: linear predictor for negative binomial part 

   *   eta_hu: linear predictor of hurdle part 

   *   shape: shape parameter of negative binomial distribution 

   * Returns:  

   *   a scalar to be added to the log posterior 

   */ 

   real hurdle_neg_binomial_logit_lpmf(int y, real eta, real eta_hu, real shape) { 

     if (y == 0) { 

       return bernoulli_logit_lpmf(1 | eta_hu); 

     } else { 

       return bernoulli_logit_lpmf(0 | eta_hu) +  

              neg_binomial_2_log_lpmf(y | eta, shape) - 

              log(1 - (shape / (exp(eta) + shape))^shape); 

     } 

   }  

data

  int<lower=1> N;                // total number of observations 

  // int<lower=2> N_cat;            // number of categories 

  int y[N]; // response variable 

  int<lower=1> K;                // number of population-level effects 

  matrix[N, K] X;             // centered population-level design matrix 

  vector[K] X_means;             // column means of X before centering 

  //row_vector[K] x[N];            // pop level variables by row (for looping)

  //int<lower=0> bound_lo[N_cat];  // lower bound of cut off points

transformed data

parameters { 

  vector[K] b;               // population-level effects 

  real temp_Intercept;       // temporary Intercept 

  real<lower=0> shape;       // shape parameter 

  real<lower=0,upper=1> hu;  // hurdle probability 

transformed parameters

model

  vector[N] eta; 

  eta = X * b + temp_Intercept; 

// prior specifications 

  shape ~ gamma(0.01, 0.01); 

  hu ~ beta(1, 1); 

  to_vector(b) ~ normal(0,2);

// likelihood contribution 

    for (n in 1:N) { 

      y[n] ~ hurdle_neg_binomial(eta[n], hu, shape); 

    }    

generated quantities

  real b_Intercept;  // population-level intercept 

  b_Intercept = temp_Intercept - dot_product(X_means, b); 


1.    Likelihood for Ordinal Negative Binomial (doesn’t work, get a sum of theta = NAN objection)

data

  int<lower=1> N;                // total number of observations 

  int<lower=2> N_cat;            // number of categories 

  int<lower=1,upper=N_cat> y[N]; // response variable 

  int<lower=1> K;                // number of population-level effects 

  // matrix[N, K] X;             // centered population-level design matrix 

  vector[K] x_means;             // column means of X before centering 

  row_vector[K] x[N];            // pop level variables by row (for looping)

  int<lower=0> bound_lo[N_cat];  // lower bound of cut off points

transformed data

parameters

  vector[K] beta;  // population-level effects 

  //ordered[N_cat-1] c;  // temporary thresholds 

  real<lower=0> phi; // neg binomial shape parameter

transformed parameters

  // vector[N] eta;  // linear predictor 

  // compute linear predictor 

  // eta = X * b; 

model

   vector[N_cat] theta;

  // prior specifications 

  // to_vector(c) ~ normal(0,5);

  // likelihood contribution 

  for (n in 1:N) {

    real eta;

    eta = exp(x[n] * beta);

    theta[1] = 1 - neg_binomial_2_lcdf(0 | eta, phi);

      for (i in 2:N_cat-1) {

          theta[i] = log_diff_exp(neg_binomial_2_lcdf(bound_lo[i] | eta, phi),

                                 neg_binomial_2_lcdf(bound_lo[i-1] | eta, phi));

                        }

      theta[N_cat] = neg_binomial_2_lcdf(bound_lo[N_cat-1] | eta, phi);

      y[n] ~ categorical(theta);

    }

  }

  

Bob Carpenter

unread,
Sep 20, 2016, 12:53:41 PM9/20/16
to stan-...@googlegroups.com
I don't understand the pmf you're trying to define or how ordinal
models relate to the negative binomial. Specifically what the bit
after what looks like the negative binomial hurdle.

It sounds like what you might
have is a measurement error model, where someone says six months,
but the true value is somewhere between 3 months and 9 months with
some model of variation in the population.

If you get errors during initialization, it's from evaluating
the transformed parameters or model blocks. Any value of
parameters meeting declared constraints should have finite log density.
It's almost always a problem with illegal data, wrong sizes, or parameters
which are constrained but not declared with the proper constraints.

We don't recommend those gamma(0.01, 0.01) priors. Can you use
something more informative?

Otherwise, the efficiency of these models often comes down to
parameterization.

- Bob

> On Sep 16, 2016, at 12:36 PM, Mehdi Hosseini <mehd...@gmail.com> wrote:
>
> Hi there,
>
> I am working on an ordinal regression problem.
>
> The idea is to model ordinal responses of a survey’s question by using a linear or non-linear combination of a set of predictors, e.g. respondent’s age and gender, under the assumption that the underlying distribution of the response variable is Hurdle Negative Binomial. The survey questions are how often you do something, and the ordinal answers are e.g. every day, once week, once a month, or never.
>
>
>
> I have treated the problem as an ordinal regression, and so mapped the survey answers to a set of cut-points. Each cut-point is the number of days a person goes for a run, plays tennis etc a year, e.g. the cut-point of “once a month” is 12, and the cut-point of every day is 365 etc.
>
>
>
> I have a maximum likelihood solution for this problem. But now I want to use stan to solve the same problem in a Bayesian fashion. The motivation is to put a prior on the cutpoints, because respondents don’t literally mean ‘6 months’ but the model treats them like they do.
>
>
>
> I can get the hurdle negative binomial to work (thanks BRMS) but I cannot figure out how to do the ordinal negative binomial bit. Also my stan model is too slow and not sure how can I make it faster. Below is the code of the stan model.
>
> Thanks in advance for any help.
>
>
>
> (1) Negative binomial hurdle …. works but ignores cut points
>
>
>
> functions {
>
>
>
> /* hurdle negative binomial log-PDF of a single response
>
> * Args:
>
> * y: the response value
>
> * eta: linear predictor for negative binomial part
>
> * hu: hurdle probability
>
> * shape: shape parameter of negative binomial distribution
>
> * Returns:
>
> * a scalar to be added to the log posterior
>
> */
>
> real hurdle_neg_binomial_lpmf(int y, real eta,
>
> real hu, real shape) {
>
> if (y == 0) {
>
> return bernoulli_lpmf(1 | hu);
>
> } else {
>
> return bernoulli_lpmf(0 | hu) +
>
> neg_binomial_2_log_lpmf(y | eta, shape) -
>
> log(1 - (shape / (exp(eta) + shape))^shape);
>
> }
>
> }
>
> /* hurdle negative binomial log-PDF of a single response
>
> * logit parameterization for the hurdle part
>
> * Args:
>
> * y: the response value
>
> * eta: linear predictor for negative binomial part
>
> * eta_hu: linear predictor of hurdle part
>
> * shape: shape parameter of negative binomial distribution
>
> * Returns:
>
> * a scalar to be added to the log posterior
>
> */
>
> real hurdle_neg_binomial_logit_lpmf(int y, real eta,
>
> real eta_hu, real shape) {
>
> if (y == 0) {
>
> return bernoulli_logit_lpmf(1 | eta_hu);
>
> } else {
>
> return bernoulli_logit_lpmf(0 | eta_hu) +
>
> neg_binomial_2_log_lpmf(y | eta, shape) -
>
> log(1 - (shape / (exp(eta) + shape))^shape);
>
> }
>
> }
>
> }
>
>
>
>
>
> data {
>
> int<lower=1> N; // total number of observations
>
> // int<lower=2> N_cat; // number of categories
>
> int y[N]; // response variable
>
> int<lower=1> K; // number of population-level effects
>
> matrix[N, K] X; // centered population-level design matrix
>
> vector[K] X_means; // column means of X before centering
>
> //row_vector[K] x[N]; // pop level variables by row (for looping)
>
> //int<lower=0> bound_lo[N_cat]; // lower bound of cut off points
>
> }
>
>
>
>
>
> transformed data {
>
> }
>
>
>
> parameters {
>
> vector[K] b; // population-level effects
>
> real temp_Intercept; // temporary Intercept
>
> real<lower=0> shape; // shape parameter
>
> real<lower=0,upper=1> hu; // hurdle probability
>
> }
>
>
>
> transformed parameters {
>
> }
>
>
>
> model {
>
> vector[N] eta;
>
> eta = X * b + temp_Intercept;
>
>
> // prior specifications
>
> shape ~ gamma(0.01, 0.01);
>
> hu ~ beta(1, 1);
>
> to_vector(b) ~ normal(0,2);
>
>
> // likelihood contribution
>
> for (n in 1:N) {
>
> y[n] ~ hurdle_neg_binomial(eta[n], hu, shape);
>
> }
>
>
> }
>
>
>
> generated quantities {
>
> real b_Intercept; // population-level intercept
>
> b_Intercept = temp_Intercept - dot_product(X_means, b);
>
> }
>
>
>
>
> 1. Likelihood for Ordinal Negative Binomial (doesn’t work, get a sum of theta = NAN objection)
>
> data {
>
> int<lower=1> N; // total number of observations
>
> int<lower=2> N_cat; // number of categories
>
> int<lower=1,upper=N_cat> y[N]; // response variable
>
> int<lower=1> K; // number of population-level effects
>
> // matrix[N, K] X; // centered population-level design matrix
>
> vector[K] x_means; // column means of X before centering
>
> row_vector[K] x[N]; // pop level variables by row (for looping)
>
> int<lower=0> bound_lo[N_cat]; // lower bound of cut off points
>
> }
>
>
>
> transformed data {
>
> }
>
>
>
> parameters {
>
> vector[K] beta; // population-level effects
>
> //ordered[N_cat-1] c; // temporary thresholds
>
> real<lower=0> phi; // neg binomial shape parameter
>
> }
>
>
>
> transformed parameters {
>
> // vector[N] eta; // linear predictor
>
> // compute linear predictor
>
> // eta = X * b;
>
> }
>
>
>
> model {
>
> vector[N_cat] theta;
>
> // prior specifications
>
> // to_vector(c) ~ normal(0,5);
>
>
> // likelihood contribution
>
> for (n in 1:N) {
>
> real eta;
>
> eta = exp(x[n] * beta);
>
> theta[1] = 1 - neg_binomial_2_lcdf(0 | eta, phi);
>
> for (i in 2:N_cat-1) {
>
> theta[i] = log_diff_exp(neg_binomial_2_lcdf(bound_lo[i] | eta, phi),
>
> neg_binomial_2_lcdf(bound_lo[i-1] | eta, phi));
>
> }
>
> theta[N_cat] = neg_binomial_2_lcdf(bound_lo[N_cat-1] | eta, phi);
>
> y[n] ~ categorical(theta);
>
> }
>
> }
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Mehdi Hosseini

unread,
Sep 21, 2016, 6:41:42 AM9/21/16
to stan-...@googlegroups.com
Hi Bob,
let me explain the problem with further details.
Let's say there are a set of predictors, e.g. age and gender, and a response variable y which is censored and accepts one of the values from {never, once a year, once a month, once a week, etc}. There is also a latent variable y* with a hurdle negative binomial distribution that underlies the censored variable y using a set of cut-points θ as below 



The predictors as well as the censored variable are our observable. The latent variable, expressed by its mean and dispersion parameters, and the outpoints are our unknowns. We try to relate the predictors to the response variable via the latent variable and the cut-points. The following figure (copyright(c), 2014 by John Kruschke and Elsevier) resembles our problem. The only difference is the distribution of the latent variable which in our case is hurdle negative binomial.  

As I said before, I have a maximum likelihood solution, where the cut-points are fixed and assumed to be known, following the paper of James S. McGinley et.al, http://www.ncbi.nlm.nih.gov/pubmed/25857717. However, in our case, the cut-points are based on a set of censored answers to a survey question. Someone says once a month, but the true answer is somewhere between twice a month and once every two months.
I am hoping to use stan to estimate the cut-points along with the latent variable's parameters using Bayesian inference.

Bob Carpenter

unread,
Sep 21, 2016, 4:35:48 PM9/21/16
to stan-...@googlegroups.com
Are you all set or was there a further question? If
you want to estimate true values given censoring, you
need to either integrate out the possible true values or
declare a parameter with appropriate constraints and sample.

- Bob

> On Sep 21, 2016, at 6:41 AM, Mehdi Hosseini <mehd...@gmail.com> wrote:
>
> Hi Bob,
> let me explain the problem with further details.
> Let's say there are a set of predictors, e.g. age and gender, and a response variable y which is censored and accepts one of the values from {never, once a year, once a month, once a week, etc}. There is also a latent variable y* with a hurdle negative binomial distribution that underlies the censored variable y using a set of cut-points θ as below
>
>
> The predictors as well as the censored variable are our observable. The latent variable, expressed by its mean and dispersion parameters, and the outpoints are our unknowns. We try to relate the predictors to the response variable via the latent variable and the cut-points. The following figure (copyright(c), 2014 by John Kruschke and Elsevier) resembles our problem. The only difference is the distribution of the latent variable which in our case is hurdle negative binomial.
>
>

Mehdi Hosseini

unread,
Sep 21, 2016, 5:50:02 PM9/21/16
to Stan users mailing list
Hi Bob,
no unfortunately, my stan code is still failing. 
I think what you suggested below is what I am trying to do in my stan code (please see the second code snippet in my first post). 

Daniel Lee

unread,
Sep 22, 2016, 1:00:23 AM9/22/16
to stan-...@googlegroups.com
Hi Medhi,

The original error messages look pretty clear. It's trying to tell you that the simplex you're providing to the categorical distribution is not valid. 

Since you created the simplex theta by hand, you'll need to figure out if that construction guarantees a simplex. You could use print() statements to see if that's true. 


Daniel

Mehdi Hosseini

unread,
Sep 22, 2016, 2:10:48 PM9/22/16
to Stan users mailing list
Hi Daniel,
thanks a lot for the hint! I have revised the code (the second snippet in my first post) to return a valid simplex. I now get a new error as below. Any idea?

CHECKING DATA AND PREPROCESSING FOR MODEL 'ONB_bayes_1' NOW.

COMPILING MODEL 'ONB_bayes_1' NOW.

STARTING SAMPLER FOR MODEL 'ONB_bayes_1' NOW.

SAMPLING FOR MODEL 'ONB_bayes_1' NOW (CHAIN 1).
 [1] "Rejecting initial value:"                                                                                                   
 [2] "  Log probability evaluates to log(0), i.e. negative infinity."                                                             
 [3] "  Stan can't start sampling from this initial value."                                                                       .
.
.

[299] "  Log probability evaluates to log(0), i.e. negative infinity."                                                             
[300] "  Stan can't start sampling from this initial value."                                                                       
[301] "Initialization between (-2, 2) failed after 100 attempts. "                                                                 
[302] " Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model."                      
[303] "Error in eval(substitute(expr), envir, enclos) : "                                                                          
[1] "error occurred during calling the sampler; sampling not done"

Daniel Lee

unread,
Sep 22, 2016, 2:52:34 PM9/22/16
to stan-...@googlegroups.com
Hi Mehdi,

It's really hard to help without knowing the actual Stan program. 

I know what the error message means. You should read it cause I can't describe it any better. 


Daniel

Mehdi Hosseini

unread,
Sep 29, 2016, 1:33:34 PM9/29/16
to Stan users mailing list
Hi Daniel,
thanks for the comments! It seems my error message is very similar to the one that you already raised as a bug on https://groups.google.com/forum/#!msg/stan-users/Q7zADp0Ho9E/swNO0CQBOS8J

I wonder if the bug is fixed? 
Mehdi

Bob Carpenter

unread,
Sep 29, 2016, 4:50:26 PM9/29/16
to stan-...@googlegroups.com
Your log density is evaluating to log(0) or -infinity.
Stan programs need to return finite log densities. So
you need to debug your program and figure out why your
model doesn't have support at the initialization values.

This isn't related to the bug you link.

- Bob
Reply all
Reply to author
Forward
0 new messages