Zero-Altered negative binomial regression model in stan

1,570 views
Skip to first unread message

Mark Carty

unread,
Aug 18, 2014, 6:25:43 PM8/18/14
to stan-...@googlegroups.com
Hi Stan users,

I"m new to Stan. I'd like to speedup a Bayesian zero-altered negative binomial regression model originally written in JAGS code. I've written a stan version of the code. However,
I get warnings when lambda is below 1e-05. The posterior estimates of my parameter values are not even close to JAGS version.

data {
  int<lower=0> N;
  int<lower=0> K;
  int<lower=0> y[N];
  vector[K] x[N];
  vector[2] u[N];
}
parameters {
  vector[2] alpha;
  vector[K] beta;
  real<lower=0.00001> delta;
}
transformed parameters {
  vector<lower=0.00001>[N] lambda;
  vector<lower=0,upper=1>[N] theta;
  for(k in 1:N){
    lambda[k] <- exp(dot_product(x[k],beta));
    theta[k] <- inv_logit(dot_product(u[k],alpha));
  }
}
model {
  #priors
  for(j in 1:K){
    beta[j] ~ normal(0,10);
  }
  alpha[1] ~ normal(0,10)T[0,15];
  alpha[2] ~ normal(0,10)T[-5,0];
  delta ~ uniform(0,25);
  for(n in 1:N){
    if(y[n]==0)
      increment_log_prob(bernoulli_log(1,theta[n]));
    else
      increment_log_prob(neg_binomial_2_log(y[n],lambda[n],delta));
  }
}
generated quantities {
  vector[N] mu;
  vector[N] z;
  vector[N] y_rep;
  vector[N] T;
  for (n in 1:N){
    z[n] <- bernoulli_rng(theta[n]);
    mu[n] <- lambda[n]*z[n] + 0.0001;
    y_rep[n] <- neg_binomial_2_rng(mu[n],delta);
    T[n] <- step(y_rep[n]-y[n]);
  }
}

Here is the JAGS version:

# Bayesian Hurdle Negative binomial regression model
    for(i in 1:N){
     
      y[i] ~ dnegbin(theta[i],phi)
      y.rep[i] ~ dnegbin(theta[i],phi)
      pvalue[i] <- step(y.rep[i]-y[i])
      theta[i] <- phi/(mu1[i] + phi)
      mu1[i] <- mu[i]*z[i] + 0.00001
      z[i] ~ dbern(q[i])
     
      log(mu[i]) <- inprod(beta[],x[i,])
      logit(q[i]) <- inprod(alpha[],u[i,])
    }
    for(j in 1:n){
      beta[j] ~ dnorm(0,0.001)
    }
    alpha[1] ~ dnorm(0,0.001);T(0,b)
    alpha[2] ~ dnorm(0,0.001);T(-2,0)
   
    #phi ~ dgamma(0.0001,0.0001)
    phi ~ dlnorm(0,0.001)
    dispersion <- pow(phi,-1)
  }


Bob Carpenter

unread,
Aug 19, 2014, 12:07:29 PM8/19/14
to stan-...@googlegroups.com
You need to constrain the parameters to the support of the
model. More details inline.


On Aug 18, 2014, at 6:25 PM, Mark Carty <mac...@cornell.edu> wrote:

> Hi Stan users,
>
> I"m new to Stan. I'd like to speedup a Bayesian zero-altered negative binomial regression model originally written in JAGS code. I've written a stan version of the code. However,
> I get warnings when lambda is below 1e-05. The posterior estimates of my parameter values are not even close to JAGS version.
>
> data {
> int<lower=0> N;
> int<lower=0> K;
> int<lower=0> y[N];
> vector[K] x[N];
> vector[2] u[N];

I'd change vector[2] to row_vector[2] so that
u[n] behaves like it would in a matrix and return
a row vector.

> }
> parameters {
> vector[2] alpha;
> vector[K] beta;

alpha needs <lower=0, upper=15>
beta needs <lower=-5,upper=0>



> real<lower=0.00001> delta;

delta needs <lower=0,upper=25> (you can make the lower 0.0001 but
it's not going to change much --- Stan should be more stable than
JAGS or BUGS at boundaries).

> }
> transformed parameters {
> vector<lower=0.00001>[N] lambda;
> vector<lower=0,upper=1>[N] theta;
> for(k in 1:N){
> lambda[k] <- exp(dot_product(x[k],beta));
> theta[k] <- inv_logit(dot_product(u[k],alpha));
> }
> }
> model {
> #priors
> for(j in 1:K){
> beta[j] ~ normal(0,10);
> }
> alpha[1] ~ normal(0,10)T[0,15];
> alpha[2] ~ normal(0,10)T[-5,0];

You can remove the truncations here because the boundaries
are fixed as are the parameters to normal.

> delta ~ uniform(0,25);

You can remove this altogether --- the default prior is uniform.


> for(n in 1:N){
> if(y[n]==0)
> increment_log_prob(bernoulli_log(1,theta[n]));

This isn't the usual way to do zero-inflation, by the way. There's
a standard mixture approach in the manual.

No need to even define theta, and it's faster and more stable to do it
on the inverse logit scale. And you can just use the faster sampling
statement notation which drops constants:

1 ~ bernoulli_logit(u[k] * alpha);

You can use regular product because u[k] is a row vector.

> else
> increment_log_prob(neg_binomial_2_log(y[n],lambda[n],delta));

y[n] ~ neg_binomial_2(lambda[n], delta);

Again, you don't need to define lambda externally. It'll just cause
more writes and a larger print statement.

If you want to stick with this model, it could be helpful to count the
y[k] == 0 cases in transformed data and break apart the two cases to
enable vectorization. Then you can skip the conditionals and vectorize everything
into two statements:.

1 ~ bernoulli_logit(u_zero * alpha);

y_non_zero ~ neg_binomial_2(exp(x_non_zero * beta));

where u_zero is the matrix of predictors for n such that y[n] == 0 and
x_non_zero is the matrix of predictors for y[n] > 0.

For this you'd have to declare u_zero and x_non_zero as matrices.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Bob Carpenter

unread,
Aug 19, 2014, 12:09:13 PM8/19/14
to stan-...@googlegroups.com
Also, your constraints on alpha aren't the same in both cases.

And you don't define theta with inverse-logit in your JAGS version.

So the models aren't the same at all here.

You should also check the negative binomial parameterizations.

- Bob

Andrew Gelman

unread,
Aug 19, 2014, 1:27:59 PM8/19/14
to stan-...@googlegroups.com
I have not looked in detail but my quick thought is that I get a bit worried with constraints such as <lower=0.00001>.  If you don’t want a parameter to be close to zero, I recommend a soft constraint rather than a hard constraint.
A

Daniel Lee

unread,
Aug 19, 2014, 1:32:24 PM8/19/14
to stan-...@googlegroups.com
Andrew, what do you mean by a soft constraint?

Bob Carpenter

unread,
Aug 19, 2014, 1:39:39 PM8/19/14
to stan-...@googlegroups.com
Some kind of zero-avoiding prior.

I think in this case it's just a holdover from BUGS/JAGS.
You often see such constraints in their models to avoid
arithmetic issues at boundaries. You used to see this in
Andrew's models for JAGS and BUGs, by the way --- it was
one of the motivating cases for his "Pinocchio principle" if
I remember correctly:

http://andrewgelman.com/2009/05/24/handy_statistic/

- Bob

Mark Carty

unread,
Aug 19, 2014, 1:47:13 PM8/19/14
to stan-...@googlegroups.com
Hi all,

Thanks for the responses. I've gotten pretty good results with both the JAGS and Stan version of the zero-altered negative binomial regression model. I was concerned about the warnings that Stan was throwing. I'm new to Stan programming, so I was trying to put constraints on the parameters to avoid this problem.
I wanted to know what Andrew meant by using soft constraints.

Best,
Mark


You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/mS3QjyqfRz8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
---------------------------------------------------
Mark A. Carty, M.A
Ph.D Candidate
Elemento & Leslie Labs
Tri-Institutional Training Program in Computational Biology Medicine
Cornell University, Weill Cornell Medical College, Memorial Sloan Kettering Cancer Center

Office location:
Zuckerman Research Center
415-417 East 68th Street, 11th Floor (ZRC-1145)
New York, New York 10065
(P) 646.888.2773 | (F)  646.422.0717

Bob Carpenter

unread,
Aug 19, 2014, 2:04:17 PM8/19/14
to stan-...@googlegroups.com
Those warnings are typically due to numerical issues in the
integrator for the Hamiltonian simulation. If they go away
early in warmup, there's nothing to worry about --- Stan just
needs to find the right mass matrix (scales of variables) and
step sizes for stability. They may still occur, but if they're
not frequent, it's not a problem.

I'm pretty sure Andrew just means using lower=0 with a prior
that assigns zero density to a value of 0 with enough curvature
near 0 to keep estimates away from the boundaries (such as a
gamma distribution with shape >> 1). But this still won't prevent the
kinds of warning messages you're seeing until Stan's estimated reasonable
scales and step sizes during warmup.

- Bob

Marco Inacio

unread,
Aug 19, 2014, 4:51:07 PM8/19/14
to stan-...@googlegroups.com
I worked with Hurdle negative binomial (and also zero-inflated) models a while ago and if that's the model you want to implement (I assume this from the comment "# Bayesian Hurdle Negative binomial regression model" before the JAGS code), I think you are missing the "normalizing constants" of the zero truncation model plus the Bernoulli part when y!=0.

It also seems that I also used a different notation for the likelihood where Bernoulli(0, theta) is used when y==0, the opposite of zero-inflated models (despite confusing, I think that's the standard in the literature, including R's pscl package).

Here's my code without the priors:

data {
  int <lower=0> nrow; //number of rows of y
  int <lower=0> ncol; //number of cols of the covariates matrix
  int <lower=0> yy[nrow]; //observed outcome
  matrix [nrow,ncol] bbgg; //matrix of covariates
  int <lower=0> nZeros; //number of rows where y==0
  int <lower=0> nOnes; //number of rows where y!=0
}
transformed data { //separate the zeros from the ones of yy and the corresponding elements of bbgg
  matrix [nZeros,ncol] bbgg0;
  matrix [nOnes,ncol] bbgg1;
  int <lower=0> yy1[nOnes];
  int jj;
  int kk;
  jj<-1;
  kk<-1;
  for (i in 1:nrow) {
    if (yy[i]==0) {
      bbgg0[jj] <- bbgg[i];
      jj<-jj+1;
    }
    else {
      yy1[kk] <- yy[i];
      bbgg1[kk] <- bbgg[i];
      kk<-kk+1;
    }
  }
}
parameters {
  vector [ncol] beta;
  vector [ncol] gamma;
  real <lower=0> phi;
}
model {
  vector [nZeros] zeta0;
  vector [nOnes] eta1;
  vector [nOnes] zeta1;
  zeta0 <- bbgg0 * gamma;
  eta1 <- bbgg1 * beta;
  zeta1 <- bbgg1 * gamma;

  0 ~ bernoulli_logit(zeta0); //you can use bbgg0 * gamma directly here, instead of zeta0
  1 ~ bernoulli_logit(zeta1); //and also bbgg1 * gamma instead of zeta1 here

  yy1 ~ neg_binomial_2_log(eta1, phi);
  increment_log_prob(-log1m_exp(neg_binomial_2_log_log(0, eta1, phi)));
}

For zero-inflated, see https://groups.google.com/d/msg/stan-users/xgK42dCNu4Y/21pr0-B6CnQJ

Bob Carpenter

unread,
Aug 19, 2014, 5:01:33 PM8/19/14
to stan-...@googlegroups.com

On Aug 19, 2014, at 4:51 PM, Marco Inacio <marcoig...@gmail.com> wrote:

> I worked with Hurdle negative binomial (and also zero-inflated) models a while ago and if that's the model you want to implement (I assume this from the comment "# Bayesian Hurdle Negative binomial regression model" before the JAGS code), I think you are missing the "normalizing constants" of the zero truncation model plus the Bernoulli part when y!=0.

Thanks for sharing the code --- this is exactly what I meant by the
coding of a zero-inflated model being non-standard. I don't know
the hurdle model.

But the model you have below doesn't address the zero-inflation (or
deflation?) does it?

> It also seems that I also used a different notation for the likelihood where Bernoulli(0, theta) is used when y==0, the opposite of zero-inflated models (despite confusing, I think that's the standard in the literature, including R's pscl package).

Unless you care about the value of theta, the polarity of
the Bernoulli shouldn't matter.

More inline on model.
Was this supposed to be a mixture or some kind? If so, I can't see
it in the code. I'm also unclear why there's a negation in the log1m_exp
expression, but that's probably just because I don't see what the model's
doing.

- Bob


Marco Inacio

unread,
Aug 19, 2014, 5:25:20 PM8/19/14
to stan-...@googlegroups.com

On 14-08-19 06:01 PM, Bob Carpenter wrote:
> On Aug 19, 2014, at 4:51 PM, Marco Inacio <marcoig...@gmail.com> wrote:
>
>> I worked with Hurdle negative binomial (and also zero-inflated) models a while ago and if that's the model you want to implement (I assume this from the comment "# Bayesian Hurdle Negative binomial regression model" before the JAGS code), I think you are missing the "normalizing constants" of the zero truncation model plus the Bernoulli part when y!=0.
> Thanks for sharing the code --- this is exactly what I meant by the
> coding of a zero-inflated model being non-standard. I don't know
> the hurdle model.
>
> But the model you have below doesn't address the zero-inflation (or
> deflation?) does it?
I does, it's actually a model that allows both inflation or deflation of
zeros, that probably why Mark Carty named this thread zero-altered
instead of zero-inflated.



> Was this supposed to be a mixture or some kind? If so, I can't see
> it in the code. I'm also unclear why there's a negation in the log1m_exp
> expression, but that's probably just because I don't see what the model's
> doing.
A zero-inflated Poisson random variable is the multiplication of a
Poisson random variable and a Bernoulli random variable.

A Hurdle Poisson random variable is the multiplication of a
zero-truncated Poisson random variable and a Bernoulli random variable.

If you wish, take a quick look at section 3.2 of article attached.


monografia - final identificado.pdf

Bob Carpenter

unread,
Aug 20, 2014, 3:48:25 PM8/20/14
to stan-...@googlegroups.com
Thanks --- equations (9) and (10) make the terminology clear.
Feel free to update the manual to discuss both. The "zero inflated"
case is just an ordinary two-way mixture of an all-0 distribution and
a Poisson.

Does anyone ever use this close relative of the hurdle?

p(y|theta,lambda) = theta if y = 0
Poisson(y-1 | lambda) if y > 0

This is the kind of thing that would be more efficient to code
directly as a zero-inflated Poisson pmf with two parameters, theta (mixture)
and lambda (Poisson mean). Ditto the hurdle model. Both are rather
error prone to code by hand.

I'm really not sure how many such distributions we should be adding,
though. I think there are also a couple of standard approaches to
overdispersed Poisson (negative binomial) models, which could themselves
also be zero inflated or hurdled, and so on.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <monografia - final identificado.pdf>

Marco Inacio

unread,
Aug 20, 2014, 6:01:49 PM8/20/14
to stan-...@googlegroups.com

On 14-08-20 04:48 PM, Bob Carpenter wrote:
Thanks --- equations (9) and (10) make the terminology clear.
Feel free to update the manual to discuss both. 
Yes, I will.



The "zero inflated"
case is just an ordinary two-way mixture of an all-0 distribution and
a Poisson.  

Does anyone ever use this close relative of the hurdle?

   p(y|theta,lambda) = theta                  if y = 0
                       Poisson(y-1 | lambda)  if y > 0
I've never seen this one before.




This is the kind of thing that would be more efficient to code
directly as a zero-inflated Poisson pmf with two parameters, theta (mixture) 
and lambda (Poisson mean).  Ditto the hurdle model.  Both are rather
error prone to code by hand.  
Yes, the Hurdle model could be easier if Stan had truncated Poisson distribution (or it has?).





I'm really not sure how many such distributions we should be adding,
though.  I think there are also a couple of standard approaches to
overdispersed Poisson (negative binomial) models, which could themselves 
also be zero inflated or hurdled, and so on.
Sure, I've read a critic of zero-inflated models claiming that negative binomial might already be enough to model the excess of zeroes by accounting to overdispersion, but not sure how far this can go.

There' also a bunch of other ways of generalizing a Poisson in the literature. But anyway, these zero-inflated Poisson and NB models seems to be popular these days and might be worth implementing.

Note that I implemented the zero-inflated Poisson and Negative binomial for Stan a while ago (they are in my fork), but they might fit in the shitty implementations family since I didn't make the gradients all that efficient since they aren't so friendly*. Also, I've only implemented the versions for regressions (that is, analogous to poison_log_log and bernoulli_logit_log).


(*)
Derivatives for ZI Poisson:
{count model component with log link, inflation component with logit link}
$\left\{-\frac{e^{\eta }}{e^{\zeta +e^{\eta
      }}+1},\frac{1}{e^{\zeta }+1}-\frac{1}{e^{\zeta +e^{\eta
      }}+1}\right\}$


Derivatives for ZI Neg Binomial:
{count model component with log link, neg bin overdispersion component with identity link, inflation component with logit link}
$\left\{-\frac{e^{\eta } \left(\frac{\phi }{e^{\eta }+\phi
      }\right)^{\phi +1}}{e^{\zeta }+\left(\frac{\phi }{e^{\eta }+\phi
      }\right)^{\phi }},\frac{\left(\frac{\phi }{e^{\eta }+\phi
      }\right)^{\phi +1} \left(\left(e^{\eta }+\phi \right) \log
      \left(\frac{\phi }{e^{\eta }+\phi }\right)+e^{\eta }\right)}{\phi 
      \left(e^{\zeta }+\left(\frac{\phi }{e^{\eta }+\phi }\right)^{\phi
      }\right)},e^{\zeta } \left(\frac{1}{e^{\zeta }+\left(\frac{\phi
      }{e^{\eta }+\phi }\right)^{\phi }}-\frac{1}{e^{\zeta
      }+1}\right)\right\}$

Mark Carty

unread,
Aug 20, 2014, 7:19:40 PM8/20/14
to stan-...@googlegroups.com
Hi Bob,

I made some changes to the hurdle negative binomial regression model. I've attached a folder with some data, the R code to run the hurdle negative binomial regression model, and the Stan file. To run the algorithm:

  install.packages(pscl)
  install.packages(VGAM)
  install.packages(boot)
  install.packages(Matrix)
  install.packages(Hmisc)
  install.packages(plyr)

 setwd('.../Hurdle_Negative_binomal/')
 files <- dir(pattern = "input")
 stanfile <- dir(pattern = ".stan")
 source('GLIMCHStan.R')
 res <- GLIMCH.stan(files,stanfile,iter = 100)
 
I've included the model fit for hurdle negative binomial model using hurdle() function from the pscl package. The get the summary of fit of the model using hurdle() function:

summary(res$mod)




--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/mS3QjyqfRz8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Hurdle_Negative_binomal.zip

Mark Carty

unread,
Aug 20, 2014, 7:52:03 PM8/20/14
to stan-...@googlegroups.com
Hi Bob,

I had mistakenly send you the wrong file.


Hurdle_Negative_binomal.zip

Marco Inacio

unread,
Aug 20, 2014, 8:31:21 PM8/20/14
to stan-...@googlegroups.com
You can change your conditional to:

        if (y[i]==0)
          0 ~ bernoulli(psi0[i]);
        else {
          1 ~ bernoulli(psi[i]);
          increment_log_prob(log1m_exp(neg_binomial_2_log_log(0, theta[i], phi)));
          y[i] ~ neg_binomial_2_log(theta[i],phi);
        }

These are more efficient since they drop constant terms. Unless you want to keep the value of lp__ for some reason.

I also don't think this psi0 makes sense since:

0 ~ bernoulli(psi[i]); //adds log(1-psi[i]) to the log probability
1 ~ bernoulli(psi[i]); //adds log(psi[i]) to the log probability

Then, it would be:


        if (y[i]==0)
          0 ~ bernoulli(psi[i]);
        else {
          1 ~ bernoulli(psi[i]);
          increment_log_prob(log1m_exp(neg_binomial_2_log_log(0, theta[i], phi)));
          y[i] ~ neg_binomial_2_log(theta[i],phi);
        }



You are also applying exp twice to dot_product(x[i],beta), since you're doing it here:

        theta[i] <- exp(dot_product(x[i],beta));

and neg_binomial_2_log_log also does it (see documentation and compare neg_binomial_log with neg_binomial_log_log and also poison_log with poison_log_log, and also bernoulli_log with bernoulli_logit_log)


So an even better solution:

transformed parameters {
      vector<lower=0,upper=1>[nrow] psi;
      vector[nrow] theta;
      real<lower=0> dispersion;
      for(i in 1:nrow){
        psi[i] <- dot_product(u[i],alpha);
        theta[i] <- dot_product(x[i],beta);
      }
      dispersion <- pow(phi,-1);
}
model {
...
      for(i in 1:nrow) {
        if (y[i]==0)
          0 ~ bernoulli_logit(psi[i]); //calls bernoulli_logit_log
        else {
          1 ~ bernoulli_logit(psi[i]); //calls bernoulli_logit_log
          increment_log_prob(log1m_exp(neg_binomial_2_log_log(0, theta[i], phi)));
          y[i] ~ neg_binomial_2_log(theta[i],phi); //calls neg_binomial_2_log_log
        }
      }
...
}

One last improvement, since the dispersion parameter is not used in your model, it's more efficient to declare it generated quantities block.

Bob Carpenter

unread,
Aug 20, 2014, 8:32:08 PM8/20/14
to stan-...@googlegroups.com
Was there something you were expecting me to do with it?
Sorry, but I'm having trouble keeping up with all the threads.
I get the basic hurdle model now.

- Bob
> <Hurdle_Negative_binomal.zip>

Mark Carty

unread,
Aug 21, 2014, 1:15:00 AM8/21/14
to stan-...@googlegroups.com

Hi all,

 

I’m now able to get the Stan version of the hurdle negative binomial model working properly without any problems. I want to thank everyone for his or her assistance.

 

Best,

Mark

Reply all
Reply to author
Forward
0 new messages