help with hierarchical random coefficient model

348 views
Skip to first unread message

Haoyan Sun

unread,
Apr 12, 2016, 5:49:31 PM4/12/16
to Stan users mailing list
Hi all,

I am working on a hierarchical random coefficient(random slope) model using Stan. Since I am new to Stan, I take baby steps by adding levels to the model step by step.For the base model, I assign random coefficient beta a distribution, and it works fine. However, when I try to add individual level predictors to model beta, I got error. 

Here is my base model code which works fine, although quite slow(several hours)

data {
 
int<lower=0> N; // total observation
 
int<lower=0> I; // subjects
 
int<lower=0> sellerid[N]; // ID variable
 
int<lower=0> y[N]; // number of return customer for seller i
  real
<lower=0,upper=1> x1[N];
  real x2
[N];
}


parameters
{
  vector
[3] beta[I];  //I*3 matrix
  real mu_beta
[3];  //means for beta's normal distribution
  real
<lower=0,upper=100> sigma_b[3];
  real
<lower=0, upper=100> sigma;  
  real
<lower=0> phi; // neg. binomial dispersion parameter
  real
<lower=0> mu_y[I]; // mean Y for each individual
}


model
{

 
//prior  
  phi
~ cauchy(0, 20);
 
 
for (k in 1:3){
  mu_beta
[k] ~ normal(0, 10);
  sigma_b
[k] ~ uniform(0,100);
 
 
for (i in 1:I)
  beta
[i,k] ~ normal(mu_beta[k], sigma_b[k]);
 
}


  sigma
~ uniform(0,100);
 
 
 
//likelihood
 
for(n in 1:N){
  y
[n] ~ neg_binomial_2_log(mu_y[sellerid[n]], phi);
 
}
 
 
for(i in 1:I){
   mu_y
[i]~ normal( beta[sellerid[i],1] + beta[sellerid[i],2] * x1[i] + beta[sellerid[i],3]* x2[i],sigma);
 
}
}

Then I want to add individual level attributes to model random coefficient beta, I got warning message: 
Warning (non-fatal):
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If so, you need to call increment_log_prob() with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
    to_vector(sigma_b) ~ cauchy(...)
    beta[k] ~ normal(...)
    mu_y[i] ~ normal(...)error message 
and also error messgaes:
  [1] "Error : "                                                                                                                              
  [2] "Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:"                        
  [3] "validate transformed params: mu_y[k0__] is 1.#QNAN, but must be greater than or equal to 0"                                            
  [4] "If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,"
  [5] "but if this warning occurs often then your model may be either severely ill-conditioned or misspecified."                              
  [6] "Rejecting initial value:"                                                                                                              
  [7] "  Error evaluating the log probability at the initial value."                   

Here is my stan code which doesn't work:

data {
 
int<lower=0> N; // total observation
 
int<lower=0> I; // subjects
 
int<lower=0> sellerid[N]; // ID variable
 
int<lower=0> y[N]; // number of return customer for seller i
  vector
[3] Z[I]; //an i by 3 matrix of coefficient predictors
  real x1
[N];
  real x2
[N];
 
int<lower=1> nvar; // Number of beta coefficients nvar=3
 
int<lower=1> nz; // Number of z coefficients for beta nz=3
}


parameters
{
  matrix
[3,3] delta;
  real
<lower=0, upper=100> sigma;  
  real
<lower=0> phi; // neg. binomial dispersion parameter
}


transformed parameters
{
  vector
[3] beta[I]; //I*3 matrix
  vector
[N] y_hat;
  real mu_beta
[3];
  real sigma_b
[3];
  real
<lower=0> mu_y[I]; // mean Y for each individual
 
 
for (n in 1:N){
    y_hat
[n] <- beta[sellerid[n],1] + beta[sellerid[n],2] * x1[n] + beta[sellerid[n],3]* x2[n];
 
}
   
 
for (i in 1:I){
 
for (k in 1:3){
  mu_beta
[k] <- Z[i,1]*delta[1,k] + Z[i,2]*delta[2,k] + Z[i,3]*delta[3,k];
 
}
 
}
 
}

model
{
  to_vector
(sigma_b) ~ cauchy(0,5);
 
 
for (k in 1:3){
  beta
[k] ~ normal(mu_beta[k], sigma_b[k]);
 
}

 
//prior  
  phi
~ cauchy(0, 5);
  sigma
~ normal(0,100);

  to_vector
(delta) ~ cauchy(0,5);
 
 
 
//likelihood
 
for(n in 1:N){
  y
[n] ~ neg_binomial_2_log(mu_y[sellerid[n]], phi);
 
}
 
 
for(i in 1:I){
   mu_y
[i]~ normal( y_hat[i],sigma);
 
}

}

I appreciate any advices on fixing the model. Also, how should I code it in a way that will make Stan run faster? 

Thanks!!

Bob Carpenter

unread,
Apr 13, 2016, 12:39:02 AM4/13/16
to stan-...@googlegroups.com
You never define mu_y. The ~ statements don't define a variable,
they're just shorthand for incrementing the log density. The
only variables you don't need to define in your Stan programs
are data and parameters.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Haoyan Sun

unread,
Apr 13, 2016, 1:57:01 PM4/13/16
to Stan users mailing list
Hi Bob,

Thanks a lot for your quick reply. I don't quite get what you mean by mu_y is not defined? I declared mu_y in the transformed parameter block, and I basically model it as yhat + error term. What should else should I do to define it?
Also,  if mu_y is not defined, why the base model worked, but not the random coefficient model?

Bob Carpenter

unread,
Apr 13, 2016, 4:34:31 PM4/13/16
to stan-...@googlegroups.com
Declaration:

real mu;

Definition:

mu <- alpha + beta * x;

There is no "modeling". Stan just defines a log density.

Every variable must be *defined* before it is used. Data variables
and parameters are automatically defined externally. All other variables,
including transformed data and transformed parameters and local variables
and generated quantities, must be defined before being used.

You don't define mu_y before you use it on the left-hand-side of ~.

What you probably want to do is declare mu_y as a parameter,
not as a transformed parameter.

- Bob

P.S. This misunderstanding by our users is what's prompting our devs
to want to remove ~ from Stan!

Haoyan Sun

unread,
Apr 18, 2016, 6:30:50 PM4/18/16
to Stan users mailing list
Hi Bob, thanks a lot for your explanation, makes a lot of sense to me. 
Now I get the model running with simulated data, but it seems the posterier distribution has a wider variance, it does cover the true value of the simulated parameter, but the n_eff is small and the sd is larger than what I expect.
Does that mean my model is not well specified? Or it's something common in Stan?

Thanks!! 

Bob Carpenter

unread,
Apr 18, 2016, 6:35:18 PM4/18/16
to stan-...@googlegroups.com
How small is n_eff and what are the R-hat values?

If the R-hats don't all converge to 1, it means the model's
going to need to be reparameterized or rewritten.

If the n_eff is just very small compared to the number of
iterations, that's sometimes workable and sometimes not.
It can often help to reparameterize.

How did you set your expectations for posterior standard deviation?
Assuming the model converges, they should be correct (to within
MCMC sampling error). There's no inherent overestimation bis
for posterior variance!

- Bob

Haoyan Sun

unread,
Apr 18, 2016, 6:42:35 PM4/18/16
to stan-...@googlegroups.com
Thanks Bob for your quick response!
I set chain =4 and iteration=3000, and the n_eff is about 300-400 for each parameter. but the R-hat are from 1.01-1.03, so that means it does converge, right?
since the data are simulated, I simply compare the posterior distribution with the simulated value, and it seems the posterior distribution are much wider, and also the mean are different for the true value and posterior.
> You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/RLuPtoWOwD0/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Haoyan Sun

unread,
Apr 19, 2016, 1:17:56 AM4/19/16
to Stan users mailing list
Hi Bob, I revised the code based on your suggestions and now I just got an new error as below
"Rejecting initial value:"                                                                                                              
"Error evaluating the log probability at the initial value."                                                                          
"Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:"                        
"validate transformed params: mu_y[k0__] is -6.6022, but must be greater than or equal to 0"                                      

I am running a negative binomial regression model, mu_y is the mean of the negative binomial distribution, so it must be larger than 0, and the error seems telling me that Stan draws negative value for mu_y ?
here is my revised code if you can get a chance to look at it. Thanks!!




data
{
 
int<lower=0> N; // total observation
 
int<lower=0> I; // subjects

 
int<lower=0> id[N]; // ID variable
 
int<lower=0> y[N];
 
int<lower=1> nvar; // Number of beta coefficients nvar=3
 
int<lower=1> nz; // Number of z coefficients for beta nz=3
  matrix
[I,nz] Z; //an i by 3 matrix of coefficient predictors

  real x1
[N];
  real x2
[N];
}



parameters
{
 
  matrix
[nvar,nz] delta;                          
  matrix
[nvar,I] error_beta;                
  vector
<lower=0>[nvar] sigma_beta;            
 
  real
<lower=0> phi; // neg. binomial dispersion parameter
 
}


transformed parameters
{
  real
<lower=0> mu_y[I]; // mean Y for each individual
  matrix
[I,nvar] beta;
 
  beta
<- Z*delta' + (diag_matrix(sigma_beta)*error_beta)';


 
for (n in 1:N){

  mu_y
[id[n]] <- beta[id[n],1] + beta[id[n],2] * x1[n] + beta[id[n],3]* x2[n];
 
}
}


model
{
 
  sigma_beta
~ cauchy(0,5);          
 
  to_vector
(delta) ~ normal(0,3);
  to_vector
(error_beta) ~ normal(0,1);
 
  phi
~ uniform(0, 20);
 
 
 
//likelihood
   increment_log_prob
(neg_binomial_2_log(y,mu_y,phi));
}

Bob Carpenter

unread,
Apr 19, 2016, 1:25:18 AM4/19/16
to stan-...@googlegroups.com
Yes, you should be fine. You aren't getting divergent transition
warnings?

A simulated value is a single value. How do you compare that
to the width of a posterior?

And the posterior mean isn't going to be exactly the same.
Consider the case where I have a normal(0, 1) and make 5 draws.
What do I get for the mean?

> y <- rnorm(10)

> mean(y)
[1] -0.1355454

That's way off the expected value of 0! It's just sampling variation
and that's what confidence intervals and posterior intervals are
quantifying (in different ways).

What you want to check is calibration. If you have 100 parameters,
are roughly half of them in posterior 50% intervals?

- Bob

Bob Carpenter

unread,
Apr 19, 2016, 1:30:17 AM4/19/16
to stan-...@googlegroups.com
Usually you need a log link function, which means an exp() wrapped
around mu in that expression --- it has to be positive.

- Bob

Haoyan Sun

unread,
Apr 19, 2016, 1:36:47 AM4/19/16
to Stan users mailing list
Yes I did get divergent transition warning even with adapt_delta above 0.99, but I thought that's why my n_eff is small, so what should I do with it?
what I mean by comparing simulated value and posterier is, for example I know I simulated beta~ N(0,2), but what I get the posterier beta is something like beta~N(0,4), I simply just made histogram plots to compare, and the posterier is wider....
Thanks!

Haoyan Sun

unread,
Apr 19, 2016, 1:46:35 AM4/19/16
to stan-...@googlegroups.com
Hi Bob,

So I change my likelihood function part of the model to 

   for (n in 1:N){
    mu_y
[id[n]] <- beta[id[n],1] + beta[id[n],2] * x1[n] + beta[id[n],3]* x2[n];

 
   increment_log_prob
(neg_binomial_2_log_log(y[n],exp(mu_y[id[n]]),phi));
}

Then the sampling finishes okay but with numerical problem. I only try with 200 iteration and 1 chain, the count number 46 seems large.

[1] "The following numerical problems occured the indicated number of times after warmup on chain 1"
                                                                                                       count
Exception thrown at line 47: stan::prob::neg_binomial_2_log_log: Precision parameter is 0, but must     46
Exception thrown at line 47: stan::prob::neg_binomial_2_log_log: Log location parameter is 1.#INF, b     5

Michael Betancourt

unread,
Apr 19, 2016, 4:32:41 AM4/19/16
to stan-...@googlegroups.com
You’re double exponentiating — neg_binomial_2_log_log
already applies the inverse log link function to the first
parameter.  Please see the manual.

On Apr 19, 2016, at 6:46 AM, Haoyan Sun <haoy...@gmail.com> wrote:

Hi Bob,

So I change my likelihood function part of the model to 

   for (n in 1:N){
    mu_y[sellerid[n]] <- beta[sellerid[n],1] + beta[sellerid[n],2] * x1[n] + beta[sellerid[n],3]* x2[n];
 
   increment_log_prob(neg_binomial_2_log_log(y[n],exp(mu_y[sellerid[n]]),phi));
}

Haoyan Sun

unread,
Apr 19, 2016, 1:40:47 PM4/19/16
to Stan users mailing list
Hi Michael, Thanks for your note.
I double checked the Stan manual, and you are right, I should still use the sampling statement: 
y ~ neg_binomial_2(mu, phi);

I think I got confused with the log alternative parameterization, since Bob suggests me to take exp() of the mean mu. However,  now my sampling statement becomes:
y ~ neg_binomial_2_log(exp(mu), phi); 

would that still be alright to parametrize this way? my model runs so slow after I change the sampling statement this way. 

Thanks for all your suggestions!

Bob Carpenter

unread,
Apr 19, 2016, 2:51:58 PM4/19/16
to stan-...@googlegroups.com
If foo(...) is the distribution (used on the right side of ~),
then foo_log(y, ...) is the log density function. Some of the
distribution names end in _log so you get _log_log on the density
function.

Look at the definitions and use the appropriate one.
The _log version of negative binomial builds in the exp()
operation, so you don't want to do it twice.

In terms of "slow", you want to measure effective sample
size / second, and then only once everything's converged
and you know you are getting the right answers.

- Bob

Haoyan Sun

unread,
Apr 19, 2016, 3:14:19 PM4/19/16
to Stan users mailing list
Hi Bob, thanks for your reply. Just to make sure I understand it correctly, do you mean that in the sampling statement(alternative parametrization in the manual):
y ~ neg_binomial_2(mu, phi); 
if mu has a functional form, mu should be something like exp(beta*X) in order to make sure mu>0 ?

Thanks!!

Bob Carpenter

unread,
Apr 19, 2016, 3:31:58 PM4/19/16
to stan-...@googlegroups.com

> On Apr 19, 2016, at 3:14 PM, Haoyan Sun <haoy...@gmail.com> wrote:
>
> Hi Bob, thanks for your reply. Just to make sure I understand it correctly, do you mean that in the sampling statement(alternative parametrization in the manual):
> y ~ neg_binomial_2(mu, phi);
> if mu has a functional form, mu should be something like exp(beta*X) in order to make sure mu>0 ?

Right. But you wouldn't compute it that way.
You'd replace

mu <- exp(beta * X);
y ~ neg_binomial_2(mu, phi);

with

log_mu <- beta * X;
y ~ neg_binomial_2_log(log_mu, phi);

because it's faster and more arithmetically stable.

- Bob

Haoyan Sun

unread,
Apr 19, 2016, 7:18:56 PM4/19/16
to Stan users mailing list
Hi Bob, it really helps! Thanks a lot! 
however, it is still very slow. I run 1 chain with only 100 iterations, and it takes 4895.27 seconds (Total), and the n_eff is very small around 3, and Rhat is not converged. 

I also get warning as below, i feel the "count" is quite large compared to 100 iteration. So what are the things I can do to fix these problems? 

[1] "The following numerical problems occured the indicated number of times after warmup on chain 1"
                                                                                                     count
Exception thrown at line 48: stan::prob::neg_binomial_2_log_log: Precision parameter is 0, but must     47
Exception thrown at line 48: stan::prob::neg_binomial_2_log_log: Log location parameter is 1.#INF, b     3
[1] "When a numerical problem occurs, the Metropolis proposal gets rejected."
[1] "However, by design Metropolis proposals sometimes get rejected  even when there are no numerical problems."
[1] "Thus, if the number in the 'count' column is small,  do not ask about this message on stan-users."

Haoyan Sun

unread,
Apr 20, 2016, 3:16:43 AM4/20/16
to Stan users mailing list
Hi Bob,

I just found a small error in my code, after I fix that, I don't get any error messages or warnings now! Thanks a lot for your continuous help!
It took me 7676 seconds in total to finish the sampling for 100 iterations. the n_eff is between 40-50. 

Haoyan

Tommaso Guerrini

unread,
Feb 20, 2017, 5:42:25 AM2/20/17
to Stan users mailing list
Hi Bob, let me ask you another question, concerning the dispersion parameter phi of the negative binomial distribution: 

I have:

phi ~ cauchy(0,2.5);
           for(i in 1:N){
  for(l in 1:47){
real lambda;
lambda = X[i,]*beta+
Zeta[i,l]*eta[l]+
Z[i,]*(b1[brand[l]]+
b2[sugar[l]]);

Y[i,l] ~ neg_binomial_2_log(lambda,phi[l]);
}
}


and phi was declared in the parameters as 
vector<lower=0>[I] phi;


One for each 'group'. Unfortunately, when I remove the prior for phi and put a fixed value it becomes much slower. Is there something in my code which is not optimized? 

Thank you

Bob Carpenter

unread,
Feb 22, 2017, 1:08:47 PM2/22/17
to stan-...@googlegroups.com
You mean put in a fixed value for phi? My guess is you're putting
in a bad value for phi. If you are just comparing putting in some
prior vs. no prior, then the no prior version is probably not getting
enough data to control inference.

- Bob
Reply all
Reply to author
Forward
0 new messages