Help with error: "Rejecting initial value" "Gradient evaluated at the initial value is not finite"

1,256 views
Skip to first unread message

Tim Ballard

unread,
Aug 4, 2016, 10:17:34 PM8/4/16
to Stan users mailing list
Hello all,

First, let me say that I'm very how excited I am about Stan. I have recently started using Stan after finding that JAGS grinds to a halt when trying to fit some of my more complex models with larger datasets. I'm particularly excited about what appears to be an extremely active support network.

I'm currently trying to fit a cognitive decision-making model. The model compiles just fine, but when running the model I'm repeatedly getting the same error:

[1] "Rejecting initial value:"                                 "  Gradient evaluated at the initial value is not finite."
[3] "  Stan can't start sampling from this initial value."     "Error : "  

Having done quite a lot of googling in my attempts to figure out what is going wrong, I've found that some of the usual suspects for triggering this error are incorrect constraints on priors, or NaN/infinities in the likelihood function. I don't believe any of those are causing the issue in this instance. My two priors are both gamma distributed, and I have used what I think is the correct constraint in their declaration ( <lower=0> ). I've also used the 'print' command to make sure that there are no NaNs or infinities feeding into the likelihood function. Here is my model:

modelString = "
data {
    int<lower=0> Ntotal;
    int y[Ntotal];
    vector[Ntotal] a_f;
    vector[Ntotal] b_f;
    vector[Ntotal] a_ta_m_tr;
    vector[Ntotal] b_ta_m_tr;
    vector[Ntotal] a_val;
    vector[Ntotal] b_val;
    vector[Ntotal] a_qm_a; 
    vector[Ntotal] a_qm_b;
    vector[Ntotal] b_qm_a;
    vector[Ntotal] b_qm_b;
    vector[Ntotal] a_qv_a;
    vector[Ntotal] a_qv_b;
    vector[Ntotal] b_qv_a;
    vector[Ntotal] b_qv_b;

 }  
 parameters {
    real<lower=0> time_sensitivity;
    real<lower=0> threshold;
 }

 transformed parameters {
    real threshold2;
    vector[Ntotal] a_raw_exp;
    vector[Ntotal] b_raw_exp;
    vector[Ntotal] a_exp;
    vector[Ntotal] b_exp;
    vector[Ntotal] a_util;
    vector[Ntotal] b_util;
    vector[Ntotal] a_util_sq;
    vector[Ntotal] b_util_sq;
    vector[Ntotal] mv_diff_m;
    vector[Ntotal] mv_diff_v;
    vector[Ntotal] mv_diff_sd;
    vector[Ntotal] ilogit_p_a;
    vector[Ntotal] p_a;   

    threshold2 = threshold*2;
    a_raw_exp = time_sensitivity * a_ta_m_tr;
    b_raw_exp = time_sensitivity * b_ta_m_tr;

    for(n in 1:Ntotal){
      a_exp[n] = inv_logit(a_raw_exp[n]);
      b_exp[n] = inv_logit(b_raw_exp[n]); 
    }
    
    a_util = (a_f).*(a_exp).*a_val +  (1-a_f).*(1-a_exp).*a_val;
    b_util = (b_f).*(b_exp).*b_val +  (1-b_f).*(1-b_exp).*b_val;

    a_util_sq = (a_util).*a_util;
    b_util_sq = (b_util).*b_util;

    mv_diff_m =  ( (a_util).*a_qm_a + (b_util).*b_qm_a) - ((a_util).*a_qm_b + (b_util).*b_qm_b);
    mv_diff_v = (a_util_sq).*a_qv_a + (b_util_sq).*b_qv_a + (a_util_sq).*a_qv_b + (b_util_sq).*b_qv_b;      
  
    for(n in 1:Ntotal){
      mv_diff_sd[n] = sqrt(mv_diff_v[n]);       
    }
 }

 model {
    time_sensitivity ~ gamma( .1 , .1 );
    threshold ~ gamma( .1 , .1);
    y ~ bernoulli_logit( (mv_diff_m)./( mv_diff_sd +0.0001 )*threshold2 );
  }
"

I've compiled the model and run the sampling in rstan using the following commands:

stanDso <- stan_model( model_code=modelString )

stanFit <- sampling( object=stanDso , 
                       data = dataList , 
                       pars = c( "time_sensitivity","threshold"),
                       chains = 4,
                       iter = 1000, 
                       warmup = 500, 
                       thin = 1 ,
                       init = list(list(time_sensitivity=1,threshold=1),
                                   list(time_sensitivity=1,threshold=1), 
                                   list(time_sensitivity=1,threshold=1),
                                   list(time_sensitivity=1,threshold=1)))  

I've attached the dataList. I've set the initial values in this model to 1 for all chains and parameters, because I know the model can evaluate at these parameter values. However, I'm still getting the error. I would really appreciate if anyone has any insights about what could be going wrong.

Also, is there anything I could be doing to make this model run faster? I believe I've vectorised as much as I can. One question I had regarding optimisation is whether it is possible to multiply 1-dimension arrays of real values with other 1-dimensional arrays of real values (e.g., as you would in R), where the result is an array of the same size as x and y with the element-wise products. For example, if x and y are both one dimensional arrays, it appears that the model won't compile if I include the statement: z = x*y. To make it work, I have to declare x and y as vectors and do element-wise multiplication via the statement: z = x.*y. Is there a better way to optimise these sorts of operations? Or does it not matter?

Thanks,
Tim





dataList.Rda

Andrew Gelman

unread,
Aug 4, 2016, 10:22:10 PM8/4/16
to stan-...@googlegroups.com
All I can say from a quick look is:
(a) That 0.0001 looks weird to me.  This sort of thing can cause problems.  I'm suspicious of those gamma(.1,.1)'s too.
(b) All those inv_logits look like they could be slowing things down; is it possible to write things more directly?
(c) Whassup with fixing those starting values.  This shouldn't be necessary.
A

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<dataList.Rda>

Tim Ballard

unread,
Aug 4, 2016, 10:49:11 PM8/4/16
to Stan users mailing list, gel...@stat.columbia.edu
Hi Andrew,

Thanks for the reply.

(a) That 0.0001 looks weird to me.  This sort of thing can cause problems.  I'm suspicious of those gamma(.1,.1)'s too.

I added the 0.0001 simply because there are there are certain elements  'mv_diff_sd' that equal 0. Without the small added constant, mv_diff_m / mv_diff_sd would be NaN for those elements. When I remove the constant, the model gives a different error:

"Exception thrown at line 68: stan::math::bernoulli_logit_log: Logit transformed probability parameter[149] is nan, but must not be nan!"

If I change the value of the constant to 0.1, I still get the original error.

When I run the model with a more reasonable parameterisation of the gamma priors (e.g., gamma(1,1) ), or if I change the priors to a uniform distribution (and add the appropriate constraints), I still get the same error.

(b) All those inv_logits look like they could be slowing things down; is it possible to write things more directly?

I've tried models with:

     a_exp[n] = inv_logit(a_raw_exp[n])

reexpressed as

    a_exp[n] = 1/(1+pow(2.71828,a_raw_exp[n])

or

  a_exp[n] = 1/(1+2.71828^a_raw_exp[n])

I still get the error. Incidentally, is there a difference in efficiency among those three different statements?

(c) Whassup with fixing those starting values.  This shouldn't be necessary.

Ok, good to know. I did it originally so that I knew exactly which starting values the model would be using, so that I might be able to eliminate certain possible causes of the error. It doesn't seem to be a problem with the starting values. Although it's interesting that if I don't fix the initial values, I get a different error:

[301] "Initialization between (-2, 2) failed after 100 attempts. "                                           
[302] " Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model."
[303] "Error : " 

Thanks again for the feedback!

Daniel Lee

unread,
Aug 4, 2016, 11:33:44 PM8/4/16
to stan-users mailing list
Hi Tim,

The issue you're having is that the model you've set up isn't differentiable. The problem you have is the last for loop in the transformed parameters block:
  for(n in 1:Ntotal){
    mv_diff_sd[n] = sqrt(mv_diff_v[n]);       
  }

For Stan to have a chance to work, it has to have finite gradients. Just getting back to calc for a minute, d / dx sqrt(x) = 1 / (2 * sqrt(x)). When x = 0, the derivative isn't well defined.

It's way too complicated for me to follow the logic, but I'm not sure why you've created mv_diff_v the way you have. Perhaps there's something you can do to bound it away from zero? Or really think about why you need that expression or the expression going into bernoulli_logit?



Daniel



To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+unsubscribe@googlegroups.com.

Tim Ballard

unread,
Aug 5, 2016, 2:01:47 AM8/5/16
to Stan users mailing list
Thanks! I changed that line to mv_diff_sd[n] = sqrt(mv_diff_v[n] + 1);  e tocce and removed the constant from the bernoulli_logit function and the model ran without any errors.

In general, is there anything else I can be doing to make the model run faster?

Kind regards,
Tim 

Bob Carpenter

unread,
Aug 6, 2016, 9:03:32 PM8/6/16
to stan-...@googlegroups.com, gel...@stat.columbia.edu
(a) is usually a sign that you're not modeling the data
with the appropriate model.

(b) inv_logit is itself more efficient (and also more accurate than
using that pow(); there's a built-in exp() function in every
programming language so you should never be typing in e as a constant!
Built-ins will always be faster.

(c) that's a sign that your constraints don't match your model. if
a value of the parameters satisfies the declared constraints, it should
have positive density (finite log density).

For efficiency, define all values that only depend on data as
transformed data.

For numerical stability, keep everything on log scale and only
apply inv_logit, exp(), etc. at last possible moment (preferably
in one of the built-ins, like bernoulli_logit.

- Bob

Tim Ballard

unread,
Aug 10, 2016, 5:33:08 PM8/10/16
to Stan users mailing list, gel...@stat.columbia.edu
Hi all,

Thanks for all the advice! My model is working great now.

Tim
Reply all
Reply to author
Forward
0 new messages