Hello all,
First, let me say that I'm very how excited I am about Stan. I have recently started using Stan after finding that JAGS grinds to a halt when trying to fit some of my more complex models with larger datasets. I'm particularly excited about what appears to be an extremely active support network.
I'm currently trying to fit a cognitive decision-making model. The model compiles just fine, but when running the model I'm repeatedly getting the same error:
[1] "Rejecting initial value:" " Gradient evaluated at the initial value is not finite."
[3] " Stan can't start sampling from this initial value." "Error : "
Having done quite a lot of googling in my attempts to figure out what is going wrong, I've found that some of the usual suspects for triggering this error are incorrect constraints on priors, or NaN/infinities in the likelihood function. I don't believe any of those are causing the issue in this instance. My two priors are both gamma distributed, and I have used what I think is the correct constraint in their declaration ( <lower=0> ). I've also used the 'print' command to make sure that there are no NaNs or infinities feeding into the likelihood function. Here is my model:
modelString = "
data {
int<lower=0> Ntotal;
int y[Ntotal];
vector[Ntotal] a_f;
vector[Ntotal] b_f;
vector[Ntotal] a_ta_m_tr;
vector[Ntotal] b_ta_m_tr;
vector[Ntotal] a_val;
vector[Ntotal] b_val;
vector[Ntotal] a_qm_a;
vector[Ntotal] a_qm_b;
vector[Ntotal] b_qm_a;
vector[Ntotal] b_qm_b;
vector[Ntotal] a_qv_a;
vector[Ntotal] a_qv_b;
vector[Ntotal] b_qv_a;
vector[Ntotal] b_qv_b;
}
parameters {
real<lower=0> time_sensitivity;
real<lower=0> threshold;
}
transformed parameters {
real threshold2;
vector[Ntotal] a_raw_exp;
vector[Ntotal] b_raw_exp;
vector[Ntotal] a_exp;
vector[Ntotal] b_exp;
vector[Ntotal] a_util;
vector[Ntotal] b_util;
vector[Ntotal] a_util_sq;
vector[Ntotal] b_util_sq;
vector[Ntotal] mv_diff_m;
vector[Ntotal] mv_diff_v;
vector[Ntotal] mv_diff_sd;
vector[Ntotal] ilogit_p_a;
vector[Ntotal] p_a;
threshold2 = threshold*2;
a_raw_exp = time_sensitivity * a_ta_m_tr;
b_raw_exp = time_sensitivity * b_ta_m_tr;
for(n in 1:Ntotal){
a_exp[n] = inv_logit(a_raw_exp[n]);
b_exp[n] = inv_logit(b_raw_exp[n]);
}
a_util = (a_f).*(a_exp).*a_val + (1-a_f).*(1-a_exp).*a_val;
b_util = (b_f).*(b_exp).*b_val + (1-b_f).*(1-b_exp).*b_val;
a_util_sq = (a_util).*a_util;
b_util_sq = (b_util).*b_util;
mv_diff_m = ( (a_util).*a_qm_a + (b_util).*b_qm_a) - ((a_util).*a_qm_b + (b_util).*b_qm_b);
mv_diff_v = (a_util_sq).*a_qv_a + (b_util_sq).*b_qv_a + (a_util_sq).*a_qv_b + (b_util_sq).*b_qv_b;
for(n in 1:Ntotal){
mv_diff_sd[n] = sqrt(mv_diff_v[n]);
}
}
model {
time_sensitivity ~ gamma( .1 , .1 );
threshold ~ gamma( .1 , .1);
y ~ bernoulli_logit( (mv_diff_m)./( mv_diff_sd +0.0001 )*threshold2 );
}
"
I've compiled the model and run the sampling in rstan using the following commands:
stanDso <- stan_model( model_code=modelString )
stanFit <- sampling( object=stanDso ,
data = dataList ,
pars = c( "time_sensitivity","threshold"),
chains = 4,
iter = 1000,
warmup = 500,
thin = 1 ,
init = list(list(time_sensitivity=1,threshold=1),
list(time_sensitivity=1,threshold=1),
list(time_sensitivity=1,threshold=1),
list(time_sensitivity=1,threshold=1)))
I've attached the dataList. I've set the initial values in this model to 1 for all chains and parameters, because I know the model can evaluate at these parameter values. However, I'm still getting the error. I would really appreciate if anyone has any insights about what could be going wrong.
Also, is there anything I could be doing to make this model run faster? I believe I've vectorised as much as I can. One question I had regarding optimisation is whether it is possible to multiply 1-dimension arrays of real values with other 1-dimensional arrays of real values (e.g., as you would in R), where the result is an array of the same size as x and y with the element-wise products. For example, if x and y are both one dimensional arrays, it appears that the model won't compile if I include the statement: z = x*y. To make it work, I have to declare x and y as vectors and do element-wise multiplication via the statement: z = x.*y. Is there a better way to optimise these sorts of operations? Or does it not matter?
Thanks,
Tim