I'm currently estimating discrete choice models using Stan via RStan 2.14 and R 3.3.2., and I'm having an issue with two generated quantities I'm trying to use to calculate a Bayesian p-value for the model. I receive an error that "The following variables have undefined values: chis_obs,The following variables have undefined values: chis_sim"
I've pasted the model below. Since the two quantities having problems are sums of the chi-square expected and observed values, I'm thinking the problem lies in calculating the "expected[a]" vector, since this is used in both sums. From looking into the topic, I've gathered some ideas about the problem. I think it could be related to numerical over/underflow in the softmax function?
I should note is that using a slightly altered version of this model, a hierarchical version where the beta[J][K] (for j individuals in a population) were ~normal(mu[K], sigma[K]), this calculation worked. I did not have the log_lik computation in that version, but figured that wouldn't affect these generated quantities in question. Is that a poor assumption?
Any comments on improving efficiency or other practical suggestions are certainly welcomed. Thank you!
data{
int C; // the number of choices per set
int K; // the number of slope parameters per individual
int N; // the number of observations
matrix[C, K] x[N]; // variables in design matrix format
int y[N]; // index of which alternative was selected
vector[C] obs; //variable used in test-stat calculation
vector[C] pos[C];
}
parameters{
vector[K] beta; // rsf coefficients
}
model{
for(l in 1:K){
beta[K] ~ normal(0, 10); // prior distribution for slope parameter mean
}
for(i in 1:N){
y[i] ~ categorical_logit(softmax(x[i] * beta));
}
}
generated quantities{
simplex[C] expected[N]; //the probabilities of use from dc_model
vector[N] chis_obs_i; //chi-square value for each choice set using observed data
vector[N] chis_sim_i; // chi-square value from simulated data
real chis_obs; //sum of chi square values across all choice sets
real chis_sim; //sum of chi square values across all choice sets
vector[C] rch[N]; //simulated random choice of used alt. within obs. data sets
vector[N] log_lik; // log likelihood
int rcat[N];
for(a in 1:N){
expected[a] = softmax(x[a] * beta);
rcat[a] = categorical_rng(expected[a]);
rch[a] = pos[rcat[a]];
chis_obs_i[a] = sum(((obs - expected[a]) .* (obs - expected[a])) ./
expected[a]);
chis_sim_i[a] = sum(((rch[a] - expected[a]) .* (rch[a] - expected[a])) ./
expected[a]);
log_lik[a] = categorical_logit_lpmf(y[a]| x[a] * beta);
}
chis_obs = sum(chis_obs_i);
chis_sim = sum(chis_sim_i);
}