Can you attach the whole Stan program you're using?
If you marginalize out the mixture responsibilities, Stan
should mix much better than JAGS. It has in every problem
I've looked at in the past.
I'd also suggest using a simple beta prior rather than a
Dirichlet if there are only two outcomes.
- Bob
> On Mar 2, 2016, at 2:48 PM, Stephen Martin <
hwki...@gmail.com> wrote:
>
> I'm attempting to fit a fairly complicated model.
>
> It's a way I came up with to assess the probability of differential item functioning (DIF) in irt (and other) models in a bayesian framework, with known OR unknown groups.
> I initially created the model in JAGS, and it worked, but I want to convert most of my models to stan.
>
> I'm mostly interested in assessing the probability of each item's DIF in the unknown or latent group scenario.
>
> In BUGS syntax, it's something like:
>
> for(i in 1:N){
> for(j in 1:J){
> y[i,j] ~ dbern(p[i,j,d])
> logit(p[i,j,1]) <- alpha[j,1] * (theta[i,k[i] - diff[j,1])
> logit(p[i,j,2]) <- alpha[j,k[i]] * (theta[i,k[i]] - diff[j,k[i]]
> }
> k[i] ~ dcat(alpha)
> }
> for(j in 1:J){
> d ~ dcat(overallDIF)
> }
> overallDIF ~ ddirch(1,1)
> alpha ~ ddirich(1,1)
>
> Basically, the model simultaneously estimates the IRT parameters, group memberships, and the probability that each item exhibits DIF.
> More simply, it's finding a combination of items that best splits the data into two clusters.
>
> I translated this model into Stan in the best way I know how, and that's at the bottom.
>
> The problem is that warmup doesn't seem to actually *occur*, or it's taking WAY too long.
> That is, I start 4 chains on the model, one or two of the chains might make progress in warmup and will finish sampling thereafter.
> The other chains will never progress past 0%, or at least after an hour they made no progress.
>
> This seems random. Sometimes, 1 will work or 2 will work.
> *Once* all four worked, and the inference was correct. The issue seems to be in the very initial warmup stage.
>
> What does this suggest about the model? How could I improve sampling?
> Things I've tried:
> 1) Parameterizing in terms of beta and real<lower=0,upper=1> p2, instead of using simplex and dirichlet. This made no difference.
> 2) Estimating alpha and diff from the ltm irt package as initial values. This made no difference.
>
> The code:
>
> data {
> int N; //Sample size
> int J; //Number of items
> int y[N,J]; //Dichotomous responses
> int jOrder; //Which item difficulty to order-constrain, for ident
> }
>
> parameters {
> simplex[2] delta[J]; //DIF or not
> simplex[2] p2[N]; //Mixture component
>
> vector[J] alpha_s[2,2]; //[Delta, Kappa, Item]
> vector[J] diff_s[2,2]; //[Delta, Kappa, Item]
> ordered[2] diff_s1; //Ordered difficulty constraint, for mixture ident
> vector[N] theta[2]; //Theta varies across mixture components
>
> simplex[2] Delta; //Overall DIF
> }
> transformed parameters{
> vector[J] alpha[2,2];
> vector[J] diff[2,2];
> alpha <- alpha_s;
> alpha[1,1] <- alpha[1,2]; //Constrain non-DIF alphas to be equal across components
>
> diff <- diff_s;
> diff[1,1] <- diff[1,2]; //Constrain non-DIF Difficulties to be equal across components
> diff[2,1,jOrder] <- diff_s1[1]; //Replace jOrderth item with ordered difficulty
> diff[2,2,jOrder] <- diff_s1[2];
> }
>
> model {
> //Decl
> real yhat[J,N,2,2]; //Estimate of Jth item, nth person on non-DIF or DIF version, in cluster 1 or 2
> real lp_jndk[J,N,2,2]; //Auxillary variables for holding likelihood parts
> real lp_jnd[J,N,2];
> real lp_jn[J,N];
> real lp_j[J];
>
> //Priors
> ////IRT params
> alpha[1,1] ~ normal(0,1);
> alpha[2,1] ~ normal(0,1);
> alpha[2,2] ~ normal(0,1);
>
> diff[1,1] ~ normal(0,1);
> diff[2,1] ~ normal(0,1);
> diff[2,2] ~ normal(0,1);
>
> ////Mixture probabilities
> for(n in 1:N){
> p2[N] ~ dirichlet(rep_vector(.5,2));
> }
>
> ////Latent abilities, per cluster
> for(k in 1:2){
> theta[k] ~ normal(0,1);
> }
>
> ////Deltas
> for(j in 1:J){
> //delta[j] ~ dirichlet(rep_vector(.5,2));
> delta[j] ~ dirichlet(Delta);
> }
>
> Delta ~ dirichlet(rep_vector(1.0,2)); //May be unnecessary
>
> //Likelihoods
> for(j in 1:J){
> for(n in 1:N){
> for(d in 1:2){
> for(k in 1:2){
> yhat[j,n,d,k] <- alpha[d,k,j]*(theta[k,n] - diff[d,k,j]);
> }
> }
> }
> }
>
> for(j in 1:J){
> for(n in 1:N){
> for(d in 1:2){
> for(k in 1:2){
> //Likelihood on item j of person n on non-DIF or DIF version of item, weighted by cluster membership
> lp_jndk[j,n,d,k] <- log(p2[n,k]) + bernoulli_logit_log(y[n,j],yhat[j,n,d,k]);
> }
> //Weighted likelihood across cluster membership for person n
> lp_jnd[j,n,d] <- log(delta[j,d]) + log_sum_exp(lp_jndk[j,n,d]);
> }
> //Weighted likelihood across cluster membership for person n in each cluster
> lp_jn[j,n] <- log_sum_exp(lp_jnd[j,n]);
> }
> //Sum of N weighted likelihoods for item j
> lp_j[j] <- sum(lp_jn[j]);
> }
> //Increment
> increment_log_prob(sum(lp_j));
>
> R Code:
>
> #Libraries#
> library(sirt)
> library(rstan)
> library(ltm)
> options(mc.cores=parallel::detectCores())
> rstan_options(auto_write=TRUE)
>
> #Simulate heterogenous irt data
> N <- 200
> J <- 40
>
> thetas.hetero <- list(rnorm(N/2),rnorm(N/2))
> bs.hetero <- list(rnorm(J),rnorm(J))
> as.hetero <- list(runif(J,.5,1.2),runif(J,.5,1.2))
> ds.hetero <- sim.raschtype(thetas.hetero[[1]],b=bs.hetero[[1]],fixed.a=as.hetero[[1]])
> ds.hetero <- rbind(ds.hetero,sim.raschtype(thetas.hetero[[2]],b=bs.hetero[[2]],fixed.a=as.hetero[[2]]))
> colnames(ds.hetero) <- paste0('item',1:J)
>
> #Start stan with initial values of alpha/difficulty. This doesn't make a difference.
> #Find initial values for some parameters -- in this case, discrimination/alpha
> ltmCoef <- coef(ltm(ds.hetero ~ z1)) ## Obtain ltm's irt estimates.
> alphaInit <- array(dim = c(2,2,J))
> alphaInit[1,1,1:J] <- ltmCoef[,1]
> alphaInit[1,2,1:J] <- ltmCoef[,1]
> alphaInit[2,1,1:J] <- ltmCoef[,1]
> alphaInit[2,2,1:J] <- ltmCoef[,1]
> diffInit <- array(dim = c(2,2,J))
> diffInit[1,1,1:J] <- ltmCoef[,2]
> diffInit[1,2,1:J] <- ltmCoef[,2]
> diffInit[2,1,1:J] <- ltmCoef[,2]
> diffInit[2,2,1:J] <- ltmCoef[,2]
>
> dif.heteroOut <- stan(model_code=irt.2pl.DIF,data = list(y=ds.hetero,N=N,J=J,jOrder=1),pars=c('theta','alpha','diff','delta','Delta'),
> init=function(){
> list(alpha_s=alphaInit,diff_s=diffInit)
> },verbose=TRUE)