Thanks for your quick response Bob,
I want to clarify one thing because I did not want to propose to fit a random subset of the imputed data, at least not if subset means using only a subset of the cases for which imputation was done.
Let’s say we have a dataset with 900 complete and 100 incomplete cases with missing data. Then we perform multiple imputations to get imputed 5 data sets, each consisting of the 900 originally complete cases plus 100 cases with imputation of missing data (with some variation in imputation values between the 5 imputed data sets).
Ben’s suggestion was to fit the 5 datasets (1000 cases each) in independent stan calls and then to merge the chains.
My proposal was to make one stan call to which I submit 900 complete cases plus 5 versions of 100 cases with imputations. For the likelihood calculation I would then each time use all 900 originally complete cases plus one of the 5 sets (randomly chosen) of 100 cases with imputed data.
I understand the potential problem during the adaptation (of sampling parameters I assume). Still, I think this is not necessarily a problem because the imputed values should be similar across imputation versions.
I guess I’ll give it a try and report back.
Cheers - Guido
You received this message because you are subscribed to a topic in the Google Groups "stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/daqP88cFQGY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
My first Idea was to do imputation and estimation of the regression model of interest in the same stan model. I did a simple test with simulated data which worked fine (see code below), but while doing the test I realized that scaling this to my own dataset would be extremely time-consuming.
The alternative then is to do the imputation first (e.g. with mi) and to do the final regression model in stan. Here, Ben proposed in an earlier post to(a) run the stan model repeatedly with the different versions of the imputed data sets and(b) merge the chains from these modelsI’d like to hear your opinion about an alternative way to use the different versions of the imputed data sets:(a) submit the originally complete part of the dataset and different versions of the imputed data to the same model and(b) specify the model such that likelihood is computed for the originally completed data plus a randomly chosen imputed data set
Specifically, whenYcomp = criterion values for complete dataYimp = criterion values for imputed dataXcomp = design matrix for complete dataXsimp = array with design matrices for imputed dataalpha = intercept parameterbeta = vector of regression weightssigma = regression errorthe crucial part of the model specification would be:Y ~ normal( Xcomp * beta, sigma)Ycomp ~ normal(Xsimp [k] * beta, sigma) // where k is a random numberIt seems to me, this would be more efficient than running multiple stan models on different imputed data sets, when only (in my case) ca. 25% of the cases would differ between imputed data sets. Does this make sense?If yes, the problem to solve would be to get the random number k. in stan, random numbers can only be generated in the “generated quantities block”. Hence, would this (I fear) trick work:First I specify a parameter _k with an (implicit) uniform prior in the parameters block:real<lower=0,upper=impN> real_k; // impN = number of versions of imputed dataThen I convert it to an integer in the model block (following a proposal from Bob in another post):int k;k <- 1;while ((k + 1) < floor(real_k))k <- k + 1;I have a bad feeling about specifying a parameter that will be updated even though it has no influence on the likelihood (I don’t know enough about the sampler to know if this is problematic ), but I don’t have a better solution at the moment.
Thanks for your input Ben!
I also thought about weighting the imputed parts, but got stuck in the
implementation of the weighting.
(couldn't find a solution in forum or stan-reference).
Can you point me to an example or documentation where this is described?
Thanks - Guido