Hello all,
I have some problems with the incorporation of imputed data sets in lavaan.
I’ve created 5 imputed data sets with mice. The data consists of both numeric items and factors (representing ordered-categorical variables) which were imputed.
I’ve converted the mids-object into a list:
mice.imp0 <- NULL
m=5
for(i in 1:m){
mice.imp0[[i]] <-
complete(imp.res0,action=i,include=FALSE)
}
That works fine but when I try to incorporate the imputed data sets in lavaan, an error occurs. (I use WLSMV, my initial model only contains categorical variables).
fit.mi<-runMI(model, data=mice.imp0, fun=sem, ordered= categoricalvars, parameterization="theta", estimator="WLSMV")
Error in as.character(mc[[1L]]) :
cannot coerce type 'closure' to vector of type 'character'
I’ve done a lot of research on the internet but I can’t find a solution. I would be very glad, if someone could help me.
Thank you very much in advance!
Error in as.character(mc[[1L]]) :
cannot coerce type 'closure' to vector of type 'character'
is.character(categoricalvars)
as.character(categoricalvars)
lapply(mice.imp0, head)sapply(mice.imp0, class)
yes you can use runMI from semTools package.
if there is another way instead of runMI to incorporate a set of imputed data in the context of the analysis with lavaan.
fit.mi<-runMI(model, data=mice.imp0, fun=sem, ordered= categoricalvars, parameterization="theta", estimator="WLSMV")
Error in as.character(mc[[1L]]) :
cannot coerce type 'closure' to vector of type 'character'
> sessionInfo()R version 3.3.2 (2016-10-31)Platform: x86_64-w64-mingw32/x64 (64-bit)Running under: Windows 7 x64 (build 7601) Service Pack 1attached base packages:[1] stats graphics grDevices utils datasets methods baseother attached packages:[1] semTools_0.4-14 lavaan_0.5-23.1044loaded via a namespace (and not attached):[1] MASS_7.3-45 tools_3.3.2 mnormt_1.5-5 pbivnorm_0.6.0 stats4_3.3.2 quadprog_1.5-5
install.packages("lavaan", repos = "http://www.da.ugent.be", type = "source")
> fit.mi <- sem.mi(clp, data=mice.imp2, ordered= cat, parameterization="theta", estimator="WLSMV")1. Convergence 2. Proper SE 3. Proper Variance Estimate Used for pooling[1,] TRUE TRUE FALSE FALSE[2,] TRUE TRUE FALSE FALSE[3,] TRUE TRUE FALSE FALSE[4,] TRUE TRUE FALSE FALSE[5,] TRUE TRUE FALSE FALSEError in runMI(model = model, data = data, m = m, miArgs = miArgs, chi = chi, :Please increase the number of imputations. The number of convergent replications is less than or equal to 1. See the details above.In addition: There were 50 or more warnings (use warnings() to see the first 50)
** WARNING ** lavaan (0.5-23.1044) model has NOT been fitted
** WARNING ** Estimates below are simply the starting values
This is also the case if I reduce model size (e. g. only 2 measurement occasions, removing the mean structure, etc.) or if I try to use MLR-estimation. As suggested by Mauricio, I even re-conducted the imputation, both with mice and Amelia. This time I've created 10 data frames. Although there are no longer improper solutions, no negative variances and a good chi-square, the warnings still appear - even with (unconstrained) CFA models. That's odd because my model fits with the original data. I've checked the convergence of the imputations and the distribution of the imputed data ... I can't find anything conspicuous.
Do you have any idea what could be wrong?
These warnings are (here) completely harmless. For once, you can ignore
them.
fit <- semList(model, dataList = mice.imp0)
The, do this (this is unofficial, just experimental):
fit@meta$lavMultipleImputation <- TRUE
and then request the summary():
summary(fit)
As far as I understand it, the pooled chi-square should be the average of the chi-squares for m analyses (Van Buuren, 2012, Flexible Imputation of Missing Data, eq. 6.11);
the test statistic for the chi-square test, however, demands adjusts. The correction for the analysis of categorical data is done according to Li et.al. 1991.
So it seems that what has been printed as the "robust chi-square" is the product of D and df (see the formula for `results` above), not the mean chi-square across m imputations. For completeness, I would also add that I found the same for the unscaled results (e.g. see lines 183-5).
I'm still a little bit confused.
My pooled chi-square and the corresponding p-value (see my last post above) seem to contradict the single chi-squares and p-values from the separate analyses. The pooled chi-square value is much smaller and it's nonsignificant, in contrast to all other analyses (both separate analyses and initial analyses with incomplete data with missing = "pairwise").
Would it be better to renounce multiple imputation and to proceed with incomplete data and missing = "pairwise"? What would you recommend in the special case of SEM with categorical variables with WLSMV?
In addition, I wonder about the appropriate chi-square difference test for nested models:
Given the pooled results from lavaanStar object from runMI, which chi-square difference test would then be appropriate in the case of categorical variables?
anova() does not seem to work: "Error in if (D < 0) D <- 0 : missing value where TRUE/FALSE needed. In addition: Warning message:In sqrt(chis) : NaNs produced"
Alternatively how is it possible to compare fit (compareFit() gives also an error: "Error in fitDiff["Chisq diff"][2, 1] : incorrect number of dimensions")?
> F_fit1_CFA <- data.frame(predict(fit)); tail(F_fit1_CFA)
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "c('lavaan.mi', 'lavaanList')"
My first question is, is there at all a way to retrieve the pooled factor scores when using imputed data sets?
devtools::install_github("simsem/semTools/semTools")
library(semTools) # 0.5-1.926
?plausibleValues
I would get x data sets containig the plausible values (i.e. putative factor scores). Is that right?
As I don't see how I can cicumvent that I'll need to go forward with one set of scores
out <- runMI(..., FUN=lavPredict)
out@funList[[1]] # see factor scores from first imputation
## average across m imputations (sum, then divide by m)
meanFS <- Reduce("+", out@funList) / length(out@funList)
ci_model <- runMI(fun = "cfa", model, data = impdata, miPackage = "mice", m = 100,
estimator="WLSMV", parameterization = "theta", parallel = "multicore", ncpus = 4L,
ordered = c("var1",...,"var61"))
ci_model <- cfa.mi(model, data = impdata, miPackage = "mice", m = 100,
estimator="WLSMV", parameterization = "theta", FUN = lavaanList(parallel = "multicore", ncpus = 4L),
ordered = c("var1",...,"var61"))
parallel = "multicore",
ncpus = 4L),
parallel::detectCores()
ncpus = parallel::detectCores() - 1
ci_model <- runMI(fun = "cfa", model, data = impdata, miPackage = "mice", m = 100,
estimator="WLSMV", parameterization = "theta", parallel = "snow", ncpus = 3L,
ordered = c("var1",...,"var61"))
ci_model <- cfa.mi(model, data = impdata, miPackage = "mice", m = 100,
estimator="WLSMV", parameterization = "theta", FUN = lavaanList(parallel = "snow", ncpus = 3L),
ordered = c("var1",...,"var61"))
Error in lavaanList(parallel = "snow", ncpus = 3L) :
lavaan ERROR: (generated) data is not a data.frame (or a matrix)
ci_model <- runMI(fun = "cfa", model, data = impdata, miPackage = "mice", m = 100,
ci_model <- cfa.mi(model, data = impdata, miPackage = "mice", m = 100,
FUN = lavaanList(parallel = "snow", ncpus = 3L),
ordered = c("var1",...,"var61"))
Error in lavaanList(parallel = "snow", ncpus = 3L) :lavaan ERROR: (generated) data is not a data.frame (or a matrix)
Are you aware of a way to resolve this error?
miceOut <- mice::parlmice(data = data_final,
m = 100,
method = methVec,
maxit = 100,
cluster.seed = 1414,
n.core = 2,
n.imp.core = 50,
cl.type = "PSOCK")
nImputations <- 100
impList <- list()
for (i in 1:nImputations) {
impList[[i]] <- complete(miceOut, action = i)
}
impList
ci_fit <- runMI(fun = "cfa", model = ci_model, data = impList, estimator="WLSMV", parameterization = "theta", ordered = c("var1",...,"var61"))
var19_o1
var19_o2
I
did
not receive this warning when I ran my model on the single (un-imputed)
data set. I've tried to find info about this warning, but I am not sure I
entirely understand:lambda.1_1 >0
lambda.2_1 >0
...
lambda.1_2 >0
lambda.2_2 >0
...
or
...
lambda.1_1 ≥0
lambda.2_1 ≥0
...
lambda.1_2 ≥0
lambda.2_2 ≥0
...
I would be very grateful for any explanation or advice.
Thank you,
Jes
lavaan WARNING starting values imply a correlation larger than 1; variables involved are:var19_o1
var19_o2
(b) whether it implies that I cannot trust my model, and
(c) whether there is a way to circumvent/correct for this problem?
I should not that this warning only for a subset of the number of imputation sets (at least I don't get the warning as often as I have imputed data sets). So I assume that this seems to be a problem for some but not all of the imputed data sets?
I originally had a loading for item 1 of 0.63 for occasion 1 and of 0.61 for occasion two. Now I would have a value of -0.15 for occasion 1 and a value of 0.059 for occasion two.
Can I do something like forcing all loadings of both factors to be positive:
lambda.1_1 ≥0 ...and
lambda.1_2 ≥0)
seemed to solve the starting value error. I now get a plausible summary output.Negative pooled test statistic was set to zero, so fit will appear to be arbitrarily perfect. Robust corrections uninformative, not returned.
From what I have inspected so far, it looks like one of the imputed data sets is a haywood case (and I am wondering whether this may cause the above error).
Therefore, I was wondering whether I could remove the coefficients of this imputed data set from the lavaan S4 object, and try to feed the other solutions into the fitMeasures() command, to see whether it technically would work fine and indeed is only this one imputation data set that behaves strange and prevents the fit from being computable?
Can you advice me on a way to remove one of the ci_fit@coefList solutions from the fit object?
I tried to subset the object myself, but that was not quite so successful ... (amateur speaking here.. as you probably are aware of).
Or if you have any other idea what causes and would thus would prevent the above error I am above very happy to hear that!
Many many thanks for all the help and apologies for all the questions,
Jes
seemed to solve the starting value error
I have now one additional error to solve, which I get when I try to request fit measures:Negative pooled test statistic was set to zero, so fit will appear to be arbitrarily perfect. Robust corrections uninformative, not returned.
one of the imputed data sets is a haywood case (and I am wondering whether this may cause the above error).
Can you advice me on a way to remove one of the ci_fit@coefList solutions from the fit object?
lavTestLRT.mi(fit_ci, omit.imps = c("no.conv","no.se","no.npd"))
#"D3" only available using maximum likelihood estimation. Changed test to "D2".
#Robust correction can only be applied to pooled chi-squared statistic, not F statistic. "asymptotic" was switched to TRUE.
#Negative pooled test statistic was set to zero, so fit will appear to be arbitrarily perfect. Robust corrections uninformative, not returned.
# chisq df pvalue npar ntotal
# 0 7185 1 562 1188
lavTestLRT.mi(generaldistress_fit_ci, omit.imps = c("no.conv", "no.se", "no.npd"), test = "D2", pool.robust = TRUE)
#Error in out[["F"]] : subscript out of bounds
lavTestLRT.mi(generaldistress_fit_ci, omit.imps = c("no.conv", "no.se", "no.npd"), test = "D2", asymptotic = TRUE, pool.robust = TRUE)
#Negative pooled test statistic was set to zero, so fit will appear to be arbitrarily perfect. Robust corrections uninformative, not returned.
# chisq df pvalue npar ntotal chisq.scaled df.scaled pvalue.scaled
# 0 7185 1 562 1188 0 7185 1
lavTestLRT.mi(generaldistress_fit_ci, test = "D2", asymptotic = TRUE, pool.robust = TRUE)
#Negative pooled test statistic was set to zero, so fit will appear to be arbitrarily perfect. Robust corrections uninformative, not returned.
# chisq df pvalue npar ntotal chisq.scaled df.scaled pvalue.scaled
# 0 7185 1 562 1188 0 7185 1
fitMeasures(syntax.fit.GeneralDistress_Model_ci)
npar fmin chisq df pvalue
562.000 6.995 16619.815 7185.000 0.000
chisq.scaled df.scaled pvalue.scaled chisq.scaling.factor baseline.chisq
11141.971 7185.000 0.000 3.066 3916648.022
baseline.df baseline.pvalue baseline.chisq.scaled baseline.df.scaled baseline.pvalue.scaled
7381.000 0.000 359756.869 7381.000 0.000
baseline.chisq.scaling.factor cfi tli nnfi rfi
11.094 0.998 0.998 0.998 0.996
nfi pnfi ifi rni cfi.scaled
0.996 0.969 0.998 0.998 0.989
tli.scaled cfi.robust tli.robust nnfi.scaled nnfi.robust
0.988 NA NA 0.988 NA
rfi.scaled nfi.scaled ifi.scaled rni.scaled rni.robust
0.968 0.969 0.989 0.989 NA
rmsea rmsea.ci.lower rmsea.ci.upper rmsea.pvalue rmsea.scaled
0.033 0.033 0.034 1.000 0.022
rmsea.ci.lower.scaled rmsea.ci.upper.scaled rmsea.pvalue.scaled rmsea.robust rmsea.ci.lower.robust
0.021 0.022 1.000 NA NA
rmsea.ci.upper.robust rmsea.pvalue.robust rmr rmr_nomean srmr
NA NA 0.046 0.046 0.046
srmr_bentler srmr_bentler_nomean crmr crmr_nomean srmr_mplus
0.046 0.046 0.046 0.046 0.046
srmr_mplus_nomean cn_05 cn_01 gfi agfi
0.046 528.321 534.285 0.996 0.996
pgfi mfi
0.924 0.019
this is not, as I first thought, a question of one imputation set being odd. I still do not understand what is happening.
Unfortunately the model runs quite some time (on the imputed data sets approximately 23 hours
Is this the point where you give up and tell the reviewers you were unable to fit your models on imputation data sets?
this is not, as I first thought, a question of one imputation set being odd. I still do not understand what is happening.There hasn't been a systematic investigation of this issue, that I am aware of. I forgot that it could happen with D2 as well as D3. From what I understand, it happens when there is a lot of between-imputation variability, and intuitively I think it would be less likely to occur the worse a model fits.
Unfortunately the model runs quite some time (on the imputed data sets approximately 23 hoursoh, that's awful.Is this the point where you give up and tell the reviewers you were unable to fit your models on imputation data sets?If you have time to run it once more, add the argument runMI(..., FUN = fitMeasures) to save the fit indices from each imputation. Then you can extract (e.g.) chisq, rmsea, and cfi from each imputation, and report the range across imputations in lieu of being able to obtain a pooled test. You should still be able to obtain SRMR from fitMeasures() because it is not based on chi-squared. If the range is as good as your first imputation's output, I think you can make a reasonable argument that approximate fit is acceptable enough to continue. Hopefully, any chi-squared difference tests using lavTestLRT.mi() on 2 nested models will not also yield negative pooled tests.
Is there literature on that?
make sense to pool this plausible values and use them as factor scores (indicators) in a subsequent analysis (aonther cfa-model)?