lavaan error: analysis of imputed data with runMI and lavaan

3,851 views
Skip to first unread message

jonas...@gmail.com

unread,
Dec 19, 2016, 8:37:08 AM12/19/16
to lavaan

Hello all,

 

I have some problems with the incorporation of imputed data sets in lavaan.

 

I’ve created 5 imputed data sets with mice. The data consists of both numeric items and factors (representing ordered-categorical variables) which were imputed.

 

I’ve converted the mids-object into a list:

 

mice.imp0 <- NULL

m=5

for(i in 1:m){

   mice.imp0[[i]] <- complete(imp.res0,action=i,include=FALSE)
}

 

That works fine but when I try to incorporate the imputed data sets in lavaan, an error occurs. (I use WLSMV, my initial model only contains categorical variables).

 

fit.mi<-runMI(model, data=mice.imp0, fun=sem, ordered= categoricalvars, parameterization="theta", estimator="WLSMV")

 

Error in as.character(mc[[1L]]) :

  cannot coerce type 'closure' to vector of type 'character'

 

I’ve done a lot of research on the internet but I can’t find a solution. I would be very glad, if someone could help me.

 

Thank you very much in advance!

 

Terrence Jorgensen

unread,
Dec 20, 2016, 6:29:40 AM12/20/16
to lavaan

Error in as.character(mc[[1L]]) :

  cannot coerce type 'closure' to vector of type 'character'


That's the type of message you see when you try use a named object (that is not a character vector, like a function) in place of a character vector.  It's hard to know without having a full script and some data to try it on, but my first guess would be to check the object categoricalvars.  Are you sure it's a character vector?

is.character(categoricalvars)

When you print it to the console, do you see what you expect?

as.character(categoricalvars)

What about the data sets, did you check the structure?

lapply(mice.imp0, head)
sapply(mice.imp0, class)

If everything is as expected, you can send me just enough data and R script to reproduce the error, and I will try to sort it out.  The next version of semTools will deal with multiple imputation differently (at least internally), but I will try to help you work with it before the next version is ready.

Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam





Message has been deleted

jonas...@gmail.com

unread,
Dec 20, 2016, 10:12:28 AM12/20/16
to lavaan
Thank you very much for the quick response!

Checking the data sets, everything is as expected. "categoricalvars" is a character, the list contains 5 "data.frames" and each of these data frames consists of both imputed factors and numerical variables. I've tried to convert all variables in the data frames into characters, but the same error occured. It would be very nice, if you could reproduce the error. I've sent you the data directly via mail.

Apart from that, is there any other way to analyze imputed data simultaneously and pool the results with lavaan? 

Kamran Afzali

unread,
Dec 20, 2016, 11:31:29 PM12/20/16
to lavaan

is there any other way to analyze imputed data simultaneously and pool the results with lavaan?

jonas...@gmail.com

unread,
Dec 21, 2016, 6:01:37 AM12/21/16
to lavaan
yes you can use runMI from semTools package.


I've used runMI from semTools, but - as described above - it doesn't work. My question was, if there is another way instead of runMI to incorporate a set of imputed data in the context of the analysis with lavaan. 

Terrence Jorgensen

unread,
Dec 21, 2016, 7:56:32 AM12/21/16
to lavaan
if there is another way instead of runMI to incorporate a set of imputed data in the context of the analysis with lavaan. 

The only other package in the lavaan ecosystem that I am aware of incorporating multiple imputed data sets is lavaan.survey, which is for complex sampling designs.  I suppose you might be able to pass it a survey-design object that specifies nothing special about the sampling (e.g., give everyone weights of 1), but don't take my word for that.  It would not work in your case anyway, because you have categorical outcomes, which lavaan.survey does not currently handle.

Terrence Jorgensen

unread,
Dec 21, 2016, 9:18:55 AM12/21/16
to lavaan

fit.mi<-runMI(model, data=mice.imp0, fun=sem, ordered= categoricalvars, parameterization="theta", estimator="WLSMV")

Error in as.character(mc[[1L]]) :

  cannot coerce type 'closure' to vector of type 'character'


Hi Jonas,

Thanks for sending your data (privately of course).  I think this must be a version issue, because I ran your script and did not get this error.  I am using R 3.3.2, lavaan 0.5-23 (development version), and semTools 0.4-14

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] semTools_0.4-14    lavaan_0.5-23.1044

loaded via a namespace (and not attached):
[1] MASS_7.3-45    tools_3.3.2    mnormt_1.5-5   pbivnorm_0.6.0 stats4_3.3.2   quadprog_1.5-5

If you need to install the development version of lavaan, here is the syntax: 

install.packages("lavaan", repos = "http://www.da.ugent.be", type = "source")

The model converges for all 5 imputations (despite the error message below), although all of them yielded improper solutions, which by default prevents pooling of results (a rule of thumb imposed by the author of the function).

> fit.mi <- sem.mi(clp, data=mice.imp2, ordered= cat, parameterization="theta", estimator="WLSMV")
     1. Convergence 2. Proper SE 3. Proper Variance Estimate Used for pooling
[1,]           TRUE         TRUE                       FALSE            FALSE
[2,]           TRUE         TRUE                       FALSE            FALSE
[3,]           TRUE         TRUE                       FALSE            FALSE
[4,]           TRUE         TRUE                       FALSE            FALSE
[5,]           TRUE         TRUE                       FALSE            FALSE
Error in runMI(model = model, data = data, m = m, miArgs = miArgs, chi = chi,  : 
  Please increase the number of imputations. The number of convergent replications is less than or equal to 1. See the details above.
In addition: There were 50 or more warnings (use warnings() to see the first 50)

You can override that default rule-of-thumb by setting the argument "includeImproper = TRUE" when you run sem.mi() or runMI().  But I would not ignore the improper solutions in your case.  Although sampling error is a potential explanation for a negative variance estimate, the CI for the residual variance of sp62 is entirely negative in your first imputed data set (as well as in the pooled results across all 5 data sets), so mere sampling error is unlikely.  Your sample size is moderate at best for such a large model, and your approximate fit appears decent, so maybe it is just a fluke, but it might be worth testing whether (for example) an invariance constraint on that item at that occasion is actually tenable.

Mauricio Garnier-Villarreal

unread,
Dec 21, 2016, 12:15:28 PM12/21/16
to lavaan
In cases like this when the imputations are giving improper results. Is also a good idea to go back and check the imputations, for example in the case of using mice, check that the 5 imputations have convergence properly before running them with runMI. You can also try other imputation method with Amelia and mi

jonas...@gmail.com

unread,
Dec 21, 2016, 9:07:13 PM12/21/16
to lavaan
Thank you very much for your efforts, Terrence! sem.mi() and cfa.mi() works, runMI gives the same error as described above. I've installed the development version of lavaan and I use the same versions of R and semTools, but the error still occurs. Apart from that, I could use sem.mi() and cfa.mi(). But each time the output begins with:


** WARNING ** lavaan (0.5-23.1044) model has NOT been fitted

** WARNING ** Estimates below are simply the starting values


This is also the case if I reduce model size (e. g. only 2 measurement occasions, removing the mean structure, etc.) or if I try to use MLR-estimation. As suggested by Mauricio, I even re-conducted the imputation, both with mice and Amelia. This time I've created 10 data frames. Although there are no longer improper solutions, no negative variances and a good chi-square, the warnings still appear - even with (unconstrained) CFA models. That's odd because my model fits with the original data. I've checked the convergence of the imputations and the distribution of the imputed data ... I can't find anything conspicuous.


Do you have any idea what could be wrong?

jonas...@gmail.com

unread,
Dec 21, 2016, 9:41:58 PM12/21/16
to lavaan
I forgot to add that all single imputed data frames fits the model with normal cfa() or sem(). 

Yves Rosseel

unread,
Dec 22, 2016, 2:50:20 AM12/22/16
to lav...@googlegroups.com
> sem.mi() and cfa.mi(). But each time the output begins with:
>
>
> ** WARNING ** lavaan (0.5-23.1044) model has NOT been fitted
>
> ** WARNING ** Estimates below are simply the starting values

These warnings are (here) completely harmless. For once, you can ignore
them.

Yves.

Yves Rosseel

unread,
Dec 22, 2016, 9:18:39 AM12/22/16
to lav...@googlegroups.com
On 12/21/2016 05:31 AM, Kamran Afzali wrote:
>
> is there any other way to analyze imputed data simultaneously and pool
> the results with lavaan?

You could try using the semList() [or lavaanList()] function. Which
simply fits the same model for a list of datasets.

Something like:

fit <- semList(model, dataList = mice.imp0)

The, do this (this is unofficial, just experimental):

fit@meta$lavMultipleImputation <- TRUE

and then request the summary():

summary(fit)

and you will get the point estimates and (pooled) standard errors. But
no fit indices.

Yves.

jonas...@gmail.com

unread,
Dec 23, 2016, 6:55:16 PM12/23/16
to lavaan
Thank you very much, Yves! I will try it out!

jonas...@gmail.com

unread,
Jan 2, 2017, 6:36:27 PM1/2/17
to lavaan
Hello, 

unfortunately I have some additional questions concerning the pooled fit measures produced by sem.mi(), cfa.mi() and runMI(): I wonder why the pooled chi-square values are much smaller and nonsignificant ( ≈ 500, df = 300, p>.05) while the chi-square values of the single imputed data sets are all much higher and significant ( ≈ 700-800, df = 300, p<.001). Do you maybe have ideas or suggestions why there are such large differences? Are the pooled chi-squares (e. g. lmrr) under the WLSMV-estimator with categorical variables the appropriate ones to report? Or do I have to calculate the pooled chi-square otherwise in this case? I've tried to use the miPoolChi() from simsem, but R doesn't find this function. In constrast, the pooled SRMR is much higher (≈ .250) than the single values (≈ .030- .040). How could I compute the appropriate pooled fit measures (especially chi square, MFI, SRMR, WRMR)? 


Furthermore, I've got some queries concerning your previous answers:

 
These warnings are (here) completely harmless. For once, you can ignore 
them. 

Just to ensure that I've understood correctly: The estimates are NOT "simply the starting values" and it would be appropriate to report them?

 fit <- semList(model, dataList = mice.imp0) 
The, do this (this is unofficial, just experimental): 
fit@meta$lavMultipleImputation <- TRUE 
and then request the summary(): 
summary(fit) 

Although lavaanList reported that all datasets converged I am afraid this code produced the following error: 

Error in `$<-.data.frame`(`*tmp*`, "est", value = c(0.434594172636161,  : 
  replacement has 346 rows, data has 316

Do you have an idea what could be wrong?


Thank you very much again for your help and efforts!

Conor Goold

unread,
Jan 3, 2017, 3:36:32 AM1/3/17
to lavaan
I think the reason for the original error ("cannot coerce to type closure to type character...") is simply because you are not using runMI(... , fun = "sem", ...), i.e. sem within inverted commas. At least, I get the same error with runMI when I don't. Otherwise, it is working fine. 
Message has been deleted

Fabrício Fialho

unread,
Jan 3, 2017, 5:03:46 AM1/3/17
to lavaan
Hi all,

Regarding the pooled chi-square value for analysis of categorical variables, I have wondered about it too. I have been analyzing 50 imputed datasets and the pooled chi-square is much smaller than the average chi-square for the 50 analyses. For instance, I found chi-squares ranging from ~380 to ~420 (mean ~399) and the pooled chi-square from runMI is 330ish. I have also found big differences in the fit indexes, what does not come as a surprise as most fit indexes are calculated using the chi-square as an input.

As far as I understand it, the pooled chi-square should be the average of the chi-squares for m analyses (Van Buuren, 2012, Flexible Imputation of Missing Data, eq. 6.11); the test statistic for the chi-square test, however, demands adjusts. The correction for the analysis of categorical data is done according to Li et.al. 1991.


I checked the code runMI code (just downloaded it from the website) and found what apparently may be an issue.

The function `lmrrPooledChi` in runMI calculates the corrected test statistic using Li et. al's eqs. 2.16 and 2.17 (see Li, Meng, Raghunathan, and Rubin 1991, p. 78; see also Van Buuren 2012, p. 159).

`lmrrPooledChi` can be found in the source code, lines 419-439. Line 436 gives the elements to be returned by `lmrrPooledChi`:

result <- c(D, df, aw, p)

D is the adjusted/corrected test statistic; df and aw are the degrees of freedom for the F-test (df is the #degrees of freedom for the analysis, being zero for saturated models, aw is the given by formulas 2.16 and 2.17 in Li et. al. 1991); p is the p-value for the F-test with df,aw degrees of freedom (Van Buuren, 2012, eq. 6.14). So far so good.


Here comes what I think might be an issue:

If we call lavaan::lavInspect(object, what="test")[[2]]$stat, it will return the scaled chi-square.

If I correctly understand lines 191 and 194, it means the robust chi-square for the pooled results in the runMI output is given by:

191: lmrrScaled <- lmrrPooledChi(chiScaled1, dfScaled)
194: fit[[2]]$stat <- as.numeric(lmrrScaled[1] * lmrrScaled[2])

[[2]] indicates this number is to be saved as part of the "robust results" (and to be printed in the "robust" column in the output), $stat indicates is should be saved (and printed) as the scaled chi-square.

So it seems that what has been printed as the "robust chi-square" is the product of D and df (see the formula for `results` above), not the mean chi-square across m imputations.

For completeness, I would also add that I found the same for the unscaled results (e.g. see lines 183-5).

This perhaps explains the difference between the pooled chi-square in the output and what we find if we average the chi-squares for m analyses. If I understand what is in the code, it is just printing something else rather the the average chi-square.

Otherwise, the function seems to be working fine (and helping us to save time and tons of lines! Thanks, folks!)

-----

I put a bunch of dirty code lines together in a script to extract the scaled chi-squares from the m imputations, calculate the mean chi-square, then calculate a few fit indexes from this average chi-square.

Not sure this if of any help but he is a link: https://www.dropbox.com/s/r84s87fp1oicos2/miRunPooledX2.R

It is a very crude code. You'd need to run your analysis using lavaan/cfa/sem (not lavaan.mi/cfa.mi/sem.mi) using a loop throughout your m complete datasets and saving them in a list, then you use this list containing the outputs as the input in my function.

First of all, before your analysis, you need to either (1) open and run this in R (or RStudio), or (2) source it using a command like "source("pathtofindthefile/miRunPooledX2.R") so the function will be available at your workspace.

Here is an example of how to proceed:

Ex:
Suppose you created 5 "complete" datasets using MICE (external to runMI) and saved your MICE output as `miceoutput`.

miceoutput <- mice(mydataforimputation, m=5, etc etc etc)


You then create a list with your five datasets running this:


CompleteDatasets <- list()
for (i in 1:5){
  CompleteDatasets[[i]] <- complete(miceoutput, action=i)
}


Now you run this to analyze each dataset and save results in another list:


listresults <- list()
for(i in 1:length(CompleteDatasets)){
  listresults[[i]] <-
    cfa(data=CompleteDatasets[[i]],
        model=MyModel,
        etc etc etc)
}


Finally, you run my interin function using to get the pooled chi-square and some fit indexes (fitIndexesPooled() is how you call my function):

fitIndexesPooled(listresults)

I recommend you to wrap it in round() to get intelligible number:
round(fitIndexesPooled(listresults),3)

----------

I am sure there are smarter ways to address this issue but this has been working for me in a provisional manner.

Terrence Jorgensen

unread,
Jan 7, 2017, 9:23:06 AM1/7/17
to lavaan
As far as I understand it, the pooled chi-square should be the average of the chi-squares for m analyses (Van Buuren, 2012, Flexible Imputation of Missing Data, eq. 6.11);

No, the average of the m chi-squareds in 6.11 is just one part of the pooled chi-squared.  And that is only for the pooled likelihood-ratio test (so-called D3 statistic; see the Meng & Rubin, 2002, reference in the ?runMI help page). 

the test statistic for the chi-square test, however, demands adjusts. The correction for the analysis of categorical data is done according to Li et.al. 1991.

That has nothing to do with categorical or continuous data.  Li et al. provided the so-called D1 and D2 methods (multiparameter Wald tests and combined Wald statistics, respectively), as opposed to the the pooled likelihood-ratio statistic (D3).  D1 can be applied for any kind of data because it only involves parameter estimates and their asymptotic covariance matrix.  semTools does not currently implement D2.  Because categorical outcomes do not utilize maximum likelihood estimation, I would presume that the pooled likelihood-ratio statistic (D3) is not applicable. But because the likelihood-ratio statistic in ML is the chi-squared statistic, currently semTools does pool the chi-squared from DWLS (even though it is not a likelihood ratio test).

It is noteworthy that Mplus (read the reference from the ?runMI help page) only pools chi-squared when using (nonrobust) ML estimation.  Otherwise, it only reports the distribution of chi-squareds across imputations (I think the Mean and a CI), because it has not been worked out how to pool robust test statistics.  semTools currently applies the naïve formulas using the robust chi-squareds, but a truly appropriate method should probably pool the naïve statistics, then apply a robust adjustment afterward (e.g., using some kind of pooled scaling factor, and shift factor if applicable).   I am unaware of any research on this topic as of yet, or of any "appropriate" method to pool fit indices across imputations, robust or otherwise.  Mplus reports the distribution of fit indices across imputations (like their robust chi-squareds), but semTools currently calculates them using the scaled chi-squared (I don't know about SRMR though).

There will be big changes to runMI in the next version of semTools, and I may not offer D3 for categorical outcomes.

So it seems that what has been printed as the "robust chi-square" is the product of D and df (see the formula for `results` above), not the mean chi-square across m imputations. For completeness, I would also add that I found the same for the unscaled results (e.g. see lines 183-5).

Correct, that is what is currently happening.  Multiplying an F statistic by its denominator df makes it asymptotically chi-squared distributed (with numerator df), which Li et al. use in the D2 statistic.  I forget exactly what it is doing in the current source code (since D2 is not a current feature), and the link to the SAS macro the author adapted to write lmrrPooledChi() is now a dead link.  As I said, there will be big changes in the next version, including a dedicated function to calculate pooled chi-squared statistics with an argument to choose the pooling method (D1, D2, D3), and better explanation in the help page. 

For more information, you can read Ch. 8 of Enders' (2010) Applied Missing Data Analysis, which discusses the three pooling methods.  An article (link) is also available that discusses these same topics, but in an ANOVA context. The same formulas apply, but the simulation results and their recommendations can't be expected to generalize beyond univariate regression models.

In the meantime, since you are using DWLS, I would recommend requesting chi = "lmrr" to avoid using the possibly inappropriate likelihood-ratio pooling method (chi = "mr" or "all").

jonas...@gmail.com

unread,
Jan 9, 2017, 5:57:47 PM1/9/17
to lavaan
Thank you very much for your prompt responses, Fabrício and Terrence!

I'm still a little bit confused. My pooled chi-square and the corresponding p-value (see my last post above) seem to contradict the single chi-squares and p-values from the separate analyses. The pooled chi-square value is much smaller and it's nonsignificant, in contrast to all other analyses (both separate analyses and initial analyses with incomplete data with missing = "pairwise"). Since the chi-square value and the fit statistics (CFI, RMSEA, etc.) are crucial for model rejection and evaluation, I'm unsure if they are trustworthy and if it would be still appropriate to report these pooled values from the runMI output. Would it be better to renounce multiple imputation and to proceed with incomplete data and missing = "pairwise"? What would you recommend in the special case of SEM with categorical variables with WLSMV?

In addition, I wonder about the appropriate chi-square difference test for nested models: Given the pooled results from lavaanStar object from runMI, which chi-square difference test would then be appropriate in the case of categorical variables? anova() does not seem to work: "Error in if (D < 0) D <- 0 : missing value where TRUE/FALSE needed. In addition: Warning message:In sqrt(chis) : NaNs produced"
Alternatively how is it possible to compare fit (compareFit() gives also an error: "Error in fitDiff["Chisq diff"][2, 1] : incorrect number of dimensions")?


I would be very grateful if you could give me some advice. Thank you very much in advance!

Terrence Jorgensen

unread,
Jan 11, 2017, 5:45:37 AM1/11/17
to lavaan
I'm still a little bit confused.

Us too.  That is the state of the field right now, until more research is done on this topic.  The only research I am aware of comparing different chi-squared pooling methods has not focused on categorical data or robust (scaled/shifted) test statistics.

My pooled chi-square and the corresponding p-value (see my last post above) seem to contradict the single chi-squares and p-values from the separate analyses. The pooled chi-square value is much smaller and it's nonsignificant, in contrast to all other analyses (both separate analyses and initial analyses with incomplete data with missing = "pairwise").

The pooling formulas are not just a simple average of within-imputation chi-squared values (see my last post above).  The "lmrr" method is not even based on the within-imputation chi-squared values at all.  Regardless of the method, pooled values are heavily weighted by between-imputation variances, and the formulas found in the references I suggested are not intuitive enough for me to make a quick prediction about what would cause a pooled chi-squared to be lower than the individual within-imputation chi-squared values.  The semTools developers would love to have read a paper explaining all this so we could program best practices as sensible defaults, but we are as in-the-dark as you are because that is the state of the field.

Would it be better to renounce multiple imputation and to proceed with incomplete data and missing = "pairwise"? What would you recommend in the special case of SEM with categorical variables with WLSMV?

No idea.  Since there isn't much guidance, good wisdom would be to report results from multiple approaches ("lmrr" pooled chi-squared from MI, also the robust chi-squared from pairwise deletion) and discuss which results differ and which conclusions are constant across methods.  Also note in your report that pairwise deletion makes much more restrictive assumptions (MCAR) than multiple imputation (MAR).  Best case scenario, future researchers will be able to see the information necessary to draw firmer conclusions in the future, once methodologists can figure out what the best approach is.  

In addition, I wonder about the appropriate chi-square difference test for nested models:

Me too, that's a good question.  Someone needs to write a methods paper on this, proposing a method, assessing it in a simulation, and validating it on a real data set.

Given the pooled results from lavaanStar object from runMI, which chi-square difference test would then be appropriate in the case of categorical variables?

If the difference between the models is just constraining some parameters to fixed values, then the D1 or D2 statistic can be calculated using the information from the unconstrained model (see Ch. 8 of Enders, 2010).  If you are making equality constraints on estimated parameters, D3 would be straight-forward, but that is the likelihood-ratio approach, and you have categorical data (WLS, not ML), so I'm not sure whether that is appropriate.

anova() does not seem to work: "Error in if (D < 0) D <- 0 : missing value where TRUE/FALSE needed. In addition: Warning message:In sqrt(chis) : NaNs produced" 

You would need to send me your data and full R script if you want me to track down the source of the error.

Alternatively how is it possible to compare fit (compareFit() gives also an error: "Error in fitDiff["Chisq diff"][2, 1] : incorrect number of dimensions")?

compareFit() doesn't work on lavaanStar objects, but all you have to do is print out the fit indices from each model that you want to compare.

Jessica Fritz

unread,
Jun 6, 2019, 7:36:19 AM6/6/19
to lavaan
Dear lavaaners,

I have a dataset with missingness on several variables. I imputed the data using the mice package. I then used the resulting data set to run several analyses on it. One is a factor model for which I use several variables from the imputed data and for which I would like to save the pooled factor scores. However, when doing so I get the following warning:

> F_fit1_CFA <-  data.frame(predict(fit)); tail(F_fit1_CFA)
Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "c('lavaan.mi', 'lavaanList')"

To give you a bit of background on my analyses, I am runnign a longitudinal CFA for 2 time points (with different levels of invariance) and categorical indicators.

My first question is, is there at all a way to retrieve the pooled factor scores when using imputed data sets?

If not, I think there are three other options: First, I could use my original data set with missingness and run the (longitudinal/invariant) CFAs with WLSMV (given all variables are ordered categorical) and pairwise present (excluding the variables that have entire missingness on time 1 and/or time 2) in MPlus 8. Second, I could use my original data set with missingness and run the (longitudinal/invariant) CFAs with MLR (altough my variables are ordered categorical) and FIML (excluding the variables that have entire missingness on time 1 and/or time 2), which I could do in lavaan. This way incidental missingness would be accounted for, but not attrition. On the first glance the WLSMV option gives a better fit than the FIML option. However, a third option may be to use Stochastic Regression Imputation (one single imputed data set) and to just feed this imputation data set into the longitudinal CFA. This should take into account the attrition, but would probably have other limitations, e.g. attenuate standard errors.

Therefore, my second question is, if multiple imputation is not an option, which of those three options would you recommend (I apologize in advance, I know this is not particularly a lavaan question...)?

My third an last question is, whether there would be any other way to deal with the missing data in lavaan, so that I could calculate my factor scores while accounting for attrition?

Thank you very much for your help,

Jes

Terrence Jorgensen

unread,
Jun 6, 2019, 7:54:20 AM6/6/19
to lavaan
My first question is, is there at all a way to retrieve the pooled factor scores when using imputed data sets? 

You don't want to.  Averaging over imputations throws away all that uncertainty due to missing data, so you couldn't trust your results if you used factor scores as observed variables in a later analysis.

Even with complete data, treating factor scores as observed ignores the uncertainty due to their estimation.  To resolve that issue, you can draw random "plausible values" from the distribution around the estimated factor scores, effectively "imputing" them as though they were missing data (actually, that's exactly what they are, if your factor model is correct).  You can read some details and find further references here:


You can account for both sources of uncertainty (estimation of factor scores AND missing data) by drawing some plausible values from each imputation.  This is available from the plausibleValues() function in the development version of semTools:

devtools::install_github("simsem/semTools/semTools")
library
(semTools) # 0.5-1.926
?plausibleValues


The function is conveniently designed to work with lavaan.mi objects too, but note that the number of plausible values to draw will be drawn from each imputation.  So if you have at least m=20 imputations, I think it is sufficient to specify nDraws=5 plausible values from each, giving you m*nDraws=100 (or more) "imputed data sets" for further analysis on factor scores.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Jessica Fritz

unread,
Jun 6, 2019, 9:02:39 AM6/6/19
to lavaan
Dear Terry,

Thanks so much. This makes a lot of sense!

From screening the script on guithub (https://github.com/simsem/semTools/blob/master/semTools/R/plausibleValues.R) and from your explanation I understand that I would get x data sets containig the plausible values (i.e. putative factor scores). Is that right?

The approach sounds theorectically perfect. But practically I am gonna run a follow-up analysis that can unfortunately not deal with multiple data sets. Therefore, I'll need to go forward with 1 set of factor scores (NB: the reason I am using factor scores is that I assume this to be neater than simple sum scores, so the main purpose here was to be able to e.g. allow for differing items loadings; on top of that I'd like to take advantage of runnign a cfa, allowing to some degree to account for missingness). As I don't see how I can cicumvent that I'll need to go forward with one set of scores, I assume that the options are down to the three options I lined out above, or using 1 set of plausible values? Would you recommend that? That would probably be v similar to using one set of factor scores?

Thank you for heping me thinking this through, that is really much appreciated!

Terrence Jorgensen

unread,
Jun 7, 2019, 5:41:35 AM6/7/19
to lavaan
I would get x data sets containig the plausible values (i.e. putative factor scores). Is that right?

Yes.  And they are returned with IDs so that you can merge them back to your original (imputed) data sets (as shown on the ?lavPredict help page for a single complete data set), if you need to include other variables in the analysis.  For imputations, you just apply the sample example but looping over imputed data.frames

As I don't see how I can cicumvent that I'll need to go forward with one set of scores

Why do you think that?  What do you plan to do with the factor scores?  If you want to use them in a subsequent analysis, you should just continue to fit models to each imputed data set and pool results (and quite feasibly continue using sem.mi() to do so).  

If you only need to assign factor scores to individuals (e.g., for the sake of diagnosis), the average across imputations would be the best estimate, so you could specify runMI(..., FUN=lavPredict) and average across imputations the results stored in the @funList slot.

out <- runMI(..., FUN=lavPredict)
out@funList[[1]] # see factor scores from first imputation
## average across m imputations (sum, then divide by m)
meanFS
<- Reduce("+", out@funList) / length(out@funList)

Jessica Fritz

unread,
Jun 14, 2019, 10:32:03 PM6/14/19
to lavaan
Dear Terry,

"If you only need to assign factor scores to individuals (e.g., for the sake of diagnosis) the average across imputations would be the best estimate"

Yes something like this is exactly what I need.

I spare you with the details as I think I got the answer I was looking for.

A million thanks. That was really very helpful.

All the best,

Jes

Jessica Fritz

unread,
Jun 28, 2019, 3:52:53 PM6/28/19
to lavaan
Dear lavaaners,

I've been running a model with the below command on several imputed data sets (imputed with mice beforehand).

The computations are however quite heavy and I'd like parallelize the estimation of my model, e.g. accross the multiple imputation sets (e.g. something like estimating my model per core on 25 imputation data sets). Any other way of parallelizing would of course be fine as well.

I've tried to edit some options in my command to incorporate parallelizing, but everything I've tried did not work out or continued to run only on one core:

e.g.:
ci_model <- runMI(fun = "cfa", model, data = impdata, miPackage = "mice", m = 100,
                       estimator
="WLSMV", parameterization = "theta", parallel = "multicore", ncpus = 4L,
                       ordered
= c("var1",...,"var61"))
ci_model
<- cfa.mi(model, data = impdata, miPackage = "mice", m = 100,
                       estimator
="WLSMV", parameterization = "theta", FUN = lavaanList(parallel = "multicore", ncpus = 4L),
                       ordered
= c("var1",...,"var61"))


I am not that familiar with using multiple cores, so I was wondering whether anyone ccould advice on a way to incorporate parallelizing in my analyses?

Or whether someone perhaps would have an example he/she would be happy to share?

Thank you very much for your help,

Jes


Terrence Jorgensen

unread,
Jun 29, 2019, 2:12:17 PM6/29/19
to lavaan
parallel = "multicore",

This option is only available for Unix-like operating systems (Mac OSX or anything Linux).  If you use Windows, your only option is parallel = "snow"
 
ncpus = 4L),

Check how many (virtual) cores you actually have:

parallel::detectCores()


Only use all of the available ones if you plan on leaving your computer alone to do its processing until completion.  If you need to do anything else while it runs, use 

ncpus = parallel::detectCores() - 1

so that a core is left over to process other applications (word processor, web browser, etc.)

Jessica Fritz

unread,
Jun 29, 2019, 11:05:08 PM6/29/19
to lavaan
Hi Terry,

Thanks so much for your quick response.

If I use the same code and replace "multicore" with "snow", as I am on a windows server (code pasted in below), I get the following:

ci_model <- runMI(fun = "cfa", model, data = impdata, miPackage = "mice", m = 100,

                       estimator
="WLSMV", parameterization = "theta", parallel = "snow", ncpus = 3L,

                       ordered
= c("var1",...,"var61"))
ci_model
<- cfa.mi(model, data = impdata, miPackage = "mice", m = 100,

                       estimator
="WLSMV", parameterization = "theta", FUN = lavaanList(parallel = "snow", ncpus = 3L),

                       ordered
= c("var1",...,"var61"))

a) the runMI command just ignores my input and keeps going on 1 core/ncpu (I checked beforehand that I have the specified amount of cores available)
b) the cfa.mi() command gives me the following error:

Error in lavaanList(parallel = "snow", ncpus = 3L) : 
  lavaan ERROR: (generated) data is not a data.frame (or a matrix)

Are you aware of a way to resolve this error?

Thank you very much,

Jes

Terrence Jorgensen

unread,
Jun 30, 2019, 7:18:05 AM6/30/19
to lavaan
ci_model <- runMI(fun = "cfa", model, data = impdata, miPackage = "mice", m = 100,

If "impData" is already a list of imputed data sets, leave out the miPackage and m arguments, and make sure all(sapply(impData, is.data.frame)).  I would highly recommend imputing your data first.  

Also, make model your first argument, or you should explicitly name the argument (model = model).  This syntax only runs because R guessed that the first unnamed argument was the first unused argument from formals(runMI)

ci_model <- cfa.mi(model, data = impdata, miPackage = "mice", m = 100,

                   
FUN = lavaanList(parallel = "snow", ncpus = 3L),
                       ordered
= c("var1",...,"var61"))

What are you doing with the FUN argument?  Just pass the parallel and ncpus argument, like you do to runMI.  The only thing cfa.mi(...) does is call runMI(..., fun = "cfa") 

Error in lavaanList(parallel = "snow", ncpus = 3L) :
  lavaan ERROR: (generated) data is not a data.frame (or a matrix)

Are you aware of a way to resolve this error?

FUN is supposed to be a function that gets applied to each fitted lavaan model.  Trying to pass a lavaan object to lavaanList() already doesn't make sense, but you don't even pass that function.  You pass the result of calling that function, which is what causes the error:

lavaanList(parallel = "snow", ncpus = 3L)

Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Jessica Fritz

unread,
Jul 1, 2019, 11:47:27 AM7/1/19
to lavaan
Hi Terry,

Thanks so much for correcting my input! That was very much needed.

I indeed ran mice first and intended to only feed my data into the runMI() command. I was under the impression I had to tell runMI() how I created my data, but if I understand it correctly, this resulted in me telling runMI() to create that data for me (which I at the same time fed into the command...).

I've now corrected the code as follows:

miceOut <- mice::parlmice(data   = data_final,
                m      
= 100,
                method
= methVec,
                maxit  
= 100,
                cluster
.seed   = 1414,
                n
.core = 2,
                n
.imp.core = 50,
                cl
.type = "PSOCK")
nImputations
<- 100
impList
<- list()
   
for (i in 1:nImputations) {
       impList
[[i]] <- complete(miceOut, action = i)
   
}
impList

ci_fit
<- runMI(fun = "cfa", model = ci_model, data = impList, estimator="WLSMV", parameterization = "theta", ordered = c("var1",...,"var61"))

I now ran into two additional problems.

Firstly, I now get the following warnings:

lavaan WARNING starting values imply a correlation larger than 1; variables involved are: var19_o1 var19_o2

I did not receive this warning when I ran my model on the single (un-imputed) data set. I've tried to find info about this warning, but I am not sure I entirely understand:
(a) what it means (e.g. is there a chance the starting value for the correlation was higher than 1, but eventually the correlation converged below 1, resulting in a plausible result?),
(b) whether it implies that I cannot trust my model, and
(c) whether there is a way to circumvent/correct for this problem?

I should not that this warning only for a subset of the number of imputation sets (at least I don't get the warning as often as I have imputed data sets). So I assume that this seems to be a problem for some but not all of the imputed data sets?


The second problems is as follows. I have two factors one for occasion 1 and one for occasion 2. Originally all items loaded pretty similarly on both factors, and I could imply some invariance constraints. Now however, all items have flipped in sign for occasion 1, but not for occasion 2. To illustrate here an arbitrary example: I originally had a loading for item 1 of 0.63 for occasion 1 and of 0.61 for occasion two. Now I would have a value of -0.15 for occasion 1 and a value of 0.059 for occasion two. Importantly, it looks like, as only for some of the imputed data sets the sign flipped from positive to negative for the first (and not the second) latent factor. For other imputed data sets, the signs seemed to stay positive for both factors. I guess this renders the entire pooling "incorrect". Hence, I would have to enforce the loadings to be positive on both factors (to make sure that when I flip the signs of the first factor positive, this does not result in flipping the signs of the second factor negative), which would then applied to all imputation sets, so that the pooling remains valid.

Has anyone ever come across something similar and can advice on how to ensure that I retain the same loading signs for both occasions?

Can I do something like forcing all loadings of both factors to be positive:

lambda.1_1 >0
lambda.2_1 >0
...
lambda.1_2 >0
lambda.2_2 >0
...

or

...
lambda.1_1 0
lambda.2_1 0
...
lambda.1_2 0
lambda.2_2 0
...


I would be very grateful for any explanation or advice.

Thank you,

Jes

Terrence Jorgensen

unread,
Jul 2, 2019, 4:51:24 AM7/2/19
to lavaan
lavaan WARNING starting values imply a correlation larger than 1; variables involved are: var19_o1 var19_o2

Weird, not sure why it occurs, but those are just starting values, so hopefully the optimizer still found the best solution that does not have Heywood cases.

(b) whether it implies that I cannot trust my model, and

No, only the solution matters.
 
(c) whether there is a way to circumvent/correct for this problem?

I don't know whether lavaanList() generates a single set of starting values that gets stored in the main parameter table, then applied to all data sets.  I wouldn't worry about it, but you could put custom starting values in your model syntax, e.g., a covariance of 0.1 between foo and bar: 

foo ~~ 0.1?bar

That starting value would then be applied for all data sets.

I should not that this warning only for a subset of the number of imputation sets (at least I don't get the warning as often as I have imputed data sets). So I assume that this seems to be a problem for some but not all of the imputed data sets?

Yes, but I think it is only a problem if the solution is sensitive to the starting values.  Using different starting values might get rid of the warning but not change your results.  But it is worth trying, especially if Heywood cases are common across imputations.  Read the message when you print ci_fit to the Console; it counts the number of nonconverged solutions, imputations with Heywood cases, etc.

I originally had a loading for item 1 of 0.63 for occasion 1 and of 0.61 for occasion two. Now I would have a value of -0.15 for occasion 1 and a value of 0.059 for occasion two.

If they also change in absolute magnitude, that is a problem.  Perhaps the weird starting values are throwing things off, but lavaan has a good algorithm for choosing defaults.  Another problem might be your imputations.  Not sure what your methVec looks like, but if you have methods that don't put bounds (e.g., normal regression imputation instead of predictive mean matching could lead to some serious outliers when the imputation model hasn't converged on a stable posterior (not that MICE is fully Bayesian).  It is worth going through the diagnostics, which mice has functions to help with.

Can I do something like forcing all loadings of both factors to be positive:

You only need to constrain 1 loading to be positive to ensure the rest are, if the data are really consistent with your measurement mode.  Using a marker variable to identify the model resolves this problem in Bayes, where bimodel posteriors are quite frequent in smaller samples because switching all loadings' signs leads to the same model-implied covariance matrix.
Message has been deleted

Jessica Fritz

unread,
Jul 2, 2019, 1:36:20 PM7/2/19
to lavaan
Hi Terry,

Again, thanks very much for all your help.

Using staring values and forcing the first factor loadings of each factor to be larger than zero(lambda.1_1 0 ...and lambda.1_2 0) seemed to solve the starting value error. I now get a plausible summary output.

However, I have now one additional error to solve, which I get when I try to request fit measures:

Negative pooled test statistic was set to zero, so fit will appear to be arbitrarily perfect. Robust corrections uninformative, not returned.

From what I have inspected so far, it looks like one of the imputed data sets is a haywood case (and I am wondering whether this may cause the above error).

Therefore, I was wondering whether I could remove the coefficients of this imputed data set from the lavaan S4 object, and try to feed the other solutions into the fitMeasures() command, to see whether it technically would work fine and indeed is only this one imputation data set that behaves strange and prevents the fit from being computable?

Can you advice me on a way to remove one of the ci_fit@coefList solutions from the fit object?

I tried to subset the object myself, but that was not quite so successful ... (amateur speaking here.. as you probably are aware of).

Or if you have any other idea what causes and would thus would prevent the above error I am above very happy to hear that!

Many many thanks for all the help and apologies for all the questions,

Jes

Terrence Jorgensen

unread,
Jul 2, 2019, 5:51:51 PM7/2/19
to lavaan
seemed to solve the starting value error

It was not an error, just a warning. 

I have now one additional error to solve, which I get when I try to request fit measures:

Negative pooled test statistic was set to zero, so fit will appear to be arbitrarily perfect. Robust corrections uninformative, not returned.

Also not an error, just a warning to not interpret chisq=0 as perfect fit.  You could try setting test = "D2" to avoid this issue.


one of the imputed data sets is a haywood case (and I am wondering whether this may cause the above error).

Probably unrelated, but there is not a lot of information out there about when the D3 method yields negative pooled test statistics.  Just an unfortunate side-effect of between-imputation variability: in some imputations, the hypothesized model fits better than the saturated model (which is just silly) when the parameters are constrained to the pooled estimates.

Can you advice me on a way to remove one of the ci_fit@coefList solutions from the fit object?

I really don't think that is necessary, but you are welcome to compare the results with and without the Heywood case.  Just add it to the criteria in the omit.imps argument

lavTestLRT.mi(fit, omit.imps = c("no.conv","no.se","no.npd"))
lavTestLRT
.mi(fit, omit.imps = c("no.conv","no.se","no.npd"), test = "D2")
lavTestLRT
.mi(fit, test = "D2", asymptotic = TRUE)
lavTestLRT
.mi(fit, test = "D2", pool.robust = TRUE)

Jessica Fritz

unread,
Jul 3, 2019, 7:04:17 AM7/3/19
to lavaan
Hi Terry,

Thanks for your suggestions. I tried all the below options and they all eventually resulted in the same warning (I also tried other combinations but the result was basically either of the three below):

lavTestLRT.mi(fit_ci, omit.imps = c("no.conv","no.se","no.npd"))
 
#"D3" only available using maximum likelihood estimation. Changed test to "D2".
 
#Robust correction can only be applied to pooled chi-squared statistic, not F statistic. "asymptotic" was switched to TRUE.
 
#Negative pooled test statistic was set to zero, so fit will appear to be arbitrarily perfect. Robust corrections uninformative, not returned.
 
# chisq     df pvalue   npar ntotal
 
#     0   7185      1    562   1188
 
 lavTestLRT
.mi(generaldistress_fit_ci, omit.imps = c("no.conv", "no.se", "no.npd"), test = "D2", pool.robust = TRUE)
 
#Error in out[["F"]] : subscript out of bounds
 
 lavTestLRT
.mi(generaldistress_fit_ci, omit.imps = c("no.conv", "no.se", "no.npd"), test = "D2", asymptotic = TRUE, pool.robust = TRUE)
 
#Negative pooled test statistic was set to zero, so fit will appear to be arbitrarily perfect. Robust corrections uninformative, not returned.
 
#        chisq            df        pvalue          npar        ntotal  chisq.scaled     df.scaled pvalue.scaled
 
#            0          7185             1           562          1188             0          7185             1
 
 lavTestLRT
.mi(generaldistress_fit_ci, test = "D2", asymptotic = TRUE, pool.robust = TRUE)
 
#Negative pooled test statistic was set to zero, so fit will appear to be arbitrarily perfect. Robust corrections uninformative, not returned.
 
#        chisq            df        pvalue          npar        ntotal  chisq.scaled     df.scaled pvalue.scaled
 
#            0          7185             1           562          1188             0          7185             1


I then inspected the lavTESTLRT.mi() function on GitHub and manually adapted it, so that I could directly specify a subset of imputation data sets to pool over (for example, 2 or 3 imputation sets which I thought looked good). However, when I did this I got exactly the same result. So this is not, as I first thought, a question of one imputation set being odd. I still do not understand what is happening.

Unfortunately the model runs quite some time (on the imputed data sets approximately 23 hours, on either my desktop PC or the cloud), and I guess re-running the model on all imputation data sets separately would probably take as long. I tried it now for just the first data set, and in fact the fit looks pretty good, but I guess too good? Interestingly the .robust indices were NA - I am not sure this is the default or is already cause of concern? See output below:

fitMeasures(syntax.fit.GeneralDistress_Model_ci)
                         npar                          fmin                         chisq                            df                        pvalue 
                      562.000                         6.995                     16619.815                      7185.000                         0.000 
                 chisq.scaled                     df.scaled                 pvalue.scaled          chisq.scaling.factor                baseline.chisq 
                    11141.971                      7185.000                         0.000                         3.066                   3916648.022 
                  baseline.df               baseline.pvalue         baseline.chisq.scaled            baseline.df.scaled        baseline.pvalue.scaled 
                     7381.000                         0.000                    359756.869                      7381.000                         0.000 
baseline.chisq.scaling.factor                           cfi                           tli                          nnfi                           rfi 
                       11.094                         0.998                         0.998                         0.998                         0.996 
                          nfi                          pnfi                           ifi                           rni                    cfi.scaled 
                        0.996                         0.969                         0.998                         0.998                         0.989 
                   tli.scaled                    cfi.robust                    tli.robust                   nnfi.scaled                   nnfi.robust 
                        0.988                            NA                            NA                         0.988                            NA 
                   rfi.scaled                    nfi.scaled                    ifi.scaled                    rni.scaled                    rni.robust 
                        0.968                         0.969                         0.989                         0.989                            NA 
                        rmsea                rmsea.ci.lower                rmsea.ci.upper                  rmsea.pvalue                  rmsea.scaled 
                        0.033                         0.033                         0.034                         1.000                         0.022 
        rmsea.ci.lower.scaled         rmsea.ci.upper.scaled           rmsea.pvalue.scaled                  rmsea.robust         rmsea.ci.lower.robust 
                        0.021                         0.022                         1.000                            NA                            NA 
        rmsea.ci.upper.robust           rmsea.pvalue.robust                           rmr                    rmr_nomean                          srmr 
                           NA                            NA                         0.046                         0.046                         0.046 
                 srmr_bentler           srmr_bentler_nomean                          crmr                   crmr_nomean                    srmr_mplus 
                        0.046                         0.046                         0.046                         0.046                         0.046 
            srmr_mplus_nomean                         cn_05                         cn_01                           gfi                          agfi 
                        0.046                       528.321                       534.285                         0.996                         0.996 
                         pgfi                           mfi 
                        0.924                         0.019 

I also triple checked the imputations with scatter plots and trace plots and density plots and I did not detect anything that would be cause for concern (I have on average 9% missignness per person and about 10% per item, ranging from 0 to 27% missingness for individual items; thus not an extreme amount of missignness I'd say). I also re-ran the imputation model with more iterations, but to me it looks like the imputation model (btw, predictive mean matching) converged fine.

Is this the point where you give up and tell the reviewers you were unable to fit your models on imputation data sets?

I am really very very grateful for any other solution I can try.

All the best,

Jes
lavTestLRT.miADAPTED.R

Terrence Jorgensen

unread,
Jul 3, 2019, 1:01:39 PM7/3/19
to lavaan
this is not, as I first thought, a question of one imputation set being odd. I still do not understand what is happening.

There hasn't been a systematic investigation of this issue, that I am aware of.  I forgot that it could happen with D2 as well as D3.  From what I understand, it happens when there is a lot of between-imputation variability, and intuitively I think it would be less likely to occur the worse a model fits.  

Unfortunately the model runs quite some time (on the imputed data sets approximately 23 hours

oh, that's awful. 

Is this the point where you give up and tell the reviewers you were unable to fit your models on imputation data sets?

 If you have time to run it once more, add the argument runMI(..., FUN = fitMeasures) to save the fit indices from each imputation.  Then you can extract (e.g.) chisq, rmsea, and cfi from each imputation, and report the range across imputations in lieu of being able to obtain a pooled test.  You should still be able to obtain SRMR from fitMeasures() because it is not based on chi-squared.  If the range is as good as your first imputation's output, I think you can make a reasonable argument that approximate fit is acceptable enough to continue.  Hopefully, any chi-squared difference tests using lavTestLRT.mi() on 2 nested models will not also yield negative pooled tests.

Jessica Fritz

unread,
Jul 4, 2019, 6:56:27 PM7/4/19
to lavaan


On Wednesday, July 3, 2019 at 6:01:39 PM UTC+1, Terrence Jorgensen wrote:
this is not, as I first thought, a question of one imputation set being odd. I still do not understand what is happening.

There hasn't been a systematic investigation of this issue, that I am aware of.  I forgot that it could happen with D2 as well as D3.  From what I understand, it happens when there is a lot of between-imputation variability, and intuitively I think it would be less likely to occur the worse a model fits.

--> Thank you very much.That is very helpful to know.
  

Unfortunately the model runs quite some time (on the imputed data sets approximately 23 hours

oh, that's awful. 

Is this the point where you give up and tell the reviewers you were unable to fit your models on imputation data sets?

 If you have time to run it once more, add the argument runMI(..., FUN = fitMeasures) to save the fit indices from each imputation.  Then you can extract (e.g.) chisq, rmsea, and cfi from each imputation, and report the range across imputations in lieu of being able to obtain a pooled test.  You should still be able to obtain SRMR from fitMeasures() because it is not based on chi-squared.  If the range is as good as your first imputation's output, I think you can make a reasonable argument that approximate fit is acceptable enough to continue.  Hopefully, any chi-squared difference tests using lavTestLRT.mi() on 2 nested models will not also yield negative pooled tests.

--> Brilliant. That is much better than the solutions I had in mind. Very many thanks for all the fantastic hands-on help and the thoughtful advice. I appreciate that a lot.

All the best,

Jes
Message has been deleted
Message has been deleted

david white

unread,
Jul 10, 2019, 11:28:29 AM7/10/19
to lavaan
Hello. I am trying to run cfa.mi as a parallel solution, but I am using Amelia to generate the imputed dataset, and I am bit confused as to whether I can. Generating the imputed data sets is very efficient and I am not using parallel processing in that step. The bottleneck appears to be running cfa.mi function for each imputed data object (imputedDat) on the model object (imp.model). I am a newbie to these packages and may not understand what is actually occurring. 

Here is a limited version of my code.

# doMC
co <- detectCores()-1
cl <- makeCluster(co)
registerDoMC(cl)


#Amelia Imputations
ameliaOut <- amelia(x = dat1, m = nImps, empri = .1*dim(dat1)[1],
                    idvars = c("Form"),
                     noms = c("Gender", "Ethnicity", "Mat_Status"),
                    ords = c("Income", "Age"), p2s = 1)

# Select the first slot in the output object. These are the imputed data sets.
imputedDat <- as.data.frame(ameliaOut[[1]])

# Specify the model syntax for lavaan:
require(lavaan)
imp.model <- '

Pathways_Autonomy =~ Path_Aut_1 + Path_Aut_2 + Path_Aut_3 + Path_Aut_4
...
...
...
etc.

# Calculate Lavaaan.mi - class - object
miLvOut_2_test <- cfa.mi(model = imp.model, data=imputedDat, lavaan::cfaList(parallel = c("multicore"), cl = cl))

Error in lavaan::lavaanList(parallel = c("multicore"), cl = cl, cmd = "cfa") : 
  lavaan ERROR: (generated) data is not a data.frame (or a matrix)

Terrence Jorgensen

unread,
Jul 13, 2019, 12:18:03 PM7/13/19
to lavaan
cfa.mi() already calls cfaList() internally, as explained on the class?lavaan.mi help page (and the previous posts above).  You simply pass the arguments as normal (via ...).

miLvOut_2 <- cfa.mi(model = imp.model, data = imputedDat,
                    parallel = "snow", cl = cl)

Message has been deleted

Maria

unread,
Mar 10, 2023, 10:06:16 AM3/10/23
to lavaan
Hi there,

I also tried first to extract factor scores from lavaan.mi with multiple imputation with lavaanPredict(). The plausibleValues()-function is really nice but I do not understand what there is happening mathematically. Is there literature on that?

Another question is whether it make sense to pool this plausible values and use them as factor scores (indicators) in a subsequent analysis (aonther cfa-model)?

Kind regards,
Maria

Terrence Jorgensen

unread,
Mar 13, 2023, 8:12:51 AM3/13/23
to lavaan
Is there literature on that?

See the Reference I provided on the ?plausibleValues help page, which has further references (see the ones with Mislevy as (co)author).



make sense to pool this plausible values and use them as factor scores (indicators) in a subsequent analysis (aonther cfa-model)?

No, they are not indicators of a factor.  They are estimates of subjects' level on the latent common factor, so they do not include measurement error.  They still include factor indeterminacy and sampling error.  The way to use them is shown in the first ?plausibleValues help page example (with complete data).  The additional complication of estimating them with multiple imputations of incomplete data is that there are more imputations (i.e., several samples of plausible values drawn from each of multiple imputations).  The last help-page example has 10 imputations of missing data, estimated using cfa.mi(), from which 5 plausible values are sampled from each of 10 imputations.  You would then just analyze your path model using all 10 * 5 = 50 sets of plausible values.

Maria

unread,
Apr 17, 2023, 2:35:12 AM4/17/23
to lavaan
Thank you for your reply!
Reply all
Reply to author
Forward
0 new messages