runMI with multiple imputation + exo and endo factors

192 views
Skip to first unread message

Lilja Kristín Jónsdóttir

unread,
Mar 15, 2022, 9:02:23 AM3/15/22
to lavaan
I am a total beginner to both SEM and R, so some of the things I ask about might be weird, sorry about that! Here is my issue: 

I am using runMI for a path analysis, as I have a good deal of missing data in a mixed dataset (=not able to use fiml), and for several reasons I am using 20 imputed datasets (using mice - using amelia still leaves me with my issue). Using the built in imputation in runMI would not solve my issue either (see below).
  
For the imputation to go through correctly, variable classes need to be specified correctly (factors or numeric).The imputations work well, but now I am faced with the issue of using categorical data in lavaan. 
if I understand this correctly, lavaan allows for binary and ordinal variables, albeit in different forms for endogenous and exogenous variables. I have one binary endo variable, and that can be specified in an argument in the runMI command (and I have done so (ordered = "myVar"), seems to work fine). I have three exogenous variables that are ordered. For them to work in the model, I would have to respecify them as numeric. HOWEVER, since these are tied up in a list of 20 imputed datasets, I have not found a way to force them to be numeric AFTER imputation (has to be afterward, or the imputation will not be performed correctly). Therefore, lavaan returns an error, notifying me of the factors that need to be dealt with.  
 
I realize this might not be a specific lavaan issue - if there is some way to respecify the variables post-imputation then there is no problem. Is there anything I can do?

Pat Malone

unread,
Mar 21, 2022, 10:48:16 AM3/21/22
to lav...@googlegroups.com
Lilja,

Assuming you are running the imputation with mice in a prior step, you should be able to do this with the imputed datasets in the mids object, using as.numeric(). It might take some fiddling with [] and/or [[]] subscripts to specify the places in the mids to change this, though.

Can you show us a str() and/or View() of the mids that comes out of mice?

Pat 

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/a7c1602f-b064-4f32-9094-e57534b89588n%40googlegroups.com.


--
Patrick S. Malone, PhD
Sr Research Statistician, FAR HARBΦR
This message may contain confidential information; if you are not the intended recipient please notify the sender and delete the message.

Lilja Kristín Jónsdóttir

unread,
Mar 24, 2022, 6:06:25 AM3/24/22
to lavaan
Thanks Pat, I managed to get there with the help of a statistician at my institution. Now there is a different problem; because I am leaving one variable as a factor (the endogenous variable), I need to specify it as ordered in the runMI code. If I do this, lavaan returns an error, saying that several pairs of variables in the dataset have a correlation of (nearly) 1.0 (specifies each pair). This is not the case (inspected it using lavCor and lavInspect, some of them have correlations of about 0.40 or 0.60, but certainly not close to 1.0). This also happened when I tried the analysis on the original dataset (not on imputed data). Then the error specifies a pair of variables that have a correlation of about 0.60.

However, if I change this endogenous variable to numeric and don't specify it as ordered, the analysis works fine. 

I have no idea why lavaan would think my variables are correlated at 1.0. 

Terrence Jorgensen

unread,
Mar 24, 2022, 8:11:13 AM3/24/22
to lavaan
I have no idea why lavaan would think my variables are correlated at 1.0

We are unlikely to be able to help without seeing syntax and output, perhaps even data to reproduce the problem.
 
if there is some way to respecify the variables post-imputation then there is no problem. Is there anything I can do?

You can use the within() method in the mitml package.

foo <- mice(...)
library(mitml)
bar <- mids2mitml.list(foo)
?within.mitml.list # see Example 1 and adapt to your needs

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Lilja Kristín Jónsdóttir

unread,
Mar 25, 2022, 9:26:20 AM3/25/22
to lavaan
Thank you very much for your reply. 
I am attaching a text file with my code and data (used the output from dput() - hope that works, but I can try again if this is note sufficient). 

reprex_data

Terrence Jorgensen

unread,
Mar 28, 2022, 8:53:51 AM3/28/22
to lavaan
I am attaching a text file with my code and data

Thanks.  I tried running the model on the first imputation, which produces that warning about correlations near 1 of x3 with x5 and with x6.  Indeed, I see that lavCor() shows correlations no higher than .65, but that is returning unconditional correlations.  The default setting with categorical data is conditional.x=TRUE, which estimates residual correlations after partialing out exogenous covariates.  When I look at your estimates for the first imputation, I can see a big problem:

fit1 <- sem(paths_1, data = impList[[1]], ordered = "x3")
lavInspect(fit1, "est")$psi
        x4      x3      x5      x6    
x4   4.060                        
x3   0.000   1.000                
x5   0.000   0.000   0.834        
x6   0.000   0.000 -26.229   0.830

Beyond the out-of-bounds residual covariance (partial correlation = -31.521), neither x3 nor x4 are correlated with x5 or x6 after x1, x2, and the 3 covariates are partialed out.   That would rule out your mediation hypothesis, but of course this result is suspect because the x5-x6 correlation is impossibly large.

When I tried fitting the model without first partialing out exogenous effects, other warnings appear:

fit1 <- sem(paths_1, data = impList[[1]], ordered = "x3", conditional.x = FALSE)
Warning messages:
1: In muthen1984(Data = X[[g]], wt = WT[[g]], ov.names = ov.names[[g]],  :
  lavaan WARNING: trouble constructing W matrix; used generalized inverse for A11 submatrix
2: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING:
    The variance-covariance matrix of the estimated parameters (vcov)
    does not appear to be positive definite! The smallest eigenvalue
    (= -4.133521e-19) is smaller than zero. This may be a symptom that
    the model is not identified.


I think you might not have enough data to support estimating this model.  Or this might simply be consistent with the result above: That there are no correlations to model after accounting for the covariates, which causes estimation problems when you try.  When only 20 of 92 observed x3 values are in one category, it is not likely that the combination of predictors (nearly) completely accounts for those individual differences.

As a last resort, I even tried setting all your exogenous variables as.numeric in the original data to use estimator="PML" with pairwise deletion, while treating only x3 as categorical.

fit1 <- sem(paths_1, data = data_rep, ordered = "x3",
            estimator = "PML", missing = "pairwise",
            fixed.x = FALSE, conditional.x = FALSE)


This yielded a result, but neither of the mediators have significant effects (also problems with the model fit stats).  Rather than trying to get this to work in the most ideal way, I would consider whether the obtainable results make it seem unreasonable to pursue this mediation model further, at least until/unless more data can be gathered.  I say that while being completely naive about the nature or stakes of your study, so salt is needed.

Lilja Kristín Jónsdóttir

unread,
Mar 30, 2022, 8:11:55 AM3/30/22
to lavaan
Thank you Terrence, I really appreciate this detailed and informative answer, very valuable!

Terrence Jorgensen

unread,
Apr 8, 2022, 6:38:22 AM4/8/22
to lavaan
I noticed a typo in my response:

When only 20 of 92 observed x3 values are in one category, it is not likely that the combination of predictors (nearly) completely accounts for those individual differences.

The "not" should *not* be there.   I was attempting to describe a confounding or multicollinearity issue, where there is little or no variance in x3 left after accounting for other covariates.
Reply all
Reply to author
Forward
0 new messages