Error and warnings produced using runMI for analysis with multiple imputed datasets.

367 views
Skip to first unread message

Thalia Theodoraki

unread,
Jun 26, 2018, 10:58:07 AM6/26/18
to lavaan
Hi everyone,
I have recently had to swap from using FIML for adressing missing data in lavaan to using multiple imputation, because my dependent variable is ordinal and the ML estimator can currently not be used for this kind of data.
Therefore, I am very new to the multiple imputation theory and would really appreciate your help with some problems I have run into using the runMI command.  

I have managed to develop the imputed datasets and combine them in a list as shown below:
      imppathscienceS4=mice(DF.2, m=20, maxit=30, predictorMatrix=predictors)
             mice.imp <- NULL
             m=20
             for(i in 1:m) {
            mice.imp[[i]] <- complete(imppathscienceS4,action=i,include=FALSE)
              }


but I have two questions regarding running my models:

1) I want to look at the correlations between my variables, using the multiple datasets:
So I define my model in lavaaan and  use the runMI command to fit the model to my data and get the pooled results. The model converges but I get these warnings:
 In pf(D, df, aw) : NaNs produced
 In pf(D, df, aw) : NaNs produced
I suspect that these warnings have something to do with the fact that I do not get any standard errors and p-values for my coefficient estimates , but I do not really understand what exactly the warning means. Can I change something in order to get the full output of sem (including st. errors and p-values).
Here is the code and some of the output I get.


##running the model
>fitcorrelationsimp<-runMI(correlations, data=mice.imp,fun="sem", estimator="WLSMV", meanstructure=TRUE) Warning messages: 1: In lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some observed variances are (at least) a factor 1000 times larger than others; use varTable(fit) to investigate 2: In muthen1984(Data = X[[g]], ov.names = ov.names[[g]], ov.types = ov.types, : lavaan WARNING: trouble constructing W matrix; used generalized inverse for A11 submatrix 3: In pf(D, df, aw) : NaNs produced 4: In pf(D, df, aw) : NaNs produced > summary(fitcorrelationsimp, standardized=TRUE) lavaan 0.6-2.1268 did not run (perhaps do.fit = FALSE)?
** WARNING ** Estimates below are simply the starting values
  Optimization method                           NLMINB
  Number of free parameters                          0
  Number of observations                            93

  Estimator                                       DWLS      Robust
  Model Fit Test Statistic                         NaN         NaN
  Degrees of freedom                                 0           0
...


2) My second question regards a mediation analysis I want to do with the same imputed datasets. Again I specified my model and fit it but I get a warning that the
subscript is out of bounds. I know this means that I am asking R to calculate something that isn't there but I do not know where exactly the problem is. Is it something in my coding that is wrong or is it a more serious problem of not being able to carry out mediation analysis with multiple imputed datasets?
Here is my code for this part:

> path.fullmodel<-' normed_Number~a1*normed_CWI3+b1*normed_Ssort+c1*normed_DigitB+normed_CWI1+normed_Digit.F+Condition normed_Matrix~a2*normed_CWI3+b2*normed_Ssort+c2*normed_DigitB+normed_CWI1+normed_Digit.F+Condition Science~a*normed_CWI3+b*normed_Ssort+c*normed_DigitB+d*normed_Number+e*normed_Matrix normed_Number~~normed_Matrix

 
indirectCWI3 := a1*d+a2*e indirectSsort := b1*d+b2*e indirectDigitB := c1*d+c2*e

 
totalCWI3:=a+a1*d+a2*e totalSsort:=b+b1*d+b2*e totalDigitB:=c+c1*d+c2*e ' > fit.path.fullmodelimp<-runMI(path.fullmodel, data=mice.imp,fun="sem", estimator="WLSMV", meanstructure=TRUE) Error in `[<-`(`*tmp*`, rhs, lhs, value = coef[i]) : subscript out of bounds


I would really appreciate any help with these two issues.
Many thanks in advance
Thalia

Terrence Jorgensen

unread,
Jun 29, 2018, 11:01:20 AM6/29/18
to lavaan
1) I want to look at the correlations between my variables, using the multiple datasets:

Easiest way to do that is to use the fmi() function in semTools, which returns the pooled summary statistics along with the fraction of missing information for each statistic.

2) I get a warning that the subscript is out of bounds. 

Error in `[<-`(`*tmp*`, rhs, lhs, value = coef[i]) : 
  subscript out of bounds

Is this with the latest version of the software (0.4-15.931)?  The newest version (0.5-0) should actually be available on CRAN today or sometime this week.

It is hard to tell what part of the process is causing this error. If it still occurs with the latest version, please privately email me your syntax and enough data to reproduce the error, so I can see what is going wrong.  I suspect it has something to do with all the user-defined parameters, but I don't know why that would be a problem (since I have run other examples with user-defined parameters that did not give an error).  Can you also try fitting this in lavaan::sem() using missing="pairwise", just to check whether the error occurs there too?  

Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Thalia Theodoraki

unread,
Jul 11, 2018, 4:23:09 PM7/11/18
to lavaan


Hi Terrence and thanks for all your help. I managed to make things work using the R 3.4.4, semTools 0.5-0 and lavaan 0.6-2 and make my model run . But, today, once again I get a warning: 

 > fitcorrelationsimp.3<-runMI(correlations, data=mice.imp.3, fun="sem", estimator="WLSMV", meanstructure=TRUE)
Error in getMethod("coef", "lavaan.mi") : 
  no method found for function 'coef' and signature lavaan.mi

When I tried rerunning the same model I ran successfully yesterday I still got the above message, so that indicates there is not something wrong with the code. I have also tried restarting R and doing it on another PC but today it just doesn't seem to want to do it.
Any further suggestions?  

Best 
Thalia 

Terrence Jorgensen

unread,
Jul 15, 2018, 7:26:06 AM7/15/18
to lavaan
Error in getMethod("coef", "lavaan.mi") : 
  no method found for function 'coef' and signature lavaan.mi

This looks like the package is not loaded, because it can't find the coef() method for lavaan.mi objects.  When you restarted R and ran library(semTools), are you sure you saw the versions were semTools 0.5-0 and lavaan 0.6-2?  I just ran the help-page example (using R 3.5.0 -- maybe it's a version issue?) and the coef() method works fine on the output.

Thalia Theodoraki

unread,
Jul 17, 2018, 8:19:05 AM7/17/18
to lavaan


Hi again Terrence,
thanks so much for your reply! I think I have now got the packages under control (hopefully) and my models work (most of the time) :-) . However, I would like to ask you one last thing regarding a certain issue with my models.
The issue regards a certain variable I insert into mice as an auxiliary for my imputations. I include this variable (English.teacher) because it explains missingness in one of my ordinal response variables (English). However, when I run my imputations and then ask for the predictors matrix I get this output:

> impdata$predictorMatrix

                English.teacher English Maths Arts Mod.Languages Social.studies Tech

English.teacher               0       1     1    1             1              1    1

English                       0       0     1    1             1              1    1

Maths                         1       1     0    1             1              1    1

Arts                          1       1     1    0             1              1    1

Mod.Languages                 1       1     1    1             0              1    1

Social.studies                1       1     1    1             1              0    1

Tech                          1       1     1    1             1              1    0


I am a bit confused as to how to interpret the underlined parts. Am I correct to believe that what the output is saying is that
1. English was not used to predict English.teacher (green)
2. English.teacher was used for predicting English (blue)

Also, is there a way to specify that I want the English.teacher variable to be used as a predictor only for missing data on the English variable, but not for missing data on any of the other variables?


Many thanks
Thalia


Terrence Jorgensen

unread,
Jul 18, 2018, 8:43:57 AM7/18/18
to lavaan
Am I correct to believe that what the output is saying is that
1. English was not used to predict English.teacher (green)
2. English.teacher was used for predicting English (blue)

No, it is the other way around.  The columns represent the predictors, and the rows represent the outcomes.  A cell with 0 means that the variable in the column was used to impute the outcome in that row, and a 1 means it was used to impute.  So the green 0 means English.teacher was not used to impute English, but the blue 1 means English was used to impute English.teacher.  

Also, is there a way to specify that I want the English.teacher variable to be used as a predictor only for missing data on the English variable, but not for missing data on any of the other variables?

Yes, you can make your own matrix to specify which variables should be used to impute others.  Read about the predictorMatrix= argument on the ?mice help page.  You can also see examples of how to construct it in my recent APS workshop materials (in the Workshops tab on my faculty web page):

Thalia Theodoraki

unread,
Jul 18, 2018, 6:12:13 PM7/18/18
to lav...@googlegroups.com

Am I correct to believe that what the output is saying is that
1. English was not used to predict English.teacher (green)
2. English.teacher was used for predicting English (blue)

No, it is the other way around.  The columns represent the predictors, and the rows represent the outcomes.  A cell with 0 means that the variable in the column was used to impute the outcome in that row, and a 1 means it was used to impute.  So the green 0 means English.teacher was not used to impute English, but the blue 1 means English was used to impute English.teacher.  


Oh no, that's dissapointing, since I planned on using the 'English.teacher' to impute the 'English' variable. 
Why is R not using it as a predictor ? I didn't specify anything like this in the code.   

Could it be due to the fact that English.teacher is  a categorical variable? Can you include categorical variables as predictors for variables to be imputed in mice??

Or is it due to the nature of the relationship between the English.teacher and the variable to be imputed? As I said , the missingness pattern in the English variable is directly influenced by the English.teacher variables, since all  the cases who had a specific teacher in the subject of English ('English.teacher') are the ones that have with missing data on the outcome ('English'). So is this the reason that R cannot use the 'English.teacher' to impute data on 'English'?

Bbest 
Thalia

Terrence Jorgensen

unread,
Jul 20, 2018, 4:30:12 AM7/20/18
to lavaan
Could it be due to the fact that English.teacher is  a categorical variable? Can you include categorical variables as predictors for variables to be imputed in mice??

Yes, you can.  Again, you read through my workshop materials to see how to get more control over the process, or just read the ?mice help page to find out how it works before trying to use it.

Or is it due to the nature of the relationship between the English.teacher and the variable to be imputed? As I said , the missingness pattern in the English variable is directly influenced by the English.teacher variables, since all  the cases who had a specific teacher in the subject of English ('English.teacher') are the ones that have with missing data on the outcome ('English'). So is this the reason that R cannot use the 'English.teacher' to impute data on 'English'?

Yes, that sounds like an explanation.
Reply all
Reply to author
Forward
0 new messages