any other estimator instead of DWLS / dealing with missing data

726 views
Skip to first unread message

sc21...@uw.edu

unread,
Jun 20, 2018, 2:16:49 PM6/20/18
to lavaan
I have a data set with 20 variables (items) as following:

# x1 - x12 are from Study 1
# x1 - x18 are from Study 2, excluding x2  (i.e. x1, x3, x4, ..., x18)  .... (*)
# We'd like to put these two studies together


# x1 - x3, x18 are numeric
# x4 - x17 are ordinal (ordered categorical)


# temp.DATA is a data matrix of dimension 1600 x 18.
# there are 700 subjects from Study1, 900 subjects from Study2.
#           so, x13-x18 are NA for those from Study1, and x2 is NA for those from Study2.



When I fit a model on Study2 variables only, or Study1 variables only, I don't have any error message. They work fine.
However, when I tried to fit a model with all items (from both studies), I get an error message.
(Using two studies together is simply adding x2 (just one more variable) to Study2.)

Error in if (abs(rho.init) >= 1) { :
  missing value
where TRUE/FALSE needed




Here are the models and commands I used:


model
.temp = "
 Time1 =~ NA*x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 + x14 + x15 + x16 + x17 + x18
 Time1 ~~ 1*Time1
 "



sem
(model.temp , data=temp.DATA, ordered=names(temp.DATA)[4:17], missing = "pairwise")




Error in if (abs(rho.init) >= 1) { :
  missing value
where TRUE/FALSE needed
In addition: Warning message:
In lav_data_full(data = data, group = group, cluster = cluster,  :
  lavaan WARNING
: due to missing values, some pairwise combinations have less than 10% coverage

 
> traceback()
8: ps_cor_TS(fit.y1 = UNI[[j]], fit.y2 = UNI[[i]])
7: lav_samplestats_step2(UNI = FIT, ov.names = ov.names, zero.add = zero.add,
       zero
.keep.margins = zero.keep.margins, zero.cell.warn = zero.cell.warn,
       zero
.cell.tables = zero.cell.tables, optim.method = optim.method)
6: muthen1984(Data = X[[g]], ov.names = ov.names[[g]], ov.types = ov.types,
       ov
.levels = ov.levels, ov.names.x = NULL, eXo = NULL, group = g,
       missing
= missing, WLS.W = WLS.W, optim.method = optim.method,
       zero
.add = zero.add, zero.keep.margins = zero.keep.margins,
       zero
.cell.warn = FALSE, zero.cell.tables = TRUE, verbose = debug)
5: lav_samplestats_from_data(lavdata = lavdata, missing = lavoptions$missing,
       rescale
= (lavoptions$estimator %in% c("ML", "REML", "NTRLS") &&
           lavoptions$likelihood
== "normal"), estimator = lavoptions$estimator,
       mimic
= lavoptions$mimic, meanstructure = lavoptions$meanstructure,
       conditional
.x = lavoptions$conditional.x, fixed.x = lavoptions$fixed.x,
       
group.w.free = lavoptions$group.w.free, missing.h1 = (lavoptions$missing !=
           
"listwise"), WLS.V = WLS.V, NACOV = NACOV, gamma.n.minus.one = lavoptions$gamma.n.minus.one,
       se
= lavoptions$se, information = lavoptions$information,
       ridge
= lavoptions$ridge, optim.method = lavoptions$optim.method.cor,
       zero
.add = lavoptions$zero.add, zero.keep.margins = lavoptions$zero.keep.margins,
       zero
.cell.warn = lavoptions$zero.cell.warn, debug = lavoptions$debug,
       verbose
= lavoptions$verbose)
4: lavaan::lavaan(model = model.temp, data = temp.DATA, ordered = names(temp.DATA)[4:17],
       missing
= "pairwise", model.type = "sem", int.ov.free = TRUE,
       
int.lv.free = FALSE, auto.fix.first = TRUE, auto.fix.single = TRUE,
       
auto.var = TRUE, auto.cov.lv.x = TRUE, auto.cov.y = TRUE,
       
auto.th = TRUE, auto.delta = TRUE)
3: eval(mc, parent.frame())
2: eval(mc, parent.frame())
1: sem(model.temp, data = temp.DATA, ordered = names(temp.DATA)[4:17],
       missing
= "pairwise")



# with considering (*), when I fit a model on Study2 data only (which means, use all variables except x2), I did not get any error message.


model.temp.wo.x2 = "
 Time1 =~ NA*x1 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 + x14 + x15 + x16 + x17 + x18
 Time1 ~~ 1*Time1
 "

sem
(model.temp.wo.x2 , data=temp.DATA[,-2], ordered=names(temp.DATA)[4:17], missing = "pairwise")


lavaan
(0.6-1) converged normally after  43 iterations


 
Number of observations                          1617
 
Number of missing patterns                         4


 
Estimator                                       DWLS      Robust
 
Model Fit Test Statistic                     218.958     230.845
 
Degrees of freedom                               119         119
  P
-value (Chi-square)                           0.000       0.000
 
Scaling correction factor                                  1.182
 
Shift parameter                                           45.603
   
for simple second-order correction (Mplus variant)


I am wondering the error message from the model with all 20 variables is due to DWLS with missing="pairwise".

For example, 
table(temp.DATA[,2], temp.DATA[,18], exclude=NULL)

gives
   
 
       1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 30 <NA>
 
0     0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    3
 
1     0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    1
 
2     0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    8
 
3     0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    9
 
4     0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   18
 
5     0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   24
 
6     0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   33
 
7     0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   45
 
8     0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   73
 
9     0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   73
 
10    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   74
 
11    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   75
 
12    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   65
 
13    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   58
 
14    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   64
 
15    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   46
 
16    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   29
 
17    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   20
 
18    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   20
 
19    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   18
 
20    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    6
 
21    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    9
 
22    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    2
 
23    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    3
 
25    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    5
 
30    0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    1
 
<NA>  3  3  6 10  9 28 29 38 48 59 53 68 69 71 66 59 55 43 34 29 17 12  9  5  4  1  2  3  1    1

(please recall that x2 is from Study1 only, and there are 700 subjects from Study1. In other words, the other 900 subjects have x2 as NA.)

I think this missingness gave the warning, not the error; however I am not sure...
I am also wondering if there is any way to implement MLR or any other estimator instead of DWLS for this case with several ordinal variables.
Or, should I use different option for missing= ?

I truly appreciate your help/advice.

Terrence Jorgensen

unread,
Jun 24, 2018, 10:07:16 AM6/24/18
to lavaan
(please recall that x2 is from Study1 only, and there are 700 subjects from Study1. In other words, the other 900 subjects have x2 as NA.)

That is indeed part of the problem.  The actual cause of the error is in the message:  "rho.init" is the starting value for estimating polychoric/polyserial correlations, and at least one of them is a missing value (so the syntax testing whether abs(rho.init) >= 1 fails because you can't test the magnitude of NA).  The starting values (rho.init) are taken from the observed correlations using cor(x, y, use = "pairwise.complete.obs"), so the missing starting-values occur because you have several correlations between pairs of variables that are not jointly observed in both samples (i.e., x2 with x13-x18).  

I think this missingness gave the warning, not the error; however I am not sure...

The warning about coverage < 10% for some correlations is not fatal, unless it is as low as 0%, in which case this error gets thrown.   Perhaps a more informative error message if (is.na(rho.init)) is in order.  I'll add a pull request on GitHub to see if it can be in the next version.

I am also wondering if there is any way to implement MLR or any other estimator instead of DWLS for this case with several ordinal variables.

Well, that would require using MML estimation, which will run quite slow for categorical outcome variables.  But you can try it (patiently) by setting estimator = "mml" and missing = "fiml".  I think robust SEs and tests will be automatic, but if not, try adding se = "robust.huber.white" and test = "yuan.bentler".

Or, should I use different option for missing= ?

FIML requires (M)ML estimation, but you can also use multiple imputation.  The semTools package has a runMI() function that assists with the fitting of models to multiply imputed data sets and pooling the results across imputations.  Make sure to install the development version (although I will send the new version to CRAN soon):

devtools::install_github("simsem/semTools/semTools")


Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

sc21...@uw.edu

unread,
Jun 26, 2018, 1:51:11 PM6/26/18
to lavaan
Thank you so much for your help!!

Sincerely,
Seo-Eun
Reply all
Reply to author
Forward
0 new messages