growth model with missing time variant covariates

37 views
Skip to first unread message

Shimon Sarraf

unread,
Nov 8, 2018, 6:52:48 AM11/8/18
to lavaan
Dear Lavaan colleagues,

What is the best way to deal with missing data for time variant covariates (tvc) while running a growth model using fiml in lavaan? Currently I am losing almost all cases in my data set because the group of tvcs has considerable missing data in any given year. fiml doesn't appear to address this issue. I've read suggestions on the Mplus website to either fill in the missing tvc data with a valid value (leaving the dependent variables missing) or explicitly specifying the variances of each tvc and rerunning the model. 

Currently, I am not using a covariance/variance matrix to run this growth model, and I'm not sure if that's even possible while using fiml to address missing data for other model variables. I tried the first suggestion (replacing tvc missing values with valid ones) but my model is not converging. Does lavaan have any effective way to handle this issue? It doesn't appear as though this topic has come up before in this google group after some searching so thought I would double check. 

Thanks in advance for any guidance. 

Shimon

Terrence Jorgensen

unread,
Nov 9, 2018, 6:07:14 AM11/9/18
to lavaan
What is the best way to deal with missing data for time variant covariates (tvc) while running a growth model using fiml in lavaan? 

I don't understand the question, because you seem to pose the answer in the question.  FIML is a solution to the problem of missing data.

fiml doesn't appear to address this issue.

How so? 

I've read suggestions on the Mplus website to either fill in the missing tvc data with a valid value (leaving the dependent variables missing) or explicitly specifying the variances of each tvc and rerunning the model. 

If you are going to replace missing values, you should do it multiple times to account for the additional uncertainty.  Read about multiple imputation, which is asymptotically equivalent to FIML but is more flexible in accounting for other sources of information that can explain missingness.


Currently, I am not using a covariance/variance matrix to run this growth model, and I'm not sure if that's even possible while using fiml to address missing data for other model variables.

No, you need raw data in order to use FIML.  That is what "full-information" means.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Shimon Sarraf

unread,
Nov 10, 2018, 7:43:46 PM11/10/18
to lavaan
Prof. Jorgensen:

Thank you for your guidance, and my apologies for not being clear about the issue I was facing. I did not understand why my model was dropping many records when I included several dichotomous variables that had missing data. I incorrectly believed that fiml would handle missing data for dichotomous variables, but as I've learned from various readings this is not the case. Following your advice, I spent some time looking at multiple imputation procedures, referencing various exchanges in this google group. By using Amelia and identifying certain variables as dichotomous, I now have multiple imputed data sets to analyze. My models are running without losing any records. I am now struggling with getting model fit statistics based on the analysis using imputed data sets. Any references you can recommend would be much appreciated.    

I've run the following and get a long message along with a few chi-square related statistics. I cannot seem to retrieve RMSEA, SRMR, TFI, CLI or any other model fit statistics besides the ones below after multiple tries based on what I've found online.

anova(RR.fit7)

"D3" only available using maximum likelihood estimation. Changed test to "D2".
Robust correction can only be applied to pooled chi-squared statistic, not F statistic. "asymptotic" was switched to TRUE.
Robust corrections are made by pooling the naive chi-squared statistic across 100 imputations for which the model converged, then applying the average (across imputations) scaling factor and shift parameter to that pooled value. 
To instead pool the robust test statistics, set test = "D2" and pool.robust = TRUE. 

                 chisq                     df                 pvalue 
               155.546                206.000                  0.996 
                  npar                 ntotal           chisq.scaled 
                28.000               1062.000                217.717 
             df.scaled          pvalue.scaled   chisq.scaling.factor 
               206.000                  0.274                  1.537 
chisq.shift.parameters 
               116.494 

Here's the rest of the R code I used to get the above results:

RR.model7 <- '
# intercept
i =~ 1*RR10 + 1*RR11 + 1*RR12 + 1*RR13 + 1*RR14 + 1*RR15 + 1*RR16 + 1*RR17 + 1*RR18
s =~ 0*RR10 + 1*RR11 + 2*RR12 + 3*RR13 + 4*RR14 + 5*RR15 + 6*RR16 + 7*RR17 + 8*RR18
s2 =~ 0*RR10 + 2*RR11 + 4*RR12 + 9*RR13 + 16*RR14 + 25*RR15 + 36*RR16 + 49*RR17 + 64*RR18
i ~ Public + Size + FT + Female + AA + LAT + SR
#time varying covariates
RR10~Incent10
RR11~Incent11
RR12~Incent12
RR13~Incent13
RR14~Incent14
RR15~Incent15
RR16~Incent16
RR17~Incent17
RR18~Incent18
RR15~LMS15
RR16~LMS16
RR17~LMS17
RR18~LMS18
# residual variances
RR10~~r*RR10
RR11~~r*RR11
RR12~~r*RR12
RR13~~r*RR13
RR14~~r*RR14
RR15~~r*RR15
RR16~~r*RR16
RR17~~r*RR17
RR18~~r*RR18'

library(Amelia)
set.seed(12345)
RRdata.amelia <- amelia(RR.data, m = 100, p2s = FALSE, ords = c(17:29))
RR.data.amelia.imps <- RRdata.amelia$imputations
RR.data.amelia.impsRR.fit7 <- growth.mi (RR.model7, data = RR.data.amelia.imps, ordered=c(17:29))
summary (RR.fit7)

Thank you once again for your guidance!

Shimon

Terrence Jorgensen

unread,
Nov 12, 2018, 8:45:26 AM11/12/18
to lavaan
I incorrectly believed that fiml would handle missing data for dichotomous variables, but as I've learned from various readings this is not the case.

Right, because (FI)ML estimation is not available with categorical data in lavaan.

Following your advice, I spent some time looking at multiple imputation procedures, referencing various exchanges in this google group. By using Amelia and identifying certain variables as dichotomous, I now have multiple imputed data sets to analyze. My models are running without losing any records.

Because the data are now complete (but vary across imputations).

I am now struggling with getting model fit statistics based on the analysis using imputed data sets. Any references you can recommend would be much appreciated.    

When you print your model to the screen, you will see the following message:

See class?lavaan.mi help page for available methods

Specifically, use the fitMeasures() method to obtain approximate fit indices, just as you would for a lavaan object.

"D3" only available using maximum likelihood estimation. Changed test to "D2".

This is because D3 is a pooled likelihood-ratio test statistic, which requires a likelihood-based estimator.  DWLS estimation is used for categorical outcomes, so only D2 is available.

Robust correction can only be applied to pooled chi-squared statistic, not F statistic. "asymptotic" was switched to TRUE.
Robust corrections are made by pooling the naive chi-squared statistic across 100 imputations for which the model converged, then applying the average (across imputations) scaling factor and shift parameter to that pooled value. 
To instead pool the robust test statistics, set test = "D2" and pool.robust = TRUE.

See the ?lavTestLRT help page for details about these arguments.  The message is indeed cryptic, and probably only informative if you have read about pooling test statistics:

Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford. (read second half of chapter 8, about D1, D2, and D3 statistics)

Grund, S., Lüdtke, O., & Robitzsch, A. (2016). Pooling ANOVA results from multiply imputed datasets. Methodology, 12, 75-88https://doi.org/10.1027/1614-2241/a000111

Basically, there is no guidance for how to pool robust test statistics, so I made the best guess I could with the information I had, and provide options for other ideas (e.g., set test = "D2" and pool.robust = TRUE). But we still need a simulation study to show which idea works better

Shimon Sarraf

unread,
Nov 12, 2018, 10:41:47 PM11/12/18
to lavaan
Thank you once again, Prof. Jorgensen. I really appreciate your assistance. I have managed to produce model fit results and feel as though I've made good progress.
All the best,
Shimon
Reply all
Reply to author
Forward
0 new messages