How to obtain the 'imputed data' from the FIML estimator? (needed for making QQ plots)

90 views
Skip to first unread message

Amonet

unread,
Jul 4, 2018, 3:13:16 PM7/4/18
to lavaan

I would like to see if the distribution of my observed variables / items of the measurement model are approximately normally distributed (despite them being ordinal variables). I have a longitudinal study design with 4 time points, where the number of missing observations is quite high for the 4th time point (i.e. 30 people with no response to any of the items at time point 4 out of 125 people in total). Due to the missing data, I would like to compute a QQ-plot for both a complete case analysis (where I delete all persons from the data who have at least 1 non-response) and for a full information maximum likelihood analysis (FIML). 

For the FIML anaysis: is it possible to extract the imputed data that is done implicitly when using estimator = 'FIML' in Lavaan? If I can, then I would be able to use these to compute a QQ-plot for all 125 data points, instead of only about 90 data points. 

Another question regarding the QQ plot: I would make this using the Psych package via the mardia(.) function. When I use this function the data set, it shows me the following output:

Call: mardia(x = data[, 51:74])

Mardia tests of multivariate skew and kurtosis
Use describe(x) the to get univariate tests
n.obs = 89   num.vars =  24 
b1p =  221.43   skew =  3284.58  with probability =  0
 small sample skew =  3404.37  with probability =  0
b2p =  665.76   kurtosis =  5.58  with probability =  2.5e-08

As you can see, it automatically removes all persons who have at least 1 missing observation (i.e. n.obs = 89, instead of 125). I was wondering however, the skew = 3284.58 seems extremely high and does not really seem to be in line with what the QQ plot shows (please see below). To me, the QQ plot shows the data has some departures from normality in both tails, but it's not super extremely non-normally distributed. How should I interpret this skewness number? The kurtosis seems 'regular'... 



Thanks,
Amonet 



Mauricio Garnier-Villarreal

unread,
Jul 5, 2018, 9:02:09 PM7/5/18
to lavaan
FIML does not imputed. It doesnt fill the blanks with data in any form. It uses the available data from each subject for the analysis

SEM has no assumption on the individual variables. The assumption is that the residuals are multivariate normal

Amonet

unread,
Jul 6, 2018, 5:32:50 PM7/6/18
to lavaan
Thanks for your reply. 

I thought that I somewhere read that CFA / SEM requires the indicators the be multivariate normal - but now that I have tried to find it, I cannot find it. Are you sure of this? It would be in line with ordinary least squares (OLS) regression, so you're probably right. How am I able to assess the (non)normality of the residuals using Lavaan (possibly in combination with another package, like Psych)?

Appreciate all the input, 

Amonet 

Terrence Jorgensen

unread,
Jul 9, 2018, 11:17:37 AM7/9/18
to lavaan
Are you sure of this? It would be in line with ordinary least squares (OLS) regression, so you're probably right.

OLS regression makes an exogeneity assumption about indicators, and no distributional assumptions are made about them.  The only random variable is the outcome, and its only random component is the residual, so that is what the distributional assumptions are about (IID normal).  The same holds in SEM, if you have exogenous observed variables (and set fixed.x = TRUE in lavaan).  But whereas MIMIC models have exogenous predictors, regular CFA models do not.  The exogneous variables in CFA are latent, and those are also assumed normally distributed  (same holds for structural regression models, or any other models without exogenous observed variables, like latent growth curve models without predictors).  Thus, the observed indicators are weighed (by loadings) sums of 2 random components: the common factor and the unique factor.  Those are both assumed normal, and a sum of 2 normal random variables is also normal.  That implies the observed endogenous variables in CFA are themselves multivariate normal, so yes, you can investigate their empirical distributions.

How am I able to assess the (non)normality of the residuals

You can't -- they are latent, just like in OLS regression.  Although in OLS regression, we at least get their estimates, by subtracting the predicted values from the observed values.  In CFA we do not have that option because we have not observed the predictors (i.e., common factors are latent).

Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Amonet

unread,
Jul 11, 2018, 6:24:37 AM7/11/18
to lavaan
Thank you for your elaborate explanation Terrence. 
Just to see if I understand correctly: I can see if the distribution of the observed endogenous indicators approach a normal distribution - so I can for example make a Q-Q plot of the observed indicators indicators (i.e. use the raw data sample) and assess the (departure from) normality? I would probably do this for both the univariate distributions (i.e. each observed indicator separately) and multivariate (i.e. all observed indicators together). In the latter (multivariate) case the Q-Q plot probably gives a false indiction, since it does not take into account the correlation over time. Do you agree on this? 

Kind regards,
Amonet 

Terrence Jorgensen

unread,
Jul 15, 2018, 6:38:40 AM7/15/18
to lavaan
so I can for example make a Q-Q plot of the observed indicators indicators (i.e. use the raw data sample) and assess the (departure from) normality?

Yes

I would probably do this for both the univariate distributions (i.e. each observed indicator separately) and multivariate (i.e. all observed indicators together). In the latter (multivariate) case the Q-Q plot probably gives a false indiction, since it does not take into account the correlation over time. Do you agree on this? 

No, the parameters of a multivariate normal distribution are a mean vector and covariance matrix, so observed variables can covary (be correlated) without violating the normality assumption.  
Reply all
Reply to author
Forward
0 new messages