How to evaluate if the residuals are normally distributed with lavaan() or cfa() models?

1,763 views
Skip to first unread message

Amonet

unread,
Apr 15, 2018, 6:39:25 AM4/15/18
to lavaan
Hi all,

As the question / topic states, how can I evaluate if the residuals of a fitted model using Lavaan are normally distributed. Or more more meaningfully phrased: how can I assess if the departure from normality is not severe? I didn't know I needed to assess this, until I came across the notion in the book of Todd D. Little: Longitudinal structural equation modeling (2013) ( https://www.guilford.com/books/Longitudinal-Structural-Equation-Modeling/Todd-Little/9781462510160 ). 

I am estimating longitudinal CFA models, with 4 time points and 5 items / observed variables.

Thanks,

Amonet

Terrence Jorgensen

unread,
Apr 17, 2018, 5:19:50 AM4/17/18
to lavaan
As the question / topic states, how can I evaluate if the residuals of a fitted model using Lavaan are normally distributed. Or more more meaningfully phrased: how can I assess if the departure from normality is not severe?

You can find covariance residuals with resid(fit), but it sounds like you are referring to casewise residuals (observed scores minus predicted values).  SEM in a frequentist framework is an analysis of (mean and) covariance structure.  Casewise residuals are not part of standard output because predicted values are not calculated for anything, and predicted values are not calculated because we have not observed the predictors (the common factors).  You could use predict(fit) to extract estimated factor scores and calculate predicted values (then residuals) manually, but estimated factor scores can differ across methods used to estimate them (even holding the estimation method for model parameters constant).

I didn't know I needed to assess this, until I came across the notion in the book of Todd D. Little: Longitudinal structural equation modeling (2013) ( https://www.guilford.com/books/Longitudinal-Structural-Equation-Modeling/Todd-Little/9781462510160 ). 

What method did he recommend?  Or was he simply stating assumptions (regardless of whether they can be verified)?

Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Amonet

unread,
Apr 17, 2018, 2:40:35 PM4/17/18
to lavaan
Thanks for replying.
 
What method did he recommend?  Or was he simply stating assumptions (regardless of whether they can be verified)?

He did not recommend anything. To cite him: "As seen in table 5.5, this model shows acceptable levels of model fit. If you download the output file from this model, you'll see that the fitted residuals from this model are normally distributed and that the modification indices do not ..." 

Problem is, it's an MPLUS script / output and requires the program to load it. I don't have that. 

Sorry for being unclear about which residuals I meant, but your inference was right. It sounds like computing them manually and interpreting them is not straightforward, because of the uncertainty around it. I thought it was a necessary check, but if it isn't, then I'd rather keep the analysis as it is. I am having enough difficulty with it already :) 

Best wishes,
Amonet 

kma...@aol.com

unread,
Apr 17, 2018, 11:44:30 PM4/17/18
to lavaan
Amonet,
I just went to the book's companion site and both the Mplus input files and Mplus output files were text files that I was able to open in a text editor.

I am not convinced that Todd is referring to case residuals based on the quotation that you provided.  Which file does the text refer to?

I like to put residuals in a stem plot.

require(lavaan)
## The famous Holzinger and Swineford (1939) example
HS
.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '


fit
<- cfa(HS.model, data=HolzingerSwineford1939)
summary
(fit, fit.measures=TRUE)

HS
.resid <- resid(fit, type='cor')$cor
HS
.resid
stem
(HS.resid[lower.tri(HS.resid, diag=FALSE)])

Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/


Amonet

unread,
Apr 18, 2018, 1:44:04 PM4/18/18
to lavaan
Dear Keith,

I would think he refers to the file: ch5.tab5.5.configl.2by3.out, but I haven't been able to find his specific reference.. Reason I think it's this one, is because he talks about table 5.5 in the text and about the configural model. I am however not sure.. 

Amonet 

kma...@aol.com

unread,
Apr 19, 2018, 8:15:50 AM4/19/18
to lavaan
Amonet,
I looked at several of the Mplus and LISREL output files and could not find any output for Table 5.5 that included residual moments in any form (except residual variances which are entirely different).  So, it remains something of a mystery for me what Todd was referring to. Let me know if you find something that I missed.  However, the LISREL output very helpfully includes the sample moments.  So, you could read those into lavaan to reproduce the analysis and extract the sample moments if you desired.

Given the information that you have provided, it appears that Todd may have been arguing that the configural model has adequate fit despite a statistically significant goodness-of-fit chi-square.  Such an argument runs counter to the frequentist logic for using tests of statistical significance as I understand it.  So, I would advise caution in emulating that practice with your own data, given the caveat that I do not have the full context.

The reasoning of the passage you quoted may be that if the model fits then residual moments differ from zero by sampling error and will follow a normal distribution centered around zero.  (The chi-square goodness-of-fit test treats each residual moment like a normal random deviate.)  However, if there is a localized source of poor fit due to a specific misspecification in the model, then you might expect a subset of residual moments to deviate from this pattern.  Based on the information that I have, I incline toward interpreting Todd's statement as referring to residual moments, not to individual case residuals.  If you are interested in the latter, I recommend taking a look at Tenko Raykov's work on the topic.

Amonet

unread,
Apr 21, 2018, 6:40:37 AM4/21/18
to lavaan
Dear Keith,

Thanks a lot for your efforts and advice, much appreciated. If I come across anything that may shed more light on this, I will post it here and let you know.

Kind regards,
Amonet 

Amonet

unread,
May 6, 2018, 9:05:17 AM5/6/18
to lavaan
Hi,

I just found another comment when going through TD Little's book. He says: "The fitted residuals are the differences between the estimated values of the reproduced matrix and the observed value for each element of the observed matrix. These residuals should be normally distributed with a mean of 0. Any evidence of skew, platykurtosis (fat tails), or outliers suggests local misfit" (Page 118). To me this sounds like he means the residuals you get when using: resid(fit) in Lavaan, but I am still not sure. 

Kind regards,
Amonet 

kma...@aol.com

unread,
May 7, 2018, 7:59:45 AM5/7/18
to lavaan

Amonet,
  Yes, these are the residuals for the sample moments, not individual values of variables.  This is why I like to put the non-redundant off-diagonal values in a stem-and-leaf plot.  If you have a small model with few observed variables, there is less to go on.  With larger models, you have more residuals to work with.

  When Todd cautions about heavy tails, he is referring to a pattern of residuals that are larger than would be expected by random normal deviations.  Skewness also suggests a pattern of larger-than-expected residuals in the tail.  Likewise an outlier is a larger-than-expected residual that appears outside the distribution.  The describe() function in the psych package offers a convenient way to obtain numeric summary statistics for univariate distributions.

Amonet

unread,
May 12, 2018, 3:51:12 AM5/12/18
to lavaan
Hi Keith,

Thank you for replying, I didn't see you have - sorry for the late reply. 

Op maandag 7 mei 2018 13:59:45 UTC+2 schreef kma...@aol.com:

Amonet,
  Yes, these are the residuals for the sample moments, not individual values of variables.  This is why I like to put the non-redundant off-diagonal values in a stem-and-leaf plot.  If you have a small model with few observed variables, there is less to go on.  With larger models, you have more residuals to work with.


So if I understand you correctly, you take the residual variance-covariance matrix and use the covariances in your stem-and-leaf plot? When you do this, do you split up the time points or take them all together? For example, I have 4 periods where the same variables were measured for the same individuals. I wonder if I should put all the residual covariances (6 variables times 4 periods) in one plot or not. In Lavaan I thought it's not possible to put the residuals in a plot.. do you have a trick for that?
 
  When Todd cautions about heavy tails, he is referring to a pattern of residuals that are larger than would be expected by random normal deviations.  Skewness also suggests a pattern of larger-than-expected residuals in the tail.  Likewise an outlier is a larger-than-expected residual that appears outside the distribution.  The describe() function in the psych package offers a convenient way to obtain numeric summary statistics for univariate distributions.

 Thanks for pointing out the psych package and the describe() function. This you would use to get a 'feel' of the raw data, right? So for each observed variable see what values were given by the respondents? (e.g. in case of a questionnaire). 

kma...@aol.com

unread,
May 13, 2018, 9:26:30 AM5/13/18
to lavaan
Amonet,

I have not worked out the best way to point to a lavaan groups post.  See if this link works for you.

https://groups.google.com/forum/#!searchin/lavaan/stem%28/lavaan/3jT8AP4UgO8/H-454PShAAAJ

Edineusa

unread,
Nov 13, 2018, 7:53:57 PM11/13/18
to lavaan
Hi Terrence,

I also need to get the casewise residuals to evaluate spatial correlation (Moran I) and redefine sample size. I am following this tutorial http://byrneslab.net/classes/sem_notes/11_Advanced_Topics.pdf. But, when I run the predict(fit) and lavPredict (fit), the output shows the row names only.  Is there another way do get these residuals? Does this mean a problem in my model?

Thanks,
Edineusa

Terrence Jorgensen

unread,
Nov 22, 2018, 9:02:18 AM11/22/18
to lavaan
when I run the predict(fit) and lavPredict (fit)

Those return factor scores, not residuals.  

the output shows the row names only. 

You can use lavInspect(fit, "case.idx") to see which row in the original data= argument each observation came from.

Is there another way do get these residuals? 

You can use the estimates to calculate predicted values manually.  It is tedious, but not difficult.  SEM estimates are just a bunch of regression equations.  

Jarrett Byrnes (whose tutorial you refer to) wrote some functions to do return casewise residuals and predicted values in the case of path analyses with observed variables.  I'm sure he would be willing to share them with you.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Mauricio Garnier-Villarreal

unread,
Nov 23, 2018, 4:16:30 PM11/23/18
to lavaan

at least the casewise residuals are already part of the options in lavaan.What I am unsure about this method, if it is sensitive to estimation methods like the factor scores

## The famous Holzinger and Swineford (1939) example
HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit <- cfa(HS.model, data=HolzingerSwineford1939)
summary(fit, fit.measures=TRUE)
residuals(fit, type="casewise")

Hi Edineusa

unread,
Nov 29, 2018, 9:15:05 AM11/29/18
to lav...@googlegroups.com

Hi Terrence,

 

Thank you so much =)

I found an easy way to get new model estimates after a spatial correction (spatial dependence): the “spatialCorrect” function. But, this function works only with the deprecated “semTools” version. Do you recommend another function with similar role in the new version (semTools v 1.1)?

 

Thank you again,

Edineusa

 

 

Sent from Mail for Windows 10

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

 

Terrence Jorgensen

unread,
Nov 30, 2018, 5:15:09 AM11/30/18
to lavaan

the “spatialCorrect” function. But, this function works only with the deprecated “semTools” version. Do you recommend another function with similar role in the new version (semTools v 1.1)?


I'm not sure why this function was removed, but it was gone before I became the maintainer.  If it was removed, it might have become available in another package, or it might not have worked as expected, so I would contact its author Jarrett Byrnes about whether it should be used.  But you can install an archived version of semTools 0.4-12 by downloading it


And installing from the source:

install.packages("semTools_0.4-12.tar.gz", repos = NULL, type = "source")

Reply all
Reply to author
Forward
0 new messages