Extracting "residual terms" in CFA results

Jinghui Liang

unread,

Aug 29, 2023, 12:22:17 PM8/29/23

to lavaan

Hi folks,

I am interested in residuals/variances in CFA results. However I got confused about the differences between residuals in regression and factor analysis. My question is, can I extract the residual/variance terms from CFA results, then recover or "predict" a new dataset including scores for each variable (question)? Here's an example to make my question clear:

dat <- HolzingerSwineford1939

model <- '
visual =~ 1*x1 + x2 + x3
textual =~ 1*x4 + x5 + x6

visual ~~ visual + textual
textual ~~ textual

fe1 =~ 1*x1; fe1 ~~ fe1
fe2 =~ 1*x2; fe2 ~~ fe2
fe3 =~ 1*x3; fe3 ~~ fe3
fe4 =~ 1*x4; fe4 ~~ fe4
fe5 =~ 1*x5; fe5 ~~ fe5
fe6 =~ 1*x6; fe6 ~~ fe6
'

fit <- lavaan(model, data = dat)

In this case, can I use

residuals <- lavInspect(fit, what = "cov.lv")

to extract variances of "fe1" to "fe6", and then use them and factor loadings or anything to obtain a "model-implied dataset", then calculate the differences between the raw dataset and "model-implied dataset", therefore compiling a so-called "residual dataset"? Or is it just a nonsense?

I know I am trying to do weird stuff... but can someone help me with that? Hugely appreciate any help in advance.

-Jinghui

Terrence Jorgensen

unread,

Sep 15, 2023, 5:32:19 AM9/15/23

to lavaan

The criterion for choosing model parameters in OLS regression is minimizing (sum of squared) casewise residuals, but in SEM the criterion is minimizing residuals of summary statistics (covariance matrix: Sigma), not casewise/raw data.

Predicted / model-implied Sigma: lavInspect(fit, "cov.ov") or fitted(fit)
Observed Sigma(-hat) or S: lavInspect(fit, "sampstat")
Residuals = Predicted Sigma minus observed S: resid(fit) or more information from lavResiduals(fit)

You can use the lavPredictY() function to obtain casewise expected values, and you can manually calculate casewises residuals as the difference between casewise expected values and your raw data. Jarrett Byrnes posted some example syntax in his feature request:

https://github.com/yrosseel/lavaan/issues/269

Terrence D. Jorgensen

Assistant Professor, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

http://www.uva.nl/profile/t.d.jorgensen

Jinghui Liang

unread,

Oct 27, 2023, 1:06:56 PM10/27/23

to lavaan

Hi Terrence,

Many thanks for your reply. I tried to post here, but for some reason, the repo was disappeared...

I tried the lavPredictY() function, but it didn't work for me, since all my variables were categorical and ordinal (from a 5-point Likert scale). Instead, I had the following solution, which gave me a closing solution:

Let's say we have a raw_dat including nine exogenous variables x1 - x9, with every three variables measuring a latent variable F1 - F3. So the CFA model would be

cfa_model <- '
F1 =~ x1 + x2 + x3
F2 =~ x4 + x5 + x6

F3 =~ x7 + x8 + x9

'

To calculate the casewise residual data casewise_res, I did:

fit <- cfa (cfa_model,

data = raw_dat,

ordered = TRUE,

std.lv = TRUE)

predict_dat <- lavPredict (fit, type = "ov")
std_dat <- as.data.frame(scale(raw_dat))

casewise_res <- std_dat - predict_dat

In this code, standardized scores for each variable (std_dat) were calculated because I found lavPredict() seems to provide this form of data. Hope I am on the right track

Subsequent work will start with data casewise_res since I would like to do some regression on residuals in behaviour we will be in OLS regression.

Does this make any sense to you? Or calculating casewise residual for categorical/ordinal data is incorrect? Many thanks for any help in advance.

Cheers

-J

Terrence Jorgensen

unread,

Nov 1, 2023, 11:04:49 AM11/1/23

to lavaan

all my variables were categorical

lavPredictY() does not apply to that. The model parameters are on the latent-response scale, not the data's observed discrete scale.

Reply all

Reply to author

Forward