Latent scores from a CFA model.

1,891 views
Skip to first unread message

David Giofrè

unread,
Aug 26, 2016, 4:05:49 PM8/26/16
to lavaan

Hello everyone,

I need to calculate factor scores. I tried many times, but I am a little bit confused about what are functions predict and LavPredict actually doing.


Here is an example.

HS.model <- '

visual  =~ x1 + x2 + x3

textual =~ x4 + x5 + x6

speed   =~ x7 + x8 + x9 '

fit <- cfa(HS.model, data=HolzingerSwineford1939)

summary(fit,standardized=T)

Covariances:

  visual ~~

    textual           0.408    0.074    5.552    0.000    0.459    0.459

    speed             0.262    0.056    4.660    0.000    0.471    0.471

  textual ~~

    speed             0.173    0.049    3.518    0.000    0.283    0.283

 

# The latent correlation between visual and spatial is .459

 

#Now I tried to calculate latent scores

a<-data.frame(lavPredict(fit))

a<-data.frame(predict(fit))

Then, I correlated the visual and textual factors.

> cor(a$visual,a$textual)
[1] 0.5516272

 

#now the correlation between visual and textual is .552,  which is considerably different from the previous one (i.e., .459).

 

In fact, I do not understand why these two correlations are different from each other.


Any help would be greatly appreciated.


Best,

David

Terrence Jorgensen

unread,
Aug 29, 2016, 4:08:58 AM8/29/16
to lavaan

In fact, I do not understand why these two correlations are different from each other. 


Factor scores are estimates based on estimates.  Conceptually, the regression equations from the CFA are reversed to have the outcomes (indicators) predict the explanatory variables (common factors).  I think factor indeterminacy (distinct but related to rotational indeterminacy) prevents factor scores from having exactly the same summary statistics as the CFA parameter estimates.  You can find a lot of discussion about factor scores on SEMNET, and Ed Rigdon's posts (and articles he has written) provide discussion about factor indeterminacy in particular.


Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Edward Rigdon

unread,
Aug 29, 2016, 9:20:10 AM8/29/16
to lav...@googlegroups.com
     Thanks for the shout-out.
     When you actually solve for the common factors (or the specific factors / residual variances)--papers by Guttman or Schoenemann & Steiger are among the most readable--there is an extra term that appears, but this term is not included in the formulas that most packages use when providing factor scores. This extra term's variance is arbitrary, but constrained to be orthogonal to all variables within the factor model. You could use white noise, in the right amount, and you would get "A" set of factor scores that performed exactly like the factors in the model. (This is a trivial problem compared to the outside-the-model problems with factor scores--those are "yuge.") But if you use Bartlett's regression method or any other variant that ignores the extra term, then your factor scores are missing a variance component--they are a bit too purified, like a single imputation approach for modeling missing data.
     I believe the LISREL package now generates scores in such a way to account for this extra variance component, so their scores will not show this specific problem. It would not be hard to modify lavaan to have it provide scores including the extra variance component, at least as an option. But then I would want the software to provide a warning that this set of factor scores is only one of an infinite variety of factor scores that are equally correct. Or you can correct the factor scores from lavaan by adding the extra variance component yourself. This really is an easy fix.

Guttman, L. 1955. The determinacy of factor score matrices with implications for five other basic problems of common-factor theory. British Journal of Statistical Psychology, 8: 65-81.

Schönemann, P. H. & Steiger, J. H. 1976. Regression component analysis. British Journal of Mathematical and Statistical Psychology, 29: 175-189.

Schönemann, P. H., & Steiger, J. H., 1978. On the validity of indeterminate factor scores. Bulletin of the Psychonomic Society, 12: 287-290.




--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+unsubscribe@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Yves Rosseel

unread,
Aug 29, 2016, 9:54:07 AM8/29/16
to lav...@googlegroups.com
There are many ways to compute factor scores. I believe LISREL (at least
the LISREL-8 family) uses an adaption of the Anderson & Rubin (1956)
method, where the factor scores are shifted and rotated in such a way,
so that their empirical covariance matrix (and mean vector) corresponds
with the estimated factor covariances (and means) as they are implied by
the SEM. That does not make them 'better', or more useful. It is perhaps
a matter of taste.

(See: http://www.ssicentral.com/lisrel/techdocs/lvscores.pdf)

An alternative approach to deal with factor scores is to 'correct' them
using a method described by Marcel Croon, so that their empirical
covariance matrix approaches the population covariance matrix of the
latent variables. From here, we can compute unbiased regression
coefficients, correlations, and so on. To compute a proper standard
error for these regression coefficients, we need to take the uncertainty
(as a function of the residual item variances) into account. A first
paper on this is now available:

http://epm.sagepub.com/content/early/2015/09/29/0013164415607618.abstract

An extension to the full SEM framework is under way.

Yves.

David Giofrè

unread,
Aug 29, 2016, 2:05:24 PM8/29/16
to lavaan
Thank you very much. I calculated them with LISREL.

All the best,
David

Matthias Vorberg

unread,
Nov 16, 2020, 9:56:48 AM11/16/20
to lavaan
Dear Yves, dear all,

I would like to use bias-corrected factor scores in a subsequent analysis. (The fsr() function seems not to be an option for me, as I have to combine factor scores and raw data in the model.)
Can I use lavPredict() to apply the correction by Croon in some way to add bias-corrected factor scores to my data set?
If yes, how do I have to modify the code? (I used the HolzingerSwineford data to test the function before using my own data.)

HS.model <- '
visual  =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed   =~ x7 + x8 + x9 '

fit <- cfa(HS.model, data=HolzingerSwineford1939)

#show factor scores
head(lavPredict(fit, type ="lv"))

#add factor scores to data set
fitPredict <- as.data.frame(predict(fit))
dataUpdated <- cbind(HolzingerSwineford1939, fitPredict)

Matthias Vorberg

unread,
Nov 27, 2020, 10:18:15 AM11/27/20
to lavaan

Since I have not received an answer here so far, I try again to bring the question to the point:

(How) Can I apply the correction by Croon (2002) to get bias-corrected factor scores for a subsequent analysis?


Best regards,
Matthias

Terrence Jorgensen

unread,
Nov 28, 2020, 3:07:22 PM11/28/20
to lavaan

(How) Can I apply the correction by Croon (2002) to get bias-corrected factor scores for a subsequent analysis? 

Yves has implemented the Croon correction in the experimental (still hidden) function fsr(), but it has not been implemented for categorical indicators.  


The Croon correction requires the covariance matrix of factor scores, which is currently only available for continuous indicators; see ?lavPredict

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Ale Dea

unread,
May 2, 2022, 12:35:07 PM5/2/22
to lavaan
Hi Terrence,

I was searching the forum for a similar problem and came across this post. May I please ask you if fsr has been implemented for categorical data?

Many thanks

Terrence Jorgensen

unread,
May 4, 2022, 12:16:12 PM5/4/22
to lavaan
May I please ask you if fsr has been implemented for categorical data?

No, and it won't because fsr() is abandoned in favor of sam() -- read the reference on that help page.  SAM has not yet been implemented for categorical data.

Yves Rosseel

unread,
May 18, 2022, 6:39:20 AM5/18/22
to lav...@googlegroups.com
The fsr() function is indeed replaced by the sam() function. If you add
the ordered= argument in the sam() function, you can fit the measurement
models with categorical indicators.

We ran a simulation study to see how well this works, and the results
were very positive. However, this work has not been published yet (and
has not even been written up). So in that sense, there is no 'official'
support of categorical data yet.

Yves.

On 5/4/22 18:16, Terrence Jorgensen wrote:
> May I please ask you if fsr has been implemented for categorical data?
>
>
> No, and it won't becausefsr() is abandoned in favor ofsam() -- read the
> reference on that help page.  SAM has not yet been implemented for
> categorical data.
>
> Terrence D. Jorgensen
> Assistant Professor, Methods and Statistics
> Research Institute for Child Development and Education, the University
> of Amsterdam
> http://www.uva.nl/profile/t.d.jorgensen
> <http://www.uva.nl/profile/t.d.jorgensen>
>
> --
> You received this message because you are subscribed to the Google
> Groups "lavaan" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to lavaan+un...@googlegroups.com
> <mailto:lavaan+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/lavaan/edaf1b71-e700-4849-ad86-aab69c5fbccfn%40googlegroups.com
> <https://groups.google.com/d/msgid/lavaan/edaf1b71-e700-4849-ad86-aab69c5fbccfn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Jungyeong Heo

unread,
Jul 5, 2022, 11:36:39 AM7/5/22
to lavaan
Hello, everyone!

I am a beginner to use factor score regression with Croon's correction for building my mediation models.
I am recently going through the discussion about factor score regression with Croon's correction in this group, and I have learned a lot. Thank you all! 

I have a follow-up question to Dr. Rosseel's answer (5/18/2022). Since my indicator variables are all ordinal variables (based on the Likert scale), I have tried to add "ordered= argument in the sam() function" based on his last comment. However, I got an error message that, I suppose, my variables need to be numeric.

## sam function code ##
fit.sam1 <- sam(RP01, data = Diss02, ordered= c("perf_1sop", "perf_10sop", "perf_27sop", "perf_29sop", "perf_42sop","perf_16swc", "perf_20swc", "perf_23swc", "perf_38swc",  "perf_40swc", "SDc_1", "SDc_2", "SDc_3", "SDc_4", "SDc_5", "SDp_1", "SDp_2", "SDp_3", "SDp_4", "SDp_5", "SDf_1", "SDf_2", "SDf_3", "SDf_4", "SDf_5", "SB_1a", "SB_2a", "SB_3r", "SB_4r", "SB_5a", "SB_6r", "SB_7r", "SB_8a", "SB_9r", "SB_10a", "SB_11a", "SB_12r", "SCS_1", "SCS_2", "SCS_3", "SCS_4", "SCS_5", "SCS_6", "SCS_7", "SCS_8", "SCS_9", "SCS_10", "CESD_1", "CESD_2", "CESD_3", "CESD_4r",
"CESD_5", "CESD_6", "CESD_7", "CESD_8r", "CESD_9", "CESD_10", "CESD_11","CESD_12r","CESD_13","CESD_14","CESD_15",
"CESD_16r","CESD_17","CESD_18","CESD_19","CESD_20"), orthogonal = TRUE)


## error message ##
Warning in lav_partable_check(lavpartable, categorical = lavoptions$categorical,  :
  lavaan WARNING: parameter table does not contain thresholds
Warning in lav_partable_check(lavpartable, categorical = lavoptions$categorical,  :
  lavaan WARNING: parameter table does not contain thresholds
Error in base::.colMeans(Y, m = NROW(Y), n = NCOL(Y)) :
  'x' must be numeric


I am wondering if my code is correct, and if the sam function does not support ordinal indicators, I am curious whether there is any other method (packages/functions/or calculators) with which I can conduct factor score regression using Croon's correction. 
I looked at the formula of Croon's method, but I have some limitations to calculate Croon's corrected variance-covariance by hand. I would be so grateful if you can share your wisdom with me!

Thank you!!! 



2022년 5월 18일 수요일 오전 6시 39분 20초 UTC-4에 yros...@gmail.com님이 작성:

Jungyeong Heo

unread,
Jul 7, 2022, 3:00:01 PM7/7/22
to lavaan
I just want to share with you Dr. Rosseel's response here. 

The errors like what I posted above will come out with a small dataset with ordered variables.
You can fix the bug in the sam()function by setting se ="none"; however, you can get only partial results. 

In my case, my sample size is 346. With se ="none",  I was only able to get estimate results. 
Thank you so much for your response, Dr. Rosseel again :) 

------ follow-up question 
I have follow-up questions to continue to make factor score path analyses work. 

It seems like Croon's correction with ordinal variables is not available with sufficient results yet. 
Also, I found it hard to use Croon's correction for factor score path analyses when the measurement models are a bit complex - for example, bi-factor model or models with corss-loading indicators, etc. I also tried to treat my ordinal variables as continuous (since it is 4-6 Likert scale) to use sam()function, but the model fit was horrible because of (I assumed) cross-loading indicators and unused factors in the model.

All factor score determinacies of my variables (which are coded as ordinal) are > .80, so it is pretty good.
In this case, would it be okay to use the factor scores extracted from lavaan for path analyses without factor score correction?
Or is there any other simple way to correct factor scores (although it may not as effective as Croon's) to take factor score uncertainty into account? 

Thank you so much in advance. 

Sincerely,
Jun

2022년 7월 5일 화요일 오전 11시 36분 39초 UTC-4에 Jungyeong Heo님이 작성:

Wen L.

unread,
Oct 4, 2022, 1:06:57 AM10/4/22
to lavaan
> I also tried to treat my ordinal variables as continuous (since it is 4-6 Likert scale) to use sam()function, but the model fit was horrible because of (I assumed) cross-loading indicators and unused factors in the model.

Have you considered using measurement blocks with more than one factor in each block, and specifying the cross-loadings in the measurement model (for each block)?

> In this case, would it be okay to use the factor scores extracted from lavaan for path analyses without factor score correction?

The path coefficient estimates are biased if using the factor scores simply in place of the latent factors.

> Or is there any other simple way to correct factor scores (although it may not as effective as Croon's) to take factor score uncertainty into account? 

IMO, sam() is probably the most straightforward. An approach that allows for categorical indicators is described in the paper below:

Lai, M. H. C., & Hsiao, Y.-Y. (2022). Two-stage path analysis with definition variables: An alternative framework to account for measurement error. Psychological Methods, 27(4), 568–588. https://doi.org/10.1037/met0000410


Wen Wei Loh
Assistant Professor | Department of Quantitative Theory and Methods, Emory University

dirkp...@gmail.com

unread,
Nov 10, 2023, 6:03:59 AM11/10/23
to lavaan
Dear all,

this thread has been very helpful, as I am also trying to get factor scores with Croon's correction. My question is how this works in practice. I am able to fit my model using the SAM approach like this:

fit_sam<-sam(wave8_model, data=train_data, missing='ML', sam.method = "local", output = "lavaan")

But how will I then proceed with actually extracting the factor scores to be used in subsequent analyses?

Would I just do the following?:

fscores<-lavPredict(fit_sam, method='regression', type='lv') 

Help is much appreciated!

Best,
Dirk

Terrence Jorgensen

unread,
Nov 15, 2023, 9:54:32 AM11/15/23
to lavaan
how will I then proceed with actually extracting the factor scores 

If you read the SAM paper (cited in the help-page references), you can learn that factor scores are not estimated in SAM.  The idea is to fit the "subsequent" model right away, so you can actually trust your SEs.
Reply all
Reply to author
Forward
0 new messages