R-squared attenuation correction in factor score regression

35 views
Skip to first unread message

Rita Pereira

unread,
Nov 7, 2025, 4:53:06 PM (4 days ago) Nov 7
to lavaan
Hello! 

I'm trying to use SEM to predict latent variables which I would live to use in a standard OLS. 
The goal is to use two noisy proxies for each latent variables, as a measurement error correction technique. The proxies are PGIs and the latent variable is the true genetic effect of the individual, mom and dad. I am mostly interested in obtaining the correct R-squared. 

For simplicity I am tackling the case of only one PGI - the individual. I have tried predicting the latent variable after a SEM with the outcome.

outcome <- PGI_latent
PGI_latent <- pgi_1 pgi_2 (R-squared of outcome ~24% )

predict PGI_latent
reg outcome PGI_latent (result: overfitting of R-squared ~30% ) 


Other alternative is to only use the PGIs to predict the outcome. This avoids overfitting however, regardless of the correction I make (Bartlett or Croons), I get consistently lower R-squares of about 16%. (coefficients are unbiased compared with SEM) 

PGI_latent <- pgi_1 pgi_2 

predict PGI_latent (regression, bartlett, croon)
reg outcome PGI_latent (result: R-squared ~16% ) 

Is there any way to correct both the coefficients and the R-squared in factor score regression? In the simple case and in the more complex case with 3 latent variables? 

Thank you so much :) 
Rita

Below my code:

  #Sem full model model
  model_full <- '
  # measurement part
  PGI_latent =~ pgs_ea_child_ukb_std + pgs_ea_child_23me_std

  # structural part
  ks4_avg ~ PGI_latent
'
  fit_full <- sem(model_full, data = data_alspac_R)
  summary(fit_full, fit.measures = TRUE, standardized = FALSE, rsquare = TRUE)
  #r-squared 23
 
 
  #Sem no outcome
  model_latent_only <- '
  PGI_latent2 =~  pgs_ea_child_ukb_std + pgs_ea_child_23me_std
'
  fit_latent_only <- sem(model_latent_only, data = data_alspac_R)
  summary(fit_latent_only, fit.measures = TRUE, standardized = FALSE, rsquare = TRUE)
 
  #lavaan
  latent_scores_2 <- lavPredict(fit_latent_only)
  lm_2 <- lm(data_alspac_R$ks4_avg ~ latent_scores_2)
  summary(lm_2)
  #R squared 16.41
 
  #regression method
  latent_scores_2reg <-lavPredict(fit_latent_only, method = "regression")
  lm_2reg <- lm(data_alspac_R$ks4_avg ~ latent_scores_2reg)
  summary(lm_2reg)
  #R squared 16.41
 
  #barlett
  latent_scores_2b <-lavPredict(fit_latent_only, method = "Bartlett")
  lm_2b <- lm(data_alspac_R$ks4_avg ~ latent_scores_2b)
  summary(lm_2b)
  #R squared 16.41
 
  #croon's correction
  #method 1
  fit_sam <- lavaan:::sam(model_latent_only, data = data_alspac_R, sam.method = "local")
  summary(fit_sam, standardized = TRUE)
  latent_scores_croon <- lavPredict(fit_sam, method = "regression")
  head(latent_scores_croon)
  data_alspac_R$PGI_latent2_croon <- latent_scores_croon[, "PGI_latent2"]
  lm_2croon <- lm(ks4_avg ~ PGI_latent2_croon, data = data_alspac_R)
  summary(lm_2croon)
  #16.55




Rita Pereira

unread,
Nov 10, 2025, 6:53:31 AM (yesterday) Nov 10
to lavaan
For clarification, I am running a simulation where I know the true latent effect. The R-squared of the latent component is 24% (first regression). I cannot seem to retrieve factor scores that give the same R-squared with existing measurement error corrections. Thank you! 

Edward Rigdon

unread,
Nov 10, 2025, 7:32:00 AM (yesterday) Nov 10
to lav...@googlegroups.com
"Factor scores" will not reproduce the factor covariance matrix without some correction. That is because what are known as "factor scores" are generally only the predicted part of the common factors, leaving out the random part. See for example Rigdon, Becker, Sarstedt (2019):
Rigdon, E. E., Becker, J. M., & Sarstedt, M. (2019). Factor indeterminacy as metrological uncertainty: Implications for advancing psychological measurement. Multivariate Behavioral Research54(3), 429-443.
There are a variety of corrections which will give you adjusted "factor scores" that will mimic the factor covariance matrix. Or you could take your co=urrent set of scores and apply a covariance matrix transplant, replacing the incorrect covariance matrix of the existing scores with the covariance matrix from your lavaan output. I can share some basic code, if this is the way that you want to go.

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lavaan/606a63c6-b828-4146-a776-4dd99cd14e2en%40googlegroups.com.

Rita Pereira

unread,
Nov 10, 2025, 8:53:01 AM (yesterday) Nov 10
to lavaan
Dear Edward, 

Thank you so much for your kind response. It's great to know that there might be a possibility of using factor scores in my setting with unbiased R-squares! 

I would be interested in correcting the scores itself, so that I can have one variable that I can use subsequently, rather than a measurement error correction of the R-squared, which I think would become rather complex when applied to my final model. 

I believe that your suggestion, of replacing the covariance structure, would correct my predicted latent variable, and if so it would be a great help if you don't mind sharing the code! If I am interpreting your suggestion wrong, and this would be an R-squared correction, I would be keen to know whether other options exist that correct the predicted latent variable. 

Many thanks again for your response. 
Best wishes, 
Rita


Edward Rigdon

unread,
Nov 10, 2025, 12:32:05 PM (22 hours ago) Nov 10
to lav...@googlegroups.com
Rita--
     (Catching my breath) If you are getting factor scores from lavPredict, just add the switch transform=T to the function and you will get scores that match the factor covariance matrix. That is the simplest way to get scores that match the factor covariance matrix.

Rita Pereira

unread,
8:07 AM (2 hours ago) 8:07 AM
to lavaan
Hi Edward, 

Many thanks for your suggestion. Unfortunately adding the switch transform=TRUE in lavPredict did not increase the R-squared in any of my specifications (not even a decimal). Let me know if you have any other ideas, I paste my updated code. 

Many thanks, 
Rita


  #Sem no outcome
  model_latent_only <- '
  PGI_latent2 =~  pgs_ea_child_ukb_std + pgs_ea_child_23me_std
'
  fit_latent_only <- sem(model_latent_only, data = data_alspac_R)
  summary(fit_latent_only, fit.measures = TRUE, standardized = FALSE, rsquare = TRUE)
 
  #lavaan
  latent_scores_2 <- lavPredict(fit_latent_only, transform = TRUE)

  lm_2 <- lm(data_alspac_R$ks4_avg ~ latent_scores_2)
  summary(lm_2)
  #16.41 stayed the same
 
  #regression method
  latent_scores_2reg <-lavPredict(fit_latent_only, method = "regression", transform = TRUE)

  lm_2reg <- lm(data_alspac_R$ks4_avg ~ latent_scores_2reg)
  summary(lm_2reg)
  #16.41 stayed the same
 
  #barlett
  latent_scores_2b <-lavPredict(fit_latent_only, method = "Bartlett", transform=TRUE)

  lm_2b <- lm(data_alspac_R$ks4_avg ~ latent_scores_2b)
  summary(lm_2b)
  #16.41, same
 
  #croon's correction
  #method 1
  fit_sam <- lavaan:::sam(model_latent_only, data = data_alspac_R, sam.method = "local")
  summary(fit_sam, standardized = TRUE)
  latent_scores_croon <- lavPredict(fit_sam, method = "regression", transform = TRUE)

  head(latent_scores_croon)
  data_alspac_R$PGI_latent2_croon <- latent_scores_croon[, "PGI_latent2"]
  lm_2croon <- lm(ks4_avg ~ PGI_latent2_croon, data = data_alspac_R)
  summary(lm_2croon)
  #16.55
  #16.55 stayed the same 


Reply all
Reply to author
Forward
0 new messages