Hello!
I'm trying to use SEM to predict latent variables which I would live to use in a standard OLS.
The goal is to use two noisy proxies for each latent variables, as a measurement error correction technique. The proxies are PGIs and the latent variable is the true genetic effect of the individual, mom and dad. I am mostly interested in obtaining the correct R-squared.
For simplicity I am tackling the case of only one PGI - the individual. I have tried predicting the latent variable after a SEM with the outcome.
outcome <- PGI_latent
PGI_latent <- pgi_1 pgi_2 (R-squared of outcome ~24% )
predict PGI_latent
reg outcome PGI_latent (result: overfitting of R-squared ~30% )
Other alternative is to only use the PGIs to predict the outcome. This avoids overfitting however, regardless of the correction I make (Bartlett or Croons), I get consistently lower R-squares of about 16%. (coefficients are unbiased compared with SEM)
PGI_latent <- pgi_1 pgi_2
predict PGI_latent (regression, bartlett, croon)
reg outcome PGI_latent (result: R-squared ~16% )
Is there any way to correct both the coefficients and the R-squared in factor score regression? In the simple case and in the more complex case with 3 latent variables?
Thank you so much :)
Rita
Below my code:
#Sem full model model
model_full <- '
# measurement part
PGI_latent =~ pgs_ea_child_ukb_std + pgs_ea_child_23me_std
# structural part
ks4_avg ~ PGI_latent
'
fit_full <- sem(model_full, data = data_alspac_R)
summary(fit_full, fit.measures = TRUE, standardized = FALSE, rsquare = TRUE)
#r-squared 23
#Sem no outcome
model_latent_only <- '
PGI_latent2 =~ pgs_ea_child_ukb_std + pgs_ea_child_23me_std
'
fit_latent_only <- sem(model_latent_only, data = data_alspac_R)
summary(fit_latent_only, fit.measures = TRUE, standardized = FALSE, rsquare = TRUE)
#lavaan
latent_scores_2 <- lavPredict(fit_latent_only)
lm_2 <- lm(data_alspac_R$ks4_avg ~ latent_scores_2)
summary(lm_2)
#R squared 16.41
#regression method
latent_scores_2reg <-lavPredict(fit_latent_only, method = "regression")
lm_2reg <- lm(data_alspac_R$ks4_avg ~ latent_scores_2reg)
summary(lm_2reg)
#R squared 16.41
#barlett
latent_scores_2b <-lavPredict(fit_latent_only, method = "Bartlett")
lm_2b <- lm(data_alspac_R$ks4_avg ~ latent_scores_2b)
summary(lm_2b)
#R squared 16.41
#croon's correction
#method 1
fit_sam <- lavaan:::sam(model_latent_only, data = data_alspac_R, sam.method = "local")
summary(fit_sam, standardized = TRUE)
latent_scores_croon <- lavPredict(fit_sam, method = "regression")
head(latent_scores_croon)
data_alspac_R$PGI_latent2_croon <- latent_scores_croon[, "PGI_latent2"]
lm_2croon <- lm(ks4_avg ~ PGI_latent2_croon, data = data_alspac_R)
summary(lm_2croon)
#16.55