The most defensible way to save factor scores for future use? And does sam help with that?

824 views
Skip to first unread message

Toni Saari

unread,
Aug 23, 2023, 4:02:48 AM8/23/23
to lavaan
Hey all,

I have a few questions regarding the saving of factor scores for future study purposes, all of which are not yet even known. This might sound a bit odd, so here is the context. We are collecting cognitive test data along with a lot of other data. Our cognitive test battery should fall represent three larger domains with 3-5 tests/indicators per domain. Instead of using individual tests as our variables of interest, we would like to use factor scores instead to avoid multiple testing issues and to increase the reliability of our measures. Often, studies similar to ours derive z-scores from either previous data of the same participants or some normative reference values, and then average the z-scores of tests belonging to each domain to get a domain z-score. We do not have such previous data or good normative data for calculating z-scores, so we thought factor scores would be the next best option. Most likely we would use these factor scores as both independent and dependent variables in various regression scenarios, and the task now is to get the best possible factor scores into the data set for future use. To make matters a bit more tricky, our observations are not completely independent so we have to use clustered standard errors which can be implemented in the estimatr package. 

In Devlieger et al. (2016; 10.1177/0013164415607618), regression FSR, Bartlett FSR, the bias avoiding method of Skrondal & Laake (2001), the bias correcting method of Croon (2002) and SEM were compared. The result in my understanding was that both regression and Bartlett were biased, and, outside of SEM, the best option would be the Croon bias correcting method with the new SE developed in the paper. 

However, as I understand it, to get the factor scores in lavaan, I would use lavPredict function on a lavaan object, such as a CFA of our cognitive test model? lavPredict offers Bartlett and regression as the methods, which seem to have their own problems. The silver lining in the Devlieger et al. paper was that if the factor score indeterminacy is low, the differences between factor score regression methods will be low and I might get comparable results with Bartlett and regression options, too.

1) My first question is how to analyze factor score indeterminacy to see if it is low?

I also read the paper describing the background and use of the new sam function (Rosseel & Loh, 2022; 0.1037/met0000503) and it again emphasizes that conventional factor score regression approaches are not ideal. I saw Widaman & Revelle use the sam function in their Thinking thrice about sum scores paper (2022; 10.3758/s13428-022-01849-w) and they describe it as such: "We used the “local” option with the default ML mapping matrix in the sam fitting function (see Rosseel & Loh, 2021), which is equivalent to the Skrondal and Laake approach with Croon’s correction and the recently derived SEs of parameter estimates."

Corresponding to this in their code:

meth_03b = '
         f1 =~ r06_paracomp + r07_sentcomp + r09_wordmean
         f2 =~ r10_addition + r12_countdot + r13_sccaps

         f1 ~  school
         f2 ~  school
         '
fitm03b = sam(model = meth_03b, data=all, sam.method = 'global', estimator = 'ML')
summary               (fitm03b, fit.measures = TRUE, ci = TRUE)
standardizedSolution  (fitm03b, type = 'std.nox')   # like STDY  in Mplus
standardizedSolution  (fitm03b, type = 'std.all')   # like STDYX in Mplus

2) My second question is two-fold: is the use of the sam function in this manner really equivalent to Skrondal & Laake with Croon correction and the new SEs? If so, is there any way to get factor scores using the Croon correction out of the sam function using just the measurement part? I have read the sam function's description and tried it with Holzinger-Swineford data but I feel like I do not really understand what happens under the hood.

So in all, I am looking for a way to get the best-performing factor scores for future use when not all scenarios of their use are yet even known. With lavPredict I have Bartlett and regression as the options and their performance in the simulation studies seem a bit discouraging. Furthermore, not everyone using these factor scores might be even using R, so the goal is to compute the factor scores in the data set that can be used later on in other programs, too. 

Yves Rosseel

unread,
Aug 29, 2023, 9:26:41 AM8/29/23
to lav...@googlegroups.com
Hello Toni,

On 8/23/23 10:02, Toni Saari wrote:
> the task now is to get
> the best possible factor scores into the data set for future use.

I would recommend to fit a CFA with all factors together, and then use
lavPredict() to compute factor scores but with the (new) transform =
TRUE argument. For example:

library(lavaan)
example(cfa)
FS <- lavPredict(fit, transform = TRUE)
round(cor(FS), 3)
# compare to:
lavInspect(fit, "cor.lv")

This will 'transform' the (by default 'regression') factor scores in
such a way that the variance-covariance matrix of the factor scores will
coincide with the model-implied variance-covariance matrix of the latent
variables that you get from the CFA fit. In theory, when you use these
transformed factor scores later in a regression, you should get unbiased
point estimates (although the standard errors will be underestimated, as
they the ignore the uncertainty that stems from the CFA model).

> In Devlieger et al. (2016; 10.1177/0013164415607618), regression FSR,
> Bartlett FSR, the bias avoiding method of Skrondal & Laake (2001), the
> bias correcting method of Croon (2002) and SEM were compared. The result
> in my understanding was that both regression and Bartlett were biased,
> and, outside of SEM, the best option would be the Croon bias correcting
> method with the new SE developed in the paper.

Indeed. Although the SE formula that we described in that 2016 paper was
somewhat limited. The better version is reported in the SAM paper (and
is also what lavaan is using in the sam() function).

> 1) My first question is how to analyze factor score indeterminacy to see
> if it is low?

If you look at the standardized factor loadings, then factor score
indeterminacy will be low if the standardized factor loadings are close
to 1.

> I saw Widaman & Revelle use the sam function in their Thinking
> thrice about sum scores paper (2022; 10.3758/s13428-022-01849-w) and
> they describe it as such: /"We used the “local” option with the default
> ML mapping matrix in the sam fitting function (see Rosseel & Loh, 2021),
> which is equivalent to the Skrondal and Laake approach with Croon’s
> correction and the recently derived SEs of parameter estimates."/

I am not sure what Widaman & Revelle mean with "the Skrondal and Laake
approach". If they just refer to the generic idea of using factor scores
in a regression, all is fine. But if they think that sam() uses Bartlett
factor scores for 'endogenous' factors, and regression factor scores for
'exogenous' factors, then they are mistaken. The sam() function will in
fact not even compute factor scores (explicitly). Local sam directly
computes an estimate of the variance-covariance matrix of the latent
variables, and uses that as input for a path-analysis or regression
analysis in a second step.

> fitm03b = sam(model = meth_03b, data=all, sam.method = 'global',
> estimator = 'ML')

Note that this is global sam, not local sam! Global sam first fits the
CFA part, then plugs in all the parameters in the full model, and then
estimates the remaining parameters of the structural part while keeping
the measurement parameters fixed.

> 2) My second question is two-fold: is the use of the sam function in
> this manner really equivalent to Skrondal & Laake with Croon correction
> and the new SEs?

No, that is a misunderstanding. The sam() function does not work with
factor scores... It works with 'summary statistics' (i.e., the mean
vector and especially the variance-covariance matrix) of the latent
variables.

> If so, is there any way to get factor scores using the
> Croon correction out of the sam function using just the measurement
> part?

No.

But you can follow the method I described at the beginning to get what
(I believe) you want: fit a CFA model with all factors included, and
then use lavPredict(fit, transform = TRUE).


Yves.

Toni Saari

unread,
Sep 8, 2023, 5:46:32 AM9/8/23
to lavaan
Thank you Yves for the comprehensive answer and all the work you have done!

I was a bit occupied with other work but finally got around to testing this bit of code today. This definitely helped me a lot!

It is nice to hear that using the lavPredict function with these options gets me to my goal using CFA. I have seen some people save factor scores from EFA, too (I guess it is an option in SPSS, heh). I imagine this would be possible with lavaan by using the efa function and having output="lavaan"?

This is a kind of an aside, but I was wondering what does the "Empirical Bayes Modal" approach do in lavPredict? As I understand it, it is the default for lavPredict and when testing with Holzinger-Swineford data, it returns the same result as method="regression".

Regarding sam, I have to correct my earlier post regarding the Widaman & Revelle analysis using the "global" option as they ran both local and global analyses and I copied the wrong part.

Here is the code for their analysis using the local option (full code: https://osf.io/swef8):
#-------------------------------------------------------------------------------------------------#
#--- Method 3: Factor score regression with Bartlett and Croon's correction on ROUNDED scores ----#
#---           Using the LOCAL option with the SAM fitting function                               #
#-------------------------------------------------------------------------------------------------#
meth_03aa = '

         f1 =~ r06_paracomp + r07_sentcomp + r09_wordmean
         f2 =~ r10_addition + r12_countdot + r13_sccaps

         f1 ~  school
         f2 ~  school
         '
fitm03aa = sam(model = meth_03aa, data=all, sam.method = 'local', estimator = 'MLR')

But I believe this does not detract from your general point about what the sam function does. It is nice to hear more details about sam, definitely need to keep that in mind for more specific future analyses.
Reply all
Reply to author
Forward
0 new messages