Hi everyone!
After searching the internet and the lavaan group for a while, I had to conclude I wasn’t
going to find the answer to this question. I can
imagine that other people have had or will have this same one. It’s a bit of a
lengthy text, so my apologies for this. I tried to highlight the key bits of the issue as well as I could. My two-part question in short: how can a) (non-standardised)
scales of latent variables (with factor loadings of first indicator set to 1) be interpreted, and b) what information does lavPredict() use to get factor scores?
I have run a longitudinal SEM (a cross-lagged panel model, N=4389) with a mix of latent variables (with ordinal indicators) and observed continuous variables. I have a running model without errors (and with good fit) in lavaan, with equality constraints on factor loadings, intercepts, thresholds and regression coefficients. Now I want to take the coefficients from the model and use them to make some very simple simulation runs.
However, I found that I don’t understand two important things. The first is how the scales of the latent variables should be interpreted. I fixed the factor loadings of the first indicators at 1, which should mean that this provides the latent variables with their scales – but they don’t have the same scale as this first indicator, I have found. For example, the (five) indicators of the first latent variable are all on a scale from 0 to 5, with an average somewhere between about 3 and 4. The latent variable, estimated with lavPredict() (method=”EBM”, type=”lv”), seems to have a minimum of about -3 and maximum of about 3.5, with a mean of about 0.28. This is quite stable over different time points.
How can the scale of a latent variable then be interpreted when it is determined by the first indicator, which is in this case ordinal? It should be relative to this first indicator somehow, but I can’t quite work out in what way exactly. Is it just that because the latent variable is continuous, it does not fit itself exactly within that six-point scale, and parts of the tails of the distribution curve will just land outside it? And how are the factor scores interpreted – what are they relative to? If an individual in my data scores a -1 on the LV, for example, relative to what should I interpret that -1?
The second thing I have a question about concerns the lavPredict() function and how it works. I have ordinal indicators, so I have to use either ML or EBM. They give me vastly different output, however. I decided the ranges given by EMB made much more sense, but I don't know if that's a valid conclusion. ML gives highly fluctuating estimates (-680 to 435, with mean 10.34 in the first year; -11 to 54, mean 0.58 in the last year) that don’t really seem to make any sense. Is there any apparent reason why this would be the case? And is simply using EBM (as this seems to work) a good way to go?
Another thing is that EMB computes the latent variables scores for all cases, even for cases that have zero data on all the indicators for a particular LV. That made little sense to me, and made me think about how it actually is that EBM produces its factor scores. Is any of the original data on the observed indicators used to obtain the factor scores? Or does lavPredict() work in some entirely different way, using just the fitted model?
Thanks in advance for any answers to my questions, they have been bugging me for a bit now. Again, apologies for the lengthy text.
Possibly superfluous example code (sorry if I made any typos; I did not run this exact one, of course):
model.1 <- '
LV2015 =~ 1*ov1_2015 + lmbd1*ov2_2015 + lmbd2*ov3_2015 + lmbd3*ov4_2015 + lmbd4*ov5_2015
LV2016 =~ 1*ov1_2016 + lmbd1*ov2_2016 + lmbd2*ov3_2016 + lmbd3*ov4_2016 + lmbd4*ov5_2016
LV2017 =~ 1*ov1_2017 + lmbd1*ov2_2017 + lmbd2*ov3_2017 + lmbd3*ov4_2017 + lmbd4*ov5_2017
LV2018 =~ 1*ov1_2018 + lmbd1*ov2_2018 + lmbd2*ov3_2018 + lmbd3*ov4_2018 + lmbd4*ov5_2018
LV2019 =~ 1*ov1_2019 + lmbd1*ov2_2019 + lmbd2*ov3_2019 + lmbd3*ov4_2019 + lmbd4*ov5_2019
#####################################
#
# Parts omitted from this example code:
# [Insert
# Second latent variable;
# Covariances between same indicators across time;
# Equality constraints on LV intercepts and thresholds;
# Structural model
# about here]
#
#####################################
'
fit.model.1 <- sem(model.1, data=dataset, estimator="WLSMV", missing="pairwise", parameterization="delta",
ordered=c("ov1_2015", "ov1_2016", "ov1_2017", "ov1_2018", "ov1_2019",
"ov2_2015", "ov2_2016", "ov2_2017", "ov2_2018", "ov2_2019",
"ov3_2015", "ov3_2016", "ov3_2017", "ov3_2018", "ov3_2019",
"ov4_2015", "ov4_2016", "ov4_2017", "ov4_2018", "ov4_2019",
"ov5_2015", "ov5_2016", "ov5_2017", "ov5_2018", "ov5_2019"))
factor.scores.1 <- lavPredict(fit.model.1, type="lv", method="EBM")
factor.scores.1
summary(factor.scores.1)
how can a) (non-standardised) scales of latent variables (with factor loadings of first indicator set to 1) be interpreted,
How can the scale of a latent variable then be interpreted when it is determined by the first indicator, which is in this case ordinal?
what information does lavPredict() use to get factor scores? Is any of the original data on the observed indicators used to obtain the factor scores?