Dear Lavaan users,
I’m evaluating longitudinal measurement invariance in a
Depression item bank with a factor structure consisting of 2 time points and
ordered-categorial indicators. The results showed that equality constraints on
thresholds where not tenable. As a next step, I want to assess the practical
significance of the invariance violation for clinical practice and I can’t seem
to decide whether my method makes any sense. Could you please provide some
feedback and/or suggestions?
Liu et al. (2017; see below) proposed to evaluate the
practical significance of an invariance violation by comparing the model-predicted
probabilities of choosing specific response categories for each item at
specific measurement occasions, based on two models with distinct levels of
invariance constraints. With this item (response) based method, however, it may
be unclear how changes in model-predicted item responses affect the scores that are used in clinical practice: the latent trait scores (i.e., the factor
scores). Therefore, I want to estimate the effect of an invariance violation
for clinical practice directly using the model-predicted factor scores. Specifically,
I want to compare the person estimated factor scores between the factor loading
invariance model and the threshold invariance model. These comparisons are performed
(1) separately for the pre- and post-measurement, and (2) at the group level as
well as the individual level. Does this make sense in the case of longitudinal CFA with ordered-categorial indicators?
To compare the factor scores between different
invariance models, I assume that the scores need to be estimated on the same standardized
common metric for pre- and post-measurement and for both models. Can I achieve this
by fixing the latent factor mean of the pre-measurement of each model to a mean of 0 and a SD of 1? If not, what other method can I use to interpret
the factor scores of both pre- and post-measurement on the same standardized
common metric for both models?
As input for the invariance models, I want to use
duplicated response patterns for pre- and post measurement, because then both
measuruments are evaluated under the same circumstances. Using these response
patterns to estimate the factor scores, can I simply interpret group differences
between models based on regular effect size interpretations: a difference in factor
scores of 0.2 = small effect, 0.5 = medium effect, 0.8 = large effect... Or
should my raw pre- and post-measurement data be normally distrubuted (which
they aren’t) to do this? Alternatively, should I use Cohen’s d?
I hopes this makes any sense. All feedback is welcome!
Kind regards,
Gerard Flens
Liu,
Y., Millsap, R. E., West, S. G., Tein, J. Y., Tanaka, R., & Grimm, K. J.
(2017). Testing measurement invariance in longitudinal data with
ordered-categorical measures. Psychological
methods, 22, 486.