Practical Significance of Longitudinal Invariance Violation

45 views

CFAClinicalRelevanceLongitudinalcategoricaleffectinvariancelavTestscorestandardized

Skip to first unread message

Gerard Flens

unread,

Jul 27, 2018, 10:53:46 AM7/27/18

to lavaan

Dear Lavaan users,

I’m evaluating longitudinal measurement invariance in a Depression item bank with a factor structure consisting of 2 time points and ordered-categorial indicators. The results showed that equality constraints on thresholds where not tenable. As a next step, I want to assess the practical significance of the invariance violation for clinical practice and I can’t seem to decide whether my method makes any sense. Could you please provide some feedback and/or suggestions?

Liu et al. (2017; see below) proposed to evaluate the practical significance of an invariance violation by comparing the model-predicted probabilities of choosing specific response categories for each item at specific measurement occasions, based on two models with distinct levels of invariance constraints. With this item (response) based method, however, it may be unclear how changes in model-predicted item responses affect the scores that are used in clinical practice: the latent trait scores (i.e., the factor scores). Therefore, I want to estimate the effect of an invariance violation for clinical practice directly using the model-predicted factor scores. Specifically, I want to compare the person estimated factor scores between the factor loading invariance model and the threshold invariance model. These comparisons are performed (1) separately for the pre- and post-measurement, and (2) at the group level as well as the individual level. Does this make sense in the case of longitudinal CFA with ordered-categorial indicators?
To compare the factor scores between different invariance models, I assume that the scores need to be estimated on the same standardized common metric for pre- and post-measurement and for both models. Can I achieve this by fixing the latent factor mean of the pre-measurement of each model to a mean of 0 and a SD of 1? If not, what other method can I use to interpret the factor scores of both pre- and post-measurement on the same standardized common metric for both models?
As input for the invariance models, I want to use duplicated response patterns for pre- and post measurement, because then both measuruments are evaluated under the same circumstances. Using these response patterns to estimate the factor scores, can I simply interpret group differences between models based on regular effect size interpretations: a difference in factor scores of 0.2 = small effect, 0.5 = medium effect, 0.8 = large effect... Or should my raw pre- and post-measurement data be normally distrubuted (which they aren’t) to do this? Alternatively, should I use Cohen’s d?

I hopes this makes any sense. All feedback is welcome!

Kind regards,

Gerard Flens

Liu, Y., Millsap, R. E., West, S. G., Tein, J. Y., Tanaka, R., & Grimm, K. J. (2017). Testing measurement invariance in longitudinal data with ordered-categorical measures. Psychological methods, 22, 486.

Terrence Jorgensen

unread,

Jul 29, 2018, 11:02:51 PM7/29/18

to lavaan

Hi Gerard,

It doesn't sound like these questions are about lavaan, so I would recommend posting this question on SEMNET, where there are many more experts (also IRT folks) who could have valuable input.

http://www2.gsu.edu/~mkteer/semnet.html

I would, however, recommend reading about Wu & Estabrook's alternative advice for testing invariance with categorical indicators, which is based on a more thorough consideration of the issues associated with identification constraints (i.e., how mean and covariance structure parameters are both intimately tied up with threshold specification.

https://doi.org/10.1007/s11336-016-9506-0

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam