Longitudinal GRM: Is three-dimensional scalar invariance model a viable alternative when bifactorial modelling fails?

18 views

Skip to first unread message

Martin Lacher

unread,

Mar 15, 2026, 11:59:30 AMMar 15

to mirt-package

Hi there,

I am working on a longitudinal IRT model for a quasi-experimental pre-post-followup design with polytomous items (Graded Response Model) and would appreciate some methodological feedback from the community. Please note that my expertise in this area is fairly limited, and both the analyses and the code below were developed with the assistance of Claude Sonnet 4.6 / Opus 4.6 - so I would particularly welcome any corrections or improvements from more experienced statician.

Study design

N = 247 students, 3 groups (2 intervention groups + control), 17 classes
3 measurement occasions (Pre, Post, Followup) with identical items across all timepoints
25 polytomous items, modeled with GRM

TL;DR

The bfactor() approach recommended for longitudinal models produced unstable

covariance matrices (negative Cov(WPR,WPO) = -0.33) despite acceptable fit indices.
As a fallback, I used mirt() with three timepoint dimensions and scalar invariance
constraints, which converged stably (RMSEA = 0.046, CFI = 0.945, plausible covariance
structure). The main difference: mirt() does not model item-specific residual factors
across timepoints. Is this a methodologically acceptable substitute? Details and
full code below.

MY QUESTIONS

Is this mirt() approach methodologically acceptable as a substitute for the bfactor() approach when the bifactor model fails to converge stably?
The key difference I see is that bfactor() explicitly models item-specific residual factors capturing the dependency of the same item across timepoints, whereas the mirt() approach relies solely on the covariance structure between timepoint dimensions to capture longitudinal dependencies. How severe is this omission in practice, particularly with N approx. 250?
Does anyone see fundamental flaws in the mirt() scalar invariance approach for this design that would invalidate the growth parameter estimates (mu_WPO, mu_WFU)?
Are there other approaches within mirt that might handle this situation better - for example, using multipleGroup() with timepoint as the grouping variable, or any other specification?

Details and full code for both approaches below for those interested.

ORIGINAL APPROACH: bfactor() WITH SCALAR INVARIANCE

Following the recommendation in the mirt documentation for longitudinal models, I started with a bifactor approach using bfactor(), where specific factors S1 ... S25 capture the item-level dependencies across timepoints, and three general factors (WPR, WPO, WFU) represent the latent ability at each timepoint:

Items 1-25 load on WPR (general factor, Pre)
Items 26-50 load on WPO (general factor, Post)
Items 51-75 load on WFU (general factor, Followup)
Each item triplet (e.g., items 1, 26, 51) loads on a common specific factor Si
Scalar invariance: slopes (a1) and all thresholds (d1, d2, ...) constrained to be equal across the three item instances of each specific factor
WPR variance fixed to 1, WPO and WFU variances and all covariances freely estimated
WPO and WFU latent means freely estimated (WPR mean fixed to 0)

# CODE #

itemloadings <- rep(1:25, times = 3)

n_items <- 25
n_cats <- sapply(dat_items[, 1:n_items], function(x) length(unique(na.omit(x))))

constrain_scalar <- paste0(
c(
sapply(1:n_items, function(i)
sprintf("(%d,%d,%d, a1)", i, i+n_items, i+2*n_items)),
unlist(sapply(1:n_items, function(i) {
n_d <- n_cats[i] - 1
sapply(1:n_d, function(d)
sprintf("(%d,%d,%d, d%d)", i, i+n_items, i+2*n_items, d))
}))
),
collapse = ", "
)

# The above generates the following constraint string
# (slopes a1 for all 25 items, then thresholds d1/d2/... per item
# depending on number of response categories):
#
# CONSTRAIN = (1,26,51, a1), (2,27,52, a1), (3,28,53, a1), ..., (25,50,75, a1),
# (1,26,51, d1), (2,27,52, d1), (3,28,53, d1), (4,29,54, d1),
# (5,30,55, d1), (5,30,55, d2),
# (6,31,56, d1), (6,31,56, d2),
# ...
# (23,48,73, d1), (23,48,73, d2), (23,48,73, d3), (23,48,73, d4),
# (24,49,74, d1), (24,49,74, d2),
# (25,50,75, d1), (25,50,75, d2), (25,50,75, d3)
#
# Reading example: (1,26,51, a1) means that the slope (a1) of item 1 (Pre),
# item 26 (Post) and item 51 (Followup) - which all represent the same
# original item - are constrained to be equal.
# (5,30,55, d2) means the second threshold of that item triplet is constrained.

model_bifactor <- paste0(
'WPR = 1-25
WPO = 26-50
WFU = 51-75
COV = WPR*WPO*WFU, WPO*WPO, WFU*WFU
MEAN = WPO, WFU
CONSTRAIN = ', constrain_scalar)

mod_bifactor <- bfactor(dat_items, itemloadings, model_bifactor,
itemtype = "graded",
QMC = TRUE,
TOL = 1e-3)

# END CODE #

This approach consistently produced non-positive definite latent trait covariance matrices and implausible negative covariances between timepoints (e.g., Cov(WPR, WPO) = -0.33), even with all 25 items. The model fit indices were acceptable (RMSEA = 0.049, CFI = 0.941), but the covariance structure was clearly unstable.

ALTERNATIVE APPROACH: mirt() WITH THREE DIMENSIONS AND SCALAR INVARIANCE

As an alternative, I specified a three-dimensional MIRT model where each timepoint is a separate dimension, without item-specific residual factors:

Dimension WPR: items 1-25 (Pre)
Dimension WPO: items 26-50 (Post)
Dimension WFU: items 51-75 (Followup)
Scalar invariance: for each item i, the slope a1 and all thresholds d1, d2, ... are constrained to be equal across items i, i+25, and i+50
All three pairwise covariances freely estimated: Cov(WPR,WPO), Cov(WPR,WFU), Cov(WPO,WFU)
WPR variance fixed to 1, WPO and WFU variances also fixed to 1 (required for identification)
WPO and WFU latent means freely estimated (WPR mean fixed to 0)
Estimated using MHRM due to stability issues with the EM algorithm

# CODE #

# Same constraint generation as above (constrain_scalar), then:

model_scalar <- paste0(
'WPR = 1-25\n',
'WPO = 26-50\n',
'WFU = 51-75\n',
'COV = WPR*WPO, WPR*WFU, WPO*WFU\n',
'MEAN = WPO, WFU\n',
'CONSTRAIN = ', constrain_scalar)

mod_longitudinal <- mirt(dat_items,
model = mirt.model(model_scalar),
itemtype = "graded",
method = "MHRM",
TOL = 1e-3)

# END CODE #

This model converged stably with a plausible covariance structure (all positive,
Cov(WPO,WFU) = 0.868) and acceptable fit (RMSEA = 0.046, CFI = 0.945).
Growth parameters were plausible (mu_WPO = 0.951, mu_WFU = 1.039).

The WPR covariances with later timepoints were slightly negative (Cov(WPR,WPO) = -0.057), which I attribute to floor effects at Pre (many items had solution rates below 10% at WPR) rather than model misspecification.

Any feedback, criticism or alternative suggestions would be greatly appreciated!

Thanks a lot and best regards,
Martin

Reply all

Reply to author

Forward

0 new messages