Computing fit statistics for bifactor model

603 views
Skip to first unread message

Sabeeh Baig

unread,
Mar 22, 2017, 11:43:41 PM3/22/17
to mirt-p...@googlegroups.com
Hi y'all!

I'm currently analyzing data on a short scale (3 items). It was administered 3 times in a repeated-measures design. So, I've setup a item bifactor model to analyze the data following advice from Li Cai's recent publications on two-tier models. Here's the code that setups up my model:

s1.rm.fact <- 'Time1 = 1-3
                    Time2 = 4-6
                    Time3 = 7-9
                    COV = Time1*Time2, Time1*Time3, Time2*Time3
                    MEAN = Time2, Time3
'
itemloadings <- rep(1:3, times = 3)

## Construct constraints
### Obtain starting values
start.vals <- bfactor(s1, itemloadings, s1.rm.fact, pars = "values")

### Setup within repeated-measures constraints
wrmconstr <- start.vals$parnum[(start.vals$name == "a1" | start.vals$name == "a2" | start.vals$name == "a3") & start.vals$est]

### Create constraint list
constraints <- list()
itemnames <- colnames(s1)
pick <- c(0, 3, 6)
for(i in 1:3){

### Impose across time constraints
    constraints[[paste0('slope.', i)]] <- start.vals$parnum[start.vals$name == paste0('a', 3+i) & start.vals$est]
    for(j in 1:4){
        constraints[[paste0('intercept.', i, '_', j)]] <-
            start.vals$parnum[start.vals$name == paste0('d',j) & (start.vals$item %in% itemnames[pick + i]) & start.vals$est]
    }

    constraints[[paste0('Time.', i)]] <- wrmconstr[pick + i]
    
}

### Test of bifactor model
s1.rm.fit <- bfactor(s1, itemloadings, s1.rm.fact, constrain = constraints, optimizer = "nlminb")
summary(s1.rm.fit)

When I estimate the model, I get this warning, which I know can be problematic:

Iteration: 6, Log-Lik: -7805.354, Max-Change: 0.54475Warning message:
Latent trait variance-covariance matrix became non-positive definite. 

The model looks fine and corresponds to other (older) results from an SEM framework. When I try to compute M2 statistics, I get this:

Error in qr.default(delta) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
  NaNs produced
2: In cov2cor(E2 - outer(E1, E1)) :
  diag(.) had 0 or NA entries; non-finite result is doubtful

And when I attempt to compute the S-X2 statistics, I get this:

Error in if (nc == 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
  NaNs produced

And I've also gotten this as well:

Error in On[j, ] : subscript out of bounds
In addition: Warning message:
In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
  NaNs produced

I've never run into these kinds of issues. As far as I understand, these statistics are suitable for item bifactor/high-dimensional models for the time being. They need to be improved, but these errors seem to suggest that something else is going on. Also, I've dropped cases with missingness. And the sample size is above 1,000.

I'd appreciate any help!

Cheers!!!

Sabeeh

Phil Chalmers

unread,
Mar 23, 2017, 4:24:05 AM3/23/17
to Sabeeh Baig, mirt-package
This is bad news here. The model was unstable and the latent covariance matrix became npd, which caused the estimation to terminate fatally. I could try to further robustify the estimation, but it looks like this model is just barely identified anyway (three indicators per factor is the absolute minimum identification constraints). If you send along some data then I could take a look to see if the problem could at least be salvaged to return something more reasonable. 
 

The model looks fine and corresponds to other (older) results from an SEM framework. When I try to compute M2 statistics, I get this:

Error in qr.default(delta) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
  NaNs produced
2: In cov2cor(E2 - outer(E1, E1)) :
  diag(.) had 0 or NA entries; non-finite result is doubtful

These are all just a consequence of the above problem. 
 

And when I attempt to computer the S-X2 statistics, I get this:

Error in if (nc == 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
  NaNs produced

And I've also gotten this as well:

Error in On[j, ] : subscript out of bounds
In addition: Warning message:
In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
  NaNs produced

I've never run into these kinds of issues. As far as I understand, these statistics are suitable for item bifactor/high-dimensional models for the time being. They need to be improved, but these errors seem to suggest that something else is going on. Also, I've dropped cases with missingness. And the sample size is above 1,000.

I'd appreciate any help!

Cheers!!!

Sabeeh

--
You received this message because you are subscribed to the Google Groups "mirt-package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sabeeh Baig

unread,
Mar 23, 2017, 11:11:46 AM3/23/17
to mirt-package, sabeeh...@gmail.com
I'm not at liberty to share any data, unfortunately. What would you generally suggest for "robustifying"? I can estimate the model within an SEM framework using lavaan, but the retrieval IRT parameters from that model is just cumbersome. There are also some other issues.
To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package...@googlegroups.com.

Phil Chalmers

unread,
Mar 23, 2017, 11:14:03 AM3/23/17
to Sabeeh Baig, mirt-package
If you can generate data with the simdata() function using the same model specification then you can send that along instead. All that really matters is that the issue is reproducible on my end so I can track it down. HTH.

Phil

To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package+unsubscribe@googlegroups.com.

Sabeeh Baig

unread,
Mar 23, 2017, 11:31:42 AM3/23/17
to mirt-package, sabeeh...@gmail.com
Ahh, got it! Is there a way I can simulate with simdata() without having to manually specific the vectors/matrices for slope and intercepts? Could I simulate directly from the mirt object even if it hasn't converged?

Sabeeh

Phil Chalmers

unread,
Mar 23, 2017, 11:40:52 AM3/23/17
to Sabeeh Baig, mirt-package
You should be able to simulate from your model, unless it's horribly broken (which it sounds like it could be given a NPD solution); Maybe terminate the model on the first iteration with TOL = NA, and supplying that to simdata(). See the documentation for the inputs.

Phil

To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package+unsubscribe@googlegroups.com.
Message has been deleted
Message has been deleted
Message has been deleted

Sabeeh Baig

unread,
Mar 29, 2017, 7:17:35 PM3/29/17
to mirt-package, sabeeh...@gmail.com
Hi Phil,

Here's some simulated data. I fit a graded response item bifactor model on the data, drew factor scores from the model, then simulated data from the model for those factor scores. The model has no constraints and all factors are orthogonal. There are 3 repeated-measures as reflected in the column names. That's the first dataset. The second is from another dataset with 7 repeated-measures.

Maybe you can recreate the problem on your end with these. I'm still having issues with this.

Sabeeh
sim_3rm.txt
sim_7rm.txt

Phil Chalmers

unread,
Mar 30, 2017, 6:38:38 AM3/30/17
to Sabeeh Baig, mirt-package
Please provide an R script so that I can reproduce your analysis problem.

Phil

To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package+unsubscribe@googlegroups.com.

Sabeeh Baig

unread,
Mar 31, 2017, 7:14:10 PM3/31/17
to mirt-package, sabeeh...@gmail.com
Here's a script that I've attached. You'll need to add in a line for reading data. This only has the models for the data with 3 repeated-measures. I think you can focus on that because I've done limited analyses not the larger dataset anyway.
syntax_irt copy.Rnw

Phil Chalmers

unread,
Apr 2, 2017, 2:23:40 PM4/2/17
to Sabeeh Baig, mirt-package
This portion: 

```
s2.rm.fact <- 'Time1 = 1-3
                    Time2 = 4-6
                    Time3 = 7-9
                    Time4 = 10-12
                    Time5 = 13-15
                    Time6 = 16-18
                    Time7 = 19-21
'
itemloadings <- rep(1:3, times = 7)

test2 <- bfactor(s2, itemloadings, s2.rm.fact, iterations = 100)
```

is a very bad idea. Each of these factors are only just identified, and the number of dimensions is 8 --- well beyond what any standard numerical integration method can manage. In fact, even your first model doesn't converge within 500 EM cycles, which is a good sign it isn't identified. Regarding the fit statistics, the dev version has an improved QMC approach which will be shipped with the next major version.

Phil

To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package+unsubscribe@googlegroups.com.

Sabeeh Baig

unread,
Apr 2, 2017, 2:42:11 PM4/2/17
to mirt-p...@googlegroups.com, sabeeh...@gmail.com
Thank you! If that's the case, then what do you suggest in terms of analyzing these data within an IRT framework? This is a repeated-measures design, so there's local dependence between the multiple assessments of the scale. Is this a limitation of the IRT framework or something else. I've done this in CCFA. The reason I want to switch to IRT is because retrieval of item parameters and IRT curves from the CCFA models is a nightmare as it has to be done by hand. 

Also, that portion and the relevant portion for the first model should include a line specifying that the factors for the repeated-measures are correlated. I probably omitted it by mistake, but I doubt it makes a difference.

Phil Chalmers

unread,
Apr 2, 2017, 2:45:19 PM4/2/17
to Sabeeh Baig, mirt-package
On Sun, Apr 2, 2017 at 2:42 PM, Sabeeh Baig <sabeeh...@gmail.com> wrote:
Thank you! If that's the case, then what do you suggest in terms of analyzing these data within an IRT framework? This is a repeated-measures design, so there's local dependence between the multiple assessments of the scale. Is this a limitation of the IRT framework or something else.

Likely a limitation, and I'm not overly familiar with IRT-based longitudinal research. 
 
I've done this in CCFA. The reason I want to switch to IRT is because retrieval of item parameters and IRT curves from the CCFA models is a nightmare as it has to be done by hand. 

Not necessarily, you could just estimate the model with something like lavaan, and then transform the parameters back into mirt. Though tracelines won't work for multidimensional models anyways, so it's likely not useful.
 
To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package+unsubscribe@googlegroups.com.
Message has been deleted

Sabeeh Baig

unread,
Apr 28, 2017, 11:05:49 PM4/28/17
to mirt-package, sabeeh...@gmail.com
I made quite a bit of progress, but have run into some issues with fit statistics. I'm working with 3 datasets with 3, 7, and 7 repeated-measures and 1000, 1700, and 900 participants. There's no missing data after cleaning. Following one of the papers on using the item bifactor model for vertical scaling, I estimated similar models to account for the repeated-measures and then arrived the most contained model that fit "well" after invariance testing. The models produced similar results to fully repeated-measures models estimated in a CCFA framework. 

As you'll see, the fit statistics for the model for the first dataset are reasonable, but the fit statistics for the second dataset seem poor to me, but I have no reference for that because the corresponding model with "configural invariance" only fits better in some respects (see below). Within the CCFA framework, similar models have better fit, but I know that there isn't a correspondence between IRT and CCFA fit statistics. So, my question is what are a set of statistics that I should check for to ensure "good" model fit? My sense is that the M2 statistic is not reasonable because it's test of exact fit. The CFI seems to be inconsistent, but the RMSEA seems to be function well. The AIC, BIC, SABIC, etc. seem to function as in other models and have been useful. Any suggestions would be appreciated? This is a substantive paper, so I need to be able to defend my choices in the analyses.

Regarding item fit statistics, I've noticed some peculiar behavior (I won't say odd). The fit statistics are much larger for the NRM, smaller for the GPCM, and smallest for the GRM in datasets 1 and 2. In dataset 3, they are much larger for the GPCM, smaller for the NRM, and smallest for the GRM. However, the magnitude of all of the fit statistics are much larger in datasets 2 and 3 than dataset 1. Is it enough to select the GRM simply based on this ordering that indicates better fit? And I wonder if the magnitude of these fit statistics becomes larger as the model becomes more complex and if the complexity should be taken into account when interpreting IRT model fit statistics?

Dataset 1 with a general factor for the construct that all items are measuring and 3 specific factors for each administration of the 3 item scale (constraints are from invariance testing):

upes.s1_part4 <- ' G = 1-9
                   CONSTRAIN = (1, 4, 7, a1), (2, 5, 8, a1), (3, 6, 9, a1),
                               (4, 7, a3, a4), (5, 8, a3, a4), (6, 9, a3, a4),
                               (1, 4, 7, d1), (4, 7, d2), (4, 7, d3), (4, 7, d4),
                               (2, 5, 8, d1), (5, 8, d2), (5, 8, d3), (5, 8, d4),
                               (3, 6, 9, d1), (6, 9, d2), (6, 9, d3), (6, 9, d4)          '
upes.s1_spec <- c(rep(x = 1, times = 3), rep(x = 2, times = 3), rep(x = 3, times = 3))

        M2 df                 p  RMSEA RMSEA_5 RMSEA_95 SRMSR   TLI   CFI
stats 94.5 15 0.000000000000141 0.0728  0.0591   0.0872 0.043 0.878 0.796

Dataset 2 with a general factor for the construct that all items are measuring and 7 specific factors for each administration of the 3 item scale (constraints are from invariance testing):

upes.s2_part4 <- ' G = 1-21
                   CONSTRAIN = (1, 4, 7, 10, 13, 16, 19, a1), (2, 5, 8, 11, 14, 17, 20,  a1), (3, 6, 9, 12, 15, 18, 21,  a1),
                               (1, 4, 7, 13, 16, 19, a2, a3, a4, a6, a7, a8),
                               (2, 5, 8, 14, 17, 20, a2, a3, a4, a6, a7, a8),
                               (3, 6, 9, 15, 18, 21, a2, a3, a4, a6, a7, a8),
                               (1, 4, 7, 10, 13, 16, 19, d1),
                               (1, 4, 7, 13, 16, 19, d2),
                               (1, 4, 7, 13, 16, 19, d3),
                               (1, 4, 7, 13, 16, 19, d4),
                               (2, 5, 8, 11, 14, 17, 20, d1),
                               (2, 5, 8, 14, 17, 20, d2),
                               (2, 5, 8, 14, 17, 20, d3),
                               (2, 5, 8, 14, 17, 20, d4),
                               (3, 6, 9, 12, 15, 18, 21, d1),
                               (3, 6, 9, 15, 18, 21, d2),
                               (3, 6, 9, 15, 18, 21, d3),
                               (3, 6, 9, 15, 18, 21, d4)'
upes.s2_spec <- c(rep(x = 1, times = 3), rep(x = 2, times = 3), rep(x = 3, times = 3), rep(x = 4, times = 3), rep(x = 5, times = 3),
                  rep(x = 6, times = 3), rep(x = 7, times = 3))

        M2  df p RMSEA RMSEA_5 RMSEA_95 SRMSR   TLI   CFI
stats 6989 201 0  0.14   0.137    0.143 0.134 0.532 0.361

Dataset 2, configurable invariance:

        M2  df p RMSEA RMSEA_5 RMSEA_95 SRMSR   TLI   CFI
stats 4410 105 0 0.155   0.151    0.158 0.132 0.432 0.594
Reply all
Reply to author
Forward
0 new messages