Help with interpretation of gpcm fit

Dallie Sandilands

unread,

May 15, 2015, 1:52:30 PM5/15/15

to mirt-p...@googlegroups.com

Hi Phil,

Thank you for the mirt package, and for the valuable information you share in this google group.

I have 4 sets of data which represent polytomous scores arising from 4 different rubrics applied to writing samples. There is no missing data in any of the data sets. The sample sizes range from about 6000 to 22000. The rubrics have different numbers of items (between 4 and 7 items) and different rating levels (either 0-3 or 0-5). I am trying to understand whether any one of the rubrics provides better measurement than the others.

I applied 1, 2, and 3 factor graded, grsm, and gpcm models to each data set, using for example, mirt(data = data, model = 2, itemtype = "gpcm") and anovas to compare the models. Based on AIC and BIC from the anova output, it seems like a 2 factor gpcm model gives the best fit of any of the models tried for all 4 data sets.

Even though the 2 factor gpcm model appears better relative to the others, I am having trouble deciding whether it is a good fit. The output from mirt calls is not producing RMSEA, TLI, CFI. The S_X2 statistics for most items are significant which I understand means they don’t fit. For 3 rubrics, the standardized residuals between items is in the range of plus or minus .05 to .08, but for one rubric they are very high (-3, -4???).

To explore whether the large sample sizes may be impacting the s_x2 significance I took random samples of 500, 1000 and 2000 from each data set and analyzed them. The s_x2 values are not significant in the samples of 500 and 1000 but are significant at 2000.

My questions are:

I assume RMSEA etc aren’t being provided in the mirt output because there are areas of sparseness in the data and calcNull is being ignored, despite the large sample sizes. Is there some other way I can assess the overall fit of each model for the data without RMSEA, TLI and CFI?

At the item level, do you think the large sample sizes are impacting the X_S2 significance levels? Would it be reasonable to assume the items fit because they fit in the smaller random sample sizes?

Many thanks for your advice,

Dallie

Phil Chalmers

unread,

May 16, 2015, 5:33:31 PM5/16/15

to Dallie Sandilands, mirt-package

Hi Dallie,

Check the M2() function, which returns goodness of fit testing similar to SEM methods by using limited information versions of the complete data tables. Should help you to determine whether your model fits 'well enough'. Cheers.

Phil

--
You received this message because you are subscribed to the Google Groups "mirt-package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dallie Sandilands

unread,

May 16, 2015, 7:30:01 PM5/16/15

to mirt-p...@googlegroups.com

Hi again Phil,
Thanks for your response. I should have mentioned in my last post that for each of the data sets M2 is returning an error - can't be calculated since df is too low.
Parallel analyses using random.polychor.pa indicates one or two factors.
Any other thoughts or guidance will be appreciated!
Best wishes,
Dallie

Phil Chalmers

unread,

May 18, 2015, 12:57:51 PM5/18/15

to Dallie Sandilands, mirt-package

These are two different things. One is about model fit, the other is about dimensionality. Comparing nested models (1 versus 2 factors) will give comparable results as random.polychor.pa. Depending on the IRT models you are using, you have to have enough items in order to run the current M2* statistic with polytomous items. A different version that isn't so strict for polytomous items will be implemented sometime, but it currently is not available. Cheers.