On Person Fit in mirt

1,004 views
Skip to first unread message

Amin

unread,
Jul 13, 2013, 1:07:37 PM7/13/13
to mirt-p...@googlegroups.com
Hi Phil,

Would you please explain how mirt calculates person fit values, like Lz, when there are mixed item formats (i.e. dichotomous and polytomous)?

As far as I know, we have formulas for dichotomous and polytomous data separately but I'm not aware of an index like Lz for mixed item formats.Suppose that we have a data set comprising 20 3PL and 10 GRM items.

Thanks,
Amin.

Phil Chalmers

unread,
Jul 13, 2013, 1:17:53 PM7/13/13
to Amin, mirt-p...@googlegroups.com
Hi Amin,

I'm not sure what the Lz statistic you are referring to is (could you
perhaps provide a reference?), and in any case it isn't implemented.
For person fit with mixed item formats currently I only provide the Zh
stat, which is adopted from Drasgow et al., 1985. If the items all
happen to be Rasch models though (all slopes equal to exactly 1) then
the infit and outfit stats are computed as well.

Phil
> --
> You received this message because you are subscribed to the Google Groups "mirt-package" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

stte...@gmail.com

unread,
Jul 13, 2013, 1:36:23 PM7/13/13
to mirt-p...@googlegroups.com, Amin
Indeed, Lz is what you mentioned as Zh by Drasgow et al., 1985 and it's a standardized likelihood based index.In that paper Z3 is defined for 3PL model and Zh for polytomous data.

So, personfit() function computes Zh with respect to the number of options and when there are two options (i.e. 0 & 1) it computes Z3 for corresponding data, Right?

Amin.

Phil Chalmers

unread,
Jul 13, 2013, 1:43:07 PM7/13/13
to mirt-p...@googlegroups.com, Amin
Yes, that sounds correct. I called it Zh because Zh and Z3 are interchangeable when using the general formula, so it seemed kind of redundant to try and differentiate them in the package (plus, Zh naturally makes more sense if the item formats are mixed). Sorry if that was initially confusing. 

Phil

stte...@gmail.com

unread,
Jul 13, 2013, 1:53:12 PM7/13/13
to mirt-p...@googlegroups.com
That's fine.So, based on my understanding of your package it uses response probabilities obtained from defined model by 'itemtype' in mirt()to compute likelihood function, it's expected value ans variance in order to calculate the person fit index,right?
In other words, it computes a N by I matrix of response probabilities, in which N is the samles size and I is the number of items, and then personfit() uses this information alongside with ability estimates for computing Zh. am I right?

Amin.

Phil Chalmers

unread,
Jul 13, 2013, 2:03:55 PM7/13/13
to Amin Mousavi, mirt-p...@googlegroups.com
That's correct, so changing the estimation method (EAP, MAP, etc) for
the person parameters will also affect this statistic, which is why I
allow a method = '...' argument.

stte...@gmail.com

unread,
Jul 18, 2013, 6:51:17 PM7/18/13
to mirt-p...@googlegroups.com, Amin Mousavi
Phil,

As Zh is based on the unidimensionallity, how mirt computes it for models which have more than one factor?

I ran an analysis with two dimensions and it computed one value of person fit for each examinee and I was wondering how it did it.

Amin.

Phil Chalmers

unread,
Jul 18, 2013, 7:11:24 PM7/18/13
to Amin Mousavi, mirt-p...@googlegroups.com

Zh doesn't require unidimensionality, that's why. It's based on likelihood differences, so it readily generalizes to multiple dimensions and multiple item types.

Phil

stte...@gmail.com

unread,
Jul 18, 2013, 7:16:51 PM7/18/13
to mirt-p...@googlegroups.com, Amin Mousavi
So Zh is computed based on observed probabilities which come from the given model(which could be unidimensional or multidimensional), right?

Also, my data contains missing values and for factor scores it didn't compute SEs, is it because of missing values?

Amin.

Phil Chalmers

unread,
Jul 18, 2013, 7:26:26 PM7/18/13
to Amin Mousavi, mirt-p...@googlegroups.com

Sent from my Nexus 4


On Jul 18, 2013 7:16 PM, <stte...@gmail.com> wrote:
>
> So Zh is computed based on observed probabilities which come from the given model(which could be unidimensional or multidimensional), right?

Correct.

>
> Also, my data contains missing values and for factor scores it didn't compute SEs, is it because of missing values?

No you can still get them, just makes sure that full.scores = FALSE. SEs will always exist for fscores.

Phil

stte...@gmail.com

unread,
Jul 18, 2013, 7:28:45 PM7/18/13
to mirt-p...@googlegroups.com, Amin Mousavi
I asked for full.scores indeed. So it will compute SEs just for observed patterns, correct?

Amin.

Phil Chalmers

unread,
Jul 18, 2013, 7:30:53 PM7/18/13
to Amin Mousavi, mirt-p...@googlegroups.com

Correct.

Sent from my Nexus 4

Aiden Loe

unread,
May 13, 2014, 6:16:52 AM5/13/14
to mirt-p...@googlegroups.com
Hi Phil, 

From my understanding in the messages exchanged above, the Z3 are based on standardised likelihood based index right? So the scale is based on response probability, sample size, as well as ability estimates to calculate Zh and it should range from say -4 to 4. I think I am having some very bad data. But some of my Zh score seems to fall out of the standardised scale. I was wondering how I could interpret the results as such? 

> person.mod6 <- personfit(mod6)
> item.mod6
         item         Zh
1  RB1bScored 10.8730920
2  RB1cScored -1.6810622
3  RB1dScored  9.9999813
4  RB2aScored  3.4320479
5  RB2cScored -3.2032104
6  RB2dScored  1.5675828
7  RB3aScored -0.9804712
8  RB3bScored  6.0381185
9  RB4dScored 14.1276472
10 AR1aScored -3.3738569
11 AR2bScored  8.7397001
12 AR2cScored -0.6957545
13 AR2dScored  5.4177734
14 AR3bScored  3.5591780
15 AR3cScored  0.5228221
16 AR4aScored  0.9424283
17 AR4bScored -0.6326235
18 AR4cScored -1.9859050
19 AT2aScored  1.7368824
20 AT2cScored  0.2603020
21 AT3aScored  1.5431050
22 AT4aScored  4.1992525
23 AT4bScored  3.1105507
24 AT4cScored -1.9673609
25 PO1cScored  1.9622907
26 PO2bScored -0.6684275
27 PO2cScored  1.6891004
28 PO2dScored 10.2308970
29 PO3aScored -0.9632781
30 PO4aScored  1.2909298

Kind regards,
Aiden

Phil Chalmers

unread,
May 13, 2014, 10:26:42 AM5/13/14
to Aiden Loe, mirt-package
Hi Aiden,

I haven't found the Zh to behave all that well except under good conditions, so it could be off for a number of reasons (small sample size creating parameter uncertainty, poor model fit, etc). Perhaps a simulation study is in order to determine the optimal parameters for it to work since relatively little work has used it.

Phil 


--
You received this message because you are subscribed to the Google Groups "mirt-package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aiden loe

unread,
Jan 19, 2015, 2:18:43 PM1/19/15
to mirt-p...@googlegroups.com, loeba...@gmail.com
Hi Phil, 

I hope you are well! 

I was wondering if any simulation studies regarding Zh to your knowledge have been conducted?

Kind regards,
Aiden

Phil Chalmers

unread,
Jan 19, 2015, 4:33:37 PM1/19/15
to Aiden loe, mirt-package, Aiden Loe
Off the top of my head I can't think of any, but I'm sure there have been a few, probably in the early 80's. Might want to reverse search who's cited the original authors for a good guess at articles that have used the statistic in their work. Cheers.

Phil

Loe Bao Sheng

unread,
Jan 19, 2015, 5:00:16 PM1/19/15
to Phil Chalmers, mirt-package
Hi Phil, 

Thanks for the quick response. Actually, I meant optimisation of Zh, which you suggested in our previous conversation. But I can look around. =) 

Also,I have been reading a couple of articles and regarding personfit index. There is a personify() but it produces the Zh and not the p value to suggest whether the person’s response could be aberrant. Based on the Zh values it produces, how could we interpret it? 


Thanks,
Aiden

Phil Chalmers

unread,
Jan 19, 2015, 11:00:33 PM1/19/15
to Loe Bao Sheng, mirt-package
On Mon, Jan 19, 2015 at 5:00 PM, Loe Bao Sheng <loeba...@gmail.com> wrote:
Hi Phil, 

Thanks for the quick response. Actually, I meant optimisation of Zh, which you suggested in our previous conversation. But I can look around. =) 

Also,I have been reading a couple of articles and regarding personfit index. There is a personify() but it produces the Zh and not the p value to suggest whether the person’s response could be aberrant. Based on the Zh values it produces, how could we interpret it? 

There's an associated p-value for the Zh? I don't recall seeing that in the original work. If you can find the formula then please send it my way and I'll happily add it into the package. Otherwise, Zh seems to have a similar relationship to the infit/outfit statistics, only instead of deviation from 1.0 it deviates from 0 to indicate better or worse than expected fit. Cheers.

Phil

Phil Chalmers

unread,
Apr 24, 2015, 10:30:06 AM4/24/15
to ANTONIO MARTINEZ PINEDA, mirt-package
The statistics are asymptotically normal, Zh ~ N(0,1), so they are really supposed to be equivalent to the usual z statistics. However, it turns out that they really aren't good z statistics unless the 'true' ability values are used, which is one of their main limitations (other fit statistics seem to be less affected by this property). Cheers.

Phil  

On Thu, Apr 23, 2015 at 6:04 PM, ANTONIO MARTINEZ PINEDA <antonio.mar...@gmail.com> wrote:
Dear Professor Chalmers:

I was searching for the correct interpretation of Zh and found this blog. So I'd like to clarify my understanding, if an item has a value greater than 1 we understand that it is too noisy and an item with a value less than -1 as deterministic?

Thanks in advance
Antonio

ANTONIO MARTINEZ PINEDA

unread,
Dec 3, 2015, 2:09:29 PM12/3/15
to mirt-package
Ok, thank you very much.

I'll considered and continue testing.

Regards

KwonHyun Kim

unread,
Dec 14, 2015, 11:22:44 AM12/14/15
to mirt-package

As far as I know, Lz is distributed differently from N(0, 1^2) so

people came up with Lz* I don't know how they call it but,

You could easily find some reference for it, and they are some tutorial papers also as far as I recall...

One thing that bothers me is that how I can interpret high Lz...

Some people think it's okay and others seem to think it as a sign of aberrant response...

 

Conal Monaghan

unread,
May 11, 2016, 7:05:55 AM5/11/16
to mirt-package
Hey, this is taken from Embretson 2000

"Large negative Z L values (e.g., those two standard errors below 0.0) indicate misfit. Large positive Z L values indicate response patterns that are higher in likelihood than the model predicts... In a real-world situation, researchers may consider, somewhat arbitrarily, only examinees with Z L values above some threshold value, say Z L > −2.0, as being scalable and as having interpretable trait level estimates. Alternatively, researchers may forgo the referencing of a person-fit statistic against a null distribution, and simply set aside a fixed proportion of bad person-fit scores for further scrutiny."

I hope this helps,
      Conal

Phil Chalmers

unread,
May 12, 2016, 10:06:56 PM5/12/16
to Conal Monaghan, mirt-package
This statement is true, but only to the degree that the ability estimates are accurate (i.e., have little measurement error). If a larger amount of error exists (e.g., in dichotomous tests with less than 40 items, or in polytomous models with, say, less than 15 items) then the behaviour generally isn't as clear. Some corrections have been proposed and are available in the PerFit package, but I can't attest to how well they actually work in practice. 

Phil

--

Conal Monaghan

unread,
May 27, 2016, 9:17:24 PM5/27/16
to mirt-package
To that end, what adjustments/considerations might we have to make for, say, a grm with say 6 items and 9 response categories? Or 10 items and 7 response categories?

Phil Chalmers

unread,
May 28, 2016, 10:54:58 AM5/28/16
to Conal Monaghan, mirt-package
Depends on the measurement precision. Polytomous items are generally less effected by the measurement imprecision than dichotomous items, so if your test is reasonably sized (say, 15 items with 5 response categories) it's likely a non-issue. However, the adjustements in PerFit likely apply to polytomous items as well (they just may not be supported yet). 

Phil

On Fri, May 27, 2016 at 9:17 PM, Conal Monaghan <conal.m...@gmail.com> wrote:
To that end, what adjustments/considerations might we have to make for, say, a grm with say 6 items and 9 response categories? Or 10 items and 7 response categories?
Reply all
Reply to author
Forward
0 new messages