DTF effect size

Conal Monaghan

unread,

Sep 8, 2016, 11:04:11 PM9/8/16

to mirt-package

Hi All,

The DTF function is fantastic to quantify the effect size of the DTF between subgroups (Chalmers, Counsell, & Flora, 2015). Although the sDTF is easy to evaluate in terms of significance, how do we evaluate the size of the effect, especially for uDTF. As the method is new, I imagine the technique needs to be implemented for a while to determine appropriate qualitative labels. At the present point in time, do we know what sDTF and uDTF values would be considered small, medium, large etc? To that end, how to we interpret the output in terms of combining to conclusions about the DTF present? I have included come examples below from DTF():

> DTF.Gender <- multipleGroup(data[,c(3:8)], model = 1, group = gender, verbose = FALSE)

$observed

sDTF.score sDTF(%).score uDTF.score uDTF(%).score

6.205398 11.491478 6.205398 11.491478

$CIs

sDTF.score sDTF(%).score uDTF.score uDTF(%).score

CI_97.5 6.759205 12.51705 6.759205 12.51705

CI_2.5 5.672016 10.50373 5.672016 10.50373

$tests

P(sDTF.score = 0)

1.544532e-89

> DTF.Politics <- multipleGroup(data[,c(3:8)], model = 1, group = politics)

$observed

sDTF.score sDTF(%).score uDTF.score uDTF(%).score

2.537339 4.698776 2.537339 4.698776

$CIs

sDTF.score sDTF(%).score uDTF.score uDTF(%).score

CI_97.5 2.934905 5.435010 2.934906 5.435012

CI_2.5 2.153802 3.988522 2.153802 3.988522

$tests

P(sDTF.score = 0)

2.050715e-34

Kind Regards,

Conal Monaghan

Phil Chalmers

unread,

Sep 11, 2016, 1:53:01 PM9/11/16

to Conal Monaghan, mirt-package

On Thu, Sep 8, 2016 at 11:04 PM, Conal Monaghan <conal.m...@gmail.com> wrote:

Hi All,
The DTF function is fantastic to quantify the effect size of the DTF between subgroups (Chalmers, Counsell, & Flora, 2015). Although the sDTF is easy to evaluate in terms of significance, how do we evaluate the size of the effect, especially for uDTF. As the method is new, I imagine the technique needs to be implemented for a while to determine appropriate qualitative labels. At the present point in time, do we know what sDTF and uDTF values would be considered small, medium, large etc?

I don't want to go into too much detail, as a chunk of this is actually part of my dissertation, but the concept of effect sizes is very much lacking in the IRT literature (and I argue, even for DIF). Currently my recommendation would be to plot the curves and their associated variability to know whether the differences are large at given \theta values (conditional tests) to help understand the marginal measures reported by sDTF and uDTF. There is a close connection to everything, and the DTF measures open a different avenue for this type of thinking, but again at the present I'd rather not go into it on a public forum.

To that end, how to we interpret the output in terms of combining to conclusions about the DTF present? I have included come examples below from DTF():

Just a note here: be sure to include anchor items and freely estimate the hyper pars of the focal groups. Otherwise, comparing expected response curves will give differences that do not account for overall group differences (see how Wald and LR tests are set up because they are identical). Cheers.

Phil

> DTF.Gender <- multipleGroup(data[,c(3:8)], model = 1, group = gender, verbose = FALSE)

$observed
sDTF.score sDTF(%).score uDTF.score uDTF(%).score
6.205398 11.491478 6.205398 11.491478

$CIs
sDTF.score sDTF(%).score uDTF.score uDTF(%).score
CI_97.5 6.759205 12.51705 6.759205 12.51705
CI_2.5 5.672016 10.50373 5.672016 10.50373

$tests
P(sDTF.score = 0)
1.544532e-89

> DTF.Politics <- multipleGroup(data[,c(3:8)], model = 1, group = politics)

$observed
sDTF.score sDTF(%).score uDTF.score uDTF(%).score
2.537339 4.698776 2.537339 4.698776

$CIs
sDTF.score sDTF(%).score uDTF.score uDTF(%).score
CI_97.5 2.934905 5.435010 2.934906 5.435012
CI_2.5 2.153802 3.988522 2.153802 3.988522

$tests
P(sDTF.score = 0)
2.050715e-34

Kind Regards,
Conal Monaghan

--
You received this message because you are subscribed to the Google Groups "mirt-package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brenton Wiernik

unread,

Sep 26, 2016, 7:40:11 PM9/26/16

to mirt-package

Also look into Chris Nye's dissertation. He did a series of simulations to set preliminary thresholds for DIF/DTF effect sizes.

Phil Chalmers

unread,

Sep 26, 2016, 7:43:55 PM9/26/16

to Brenton Wiernik, mirt-package

Thanks. Could post a link to it? Might be interesting to check out.

Phil

On Mon, Sep 26, 2016 at 7:40 PM, Brenton Wiernik <quill...@gmail.com> wrote:

Also look into Chris Nye's dissertation. He did a series of simulations to set preliminary thresholds for DIF/DTF effect sizes.

Brenton Wiernik

unread,

Sep 26, 2016, 7:46:18 PM9/26/16

to mirt-package

https://www.ideals.illinois.edu/bitstream/handle/2142/26174/Nye_Christopher.pdf?sequence=1

Garett Howardson

unread,

Sep 27, 2016, 6:38:22 AM9/27/16

to mirt-p...@googlegroups.com, quill...@gmail.com

Here's a published example of the Nye effect size if that helps:

https://www.researchgate.net/profile/Christopher_Nye/publication/50998374_Effect_Size_Indices_for_Analyses_of_Measurement_Equivalence_Understanding_the_Practical_Importance_of_Differences_Between_Groups/links/550859a20cf26ff55f816638.pdf

On Monday, September 26, 2016 at 7:43:55 PM UTC-4, Phil Chalmers wrote:

Thanks. Could post a link to it? Might be interesting to check out.

Phil

On Mon, Sep 26, 2016 at 7:40 PM, Brenton Wiernik <quill...@gmail.com> wrote:

Also look into Chris Nye's dissertation. He did a series of simulations to set preliminary thresholds for DIF/DTF effect sizes.

--
You received this message because you are subscribed to the Google Groups "mirt-package" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package...@googlegroups.com.

Conal Monaghan

unread,

Jul 26, 2017, 5:34:49 AM7/26/17

to mirt-package, quill...@gmail.com

Would this mean that at least one of each parameter (e.g., a1 and d1-6, for polytomous) must be constrained for DIF/DTF, and therefore respective effect sizes? When, for example, all parameters for one item are constrained, the parameters will theoretically be on the same scale across groups, and mirt() will free hyper-parameters.

This might create issues when there is systematic DIF across groups, such as when all items are more difficult for one group than another.

Brenton Wiernik

unread,

Jul 26, 2017, 7:43:35 AM7/26/17

to Conal Monaghan, mirt-package

One of the basic assumptions of any DIF analyses is that you have 1 anchor item that is non-DIF. Kopf recently wrote a review of different approaches to identifying a usable anchor item: http://journals.sagepub.com/doi/abs/10.1177/0013164414529792

Conal Monaghan

unread,

Jan 22, 2018, 7:09:34 PM1/22/18

to mirt-package

Thanks for the paper and the info!

Would this mean that even standard DIF analyses require one anchor item? For example I am currently investigating DTF between males and females, and all items have DIF. Therefore, my scale has no anchors as this seems inappropriate. This also means that you will not be able to assess the DIF for one item in each analysis (annoying if you want to see how each item performs). Forcing an anchor introduces issues where all items have DIF (the scale is internally consistent) thus having the same effect across items. When this is the case, there are no good anchor candidates and potentially provides incorrect parameter estimates when you try to constrain one to be equal.

However, I can see the importance of constraining items to provide metric / scale equivalence. Is it possible to run DIF/DTF without an anchor?

Cheers,

Conal

Brenton Wiernik

unread,

Jan 22, 2018, 7:19:11 PM1/22/18

to Conal Monaghan, mirt-package

No, the item parameters cannot be compared (i.e., you can’t determine if they have DIF or not) if you don’t have an anchor item because there is nothing to constrain the two samples to have the same metric for the theta scales.

I’m not sure what you mean by “standard DIF analyses” how you were determining that all items show DIF without an anchor item. Even non-IRT DIF methods (e.g., the Mantel-Hansel test) that use sum-score based comparisons rely on the assumption that something in the test is comparable across groups. If that is not the case, you can’t interpret parameter differences. I’m not sure what you mean by “internally consistent” here.

From: mirt-p...@googlegroups.com <mirt-p...@googlegroups.com> on behalf of Conal Monaghan <conal.m...@gmail.com>
Sent: Monday, January 22, 2018 7:09:33 PM
To: mirt-package
Subject: Re: DTF effect size

You received this message because you are subscribed to a topic in the Google Groups "mirt-package" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mirt-package/MoAvuJFfyL4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mirt-package...@googlegroups.com.

Reply all

Reply to author

Forward