modelling the effect of individual covariates in IRT model

53 views

Michele

Nov 13, 2018, 7:37:16 AM11/13/18
to lavaan

Hi all,

I have been reading about the relation between factor analysis and IRT and, in particular, about ways to perform this latter in Lavaan. I guess this is more a general question on modelling rather than on how to code this in Lavaan.

Im interested in estimating a two-parameter logistic IRT model. In addition to item parameters (difficulty and discrimination), I would like to also control for individual characteristics in the logit link function to capture the fact that the probability of correctly answering each item might be a function of some characteristics of the respondent (say age). The items of my test are all assumed to be measuring the same underlying latent variable (lets s say this is language ability and call it Lang).

In Lavaan, I am using the following code (which I believe uses a probit link, so I convert the parameter to the logit case in my code). In the code below, I am not controlling for age in any way:

# Model definition

twoP.model<-'

Theta =~ l1*item_1 + l2*item_2 + l3*item_3+ l4*item_4  + l5*item_5

# thresholds

item_1 | th1*t1

item_2 | th2*t1

item_3 | th3*t1

item_4 | th4*t1

item_5 | th5*t1

discr1.L := (l1)/sqrt(1-l1^2)*1.7

discr2.L := (l2)/sqrt(1-l2^2)*1.7

discr3.L := (l3)/sqrt(1-l3^2)*1.7

discr4.L := (l4)/sqrt(1-l4^2)*1.7

discr5.L := (l5)/sqrt(1-l5^2)*1.7

# convert thresholds to difficulty parameter (logistic)

diff1.L := th1/l1

diff2.L := th2/l2

diff3.L := th3/l3

diff4.L := th4/l4

diff5.L := th5/l5

'

# Model estimation:

twoP.fit <- cfa(twoP.model, data=data.frame(data_factor_model_age),  std.lv=TRUE, ordered=c("item_1","item_2", "item_3","item_4", "item_5"))

summary(twoP.fit, standardized=TRUE)

My question is: how should I think of the effect of age in this type of analysis? In particular

a. should I think of my latent variable as being a function of age and write something like Cog ~ age,

b. or instead that each individual item in the test is a function of age and write:

item_1 ~ age

item_2 ~ age

item_3 ~ age etc. ?

And once I include age, how can I convert the parameters estimates in terms of discrimination and difficulty parameters?

More generally if anyone could provide me with some references that describes how think about the effect of background variables in the SEM literature (in particular for the measurement model), that would be great.

Thanks!
Michele

Michele

Nov 13, 2018, 8:54:04 AM11/13/18
to lavaan
sorry, "Cog" should read "Lang" or, to be consistent with the code, "Theta".

Thanks,
Michele

Mauricio Garnier-Villarreal

Nov 13, 2018, 1:15:36 PM11/13/18
to lavaan
Michele

a. should I think of my latent variable as being a function of age and write something like Cog ~ age,

b. or instead that each individual item in the test is a function of age and write:

You could do either. Each of this ask a different research question.

- The first one would ask about the regression of age on the latent factor.

- While the second would ask about the regression of age on each item, above and beyond what the items shared between each other in the latent factor.

- The second one could also be specified to test for Differential Item Functioning (DIF), in what would be a MIMIC model to test for this.

You can look at Rex Kline, Principles and practices for SEM; or Rick Hoyle Handobook of SEM

Michele

Nov 14, 2018, 11:34:38 AM11/14/18
to lavaan
Hi Mauricio,

Thank you very much for your reply -  and thanks for the references, they are very helpful!

One additional questions (now more related to how to do things in Lavaan):

1. Once I control for age (say that I posit that the factor is a function of age), how would I recover the loading and discrimination parameters from the estimates? In particular would the formulae below still be valid?

discr1.L := (l1)/sqrt(1-l1^2)*1.7

and

diff1.L := th1/l1

And how would this change if I included age in the link function instead and not in the regression for the factor. Should I adjust these formulae in this case?

Thank you!

Michele

Mauricio Garnier-Villarreal

Nov 14, 2018, 3:17:28 PM11/14/18
to lavaan
Michele

You woudlnt need to change those formulas. The reason is because you are not changing the estimates or estimating any new parameters. By using := operator you are including new estimates that are a function of other parameters, so these := are actually no new parameters or constraints in the model. But are the transformed probit link parameters into the log link

If you do Cog ~ age, this will represent a linear regression as the latent factor is assume to be continuous. While if you do item_1 ~ age, these would be probit regressions, as the item is binary and lavaan only uses the probit link (so far)

Michele

Nov 15, 2018, 8:08:26 AM11/15/18
to lavaan
Hi Mauricio,

Thanks, I see what you are saying.

However, if I include age in the probit regressions, the constants would be interpreted differently from the case where I do not include age. So the interpretation of the constant as ~ difficulty is not super-straightforward anymore.

Thank you.
Michele