Hi all,
I have been reading about the relation between factor analysis and IRT and, in particular, about ways to perform this latter in Lavaan. I guess this is more a general question on modelling rather than on how to code this in Lavaan.
I’m interested in estimating a two-parameter logistic IRT model. In addition to item parameters (difficulty and discrimination), I would like to also control for individual characteristics in the logit link function to capture the fact that the probability of correctly answering each item might be a function of some characteristics of the respondent (say age). The items of my test are all assumed to be measuring the same underlying latent variable (let’s s say this is language ability and call it “Lang”).
In Lavaan, I am using the following code (which I believe uses a probit link, so I convert the parameter to the logit case in my code). In the code below, I am not controlling for age in any way:
# Model definition
twoP.model<-'
# loadings
Theta =~ l1*item_1 + l2*item_2 + l3*item_3+ l4*item_4 + l5*item_5
# thresholds
item_1 | th1*t1
item_2 | th2*t1
item_3 | th3*t1
item_4 | th4*t1
item_5 | th5*t1
# convert loadings to discrimination parameter (logistic)
discr1.L := (l1)/sqrt(1-l1^2)*1.7
discr2.L := (l2)/sqrt(1-l2^2)*1.7
discr3.L := (l3)/sqrt(1-l3^2)*1.7
discr4.L := (l4)/sqrt(1-l4^2)*1.7
discr5.L := (l5)/sqrt(1-l5^2)*1.7
# convert thresholds to difficulty parameter (logistic)
diff1.L := th1/l1
diff2.L := th2/l2
diff3.L := th3/l3
diff4.L := th4/l4
diff5.L := th5/l5
'
# Model estimation:
twoP.fit <- cfa(twoP.model, data=data.frame(data_factor_model_age), std.lv=TRUE, ordered=c("item_1","item_2", "item_3","item_4", "item_5"))
summary(twoP.fit, standardized=TRUE)
My question is: how should I think of the effect of age in this type of analysis? In particular
a. should I think of my latent variable as being a function of age and write something like Cog ~ age,
b. or instead that each individual item in the test is a function of age and write:
item_1 ~ age
item_2 ~ age
item_3 ~ age etc. ?
And once I include age, how can I convert the parameters estimates in terms of discrimination and difficulty parameters?
More generally if anyone could provide me with some references that describes how think about the effect of “background” variables in the SEM literature (in particular for the measurement model), that would be great.
Thanks!
Michele
a. should I think of my latent variable as being a function of age and write something like Cog ~ age,
b. or instead that each individual item in the test is a function of age and write:
You could do either. Each of this ask a different research question.
- The first one would ask about the regression of age on the latent factor.
- While the second would ask about the regression of age on each item, above and beyond what the items shared between each other in the latent factor.
- The second one could also be specified to test for Differential Item Functioning (DIF), in what would be a MIMIC model to test for this.
You can look at Rex Kline, Principles and practices for SEM; or Rick Hoyle Handobook of SEM
discr1.L := (l1)/sqrt(1-l1^2)*1.7
and
diff1.L := th1/l1
And how would this change if I included age in the link function instead and not in the regression for the factor. Should I adjust these formulae in this case?
Thank you!
Michele