Estimating an Item Response Model (2PL)

koenig...@gmail.com

unread,

Jan 14, 2019, 10:00:21 AM1/14/19

to lavaan

Dear All,

currently I am working on a simulation about item parameter recovery in small sample IRT modeling. The model under investigation is a 2PL model, and a reviewer suggested to include limited-information methods based on tetrachoric correlations implemented in lavaan and Mplus.

I specified the model as follows (a unidimensional example with 10 items):

mod <-'
# loadings
Theta =~ l1*Q1 + l2*Q2 + l3*Q3 + l4*Q4 + l5*Q5 +
         l6*Q6 + l7*Q7 + l8*Q8 + l9*Q9 + l10*Q10

# thresholds (equivalent to tau of the polychoric/tetrachoric-fct.)
Q1 | th1*t1
Q2 | th2*t1
Q3 | th3*t1
Q4 | th4*t1
Q5 | th5*t1
Q6 | th6*t1
Q7 | th7*t1
Q8 | th8*t1
Q9 | th9*t1
Q10 | th10*t1

#discriminations (2PL)
alpha1 := l1/sqrt(1-l1^2) * 1.702
alpha2 := l2/sqrt(1-l2^2) * 1.702
alpha3 := l3/sqrt(1-l3^2) * 1.702
alpha4 := l4/sqrt(1-l4^2) * 1.702
alpha5 := l5/sqrt(1-l5^2) * 1.702
alpha6 := l6/sqrt(1-l6^2) * 1.702
alpha7 := l7/sqrt(1-l7^2) * 1.702
alpha8 := l8/sqrt(1-l8^2) * 1.702
alpha9 := l9/sqrt(1-l9^2) * 1.702
alpha10 := l10/sqrt(1-l10^2) * 1.702

#difficulties (2PL)
beta1 := th1/l1
beta2 := th2/l2
beta3 := th3/l3
beta4 := th4/l4
beta5 := th5/l5
beta6 := th6/l6
beta7 := th7/l7
beta8 := th8/l8
beta9 := th9/l9
beta10 := th10/l10
'

I am running the replications with the following specifications:

fit <- cfa(mod, data=dat, std.lv = TRUE, ordered=c(paste0("Q",1:10)), estimator="WLSMV", parameterization = "delta")

Is this the currently correct specification to run the model? Are there any tweaks?

I browsed this group a bit and found some entries where this model is estimated with the 'theta' parameterization. Does the conversion of the factor loadings to discrimination and difficulties change?

Thank you very much!

Best wishes,

Christoph

Terrence Jorgensen

unread,

Jan 15, 2019, 7:53:08 AM1/15/19

to lavaan

Is this the currently correct specification to run the model? Are there any tweaks?

The specification is correct, but the user-defined IRT parameters might not be correct because lavaan uses a probit (normal ogive) rather than logistic link. This might help:

https://www.tandfonline.com/doi/abs/10.1080/10705510701758406

with the 'theta' parameterization. Does the conversion of the factor loadings to discrimination and difficulties change?

I'm sure it would. Under the delta parameterization, the total variance of the latent normal item-responses is fixed to 1 (so residual variances < 1). Under the theta parameterization, the residual variance of the latent normal item-responses is fixed to 1 (so total variances > 1).

https://www.statmodel.com/download/webnotes/CatMGLong.pdf

Terrence D. Jorgensen

Assistant Professor, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

http://www.uva.nl/profile/t.d.jorgensen

koenig...@gmail.com

unread,

Jan 16, 2019, 2:54:43 AM1/16/19

to lavaan

Dear Terrence,

thank you for your quick response!

Am Dienstag, 15. Januar 2019 13:53:08 UTC+1 schrieb Terrence Jorgensen:

The specification is correct, but the user-defined IRT parameters might not be correct because lavaan uses a probit (normal ogive) rather than logistic link. This might help:

https://www.tandfonline.com/doi/abs/10.1080/10705510701758406

Yeah, it took me a while to wrap my head around all those different parameterizations and conversion formulas. When setting up the conversions in my model, I followed the code-snipped from Beaujean.

For example, for one discrimination and difficulty parameter:

loading to slope (normal):    alpha1.N := (l1)/sqrt(1-l1^2)
loading to slope (logistic):  alpha1.L := (l1)/sqrt(1-l1^2)*1.7

threshold to intercept (normal):   beta1.N := (-th1)/sqrt(1-l1^2)
threshold to intercept (logistic): beta1.L := (-alpha1.L)*loc1.L

where loc1.L is thresholds to locations (logistic): loc1.L := th1/l1

I compared the estimates of my user-defined parameters with ltm, mirt and Stan, and they check out (although I don't find any reference for the last formula).

Another question: the reviewer mentioned that I should also look at Mplus. If I specify 'mimic = Mplus', to what degree are the results obtained in lavaan similar to the results I would obtain if I estimated the model in Mplus? I'd like to avoid setting up the simulation in Mplus.

Thanks!

Best wishes

Christoph

koenig...@gmail.com

unread,

Jan 16, 2019, 8:02:28 AM1/16/19

to lavaan

Okay, wait a second.

Starting with Table 2 from Kamata and Bauer (Conversion Formulas for Four Factor Analysis Parameterizations) and the documentation of the irt.fa function (psych-package):

1. with parameterization = "delta", the results are similar to the results obtained with the 'standardized and conditional' parameterization

  # Normal results  ("Standardized and conditional") (from Akihito Kamata )   
#item          discrim   tau
#  1           0.3848    -1.4325  
#  2           0.3976    -0.5505 
#  3           0.4733    -0.1332 
#  4           0.3749    -0.7159 
#  5           0.3377    -1.1264 

  # Normal results (lavaan, parameterization = "delta")                                      
#  1           0.389     -1.433    
#  2           0.397     -0.550    
#  3           0.471     -0.133    
#  4           0.377     -0.716    
#  5           0.342     -1.126

2. with parameterization = "theta", the results are similar to the results obtained with the 'standardized and marginal' parameterization

  # Normal results  ("Standardized and Marginal")    (from Akihito Kamata )       
#Item       discrim             tau 
#  1       0.4169             -1.5520   
#  2       0.4333             -0.5999 
#  3       0.5373             -0.1512 
#  4       0.4044             -0.7723  
#  5       0.3587             -1.1966 

  # Normal results (lavaan, parameterization = "theta")
#  1       0.423              -1.555      
#  2       0.433              -0.600 
#  3       0.534              -0.151     
#  4       0.407              -0.773    
#  5       0.364              -1.199

Hence, the default estimation in lavaan is 'standardized and marginal'. Is this correct?

Thanks!

Best wishes

Christoph

Terrence Jorgensen

unread,

Jan 16, 2019, 6:17:56 PM1/16/19

to lavaan

The default parameterization is delta

?lavOptions

koenig...@gmail.com

unread,

Jan 17, 2019, 5:35:32 AM1/17/19

to lavaan

Dear Terrence,

thanks again!

One last question: in my simulation I deal with very small sample sizes (N < 200) and two test lengths (k = 25, 50). When using WLSMV or ULSMV estimation, many replications of the smallest sample sizes negative factor loadings for either some or all items. Converting these factor loadings consequently yields negative item discriminations, which does not make sense. The solutions with negative factor loadings are not related to solutions with either negative residual variances or NPD covariance matrices. The number of replications with negative factor loadings slowly decrease with increasing sample size.

Is this a symptom of the performance of WLSMV and ULSMV estimation in small samples?

My first reaction would be to eliminate solutions with negative factor loadings as inadmissable solutions. Would this be correct?

Thank you (and sorry for using the forum as my personal dashboard)!

Best wishes

Christoph

Mauricio Garnier-Villarreal

unread,

Jan 17, 2019, 1:10:25 PM1/17/19

to lavaan

Christoph

This is one of the main differences between WLS and IRT. In WLS a negative factor loadings is perfectly fine, there is no constraint against it, while in IRT all the discrimination have to be positive. In IRT this is handle by reverse coding items (that need it) before analysis. In my experience I have found more heywood cases (begative residual variances in WLS with smaller sample sizes)

This may be related on how you are simulating the data. Can you share your data simulation code?

koenig...@gmail.com

unread,

Jan 18, 2019, 3:04:31 AM1/18/19

to lavaan

Dear Maurice,

thanks for your response! I think I know what you mean:

I should not simulate the responses under a 2PL IRT model via the item parameters, but rather under a CFA model via the factor loadings, correct?

Only in the latter case I have control over the sign of the factor loadings. Hence, I simulate response data following a CFA model and convert the

factor loadings to item parameters directly in the simulation code.

See my current data simulation below (it's a bit different, but the outcome is the same):

twopl.sim         <- function( nitem = 20, npers = 100 ) {

i.loc = rnorm( nitem ); p.loc = rnorm( npers ); i.slp = rlnorm( nitem, sdlog = .4 )
temp = matrix( rep( p.loc, length( i.loc ) ), ncol = length( i.loc ) )

logits = t( apply( temp  , 1, '-', i.loc) )
logits = t( apply( logits, 1, '*', i.slp) )

probabilities = 1 / ( 1 + exp( -logits ) )

resp.prob     = matrix( probabilities, ncol = nitem)
obs.resp      = matrix( sapply( c(resp.prob), rbinom, n = 1, size = 1), ncol = length(i.loc) )

output        <- list(i.loc = i.loc, i.slp = i.slp, p.loc = p.loc, resp = obs.resp)
}

Do you happen to know a R package which allows me to simulate binary CFA data (or do you have some code snippets available)?

Thank you very much!

Best wishes

Christoph

Mauricio Garnier-Villarreal

unread,

Jan 18, 2019, 2:01:23 PM1/18/19

to lavaan

Christoph

Not at all, actually when I work on this I simulate the data from the IRT metric, instead of CFA. For CFA you can simulate binary data with lavaan.

I do see an issue with your IRT simulation code. When you are calculating the probabilities, you are not including a latent variable (theta), because of this the probabilities of the items being 1 are independent, this is probably causing the issue of some items having negative factor loadings

Here is the code I use to simulate IRT data, that includes factors score (mean=0, sd=1), this should fix the issue

Note this code is set for a 3PL, by setting the guessing (c) at 0 they simulate a 2PL

n <- 20 # number of items
N <- 100 # number of subjects
theta <- rnorm(N, mean=0, sd=1) ## factor scores
b <- rnorm(n,0,1) # thresholds/location
a <- rlnorm(n, meanlog = 0, sdlog = .4) # discrimination
c <- rep(0,n)#runif(n,0,0.3)# ## guessing
item.par <- cbind(a,b,c)

###Create arrays for prob (p) and simulated data (x)
p <- matrix(0,N,n)
x <- matrix(0,N,n)

###Use a stochastic process to generate model-fitting data
for (i in 1:N) {
for (j in 1:n){
    p[i,j] <- c[j] + (1-c[j])/(1+exp(-a[j]*(theta[i]-b[j])))
    r <- runif(1,0,1)
    if (r <= p[i,j]) {
      x[i,j] <- 1
    }
}
}

x ## binary data set
item.par ## IRT simulation parameters

Mauricio Garnier-Villarreal

unread,

Jan 18, 2019, 2:46:03 PM1/18/19

to lavaan

Also, you might want to condier other options for this rlnorm(n, meanlog = 0, sdlog = .4) # discrimination. This might be giving values to close to 0, that including sampling variability could lead to problems.

I usually do something like this runif(n,0.4,3)

Terrence Jorgensen

unread,

Jan 25, 2019, 9:39:21 AM1/25/19

to lavaan

Is this a symptom of the performance of WLSMV and ULSMV estimation in small samples?

It is a symptom of any estimator: larger SEs, which implies (and simulations demonstrate empirically) that estimates exhibit greater sampling variability in smaller samples. This is the same phenomenon that leads to negative variances and NPD matrices being more frequent in smaller samples.

My first reaction would be to eliminate solutions with negative factor loadings as inadmissable solutions. Would this be correct?

No, there is nothing inadmissible about negative factor loadings, except that they conceptually would not make sense if the observed variable is supposed to be positively correlated with the factor it indicates. But on average, the estimates should still be unbiased (mean across replications == population parameter). Mauricio's solution about simulating a common cause is important, though, and if the discriminations/loadings are sufficiently large, their sampling distributions might not capture negative values even in small samples.

koenig...@gmail.com

unread,

Jan 28, 2019, 10:23:58 AM1/28/19

to lavaan

Dear Mauricio, Dear Terrence

thank you both for your valuable input!

I did some extensive testing during last week and adjusting the data generation indeed alleviates the problem of negative factor loadings. Nevertheless, it does not disappear completely.

I definitely can live with that. My remaining issue is how to present the results? It is quite a difference in average bias between a) keeping the solutions with negative factor loadings (and no other apparent convergence issues) and b) removing them. What I can think of is to add a note to the respective figures where I explain how the bias results and, perhaps, the resulting bias when the solutions with negative factor loadings are removed.

Best wishes

Christoph

Mauricio Garnier-Villarreal

unread,

Jan 29, 2019, 2:18:59 PM1/29/19

to lavaan

Christoph

Could you share your data simulation, and data analysis code?

I believe adding both the results including and excluding the negative factor loadings is an honest way to include. Also relevant to describe in which conditions tends to happened more often, for example in my experience WLS gives negative residuals more commonly with smaller samples

koenig...@gmail.com

unread,

Jan 30, 2019, 10:00:38 AM1/30/19

to lavaan

Sure, please see attached file.

Note: to maintain comparability with my other conditions, I cannot change the basic specification of the data generation.

190124_DataGeneration_Analysis.R

Mauricio Garnier-Villarreal

unread,

Jan 30, 2019, 11:51:11 AM1/30/19

to lavaan

Christoph

The code looks good. The only part that could think is related to the negative factor loadings is the correlation between alpha and beta. I usually see these simulated as independent. Have you check if the negative factor loadings are still they are not correlated. I notice that is some cases it gives a negative correlation between the parameters, maybe this negative correlations between aplha and beta is causing the negative factor loadings. If it is not related to that, I dont see anything else that I would consider problematic.

koenig...@gmail.com

unread,

Feb 1, 2019, 3:30:01 AM2/1/19

to lavaan

Dear Mauricio,

thank you for checking!

FYI, I run some additional tests: the correlation has no impact on the number of negative factor loadings. When I manually specify starting values for the factor loadings, however, the number of negative factor loadings reduces considerably. With automatic starting values I have approx. 21% factor loadings which are negative, whereas with user-defined starting values I obtain < 0.1% negative factor loadings.

That's quite interesting.

Best wishes

Christoph

Mauricio Garnier-Villarreal

unread,

Feb 2, 2019, 11:30:36 AM2/2/19

to lavaan

Christoph

That is for sure a steep difference. What values were you giving for the fix a? I saw under your simulation values that the random population a would not be smaller than .35

I have seen some cases where these are more sensitive with more latent factors (I think you are only using 1 factor), and smaller sample sizes

koenig...@gmail.com

unread,

Feb 4, 2019, 2:40:48 PM2/4/19

to lavaan

Dear Mauricio,

yes, I have a unidimensional model.

I manually specified 1 as starting value for every factor loading, although that's also the automatic starting value. Nevertheless, the number of negative factor loadings reduces considerably (I also tested some other starting values, with the same effect). I guess what happens is that when I specify the starting values manually, I impose some kind of range restriction on the loadings (I saw another post on this forum that to obtain negative factor loadings, one should specify -1 as starting value).

Best wishes

Christoph

Mauricio Garnier-Villarreal

unread,

Feb 10, 2019, 10:34:39 AM2/10/19

to lavaan

Christoph

I did some testing about this. I noticed that in most cases when I was getting negative factor loadings, all the items had negative factor loadings. What was happening is that the direction of the factor in general was in the negative direction.

This can be helped by constraining 1 factor loadings to be positive, so add at the end of the lavaan model add " l1 > 0 ". This will specify the overall direction of the factor. In my tests, this reduce the number of negative factor loadings from 10% to 3%. Also, in my examples this was more common with smaller sample sizes

good luck

Reply all

Reply to author

Forward