Confusing fscores()-results

224 views
Skip to first unread message

Chris

unread,
Oct 9, 2015, 6:11:20 AM10/9/15
to mirt-package
Dear Phil,

having helped a great deal last time I posed a question, I give it another try with yet another confusion. This time, I am confused with fscores(). There are two problems:
  • I have run a 3-dimensional 2PL-model. Everything seems fine with the model and the fit seems ok. Fscores() delivers ability estimation for the EAP-method and plausible values. However, when using the WLE-method, I get an error message and don't really know what to make of it:
> fit.mirt.ME.2PL <- mirt(dat,model.mirt.ME,quadpts=27,itemtype='2PL', survey.weights=weight, GenRandomRars=TRUE) 
# cut #
> scores.mirt.ME.2PL.wle <- fscores(fit.mirt.ME.2PL,method="WLE", full.scores=TRUE, full.scores.SE=TRUE, verbose=TRUE, returnER=TRUE)

Error in solve.default(estimate$hessian) : 
  Lapack routine dgesv: system is exactly singular: U[1,1] = 0
In addition: There were 13 warnings (use warnings() to see them)            # note by Chris: I can't access the warnings as they are replaced
Error in diag(vcov) : invalid 'nrow' value (too large or NA)                # with the last warning ("NAs introduced by coercion")
In addition: Warning message:
In diag(vcov) : NAs introduced by coercion
  • My second confusion concerns the correlations of ability estimates. While my model shows factor correlations of .71 to .90, the ability estimates behave pretty differently and not anywhere similar to each other. Correlations in EAP-ability-estimations are between .94 and .99, in WLE-ability-estimations (with full, imputed data, WLE-estimation works just fine) are between .30 and .40 and in plausible values between .74 and .92. Plausible values seem to be closest to the factor correlations. However, I don't understand, why EAP and WLE can be off by that much. 
> fulldata <- imputeMissing(fit.mirt.ME.2PL,scores.mirt.ME.2PL.eap[,1:3])
> fit.mirt.ME.2PL.full <- mirt(fulldata,model.mirt.ME,quadpts=27,itemtype='2PL', survey.weights=weights, GenRandomRars=TRUE)
> summary(fit.mirt.ME.2PL.full)
# cut #
Factor correlations: 

      fpr   qnl   qll
fpr 1.000 0.906 0.805
qnl 0.906 1.000 0.761
qll 0.805 0.761 1.000

> scores.mirt.ME.2PL.eap.full <- fscores(fit.mirt.ME.2PL.full,method="EAP", full.scores=TRUE, full.scores.SE=TRUE, verbose=TRUE)
> scores.mirt.ME.2PL.wle.full <- fscores(fit.mirt.ME.2PL.full,method="WLE", full.scores=TRUE, full.scores.SE=TRUE, verbose=TRUE)
> scores.mirt.ME.2PL.pv.full <- fscores(fit.mirt.ME.2PL.full,method="plausible", full.scores=TRUE, full.scores.SE=TRUE, verbose=TRUE)
> scores.full <- cbind(scores.mirt.ME.2PL.eap.full[,1:3],scores.mirt.ME.2PL.wle.full[,1:3],scores.mirt.ME.2PL.pv.full[,1:3])
round(cor(scores.full),2)
        fpr_EAP qnl_EAP qll_EAP fpr_WLE qnl_WLE qll_WLE fpr_PV qnl_PV qll_PV        # note by Chris: correlations between factors of each 
fpr_EAP    1.00                                                                     # estimation procedure are highlighted in yellow.
qnl_EAP    0.99    1.00    
qll_EAP    0.95    0.94    1.00    
fpr_WLE    0.82    0.76    0.71    1.00    
qnl_WLE    0.76    0.84    0.68    0.40    1.00    
qll_WLE    0.58    0.55    0.77    0.31    0.30    1.00   
fpr_PV     0.81    0.81    0.76    0.66    0.63    0.45   1.00   
qnl_PV     0.81    0.81    0.75    0.63    0.68    0.43   0.92   1.00   
qll_PV     0.70    0.69    0.74    0.50    0.50    0.59   0.78   0.74   1.00

Can you help me understand the reasons for these surprising and confusing findings? I want to use the ability estimations in SEM for validation. Accoring to these confusing findings, would you recommend to use plausible values? Surprisingly (again!) the validation results seem to be a lot better for WLE- and EAP-estimation than with PV. Also, validation shows pretty similar findings for both EAP and WLE, albeit the data structure of both estimations seem to be quite differently as indicated by the correlations cited above.

Sorry for these two long questions, I hope you can shed some light on these confusing results. Happy to show you more output in case it's needed.

Thanks so much and best wishes 
Chris


Phil Chalmers

unread,
Oct 9, 2015, 9:55:28 AM10/9/15
to Chris, mirt-package
Hi Chris,


On Fri, Oct 9, 2015 at 6:11 AM, Chris <christop...@gmail.com> wrote:
Dear Phil,

having helped a great deal last time I posed a question, I give it another try with yet another confusion. This time, I am confused with fscores(). There are two problems:
  • I have run a 3-dimensional 2PL-model. Everything seems fine with the model and the fit seems ok. Fscores() delivers ability estimation for the EAP-method and plausible values. However, when using the WLE-method, I get an error message and don't really know what to make of it:
> fit.mirt.ME.2PL <- mirt(dat,model.mirt.ME,quadpts=27,itemtype='2PL', survey.weights=weight, GenRandomRars=TRUE) 
# cut #
> scores.mirt.ME.2PL.wle <- fscores(fit.mirt.ME.2PL,method="WLE", full.scores=TRUE, full.scores.SE=TRUE, verbose=TRUE, returnER=TRUE)

Error in solve.default(estimate$hessian) : 
  Lapack routine dgesv: system is exactly singular: U[1,1] = 0
In addition: There were 13 warnings (use warnings() to see them)            # note by Chris: I can't access the warnings as they are replaced
Error in diag(vcov) : invalid 'nrow' value (too large or NA)                # with the last warning ("NAs introduced by coercion")
In addition: Warning message:
In diag(vcov) : NAs introduced by coercion

This is either a bug on my part, or the item response patterns just didn't converge to a local minimum estimate (information on this can now be returned on the dev version). Either way, the error message looks like it could be cleaned up, so sending me a way to reproduce the issue will be helpful to track it down faster and clear it up. 
This isn't overly surprising, and can happen when the reliability of each composite is fairly low due to having too few items per composite. EAP and PVs using prior information from the group parameters, so in a sense they get pulled towards what the original structure was (in fact, PVs are design to do exactly this, as you can see in the lower right triangle). WLE (as well as ML) don't, and therefore are free to vary on an individual basis, so if the composite reliabilities are too low then there won't be a lot of correspondence between the estimates. 

I don't have a great answer for what to recommend here, because it ultimately is the old question 'To Bayes, or not to Bayes'. It really depends on the purpose of the scores and what form of bias/inaccuracy you are willing to tolerate (if they are just for secondary analyses, then PVs are the correct approach without question). Cheers.

Phil
 

Sorry for these two long questions, I hope you can shed some light on these confusing results. Happy to show you more output in case it's needed.

Thanks so much and best wishes 
Chris


--
You received this message because you are subscribed to the Google Groups "mirt-package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris

unread,
Oct 21, 2015, 9:31:30 AM10/21/15
to mirt-package
Sorry for this late update. The issue has been solved by Phil in private conversation. The error message in fscores() was due to missing values: in some cases, none of the items in one factor have been answered and thus, no ability estimation was possible. Phil fixed the algorithm so that fscores() does not crash in these cases but returnes NA for the ability estimates. 

Phil, thanks a lot for your help!
Reply all
Reply to author
Forward
0 new messages