M2 statistics do not work well with big data

261 views
Skip to first unread message

Seongho Bae

unread,
Jul 13, 2014, 2:44:09 PM7/13/14
to mirt-p...@googlegroups.com
Dear all.

I observe an error in estimating M2 statistics today.

So, I report it. I haven't seen this error at 1.3.x version with same analytics.


※ Conditions

Sample size: 2330
Items: 186

OS: Ubuntu Linux 12.04.1 64bit
Version of R: R 3.1.1 (64bit)
Version of mirt: 1.4 and 1.4.1 
BLAS: OpenBLAS (Nehalem)

Processor: Intel(R) Xeon(R) CPU E5-2690 @ 2.90GHz (16 cores)
RAM: 124GB

File: Vocational Interest Items (vocational interest.csv)
Measurement: Likert 5 point scale


Best wishes,
Seongho Bae
vocational interest.csv

Phil Chalmers

unread,
Jul 13, 2014, 5:09:54 PM7/13/14
to Seongho Bae, mirt-package
Sorry Seongho, I couldn't reproduce this issue with the dev version. Could you provide a script reproducing it?
As well, there are probably too many graded items for the M2 statistic to work (I seem to recall about 100 items is around the max before my computer choked). 

Here's my output from what I did get:

> library(mirt)
Loading required package: stats4
Loading required package: lattice
> mirtCluster()
Loading required package: parallel
> dat = read.csv('~/Downloads/vocational interest.csv')
> mod <- mirt(dat, 1)
Iteration: 393, Log-Lik: -417881.163, Max-Change: 0.00001
> M2(mod)
Error: cannot allocate vector of size 2.3 Gb
> sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 [4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] mirt_1.4.1 lattice_0.20-29 loaded via a namespace (and not attached): [1] GPArotation_2012.3-1 grid_3.1.0 Rcpp_0.11.2 RCurl_1.95-4.1 [5] tools_3.1.0


--
You received this message because you are subscribed to the Google Groups "mirt-package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Seongho Bae

unread,
Jul 14, 2014, 2:46:57 AM7/14/14
to mirt-p...@googlegroups.com, seongh...@gmail.com
In my observation, It need to be over 20GB data+stack sizes and 20GB virtual image size (VIRT) when I see in the process manager. So, Solve for this issue, It may need to be Big memory machine over 50GB. My server doesn't chocked, and Chundoong cluster (http://top500.org/system/177987) who operating Seoul National University doesn't chocked too when the mirt version was 1.3.x with just use one node.

Here are the code and the results.

> require('mirt')
Loading required package: mirt
Loading required package: lattice
> mirtCluster() >
> # load data [variables are 4th ~ 190th column in data.frame]
> vocational_interest <- read.csv('rawdata.csv') # vocational interest.csv is a part of rawdata.csv
> 
> # try two methods: EM and MHRM -- The MHRM is my favorite then EM
> # geominQ rotation do not work extract one factor, but It's inserted my code for ractext EFA model automatically via for().
> mod1 <- mirt(vocational_interest[4:190], model = 1, method = 'EM', rotate = 'geominQ')
Iteration: 333, Log-Lik: -399533.976, Max-Change: 0.00010
> mod2 <- mirt(vocational_interest[4:190], model = 1, method = 'MHRM', rotate = 'geominQ')
Stage 3 = 83, LL = -397065.3, AR(0.90) = [0.13], gam = 0.0166, Max-Change = 0.0008

Calculating log-likelihood...
> 
> M2(mod1, calcNull = TRUE)
Error in if (tmp < 0) Mrate <- exp(tmp) : 
  missing value where TRUE/FALSE needed
Error in M2(null.mod, calcNull = FALSE, quadpts = quadpts) : 
  trying to get slot "Data" from an object (class "try-error") that is not an S4 object 
> M2(mod2, calcNull = TRUE)
Error in if (tmp < 0) Mrate <- exp(tmp) : 
  missing value where TRUE/FALSE needed
Error in M2(null.mod, calcNull = FALSE, quadpts = quadpts) : 
  trying to get slot "Data" from an object (class "try-error") that is not an S4 object 
> 
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C            LC_COLLATE=C        
 [5] LC_MONETARY=C        LC_MESSAGES=C        LC_PAPER=C           LC_NAME=C           
 [9] LC_ADDRESS=C         LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mirt_1.4.1      lattice_0.20-29

loaded via a namespace (and not attached):
[1] GPArotation_2012.3-1 Rcpp_0.11.2          grid_3.1.1           tools_3.1.1         
>

2014년 7월 14일 월요일 오전 6시 9분 54초 UTC+9, Phil Chalmers 님의 말:

Phil Chalmers

unread,
Jul 14, 2014, 10:02:22 AM7/14/14
to mirt-package
Okay, I see the issue now. The problem doesn't actually have to do with M2() but rather happens when the statistical null model (the model in which all slopes are constrained to be 0) is computed and a type of 'divide-by-zero' issue is happening in the E-step causing NaN's to pop up where they shouldn't. I've made a patch for this on the dev version.

The following code will crash with versions less than 1.4.2 for numerical reasons, but will run on the dev now and therefore should fix your problem.

# Calc null model manually
sv <- mirt(dat, 1, pars = 'values')
sv$value[sv$name == 'a1'] <- 0
sv$est[sv$name == 'a1'] <- FALSE
null.model <- mirt(dat, 1, pars=sv)

Cheers.

Seongho Bae

unread,
Jul 14, 2014, 12:00:17 PM7/14/14
to mirt-p...@googlegroups.com
Yes, It works! Thank you so much!

But, I can find 33 warning doing exact same jobs like this:

1: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter.

It may have to increase more iteration, I think.

2014년 7월 14일 월요일 오후 11시 2분 22초 UTC+9, Phil Chalmers 님의 말:

Phil Chalmers

unread,
Jul 14, 2014, 12:04:35 PM7/14/14
to Seongho Bae, mirt-package
33 warnings? That's very peculiar, especially since pchisq() is only called a maximum of 4 times when calling M2(), though it's called a lot more when finding the 90% CI for RMSEA.... Does the output of the M2() function still look fine? If so, this might just be an ignorable warning message somewhere during execution.

Phil

Seongho Bae

unread,
Jul 14, 2014, 12:13:47 PM7/14/14
to mirt-p...@googlegroups.com, seongh...@gmail.com
Here you are:

M2(mod2)
            M2    df p     RMSEA  RMSEA_5 RMSEA_95       TLI       CFI
stats 479324.5 16643 0 0.1092548 0.108965 0.109498 0.9148474 0.9157935
In addition: There were 33 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
2: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
3: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
4: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
5: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
6: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
7: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
8: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
9: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
10: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
11: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
12: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
13: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
14: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
15: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
16: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
17: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
18: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
19: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
20: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
21: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
22: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
23: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
24: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
25: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
26: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
27: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
28: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
29: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
30: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
31: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
32: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter
.
33: In pchisq(X2, df = df, ncp = lambda) :
  pnchisq(x=5.51144e+06, ..): not converged in 1000000 iter.

2014년 7월 15일 화요일 오전 1시 4분 35초 UTC+9, Phil Chalmers 님의 말:

Phil Chalmers

unread,
Jul 14, 2014, 12:17:23 PM7/14/14
to Seongho Bae, mirt-package
Looks okay to me. I'll look more into why these warnings are coming up though (and why they don't converge). Thanks for bringing this to my attention.

Phil

Martin Segado

unread,
Jun 29, 2017, 10:21:19 AM6/29/17
to mirt-p...@googlegroups.com, seongh...@gmail.com
Came across this post after getting the following message... my dataset has ~600 dichotomous items:

Error: cannot allocate vector of size 247.1 Gb

Any suggestions, other than renting a VM with a few hundred gigabytes of RAM? =)
Reply all
Reply to author
Forward
0 new messages