Likelihood functions

Federico Andreis

unread,

Jul 21, 2014, 11:05:11 AM7/21/14

to mirt-p...@googlegroups.com

Hello there!

I was wondering whether or not an explicit way to call the likelihood functions of the implemented models existed.

In particular, I wish to be able to compute the value of the (log) likelihood function for a given model point-by-point, i.e. for each combination of individual/response,
using ML estimates as fixed parameters and providing some latent trait values matrix.

The only thing I could come up with as of now was to employ probtrace() on extract.item() for each item/individual, but it seems a little bit too messy.. any advice?

Thank you,

/federico

Phil Chalmers

unread,

Jul 21, 2014, 5:51:41 PM7/21/14

to Federico Andreis, mirt-package

Hi Federico,

There are better ways of course, but I wouldn't recommend playing around with the package internals right away. Here's a quick and pretty flexible approach that should work for your purpose using functions exported from the package. Cheers.

Phil

> library(mirt)
> 
> #trace function for all items
> Probtrace <- function(items, Theta){
+     traces <- lapply(items, probtrace, Theta=Theta)
+     ret <- do.call(cbind, traces)
+     ret
+ }
> 
> dat <- simdata(a=matrix(1,10), d=matrix(rnorm(10)), 1000, itemtype = 'dich')
> mod <- mirt(dat, 1, TOL = NaN)
> 
> #structured data
> fulldata <- mod@Data$fulldata[[1L]]
> head(fulldata) 
     Item.1_1 Item.1_2 Item.2_1 Item.2_2 Item.3_1 Item.3_2 Item.4_1 Item.4_2 Item.5_1 Item.5_2
[1,]        1        0        1        0        1        0        1        0        1        0
[2,]        1        0        1        0        1        0        1        0        1        0
[3,]        1        0        1        0        0        1        1        0        1        0
[4,]        0        1        1        0        1        0        0        1        0        1
[5,]        0        1        0        1        0        1        0        1        0        1
[6,]        0        1        0        1        0        1        1        0        1        0
     Item.6_1 Item.6_2 Item.7_1 Item.7_2 Item.8_1 Item.8_2 Item.9_1 Item.9_2 Item.10_1 Item.10_2
[1,]        1        0        1        0        0        1        0        1         1         0
[2,]        1        0        0        1        1        0        1        0         1         0
[3,]        1        0        1        0        1        0        1        0         1         0
[4,]        1        0        1        0        0        1        0        1         1         0
[5,]        1        0        1        0        0        1        0        1         0         1
[6,]        1        0        1        0        1        0        1        0         1         0
> head(dat) #compare
     Item_1 Item_2 Item_3 Item_4 Item_5 Item_6 Item_7 Item_8 Item_9 Item_10
[1,]      0      0      0      0      0      0      0      1      1       0
[2,]      0      0      0      0      0      0r      1      0      0       0
[3,]      0      0      1      0      0      0      0      0      0       0
[4,]      1      0      0      1      1      0      0      1      1       0
[5,]      1      1      1      1      1      0      0      1      1       1
[6,]      1      1      1      0      0      0      0      0      0       0
> 
> #latent scores
> Theta <- matrix(rnorm(nrow(fulldata)))
> 
> #load items
> items <- vector('list', 10)
> for(i in 1:10) items[[i]] <- extract.item(mod, i)
> 
> #compute log-like for each Theta
> traces <- Probtrace(items, Theta)
> LL <- rowSums(fulldata * log(traces))
> head(LL)
[1]  -4.583083 -10.374255  -4.151897  -7.632804  -7.052407  -6.571110
> 
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8       
 [4] LC_COLLATE=en_CA.UTF-8     LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mirt_1.4.2      lattice_0.20-29

loaded via a namespace (and not attached):
[1] GPArotation_2012.3-1 grid_3.1.0           Rcpp_0.11.2          RCurl_1.95-4.1      
[5] tools_3.1.0

--
You received this message because you are subscribed to the Google Groups "mirt-package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Federico Andreis

unread,

Jul 21, 2014, 6:05:54 PM7/21/14

to mirt-p...@googlegroups.com, federico...@gmail.com

Well Phil, thanks a lot.

As you immediately got, I was getting started playing with the package internals already.. besides, I was going down the exact same road with probtrace and extract.item.

I need this to be employable in a sim study, so I was looking for some quick-and-dirty way to be implemented.
I'll try and let you know!

Thanks for your reply!

Cheers,

/f

Russ Poldrack

unread,

Sep 1, 2016, 10:13:30 AM9/1/16

to mirt-package, federico...@gmail.com

hi - I am trying to implement this suggestion to compute the log likelihood of a dataset given the estimated model (for use in crossvalidation). however, it's not clear to me what to use for Theta. I tried the following:

m=mirt(d,5)

fulldata <- m@Data$fulldata[[1L]]

Theta=fscores(m)

#load items

nitems=dim(d)[2]

items <- vector('list', nitems)

for (i in 1:nitems) items[[i]] <- extract.item(m, i)

#compute log-like for each Theta

traces <- Probtrace(items, Theta)

LL <- rowSums(fulldata * log(traces))

but sum(LL) != logLik(m) (-30497.29 vs. -25661.14) - am I missing something here?

thanks!

russ

Phil Chalmers

unread,

Sep 1, 2016, 3:06:53 PM9/1/16

to Russ Poldrack, mirt-package, Federico Andreis

On Thu, Sep 1, 2016 at 10:13 AM, Russ Poldrack <pold...@gmail.com> wrote:

hi - I am trying to implement this suggestion to compute the log likelihood of a dataset given the estimated model (for use in crossvalidation). however, it's not clear to me what to use for Theta.

You don't actually use any Theta values to compute the observed log-likelihood because they are integrated out of the individual response functions. What's going on here is more like the 'complete data log-likelihood', where Theta is treated as known when in fact it isn't.

To unsubscribe from this group and stop receiving emails from it, send an email to mirt-package+unsubscribe@googlegroups.com.

Russ Poldrack

unread,

Sep 1, 2016, 3:11:14 PM9/1/16

to Phil Chalmers, mirt-package, Federico Andreis

thanks - so how would you go about computing the likelihood of a set of observations under a fittedmodel, using data that were not included in the fitting of the model?

cheers

russ

—

Russell A. Poldrack
Albert Ray Lang Professor of Psychology
Bldg. 420, Jordan Hall
Stanford University
Stanford, CA 94305

pold...@stanford.edu
http://www.poldracklab.org/

Phil Chalmers

unread,

Sep 1, 2016, 3:16:06 PM9/1/16

to Russ Poldrack, mirt-package, Federico Andreis

library(mirt)

mod <- mirt(Science, 1) #original

logLik(mod)

Science2 <- simdata(model = mod, N = 1000) #newdata generated from estparamters

sv <- mod2values(mod)

sv$est <- FALSE #fix starting values to last model

mod2 <- mirt(Science2, 1, pars = sv)

logLik(mod2)

Phil

Russ Poldrack

unread,

Sep 1, 2016, 3:17:54 PM9/1/16

to Phil Chalmers, mirt-package, Federico Andreis

awesome, thanks!

russ

Russ Poldrack

unread,

Sep 1, 2016, 4:25:09 PM9/1/16

to Phil Chalmers, mirt-package, Federico Andreis

hi - I’m getting an error when I try this, presumably because the data that I am testing on has a different number of observations from those that the model was fitted to:

for (cvfold in 1:4){

train_data=d[cvidx!=cvfold,]

test_data=d[cvidx==cvfold,]

m=mirt(train_data,ncomponents)

sv <- mod2values(m)

sv$est <- FALSE #fix starting values to last model

mod2 <- mirt(test_data, ncomponents, pars = sv)

ll[cvfold]=logLik(mod2)

}

the intial model fits, but the second call to mirt fails with "Error: Rows in supplied and starting value data.frame objects do not match. Were the

data or itemtype input arguments modified?”

the train and test data sets are different sizes:

> dim(train_data)

[1] 940 115

> dim(test_data)

[1] 314 115

>

is there any way to trick it into taking that shorter dataset to compute the LL?

thanks!

russ

Phil Chalmers

unread,

Sep 1, 2016, 6:46:24 PM9/1/16

to Russ Poldrack, mirt-package, Federico Andreis

On Thu, Sep 1, 2016 at 4:25 PM, Russ Poldrack <pold...@gmail.com> wrote:

hi - I’m getting an error when I try this, presumably because the data that I am testing on has a different number of observations from those that the model was fitted to:

That's unlikely, the example I previously posted had a different number of rows as well. It seems the structures are different though, hence the error (one of the datasets has more intercept terms than the other).

If you are going to make these kinds of splits make sure that apply(dat, 2, function(x) length(unique(x))) is the same for all models, otherwise this error will occur (as it should).

Russ Poldrack

unread,

Sep 1, 2016, 6:52:06 PM9/1/16

to Phil Chalmers, mirt-package, Federico Andreis

ah, I see, the issue is that a few of measures in one of the test folds don’t have all possible response values:

> unique(train_data['hopkins12'])

hopkins12

2 0

39 1

59 2

145 3

> unique(test_data['hopkins12'])

hopkins12

1 0

29 1

136 2

I will implement a balanced crossvalidation strategy that ensures that all of the folds are equated for response values.

thanks for your help!

cheers

russ

Reply all

Reply to author

Forward