Google Groupes

Re: [mnemosyne-proj-users] Statistics: is time of day of reviews stored?


Gwern Branwen 4 sept. 2013 16:12
Envoyé au groupe : mnemosyne-proj-users
On Fri, Aug 30, 2013 at 2:00 AM, Peter Bienstman
<Peter.B...@ugent.be> wrote:
> Would be interesting to see how this generalises to other people.

(I killed my log import at 26% and discovered that the importing
process is resumable, so I'd been suffering for no reason - I could've
just kept running it overnight until it finished, and not driven
myself nuts with a degraded system! Oh well. It's not trivial to run a
linear model on a 1.1G CSV with 48m rows, but it's not as hard as I
expected, and I've finished a simple analysis.)

Summary: Noon good, 6 AM bad. There's a circadian-looking change in
average grade over the day: http://i.imgur.com/sum3toZ.png There seems
to be no real day-of-week effect.

I wonder if anyone has published on spaced repetition performance over
the day before? Probably not, you'd need a pretty big dataset like
this to find it.

I extract the data as before:

    $ sqlite3 -batch ./mnemosyne-stats/logs.db "SELECT
timestamp,object_id,grade FROM log WHERE event==9;" | tr '|' ',' >
~/mnemosyne-all.csv
    $ wc mnemosyne-all.csv
      47794669   47794669 1114023396 mnemosyne-all.csv
    $ R
    R> install.packages("biglm")

The `biglm` package offers an *incremental* linear model function: you
can read in a million rows, 'add' them to a `biglm` object, read in
another million rows and so on. Since I can fit 1 million rows in RAM
but not 48 million rows, this works great:

    library(biglm)

    m <- file("mnemosyne-all.csv", open="rt")

    get <- function(filepath) {
        chunk <- read.csv(filepath, nrows=2000000, header=FALSE,
col.names=c("Date", "ID", "Grade"), colClasses=c("integer", "factor",
"numeric"))
        chunk$Date <- as.POSIXct(chunk$Date, origin = "1970-01-01", tz = "UTC")
        chunk$WeekDay <- as.factor(weekdays(chunk$Date))
        chunk$Hour <- as.factor(as.numeric(format(chunk$Date, "%H")))
        return(chunk)
        }

    frml <- Grade ~ Hour + WeekDay

    # create a seed to update with fresh data in the loop
    chunk1 <- get(m)
    bl <- biglm(frml, chunk1)

    while (isOpen(m)) {
        chunk <- get(m)
        bl <- update(bl, chunk)
    }
    summary(bl)
    closeAllConnections()

    ...
    Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
      contrasts can be applied only to factors with 2 or more levels
    Calls: update ... model.matrix -> model.matrix.default -> contrasts<-
    R> # It errors out because I didn't figure out how to handle
reading to the end of the file
    R> summary(bl)
    Large data regression model: biglm(frml, chunk1)
    Sample size =  47794669
                     Coef (95%  CI) SE p
    (Intercept)       2.8  2.8  2.8  0 0
    Hour1             0.0  0.0  0.0  0 0
    Hour2             0.0  0.0  0.0  0 0
    Hour3            -0.1 -0.1  0.0  0 0
    Hour4            -0.1 -0.1 -0.1  0 0
    Hour5            -0.2 -0.2 -0.2  0 0
    Hour6            -0.3 -0.3 -0.3  0 0
    Hour7            -0.1 -0.1 -0.1  0 0
    Hour8             0.1  0.1  0.1  0 0
    Hour9             0.2  0.2  0.2  0 0
    Hour10            0.3  0.3  0.3  0 0
    Hour11            0.3  0.3  0.3  0 0
    Hour12            0.3  0.3  0.3  0 0
    Hour13            0.2  0.2  0.2  0 0
    Hour14            0.2  0.2  0.2  0 0
    Hour15            0.2  0.2  0.2  0 0
    Hour16            0.1  0.1  0.1  0 0
    Hour17            0.1  0.1  0.1  0 0
    Hour18            0.1  0.1  0.1  0 0
    Hour19            0.0  0.0  0.1  0 0
    Hour20            0.0  0.0  0.0  0 0
    Hour21            0.0  0.0  0.0  0 0
    Hour22            0.0  0.0  0.0  0 0
    Hour23            0.0  0.0  0.0  0 0
    WeekDayMonday     0.0  0.0  0.0  0 0
    WeekDaySaturday   0.0  0.0  0.0  0 0
    WeekDaySunday     0.0  0.0  0.0  0 0
    WeekDayThursday   0.0  0.0  0.0  0 0
    WeekDayTuesday    0.0  0.0  0.0  0 0
    WeekDayWednesday  0.0  0.0  0.0  0 0

But hopefully 47,794,669 (48m) flashcard reviews is enough. So,
pulling out the interesting coefficients - all the weekdays drop out
as irrelevant - we get:

    06            -0.3
    05            -0.2
    03            -0.1
    04            -0.1
    07            -0.1
    08             0.1
    16             0.1
    17             0.1
    18             0.1
    09             0.2
    13             0.2
    14             0.2
    15             0.2
    10             0.3
    11             0.3
    12             0.3

(We can ignore the p-values & confidence intervals, since at this
sample scale, they're all going to be zero & point-values.) No
apparent pattern when sorted by size, but the pattern jumps out when
we graph by hour:

    plot(c(0,0,-0.1,-0.1,-0.2,-0.3,-0.1,0.1,0.2,0.3,0.3,0.3,0.2,0.2,0.2,0.1,0.1,0.1,0,0,0,0,0))

http://i.imgur.com/sum3toZ.png

We get a beautiful-looking circadian rhythm: peak performance at noon,
crappy performance in early morning, declining performance over the
day into evening. I'm actually impressed at the effect sizes here: if
you compare reviewing at noon vs reviewing at 6 AM, that's a
difference of 0.6 - on a 1-5 scale where most grades are a 3 or 4! If
this is reflecting actual memory performance and not some sort of
response bias varying by time (like being too pessimistic when you're
up too early/late)

That's only about retrieval, though. I wonder if late at night (==near
bedtime) would be best for subsequent recalls, but I'm not sure how to
analyze that.

This doesn't tell us how hour & day may combine like in my previous
multilevel model. So let's rerun with:

    frml <- Grade ~ Hour * WeekDay

This will use up ~4x more RAM while running, BTW, so you may need to
reduce how many rows you tell `read.csv` to read in. The results:

    R> summary(bl)
    Large data regression model: biglm(frml, chunk1)
    Sample size =  47794669
                            Coef (95%  CI) SE   p
    (Intercept)              2.8  2.8  2.8  0 0.0
    Hour1                    0.0  0.0  0.0  0 0.0
    Hour2                    0.0  0.0  0.0  0 0.0
    Hour3                    0.0  0.0  0.0  0 0.0
    Hour4                   -0.1 -0.1 -0.1  0 0.0
    Hour5                   -0.2 -0.2 -0.2  0 0.0
    Hour6                   -0.3 -0.3 -0.3  0 0.0
    Hour7                    0.0  0.0  0.0  0 0.0
    Hour8                    0.1  0.1  0.1  0 0.0
    Hour9                    0.1  0.1  0.1  0 0.0
    Hour10                   0.3  0.3  0.3  0 0.0
    Hour11                   0.3  0.3  0.3  0 0.0
    Hour12                   0.3  0.3  0.3  0 0.0
    Hour13                   0.2  0.2  0.2  0 0.0
    Hour14                   0.2  0.2  0.2  0 0.0
    Hour15                   0.2  0.2  0.2  0 0.0
    Hour16                   0.1  0.1  0.1  0 0.0
    Hour17                   0.1  0.1  0.1  0 0.0
    Hour18                   0.1  0.1  0.1  0 0.0
    Hour19                   0.0  0.0  0.0  0 0.0
    Hour20                   0.1  0.1  0.1  0 0.0
    Hour21                   0.1  0.0  0.1  0 0.0
    Hour22                   0.1  0.1  0.1  0 0.0
    Hour23                   0.0  0.0  0.0  0 0.0
    WeekDayMonday            0.0  0.0  0.0  0 0.6
    WeekDaySaturday          0.1  0.0  0.1  0 0.0
    WeekDaySunday            0.0  0.0  0.0  0 0.0
    WeekDayThursday          0.1  0.0  0.1  0 0.0
    WeekDayTuesday           0.0  0.0  0.0  0 0.9
    WeekDayWednesday         0.1  0.1  0.1  0 0.0
    Hour1:WeekDayMonday      0.0  0.0  0.0  0 0.0
    Hour2:WeekDayMonday      0.0 -0.1  0.0  0 0.0
    Hour3:WeekDayMonday      0.0  0.0  0.0  0 0.0
    Hour4:WeekDayMonday      0.0  0.0  0.0  0 0.4
    Hour5:WeekDayMonday      0.0  0.0  0.0  0 0.0
    Hour6:WeekDayMonday      0.1  0.1  0.1  0 0.0
    Hour7:WeekDayMonday      0.0 -0.1  0.0  0 0.0
    Hour8:WeekDayMonday      0.1  0.1  0.2  0 0.0
    Hour9:WeekDayMonday      0.3  0.2  0.3  0 0.0
    Hour10:WeekDayMonday     0.1  0.1  0.1  0 0.0
    Hour11:WeekDayMonday     0.0  0.0  0.0  0 0.0
    Hour12:WeekDayMonday     0.0  0.0  0.0  0 0.0
    Hour13:WeekDayMonday     0.1  0.1  0.1  0 0.0
    Hour14:WeekDayMonday     0.0  0.0  0.1  0 0.0
    Hour15:WeekDayMonday     0.0  0.0  0.0  0 0.0
    Hour16:WeekDayMonday     0.1  0.1  0.1  0 0.0
    Hour17:WeekDayMonday     0.0  0.0  0.0  0 0.1
    Hour18:WeekDayMonday     0.1  0.0  0.1  0 0.0
    Hour19:WeekDayMonday     0.1  0.1  0.1  0 0.0
    Hour20:WeekDayMonday     0.1  0.0  0.1  0 0.0
    Hour21:WeekDayMonday     0.1  0.1  0.1  0 0.0
    Hour22:WeekDayMonday     0.0  0.0  0.0  0 0.0
    Hour23:WeekDayMonday     0.0  0.0  0.0  0 0.0
    Hour1:WeekDaySaturday   -0.1 -0.1 -0.1  0 0.0
    Hour2:WeekDaySaturday   -0.1 -0.1 -0.1  0 0.0
    Hour3:WeekDaySaturday   -0.1 -0.1 -0.1  0 0.0
    Hour4:WeekDaySaturday    0.0  0.0  0.0  0 0.3
    Hour5:WeekDaySaturday    0.0  0.0  0.0  0 0.0
    Hour6:WeekDaySaturday    0.1  0.1  0.1  0 0.0
    Hour7:WeekDaySaturday   -0.2 -0.2 -0.2  0 0.0
    Hour8:WeekDaySaturday    0.0  0.0  0.0  0 0.3
    Hour9:WeekDaySaturday   -0.1 -0.1 -0.1  0 0.0
    Hour10:WeekDaySaturday  -0.1 -0.1 -0.1  0 0.0
    Hour11:WeekDaySaturday  -0.1 -0.1 -0.1  0 0.0
    Hour12:WeekDaySaturday  -0.1 -0.1 -0.1  0 0.0
    Hour13:WeekDaySaturday   0.0  0.0  0.0  0 0.0
    Hour14:WeekDaySaturday   0.0  0.0  0.1  0 0.0
    Hour15:WeekDaySaturday   0.0  0.0  0.0  0 0.1
    Hour16:WeekDaySaturday   0.0  0.0  0.0  0 0.0
    Hour17:WeekDaySaturday   0.0  0.0  0.0  0 0.9
    Hour18:WeekDaySaturday   0.0  0.0  0.0  0 0.4
    Hour19:WeekDaySaturday   0.0  0.0  0.0  0 0.0
    Hour20:WeekDaySaturday  -0.1 -0.1 -0.1  0 0.0
    Hour21:WeekDaySaturday  -0.1 -0.1 -0.1  0 0.0
    Hour22:WeekDaySaturday  -0.1 -0.1 -0.1  0 0.0
    Hour23:WeekDaySaturday  -0.1 -0.1 -0.1  0 0.0
    Hour1:WeekDaySunday     -0.1 -0.1  0.0  0 0.0
    Hour2:WeekDaySunday      0.0  0.0  0.0  0 0.0
    Hour3:WeekDaySunday      0.0  0.0  0.0  0 0.1
    Hour4:WeekDaySunday      0.0  0.0  0.0  0 0.0
    Hour5:WeekDaySunday      0.0  0.0  0.0  0 0.0
    Hour6:WeekDaySunday      0.1  0.1  0.1  0 0.0
    Hour7:WeekDaySunday     -0.1 -0.1 -0.1  0 0.0
    Hour8:WeekDaySunday      0.0  0.0  0.0  0 0.1
    Hour9:WeekDaySunday      0.1  0.0  0.1  0 0.0
    Hour10:WeekDaySunday     0.0  0.0  0.1  0 0.0
    Hour11:WeekDaySunday    -0.1 -0.1 -0.1  0 0.0
    Hour12:WeekDaySunday     0.0 -0.1  0.0  0 0.0
    Hour13:WeekDaySunday     0.0  0.0  0.0  0 0.0
    Hour14:WeekDaySunday     0.1  0.1  0.1  0 0.0
    Hour15:WeekDaySunday     0.1  0.1  0.1  0 0.0
    Hour16:WeekDaySunday     0.1  0.1  0.1  0 0.0
    Hour17:WeekDaySunday     0.0  0.0  0.0  0 0.0
    Hour18:WeekDaySunday     0.0  0.0  0.0  0 0.1
    Hour19:WeekDaySunday     0.1  0.0  0.1  0 0.0
    Hour20:WeekDaySunday     0.0  0.0  0.0  0 0.0
    Hour21:WeekDaySunday    -0.1 -0.1  0.0  0 0.0
    Hour22:WeekDaySunday    -0.1 -0.1 -0.1  0 0.0
    Hour23:WeekDaySunday     0.0  0.0  0.0  0 0.0
    Hour1:WeekDayThursday   -0.1 -0.1  0.0  0 0.0
    Hour2:WeekDayThursday   -0.1 -0.1 -0.1  0 0.0
    Hour3:WeekDayThursday   -0.1 -0.1  0.0  0 0.0
    Hour4:WeekDayThursday    0.0  0.0  0.0  0 0.0
    Hour5:WeekDayThursday   -0.1 -0.1  0.0  0 0.0
    Hour6:WeekDayThursday    0.0 -0.1  0.0  0 0.0
    Hour7:WeekDayThursday   -0.1 -0.2 -0.1  0 0.0
    Hour8:WeekDayThursday    0.1  0.0  0.1  0 0.0
    Hour9:WeekDayThursday    0.0  0.0  0.0  0 0.0
    Hour10:WeekDayThursday   0.0  0.0  0.0  0 0.1
    Hour11:WeekDayThursday   0.0 -0.1  0.0  0 0.0
    Hour12:WeekDayThursday   0.0 -0.1  0.0  0 0.0
    Hour13:WeekDayThursday   0.0  0.0  0.0  0 0.8
    Hour14:WeekDayThursday   0.0  0.0  0.0  0 0.0
    Hour15:WeekDayThursday   0.0  0.0  0.0  0 0.0
    Hour16:WeekDayThursday   0.0  0.0  0.0  0 0.0
    Hour17:WeekDayThursday   0.0  0.0  0.0  0 0.4
    Hour18:WeekDayThursday   0.0 -0.1  0.0  0 0.0
    Hour19:WeekDayThursday   0.0  0.0  0.0  0 0.0
    Hour20:WeekDayThursday  -0.1 -0.1  0.0  0 0.0
    Hour21:WeekDayThursday   0.0 -0.1  0.0  0 0.0
    Hour22:WeekDayThursday  -0.1 -0.1 -0.1  0 0.0
    Hour23:WeekDayThursday  -0.1 -0.1 -0.1  0 0.0
    Hour1:WeekDayTuesday     0.0  0.0  0.0  0 0.0
    Hour2:WeekDayTuesday     0.0  0.0  0.0  0 0.2
    Hour3:WeekDayTuesday     0.0 -0.1  0.0  0 0.0
    Hour4:WeekDayTuesday     0.0  0.0  0.0  0 0.0
    Hour5:WeekDayTuesday     0.0  0.0  0.0  0 0.1
    Hour6:WeekDayTuesday     0.1  0.1  0.1  0 0.0
    Hour7:WeekDayTuesday     0.1  0.1  0.1  0 0.0
    Hour8:WeekDayTuesday     0.1  0.1  0.2  0 0.0
    Hour9:WeekDayTuesday     0.1  0.1  0.1  0 0.0
    Hour10:WeekDayTuesday    0.0  0.0  0.0  0 0.6
    Hour11:WeekDayTuesday    0.0  0.0  0.0  0 0.0
    Hour12:WeekDayTuesday    0.0  0.0  0.0  0 0.0
    Hour13:WeekDayTuesday    0.1  0.1  0.1  0 0.0
    Hour14:WeekDayTuesday    0.0  0.0  0.0  0 0.0
    Hour15:WeekDayTuesday    0.0  0.0  0.0  0 0.3
    Hour16:WeekDayTuesday    0.0  0.0  0.0  0 0.0
    Hour17:WeekDayTuesday    0.0  0.0  0.0  0 0.5
    Hour18:WeekDayTuesday    0.0  0.0  0.0  0 0.6
    Hour19:WeekDayTuesday    0.1  0.0  0.1  0 0.0
    Hour20:WeekDayTuesday    0.0  0.0  0.0  0 0.5
    Hour21:WeekDayTuesday    0.0  0.0  0.0  0 0.8
    Hour22:WeekDayTuesday    0.0  0.0  0.0  0 0.0
    Hour23:WeekDayTuesday    0.0  0.0  0.1  0 0.0
    Hour1:WeekDayWednesday  -0.1 -0.1 -0.1  0 0.0
    Hour2:WeekDayWednesday  -0.1 -0.1 -0.1  0 0.0
    Hour3:WeekDayWednesday  -0.1 -0.1 -0.1  0 0.0
    Hour4:WeekDayWednesday  -0.1 -0.1 -0.1  0 0.0
    Hour5:WeekDayWednesday  -0.1 -0.1  0.0  0 0.0
    Hour6:WeekDayWednesday  -0.1 -0.2 -0.1  0 0.0
    Hour7:WeekDayWednesday  -0.2 -0.2 -0.2  0 0.0
    Hour8:WeekDayWednesday   0.0 -0.1  0.0  0 0.0
    Hour9:WeekDayWednesday   0.0  0.0  0.0  0 0.2
    Hour10:WeekDayWednesday  0.0 -0.1  0.0  0 0.0
    Hour11:WeekDayWednesday  0.0 -0.1  0.0  0 0.0
    Hour12:WeekDayWednesday -0.1 -0.1 -0.1  0 0.0
    Hour13:WeekDayWednesday  0.0 -0.1  0.0  0 0.0
    Hour14:WeekDayWednesday -0.1 -0.1  0.0  0 0.0
    Hour15:WeekDayWednesday -0.1 -0.1  0.0  0 0.0
    Hour16:WeekDayWednesday  0.0  0.0  0.0  0 0.0
    Hour17:WeekDayWednesday -0.1 -0.1 -0.1  0 0.0
    Hour18:WeekDayWednesday -0.1 -0.1 -0.1  0 0.0
    Hour19:WeekDayWednesday -0.1 -0.1 -0.1  0 0.0
    Hour20:WeekDayWednesday -0.1 -0.1 -0.1  0 0.0
    Hour21:WeekDayWednesday -0.1 -0.1 -0.1  0 0.0
    Hour22:WeekDayWednesday -0.1 -0.1 -0.1  0 0.0
    Hour23:WeekDayWednesday -0.1 -0.1  0.0  0 0.0

So, we get very similar values as before for the hours of the day on
their own. This time, we actually do get day of week effects for
Wednesday, Thursday, and Saturday:

                            Coef
    WeekDaySaturday          0.1
    WeekDayThursday          0.1
    WeekDayWednesday         0.1

And we get a pile of interactions:

    Hour6:WeekDayMonday      0.1
    Hour8:WeekDayMonday      0.1
    Hour9:WeekDayMonday      0.3
    Hour10:WeekDayMonday     0.1
    Hour13:WeekDayMonday     0.1
    Hour16:WeekDayMonday     0.1
    Hour18:WeekDayMonday     0.1
    Hour19:WeekDayMonday     0.1
    Hour20:WeekDayMonday     0.1
    Hour21:WeekDayMonday     0.1

    Hour6:WeekDayTuesday     0.1
    Hour7:WeekDayTuesday     0.1
    Hour8:WeekDayTuesday     0.1
    Hour9:WeekDayTuesday     0.1
    Hour13:WeekDayTuesday    0.1
    Hour19:WeekDayTuesday    0.1

    Hour1:WeekDayWednesday  -0.1
    Hour2:WeekDayWednesday  -0.1
    Hour3:WeekDayWednesday  -0.1
    Hour4:WeekDayWednesday  -0.1
    Hour5:WeekDayWednesday  -0.1
    Hour6:WeekDayWednesday  -0.1
    Hour7:WeekDayWednesday  -0.2
    Hour12:WeekDayWednesday -0.1
    Hour14:WeekDayWednesday -0.1
    Hour15:WeekDayWednesday -0.1
    Hour17:WeekDayWednesday -0.1
    Hour18:WeekDayWednesday -0.1
    Hour19:WeekDayWednesday -0.1
    Hour20:WeekDayWednesday -0.1
    Hour21:WeekDayWednesday -0.1
    Hour22:WeekDayWednesday -0.1
    Hour23:WeekDayWednesday -0.1

    Hour1:WeekDayThursday   -0.1
    Hour2:WeekDayThursday   -0.1
    Hour3:WeekDayThursday   -0.1
    Hour5:WeekDayThursday   -0.1
    Hour7:WeekDayThursday   -0.1
    Hour8:WeekDayThursday    0.1
    Hour20:WeekDayThursday  -0.1
    Hour22:WeekDayThursday  -0.1
    Hour23:WeekDayThursday  -0.1

    Hour1:WeekDaySaturday   -0.1
    Hour2:WeekDaySaturday   -0.1
    Hour3:WeekDaySaturday   -0.1
    Hour6:WeekDaySaturday    0.1
    Hour7:WeekDaySaturday   -0.2
    Hour9:WeekDaySaturday   -0.1
    Hour10:WeekDaySaturday  -0.1
    Hour11:WeekDaySaturday  -0.1
    Hour12:WeekDaySaturday  -0.1
    Hour20:WeekDaySaturday  -0.1
    Hour21:WeekDaySaturday  -0.1
    Hour22:WeekDaySaturday  -0.1
    Hour23:WeekDaySaturday  -0.1

    Hour1:WeekDaySunday     -0.1
    Hour6:WeekDaySunday      0.1
    Hour7:WeekDaySunday     -0.1
    Hour9:WeekDaySunday      0.1
    Hour11:WeekDaySunday    -0.1
    Hour14:WeekDaySunday     0.1
    Hour15:WeekDaySunday     0.1
    Hour16:WeekDaySunday     0.1
    Hour19:WeekDaySunday     0.1
    Hour21:WeekDaySunday    -0.1
    Hour22:WeekDaySunday    -0.1

I don't really understand these estimates. For example, why would 6 AM
be terrible in general, but be helpful on Sundays?

--
gwern
http://www.gwern.net/Spaced%20repetition