On Fri, Aug 30, 2013 at 2:00 AM, Peter Bienstman
<
Peter.B...@ugent.be> wrote:
> Would be interesting to see how this generalises to other people.
(I killed my log import at 26% and discovered that the importing
process is resumable, so I'd been suffering for no reason - I could've
just kept running it overnight until it finished, and not driven
myself nuts with a degraded system! Oh well. It's not trivial to run a
linear model on a 1.1G CSV with 48m rows, but it's not as hard as I
expected, and I've finished a simple analysis.)
Summary: Noon good, 6 AM bad. There's a circadian-looking change in
average grade over the day:
http://i.imgur.com/sum3toZ.png There seems
to be no real day-of-week effect.
I wonder if anyone has published on spaced repetition performance over
the day before? Probably not, you'd need a pretty big dataset like
this to find it.
I extract the data as before:
$ sqlite3 -batch ./mnemosyne-stats/logs.db "SELECT
timestamp,object_id,grade FROM log WHERE event==9;" | tr '|' ',' >
~/mnemosyne-all.csv
$ wc mnemosyne-all.csv
47794669 47794669 1114023396 mnemosyne-all.csv
$ R
R> install.packages("biglm")
The `biglm` package offers an *incremental* linear model function: you
can read in a million rows, 'add' them to a `biglm` object, read in
another million rows and so on. Since I can fit 1 million rows in RAM
but not 48 million rows, this works great:
library(biglm)
m <- file("mnemosyne-all.csv", open="rt")
get <- function(filepath) {
chunk <- read.csv(filepath, nrows=2000000, header=FALSE,
col.names=c("Date", "ID", "Grade"), colClasses=c("integer", "factor",
"numeric"))
chunk$Date <- as.POSIXct(chunk$Date, origin = "1970-01-01", tz = "UTC")
chunk$WeekDay <- as.factor(weekdays(chunk$Date))
chunk$Hour <- as.factor(as.numeric(format(chunk$Date, "%H")))
return(chunk)
}
frml <- Grade ~ Hour + WeekDay
# create a seed to update with fresh data in the loop
chunk1 <- get(m)
bl <- biglm(frml, chunk1)
while (isOpen(m)) {
chunk <- get(m)
bl <- update(bl, chunk)
}
summary(bl)
closeAllConnections()
...
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
Calls: update ... model.matrix -> model.matrix.default -> contrasts<-
R> # It errors out because I didn't figure out how to handle
reading to the end of the file
R> summary(bl)
Large data regression model: biglm(frml, chunk1)
Sample size = 47794669
Coef (95% CI) SE p
(Intercept) 2.8 2.8 2.8 0 0
Hour1 0.0 0.0 0.0 0 0
Hour2 0.0 0.0 0.0 0 0
Hour3 -0.1 -0.1 0.0 0 0
Hour4 -0.1 -0.1 -0.1 0 0
Hour5 -0.2 -0.2 -0.2 0 0
Hour6 -0.3 -0.3 -0.3 0 0
Hour7 -0.1 -0.1 -0.1 0 0
Hour8 0.1 0.1 0.1 0 0
Hour9 0.2 0.2 0.2 0 0
Hour10 0.3 0.3 0.3 0 0
Hour11 0.3 0.3 0.3 0 0
Hour12 0.3 0.3 0.3 0 0
Hour13 0.2 0.2 0.2 0 0
Hour14 0.2 0.2 0.2 0 0
Hour15 0.2 0.2 0.2 0 0
Hour16 0.1 0.1 0.1 0 0
Hour17 0.1 0.1 0.1 0 0
Hour18 0.1 0.1 0.1 0 0
Hour19 0.0 0.0 0.1 0 0
Hour20 0.0 0.0 0.0 0 0
Hour21 0.0 0.0 0.0 0 0
Hour22 0.0 0.0 0.0 0 0
Hour23 0.0 0.0 0.0 0 0
WeekDayMonday 0.0 0.0 0.0 0 0
WeekDaySaturday 0.0 0.0 0.0 0 0
WeekDaySunday 0.0 0.0 0.0 0 0
WeekDayThursday 0.0 0.0 0.0 0 0
WeekDayTuesday 0.0 0.0 0.0 0 0
WeekDayWednesday 0.0 0.0 0.0 0 0
But hopefully 47,794,669 (48m) flashcard reviews is enough. So,
pulling out the interesting coefficients - all the weekdays drop out
as irrelevant - we get:
06 -0.3
05 -0.2
03 -0.1
04 -0.1
07 -0.1
08 0.1
16 0.1
17 0.1
18 0.1
09 0.2
13 0.2
14 0.2
15 0.2
10 0.3
11 0.3
12 0.3
(We can ignore the p-values & confidence intervals, since at this
sample scale, they're all going to be zero & point-values.) No
apparent pattern when sorted by size, but the pattern jumps out when
we graph by hour:
plot(c(0,0,-0.1,-0.1,-0.2,-0.3,-0.1,0.1,0.2,0.3,0.3,0.3,0.2,0.2,0.2,0.1,0.1,0.1,0,0,0,0,0))
http://i.imgur.com/sum3toZ.png
We get a beautiful-looking circadian rhythm: peak performance at noon,
crappy performance in early morning, declining performance over the
day into evening. I'm actually impressed at the effect sizes here: if
you compare reviewing at noon vs reviewing at 6 AM, that's a
difference of 0.6 - on a 1-5 scale where most grades are a 3 or 4! If
this is reflecting actual memory performance and not some sort of
response bias varying by time (like being too pessimistic when you're
up too early/late)
That's only about retrieval, though. I wonder if late at night (==near
bedtime) would be best for subsequent recalls, but I'm not sure how to
analyze that.
This doesn't tell us how hour & day may combine like in my previous
multilevel model. So let's rerun with:
frml <- Grade ~ Hour * WeekDay
This will use up ~4x more RAM while running, BTW, so you may need to
reduce how many rows you tell `read.csv` to read in. The results:
R> summary(bl)
Large data regression model: biglm(frml, chunk1)
Sample size = 47794669
Coef (95% CI) SE p
(Intercept) 2.8 2.8 2.8 0 0.0
Hour1 0.0 0.0 0.0 0 0.0
Hour2 0.0 0.0 0.0 0 0.0
Hour3 0.0 0.0 0.0 0 0.0
Hour4 -0.1 -0.1 -0.1 0 0.0
Hour5 -0.2 -0.2 -0.2 0 0.0
Hour6 -0.3 -0.3 -0.3 0 0.0
Hour7 0.0 0.0 0.0 0 0.0
Hour8 0.1 0.1 0.1 0 0.0
Hour9 0.1 0.1 0.1 0 0.0
Hour10 0.3 0.3 0.3 0 0.0
Hour11 0.3 0.3 0.3 0 0.0
Hour12 0.3 0.3 0.3 0 0.0
Hour13 0.2 0.2 0.2 0 0.0
Hour14 0.2 0.2 0.2 0 0.0
Hour15 0.2 0.2 0.2 0 0.0
Hour16 0.1 0.1 0.1 0 0.0
Hour17 0.1 0.1 0.1 0 0.0
Hour18 0.1 0.1 0.1 0 0.0
Hour19 0.0 0.0 0.0 0 0.0
Hour20 0.1 0.1 0.1 0 0.0
Hour21 0.1 0.0 0.1 0 0.0
Hour22 0.1 0.1 0.1 0 0.0
Hour23 0.0 0.0 0.0 0 0.0
WeekDayMonday 0.0 0.0 0.0 0 0.6
WeekDaySaturday 0.1 0.0 0.1 0 0.0
WeekDaySunday 0.0 0.0 0.0 0 0.0
WeekDayThursday 0.1 0.0 0.1 0 0.0
WeekDayTuesday 0.0 0.0 0.0 0 0.9
WeekDayWednesday 0.1 0.1 0.1 0 0.0
Hour1:WeekDayMonday 0.0 0.0 0.0 0 0.0
Hour2:WeekDayMonday 0.0 -0.1 0.0 0 0.0
Hour3:WeekDayMonday 0.0 0.0 0.0 0 0.0
Hour4:WeekDayMonday 0.0 0.0 0.0 0 0.4
Hour5:WeekDayMonday 0.0 0.0 0.0 0 0.0
Hour6:WeekDayMonday 0.1 0.1 0.1 0 0.0
Hour7:WeekDayMonday 0.0 -0.1 0.0 0 0.0
Hour8:WeekDayMonday 0.1 0.1 0.2 0 0.0
Hour9:WeekDayMonday 0.3 0.2 0.3 0 0.0
Hour10:WeekDayMonday 0.1 0.1 0.1 0 0.0
Hour11:WeekDayMonday 0.0 0.0 0.0 0 0.0
Hour12:WeekDayMonday 0.0 0.0 0.0 0 0.0
Hour13:WeekDayMonday 0.1 0.1 0.1 0 0.0
Hour14:WeekDayMonday 0.0 0.0 0.1 0 0.0
Hour15:WeekDayMonday 0.0 0.0 0.0 0 0.0
Hour16:WeekDayMonday 0.1 0.1 0.1 0 0.0
Hour17:WeekDayMonday 0.0 0.0 0.0 0 0.1
Hour18:WeekDayMonday 0.1 0.0 0.1 0 0.0
Hour19:WeekDayMonday 0.1 0.1 0.1 0 0.0
Hour20:WeekDayMonday 0.1 0.0 0.1 0 0.0
Hour21:WeekDayMonday 0.1 0.1 0.1 0 0.0
Hour22:WeekDayMonday 0.0 0.0 0.0 0 0.0
Hour23:WeekDayMonday 0.0 0.0 0.0 0 0.0
Hour1:WeekDaySaturday -0.1 -0.1 -0.1 0 0.0
Hour2:WeekDaySaturday -0.1 -0.1 -0.1 0 0.0
Hour3:WeekDaySaturday -0.1 -0.1 -0.1 0 0.0
Hour4:WeekDaySaturday 0.0 0.0 0.0 0 0.3
Hour5:WeekDaySaturday 0.0 0.0 0.0 0 0.0
Hour6:WeekDaySaturday 0.1 0.1 0.1 0 0.0
Hour7:WeekDaySaturday -0.2 -0.2 -0.2 0 0.0
Hour8:WeekDaySaturday 0.0 0.0 0.0 0 0.3
Hour9:WeekDaySaturday -0.1 -0.1 -0.1 0 0.0
Hour10:WeekDaySaturday -0.1 -0.1 -0.1 0 0.0
Hour11:WeekDaySaturday -0.1 -0.1 -0.1 0 0.0
Hour12:WeekDaySaturday -0.1 -0.1 -0.1 0 0.0
Hour13:WeekDaySaturday 0.0 0.0 0.0 0 0.0
Hour14:WeekDaySaturday 0.0 0.0 0.1 0 0.0
Hour15:WeekDaySaturday 0.0 0.0 0.0 0 0.1
Hour16:WeekDaySaturday 0.0 0.0 0.0 0 0.0
Hour17:WeekDaySaturday 0.0 0.0 0.0 0 0.9
Hour18:WeekDaySaturday 0.0 0.0 0.0 0 0.4
Hour19:WeekDaySaturday 0.0 0.0 0.0 0 0.0
Hour20:WeekDaySaturday -0.1 -0.1 -0.1 0 0.0
Hour21:WeekDaySaturday -0.1 -0.1 -0.1 0 0.0
Hour22:WeekDaySaturday -0.1 -0.1 -0.1 0 0.0
Hour23:WeekDaySaturday -0.1 -0.1 -0.1 0 0.0
Hour1:WeekDaySunday -0.1 -0.1 0.0 0 0.0
Hour2:WeekDaySunday 0.0 0.0 0.0 0 0.0
Hour3:WeekDaySunday 0.0 0.0 0.0 0 0.1
Hour4:WeekDaySunday 0.0 0.0 0.0 0 0.0
Hour5:WeekDaySunday 0.0 0.0 0.0 0 0.0
Hour6:WeekDaySunday 0.1 0.1 0.1 0 0.0
Hour7:WeekDaySunday -0.1 -0.1 -0.1 0 0.0
Hour8:WeekDaySunday 0.0 0.0 0.0 0 0.1
Hour9:WeekDaySunday 0.1 0.0 0.1 0 0.0
Hour10:WeekDaySunday 0.0 0.0 0.1 0 0.0
Hour11:WeekDaySunday -0.1 -0.1 -0.1 0 0.0
Hour12:WeekDaySunday 0.0 -0.1 0.0 0 0.0
Hour13:WeekDaySunday 0.0 0.0 0.0 0 0.0
Hour14:WeekDaySunday 0.1 0.1 0.1 0 0.0
Hour15:WeekDaySunday 0.1 0.1 0.1 0 0.0
Hour16:WeekDaySunday 0.1 0.1 0.1 0 0.0
Hour17:WeekDaySunday 0.0 0.0 0.0 0 0.0
Hour18:WeekDaySunday 0.0 0.0 0.0 0 0.1
Hour19:WeekDaySunday 0.1 0.0 0.1 0 0.0
Hour20:WeekDaySunday 0.0 0.0 0.0 0 0.0
Hour21:WeekDaySunday -0.1 -0.1 0.0 0 0.0
Hour22:WeekDaySunday -0.1 -0.1 -0.1 0 0.0
Hour23:WeekDaySunday 0.0 0.0 0.0 0 0.0
Hour1:WeekDayThursday -0.1 -0.1 0.0 0 0.0
Hour2:WeekDayThursday -0.1 -0.1 -0.1 0 0.0
Hour3:WeekDayThursday -0.1 -0.1 0.0 0 0.0
Hour4:WeekDayThursday 0.0 0.0 0.0 0 0.0
Hour5:WeekDayThursday -0.1 -0.1 0.0 0 0.0
Hour6:WeekDayThursday 0.0 -0.1 0.0 0 0.0
Hour7:WeekDayThursday -0.1 -0.2 -0.1 0 0.0
Hour8:WeekDayThursday 0.1 0.0 0.1 0 0.0
Hour9:WeekDayThursday 0.0 0.0 0.0 0 0.0
Hour10:WeekDayThursday 0.0 0.0 0.0 0 0.1
Hour11:WeekDayThursday 0.0 -0.1 0.0 0 0.0
Hour12:WeekDayThursday 0.0 -0.1 0.0 0 0.0
Hour13:WeekDayThursday 0.0 0.0 0.0 0 0.8
Hour14:WeekDayThursday 0.0 0.0 0.0 0 0.0
Hour15:WeekDayThursday 0.0 0.0 0.0 0 0.0
Hour16:WeekDayThursday 0.0 0.0 0.0 0 0.0
Hour17:WeekDayThursday 0.0 0.0 0.0 0 0.4
Hour18:WeekDayThursday 0.0 -0.1 0.0 0 0.0
Hour19:WeekDayThursday 0.0 0.0 0.0 0 0.0
Hour20:WeekDayThursday -0.1 -0.1 0.0 0 0.0
Hour21:WeekDayThursday 0.0 -0.1 0.0 0 0.0
Hour22:WeekDayThursday -0.1 -0.1 -0.1 0 0.0
Hour23:WeekDayThursday -0.1 -0.1 -0.1 0 0.0
Hour1:WeekDayTuesday 0.0 0.0 0.0 0 0.0
Hour2:WeekDayTuesday 0.0 0.0 0.0 0 0.2
Hour3:WeekDayTuesday 0.0 -0.1 0.0 0 0.0
Hour4:WeekDayTuesday 0.0 0.0 0.0 0 0.0
Hour5:WeekDayTuesday 0.0 0.0 0.0 0 0.1
Hour6:WeekDayTuesday 0.1 0.1 0.1 0 0.0
Hour7:WeekDayTuesday 0.1 0.1 0.1 0 0.0
Hour8:WeekDayTuesday 0.1 0.1 0.2 0 0.0
Hour9:WeekDayTuesday 0.1 0.1 0.1 0 0.0
Hour10:WeekDayTuesday 0.0 0.0 0.0 0 0.6
Hour11:WeekDayTuesday 0.0 0.0 0.0 0 0.0
Hour12:WeekDayTuesday 0.0 0.0 0.0 0 0.0
Hour13:WeekDayTuesday 0.1 0.1 0.1 0 0.0
Hour14:WeekDayTuesday 0.0 0.0 0.0 0 0.0
Hour15:WeekDayTuesday 0.0 0.0 0.0 0 0.3
Hour16:WeekDayTuesday 0.0 0.0 0.0 0 0.0
Hour17:WeekDayTuesday 0.0 0.0 0.0 0 0.5
Hour18:WeekDayTuesday 0.0 0.0 0.0 0 0.6
Hour19:WeekDayTuesday 0.1 0.0 0.1 0 0.0
Hour20:WeekDayTuesday 0.0 0.0 0.0 0 0.5
Hour21:WeekDayTuesday 0.0 0.0 0.0 0 0.8
Hour22:WeekDayTuesday 0.0 0.0 0.0 0 0.0
Hour23:WeekDayTuesday 0.0 0.0 0.1 0 0.0
Hour1:WeekDayWednesday -0.1 -0.1 -0.1 0 0.0
Hour2:WeekDayWednesday -0.1 -0.1 -0.1 0 0.0
Hour3:WeekDayWednesday -0.1 -0.1 -0.1 0 0.0
Hour4:WeekDayWednesday -0.1 -0.1 -0.1 0 0.0
Hour5:WeekDayWednesday -0.1 -0.1 0.0 0 0.0
Hour6:WeekDayWednesday -0.1 -0.2 -0.1 0 0.0
Hour7:WeekDayWednesday -0.2 -0.2 -0.2 0 0.0
Hour8:WeekDayWednesday 0.0 -0.1 0.0 0 0.0
Hour9:WeekDayWednesday 0.0 0.0 0.0 0 0.2
Hour10:WeekDayWednesday 0.0 -0.1 0.0 0 0.0
Hour11:WeekDayWednesday 0.0 -0.1 0.0 0 0.0
Hour12:WeekDayWednesday -0.1 -0.1 -0.1 0 0.0
Hour13:WeekDayWednesday 0.0 -0.1 0.0 0 0.0
Hour14:WeekDayWednesday -0.1 -0.1 0.0 0 0.0
Hour15:WeekDayWednesday -0.1 -0.1 0.0 0 0.0
Hour16:WeekDayWednesday 0.0 0.0 0.0 0 0.0
Hour17:WeekDayWednesday -0.1 -0.1 -0.1 0 0.0
Hour18:WeekDayWednesday -0.1 -0.1 -0.1 0 0.0
Hour19:WeekDayWednesday -0.1 -0.1 -0.1 0 0.0
Hour20:WeekDayWednesday -0.1 -0.1 -0.1 0 0.0
Hour21:WeekDayWednesday -0.1 -0.1 -0.1 0 0.0
Hour22:WeekDayWednesday -0.1 -0.1 -0.1 0 0.0
Hour23:WeekDayWednesday -0.1 -0.1 0.0 0 0.0
So, we get very similar values as before for the hours of the day on
their own. This time, we actually do get day of week effects for
Wednesday, Thursday, and Saturday:
Coef
WeekDaySaturday 0.1
WeekDayThursday 0.1
WeekDayWednesday 0.1
And we get a pile of interactions:
Hour6:WeekDayMonday 0.1
Hour8:WeekDayMonday 0.1
Hour9:WeekDayMonday 0.3
Hour10:WeekDayMonday 0.1
Hour13:WeekDayMonday 0.1
Hour16:WeekDayMonday 0.1
Hour18:WeekDayMonday 0.1
Hour19:WeekDayMonday 0.1
Hour20:WeekDayMonday 0.1
Hour21:WeekDayMonday 0.1
Hour6:WeekDayTuesday 0.1
Hour7:WeekDayTuesday 0.1
Hour8:WeekDayTuesday 0.1
Hour9:WeekDayTuesday 0.1
Hour13:WeekDayTuesday 0.1
Hour19:WeekDayTuesday 0.1
Hour1:WeekDayWednesday -0.1
Hour2:WeekDayWednesday -0.1
Hour3:WeekDayWednesday -0.1
Hour4:WeekDayWednesday -0.1
Hour5:WeekDayWednesday -0.1
Hour6:WeekDayWednesday -0.1
Hour7:WeekDayWednesday -0.2
Hour12:WeekDayWednesday -0.1
Hour14:WeekDayWednesday -0.1
Hour15:WeekDayWednesday -0.1
Hour17:WeekDayWednesday -0.1
Hour18:WeekDayWednesday -0.1
Hour19:WeekDayWednesday -0.1
Hour20:WeekDayWednesday -0.1
Hour21:WeekDayWednesday -0.1
Hour22:WeekDayWednesday -0.1
Hour23:WeekDayWednesday -0.1
Hour1:WeekDayThursday -0.1
Hour2:WeekDayThursday -0.1
Hour3:WeekDayThursday -0.1
Hour5:WeekDayThursday -0.1
Hour7:WeekDayThursday -0.1
Hour8:WeekDayThursday 0.1
Hour20:WeekDayThursday -0.1
Hour22:WeekDayThursday -0.1
Hour23:WeekDayThursday -0.1
Hour1:WeekDaySaturday -0.1
Hour2:WeekDaySaturday -0.1
Hour3:WeekDaySaturday -0.1
Hour6:WeekDaySaturday 0.1
Hour7:WeekDaySaturday -0.2
Hour9:WeekDaySaturday -0.1
Hour10:WeekDaySaturday -0.1
Hour11:WeekDaySaturday -0.1
Hour12:WeekDaySaturday -0.1
Hour20:WeekDaySaturday -0.1
Hour21:WeekDaySaturday -0.1
Hour22:WeekDaySaturday -0.1
Hour23:WeekDaySaturday -0.1
Hour1:WeekDaySunday -0.1
Hour6:WeekDaySunday 0.1
Hour7:WeekDaySunday -0.1
Hour9:WeekDaySunday 0.1
Hour11:WeekDaySunday -0.1
Hour14:WeekDaySunday 0.1
Hour15:WeekDaySunday 0.1
Hour16:WeekDaySunday 0.1
Hour19:WeekDaySunday 0.1
Hour21:WeekDaySunday -0.1
Hour22:WeekDaySunday -0.1
I don't really understand these estimates. For example, why would 6 AM
be terrible in general, but be helpful on Sundays?
--
gwern
http://www.gwern.net/Spaced%20repetition