Mnemosyne data set available

Affichage de 16 messages sur 6
Mnemosyne data set available Gwern Branwen 07/02/14 09:46
Hello everyone. As you know, by default Mnemosyne collects logs of all
flashcard reviews, and has done so for years.

This seems like it could be useful data for some projects (see for
example my earlier emails about time-of-day and -week on performance),
but it's difficult to get access to the corpus because it has become
far too large to email or casually host; Peter provided a torrent ~4
years ago but it died long ago. One can email him for a copy but that
takes up his time and in any case, the logs still have to processed
into a SQL database (I tried recently but after 3 months of
processing, it still hadn't finished because of the IO bottleneck of
my hard drive). I've taken it upon myself to take a recent dump of
logs from Peter, process them into a SQL database (~1 day with my new
SSD), and upload them to my Amazon S3 account where anyone can
download them.

The link is https://s3.amazonaws.com/gwern-mnemosyne/2014-01-27-mnemosynelogs-all.db.xz
(due to the size, I suggest a download manager like `wget
--continue`).

This is a 2.8GB file compressed with xz
(https://en.wikipedia.org/wiki/Xz) which `unxz`/unpacks to an 18GB
SQLite 3.x database with the MD5 hash 03569c5416dd6923613389be6d0cc9e1
It can be queried with commands like `$ sqlite3 -batch ./logs.db
"SELECT timestamp,object_id,grade FROM log WHERE event==9;"` or via
SQL interfaces like 'sqldf' for R.

I commit to keeping the file up for 3 months before removing it, since
S3 bandwidth is not free; if you'd like to see it stay longer, I
accept Bitcoin donations at 1HbHpdhazqzfPtbcw9NA2H9R1GWNekm1L

--
gwern
http://www.gwern.net/Spaced%20repetition
RE: [mnemosyne-proj-users] Mnemosyne data set available Peter Bienstman 08/02/14 00:29
Great, thanks for making this available!

Peter
> --
> You received this message because you are subscribed to the Google Groups
> "mnemosyne-proj-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mnemosyne-proj-users+unsubscribe@googlegroups.com.
> To post to this group, send email to mnemosyne-proj-
> us...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mnemosyne-proj-
> users/CAMwO0gz2azU-N-oCiLrCJ1fYvpU-
> 37Aj%2BMWkRGBWHuBeZXxzTQ%40mail.gmail.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Re: Mnemosyne data set available Gnome 08/02/14 14:53
If someone is interested in a new torrent, I have a server at home that is almost always on. However I may cap the upload speed from time to time...
Re: [mnemosyne-proj-users] Re: Mnemosyne data set available Gwern Branwen 08/02/14 15:31
On Sat, Feb 8, 2014 at 5:53 PM, Gnome <jippi...@hotmail.com> wrote:
> If someone is interested in a new torrent, I have a server at home that is
> almost always on. However I may cap the upload speed from time to time...

Supposedly an S3 file is already a torrent:
http://docs.aws.amazon.com/AmazonS3/latest/dev/S3Torrent.html

If I'm understanding it right, the torrent link is
https://s3.amazonaws.com/gwern-mnemosyne/2014-01-27-mnemosynelogs-all.db.xz?torrent

I didn't bother with this because I have hardly any upload capacity at
all and would contribute minimally to anyone's download, but if you
want to do it, you can. (Personally, I doubt there's enough interest
in a torrent to repay the download.)

--
gwern
http://www.gwern.net
Re: [mnemosyne-proj-users] Re: Mnemosyne data set available la...@checkd.in 22/06/14 23:07
On Sunday, February 9, 2014 12:31:36 AM UTC+1, Gwern Branwen wrote:

To keep this collection available, I've taken the liberty of uploading the file to archive.org:
https://archive.org/details/20140127MnemosynelogsAll.db

From there it is available as a https and torrent downloads.

Hope this is found useful.

Regards
Laust

Re: Mnemosyne data set available Nawaf AlSabhan 03/09/15 12:44
Hi

Thanks for making the data available.

I'm an acedamic researcher and I have downloaded the dataset; however, I am struggling to understand the meaning/values of some of the columns.

Is there any description or a readme file for the dataset?

The table parsed_logs contain file names such as c136315a_00001.bz2, are these files available?

Best regards,
Nawaf