| Mnemosyne data set available | Gwern Branwen | 07/02/14 09:46 | Hello everyone. As you know, by default Mnemosyne collects logs of all
flashcard reviews, and has done so for years. This seems like it could be useful data for some projects (see for example my earlier emails about time-of-day and -week on performance), but it's difficult to get access to the corpus because it has become far too large to email or casually host; Peter provided a torrent ~4 years ago but it died long ago. One can email him for a copy but that takes up his time and in any case, the logs still have to processed into a SQL database (I tried recently but after 3 months of processing, it still hadn't finished because of the IO bottleneck of my hard drive). I've taken it upon myself to take a recent dump of logs from Peter, process them into a SQL database (~1 day with my new SSD), and upload them to my Amazon S3 account where anyone can download them. The link is https://s3.amazonaws.com/gwern-mnemosyne/2014-01-27-mnemosynelogs-all.db.xz (due to the size, I suggest a download manager like `wget --continue`). This is a 2.8GB file compressed with xz (https://en.wikipedia.org/wiki/Xz) which `unxz`/unpacks to an 18GB SQLite 3.x database with the MD5 hash 03569c5416dd6923613389be6d0cc9e1 It can be queried with commands like `$ sqlite3 -batch ./logs.db "SELECT timestamp,object_id,grade FROM log WHERE event==9;"` or via SQL interfaces like 'sqldf' for R. I commit to keeping the file up for 3 months before removing it, since S3 bandwidth is not free; if you'd like to see it stay longer, I accept Bitcoin donations at 1HbHpdhazqzfPtbcw9NA2H9R1GWNekm1L -- gwern http://www.gwern.net/Spaced%20repetition |
| RE: [mnemosyne-proj-users] Mnemosyne data set available | Peter Bienstman | 08/02/14 00:29 | Great, thanks for making this available!
Peter > -- > You received this message because you are subscribed to the Google Groups > "mnemosyne-proj-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to mnemosyne-proj-users+unsubscribe@googlegroups.com. > To post to this group, send email to mnemosyne-proj- > us...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/mnemosyne-proj- > users/CAMwO0gz2azU-N-oCiLrCJ1fYvpU- > 37Aj%2BMWkRGBWHuBeZXxzTQ%40mail.gmail.com. > For more options, visit https://groups.google.com/groups/opt_out. |
| Re: Mnemosyne data set available | Gnome | 08/02/14 14:53 | If someone is interested in a new torrent, I have a server at home that is almost always on. However I may cap the upload speed from time to time... |
| Re: [mnemosyne-proj-users] Re: Mnemosyne data set available | Gwern Branwen | 08/02/14 15:31 | On Sat, Feb 8, 2014 at 5:53 PM, Gnome <jippi...@hotmail.com> wrote:Supposedly an S3 file is already a torrent: http://docs.aws.amazon.com/AmazonS3/latest/dev/S3Torrent.html If I'm understanding it right, the torrent link is https://s3.amazonaws.com/gwern-mnemosyne/2014-01-27-mnemosynelogs-all.db.xz?torrent I didn't bother with this because I have hardly any upload capacity at all and would contribute minimally to anyone's download, but if you want to do it, you can. (Personally, I doubt there's enough interest in a torrent to repay the download.) -- gwern http://www.gwern.net |
| Re: [mnemosyne-proj-users] Re: Mnemosyne data set available | la...@checkd.in | 22/06/14 23:07 | On Sunday, February 9, 2014 12:31:36 AM UTC+1, Gwern Branwen wrote: To keep this collection available, I've taken the liberty of uploading the file to archive.org: From there it is available as a https and torrent downloads. Hope this is found useful. Regards |
| Re: Mnemosyne data set available | Nawaf AlSabhan | 03/09/15 12:44 | Hi
Thanks for making the data available. I'm an acedamic researcher and I have downloaded the dataset; however, I am struggling to understand the meaning/values of some of the columns. Is there any description or a readme file for the dataset? The table parsed_logs contain file names such as c136315a_00001.bz2, are these files available? Best regards, Nawaf |