fyi, enron has no attachments

2 views
Skip to first unread message

Mark Hammond

unread,
Aug 5, 2010, 8:18:41 PM8/5/10
to raindr...@googlegroups.com
I noticed gozer talk about using the enron dataset for some benchmarking
- this is just an FYI...

While benchmark-raindrop.py does load the enron corpus fine, a bug
limitation is that all attachments have been removed. There is a copy
of the corpus available which has attachments, but that version uses
.pst files and benchmark-raindrop can't currently do that.

So depending on what you want to benchmark, enron might, or might not,
be the best option.

An option we can consider is to grab a mbox file of Jean Reilly's
account - this should just be a matter of creating an account in
thunderbird, then copying the 'INBOX' file from the profile. We could
then put the file somewhere semi-public and all use the same data to
share meaningful results.

FYI, I can import my thunderbird account with a command-line like:

% benchmark-raindrop.py \
--my-address=mham...@skippinet.com.au \
--my-address=skippy....@gmail.com \

--mailbox=c:\Users\skip\AppData\Roaming\Thunderbird\Profiles\{salt_dir_name}\ImapMail\{imap_acct_name}\INBOX

If this proves useful, it would be fairly easy to have it walk a dir
structure looking for all mbox files (ie, to import all folders from all
accounts)

Cheers,

Mark

Reply all
Reply to author
Forward
0 new messages