I don't have any WSD specific comments to make, but I have done a reasonable amount of annotation of other phenomena in the Enron corpus. Like you're suggesting, I have found using one of the database versions of the corpus very convenient for this purpose. Specifically, I use the database dump prepared by Andrew Fiore and Jeff Heer at UC Berkeley and add my own tables (with references to ids in the existing tables) for storing annotations that I make on the data. This also allows you to easily add multi-layer annotations over time (i.e., annotations that refer to other annotations, such as classification annotations that refer to text unit annotations).
I've cross-posted your query to the Email Research mailing list too, in case anyone has additional advice to offer, particularly in reference to word-sense disambiguation.
Cheers,
Andrew
________________________________________
From: enron-corp...@sgi.nu [enron-corp...@sgi.nu] On Behalf Of Stuart Moore [stuart...@cl.cam.ac.uk]
Sent: Wednesday, 20 May 2009 11:08 PM
To: enron-...@sgi.nu
Subject: [Enron-corpus] Using Enron Corpus as a text corpus for Word Sense Disambiguation
I'm looking to use the Enron corpus as a flat text corpus for Word
Sense Disambiguation research - and hopefully to make my annotations
publicly available. I currently plan just to use the body text of each
email, rather than any of the email headers. Personal emails, list
emails etc. are all useful to me (but Spam isn't).
I'm currently trying to work out the best way to organise my data -
has anyone done anything similar? Does anyone have any suggestions?
My current plan is to use one of the database versions mentioned on
http://sgi.nu/enron/corpora.php and add extra tables for my
annotations.
Many thanks
Stuart Moore
PhD Student, University of Cambridge
_______________________________________________
Enron-corpus mailing list
Enron-...@sgi.nu
http://lists.sgi.nu/mailman/listinfo/enron-corpus