I've recently found out about an alleged corpus of 8000 email messages from the Venezuelan government, which I've blogged about at: <http://www.sgi.nu/diary/2008/09/02/venezuelan-government-email-corpus/>
It's currently on offer to the highest bidder, but WikiLeaks (who have obtained the messages) claim they will publicly release it after a period of exclusive access for the winning bidder.
I'm wondering whether anyone knows more about this?
Thanks,
Andrew
--------------
Andrew Lampert
Research Engineer
Information Engineering Laboratory
CSIRO ICT Centre
<http://www.ict.csiro.au/staff/Andrew.Lampert/>
Post: Locked Bag 17, North Ryde, NSW 1670, Australia
Office: Building E6B, Macquarie University, North Ryde, 2113
Tel: +61 2 9325 3129, Fax: +61 2 9325 3200
Does anyone have thoughts about using this type of data?
Mark
Michael, you say that "using this would set a bad precedent and undermine trust we might hope for from legit providers". That might be true - it's hard to know, I think. Is the potential to undermine trust still there once emails are in the public domain? What has to happen to lend legitimacy to a real-world email corpus (other than where an organisation volunteers their data)? Enron is one example, though even there some people raise ethical questions about using it. I guess we can just keep hoping that the Clinton.gov corpus you mentioned comes to fruition, Mark!
That said, there are some tricky cases. Consider:
1) The government subpoenas a large amount of email and releases it,
as with Enron. I think this is ok to use. What if one of the people in
the corpus ask for all of their mail to be removed?
2) I decide to release all my email data. Can my contacts who sent me
the email object and ask that their messages be removed?
I think both of these cases would be ok. The Venezuela case is not
because the permission broke the law by releasing government
information, even if the people have a right to see it, it doesn't
mean that legally we can use it.
The AOL data is unclear to me. AOL released the data under a license,
but then removed the data and asked people not to use it. Can you
withdraw the license that was originally provided (depends on the
license)?
I certainly do not want to use any data in my research that people
would view as suspect. However, I suspect that other researchers
(journalists and political scientists) would use the released
Venezuelan email. If they would, do we have a different standard?
Mark