Ethics of opendata.

Kevin McArthur

unread,

Aug 18, 2011, 2:01:41 PM8/18/11

to opend...@googlegroups.com

Last night there was a pretty good discussion in Victoria on the ethics of opendata and whether the release of currently FOI-able data as entire-datasets is transformative on ethical concerns, privacy law, etc.

The discussion stems from the launch of my proactivedisclosure.ca site which hosts some a search engine on some recently released financial datasets and it's /slightly/ controversial. However, its the potential for future expansion that has ethics questions being raised.

So I'll start from the beginning here and see where we go from there.

I was recently asked in an interview

"What would be your dream data sets?"

To which I responded:

"I'd like a copy of the email headers (to/from/date/messageid/subject) for every email sent to and from the public service. I'm pretty sure I'll never get that one, but I'd use it to do social network analysis and shine the light on how government communicates. Its the kind of dataset that you could spend years dissecting and analyzing." [...]

This data has tons of awesome use cases, but a few others raise ethical questions.

The one that stood out in my mind was brought up by Christopher Parsons (a researcher from UVIC) which was [paraphrasing] "What if someone uses their government email for whistleblowing to a journalist?" with this dataset, searching out a journalist's email could expose who is talking to them. But, on the other hand, it could also be highly useful for searching out cases of improper influence, data leaks, corruption, and even more innocuous things like, do [adms, dms], ministers, etc actually get to accurately know what is being discussed by the worker bees in the civil service...

I'd be interested to hear what the opendatabc group thinks of the ethics of radically open data and full transparency in government -- and more specifically the email headers question.

--

Kevin McArthur

James McKinney

unread,

Aug 18, 2011, 2:35:26 PM8/18/11

to opend...@googlegroups.com

I don't think this data will deliver in terms of reducing corruption/waste. Public servants will just carry on their fishy communications through other channels, no?

In terms of answering the question "how is government communicating?" and specifically about the interactions between ministers and civil servants, email is just one form of communication. I'm sure much is done in private meetings. In particular, one would have to be careful about what conclusions can be drawn about specific individuals.

My guess is the publication of this data would have a chilling effect of free communication within government. We already have a problem of there not being free communication between government employees and the public. Keeping this data private I think insulates government in a good way. There's a lot of negotiating/dealmaking in government which requires a certain level of secrecy for it to work, and this data would either jeopardize these activities or push those conversations into other channels which may be less convenient.

Bruce Atherton

unread,

Aug 18, 2011, 2:40:02 PM8/18/11

to opend...@googlegroups.com

I agree that the email dataset would be incredibly interesting, but I think that asking to have the original subject line is going a bit far. You sensibly do not ask for the contents of the emails, yet the subject line is a part of the contents. Sometimes the subject line is all the contents.

Other than the content, the benefit to having the subject is to correlate messages so you can have some indication of how messages relate to one another. For that, an anonymized subject line would probably be sufficient. So instead of "Subject: Re: BC Railgate Coverup" you would have "Subject: Re: 176200138".

I'd also suggest that having all the message headers would be useful, not just the ones you've listed. Often there are relationships between messages that are revealed in SMTP headers that aren't visible in a client program. That may have been your intention anyway.

Good luck getting the data. It makes for a nice dream, anyway.

Bruce Atherton

unread,

Aug 18, 2011, 2:58:50 PM8/18/11

to opend...@googlegroups.com

Thinking through this example, I believe that it is a non-issue. The government, who are the ones with the most incentive to find such a whistleblower, already has all of this data (as well as the email contents) which is why any sensible person would not use their government account to send or receive incriminating emails. Revealing the header data to the public does not pose any additional risk save that there would be added resources devoted to analysis which could bring up an association that the government might otherwise have missed. To me, the benefits far outweigh this slight risk.

On Thu, Aug 18, 2011 at 11:01 AM, Kevin McArthur <ke...@stormtide.ca> wrote:

Kevin McArthur

unread,

Aug 18, 2011, 3:01:58 PM8/18/11

to opend...@googlegroups.com

Bruce,

You're quite right that a hashed/anonymized subject line could address some of the concerns and still allow for some really useful analysis. However, it would also significantly reduce the value of the dataset for oversight. For example being able to search for BC Rail would be hugely helpful, but so would be the discovery potential of the subject line -- as the network analysis could break it down to ministry, unit, etc and see what specific offices are communicating about. The biggest problem with the FOI-oversight system today is that people don't know what what questions to ask -- the subject lines would completely change that.

The other headers would be interesting too, however, may invoke other privacy arguments (like IP addresses etc)

--

Kevin McArthur

James McKinney

unread,

Aug 18, 2011, 3:37:41 PM8/18/11

to opend...@googlegroups.com

On 2011-08-18, at 2:58 PM, Bruce Atherton wrote:

> Thinking through this example, I believe that it is a non-issue. The government, who are the ones with the most incentive to find such a whistleblower, already has all of this data (as well as the email contents) which is why any sensible person would not use their government account to send or receive incriminating emails. Revealing the header data to the public does not pose any additional risk save that there would be added resources devoted to analysis which could bring up an association that the government might otherwise have missed. To me, the benefits far outweigh this slight risk.

I wouldn't overstate the likelihood of government spying on the communications of users of its mail servers. Privacy laws protect civil servants from such spying. It doesn't make a difference that the government owns the servers on which the email is stored.

In Montreal, such a case of espionage has had significant political ramifications and has brought Quebec's anti-corruption unit down on the city. In this case, the Auditor General's email was read without authorization. The Auditor General, in performing his function, is in regular contact with whistleblowers within the city and until recently also received tips through Montreal’s whistleblower hotline. Disclosing details of his emails would have a chilling effect on whistleblowing activity. Like Clay Johnson, I could stand for more whistleblowing and less transparency: http://infovegan.com/2010/07/01/how-transparency-fails-and-works-too

Of course, such civil servants as the Auditor General could carry on their activities with private phones etc. but when you force this activity out of the domain of the normal work environment you send a strong social signal that this activity is not supported by the institution and encourages fear and uncertainty that really gets in the way of actually doing your job. I want my civil servants to be able to follow up on tips from whistleblowers without fear of retribution or disclosing sources. It is far from clear to me that the benefits outweigh the risks.

Bruce Atherton

unread,

Aug 18, 2011, 4:24:08 PM8/18/11

to opend...@googlegroups.com

Thanks for the story about Montreal's Auditor General. It is an interesting one, for sure.

Not to be too paranoid, but anyone who really wants to protect whistleblowers should probably insist that the communication infrastructure that supports them is completely isolated from the infrastructure of those they are blowing the whistle on. It is far too trivial to access email with no record of having done so for a large number of email systems. Resorting to the telephone is not a solution, either, since monitoring and storing conversations is easily done with many systems if you control the phone network. Relying on all the people within an organization to do the right thing by averting their eyes seems pretty risky to me.

Lisa Tansey

unread,

Aug 19, 2011, 12:59:43 AM8/19/11

to opend...@googlegroups.com

I think it would be hard to disentangle civil servants who were trying to do something useful from those trying to do something sleazy. They could both be communicating with the same entity but for different reasons. Also I see a lot of business cards from representatives of companies where the email address is gmail or yahoo etc. rather than an identifiable company.

But I do understand the beauty of the dream of following lines of influence and communication. Even imperfect, it would be quite an interesting dataset to mine.

Kevin McArthur

unread,

Aug 19, 2011, 12:08:20 PM8/19/11

to opend...@googlegroups.com

This is the beauty of data, it's not biased.

As I understand it, this data is almost all available under a costly FOI process if it is formally requested. The question then isn't one of 'should the public have this information' because they already have the info under FOI... but rather does having more efficient access to it (eg as a dataset under ogl) change the argument for FOI? Is this the big litmus test for FOI in an open-data world, and will it underscore the concept of proactive disclosure (that is government releasing data before it is requested under foi)

There's certainly a debate around FOI, and there was when it was enacted -- however, are those debates not long settled? The only change here would be in creating a proactively released dataset of the information rather than requiring FOI requests -- right?

--

Kevin

On 11-08-18 09:59 PM, Lisa Tansey wrote:

I think it would be hard to disentangle civil servants who were trying to do something useful from those trying to do something sleazy. ï¿½They could both be communicating with the same entity but for different reasons. ï¿½Also I see a lot of business cards from representatives of companies where the email address is gmail or yahoo etc. rather than an identifiable company.

But I do understand the beauty of the dream of following lines of influence and communication. ï¿½Even imperfect, it would be quite an interesting dataset to mine.

On Thu, Aug 18, 2011 at 1:24 PM, Bruce Atherton <call...@gmail.com> wrote:

Thanks for the story about Montreal's Auditor General. It is an interesting one, for sure.

Not to be too paranoid, but anyone who really wants to protect whistleblowers should probably insist that the communication infrastructure that supports them is completely isolated from the infrastructure of those they are blowing the whistle on. It is far too trivial to access email with no record of having done so for a large number of email systems. Resorting to the telephone is not a solution, either, since monitoring and storing conversations is easily done with many systems if you control the phone network. Relying on all the people within an organization to do the right thing by averting their eyes seems pretty risky to me.

On Thu, Aug 18, 2011 at 12:37 PM, James McKinney <oxford...@gmail.com> wrote:

On 2011-08-18, at 2:58 PM, Bruce Atherton wrote:

Thinking through this example, I believe that it is a non-issue. The government, who are the ones with the most incentive to find such a whistleblower, already has all of this data (as well as the email contents) which is why any sensible person would not use their government account to send or receive incriminating emails. Revealing the header data to the public does not pose any additional risk save that there would be added resources devoted to analysis which could bring up an association that the government might otherwise have missed. To me, the benefits far outweigh this slight risk.

I wouldn't overstate the likelihood of government spying on the communications of users of its mail servers. Privacy laws protect civil servants from such spying. It doesn't make a difference that the government owns the servers on which the email is stored.

In Montreal, such a case of espionage has had significant political ramifications and has brought Quebec's anti-corruption unit down on the city. In this case, the Auditor General's email was read without authorization. The Auditor General, in performing his function, is in regular contact with whistleblowers within the city and until recently also received tips through Montrealï¿½s whistleblower hotline. Disclosing details of his emails would have a chilling effect on whistleblowing activity. ï¿½Like Clay Johnson, I could stand for more whistleblowing and less transparency: http://infovegan.com/2010/07/01/how-transparency-fails-and-works-too

Herb Lainchbury

unread,

Aug 19, 2011, 12:36:58 PM8/19/11

to opend...@googlegroups.com

Citizens are concerned about corruption, and rightly so. There is a lot at stake.

I personally am not interested in looking for corruption explicitly but I also think it's naive to think that there isn't any. My reasoning is that even if you find corruption, it requires the coordinated effort and courage of lots of folks to do anything about it.

My preference is to look for systemic solutions such as introducing friction to the systems that corruption thrives on, such as secrecy. That's one of the reasons I support transparency in government. So that the economics of corruption are affected making the cost of corrupt transactions much higher and making it much quicker to identify potential trouble areas. Corrupt transactions then have to find alternative more expensive forms of communication like telephone / in person meetings which slows things down and associated costs making it less worth it to engage in corruption in the first place.

I am interested in the efficiency of government though and I think a dataset like this could be a gold mine for process improvement and insights into how governments work. For example, government is organized in silos. Sometimes those silos communicate with each other - sometimes they don't. I think it would be really interesting and valuable to be able to visualize which parts of government are well connected to other parts of government and which ones aren't. Then you could test hypotheses of how improved communication affects things like the workplace engagement scores by applying training or alternative forms of communication ( social media? instant messaging? ) to different groups.

Anyway, I don't actually expect this to happen but if it were to happen I would ask for the data moving forward and not even request past data. And I would probably leave out the subjects or obfuscate them in some way, as was suggested, as I would want to minimize the "chilling effect" mentioned.

I want my public servants to be free to concentrate on their jobs and not be worried about being mistakenly accused of wrongdoing. I want them to use the tools however makes them most effective and stoked about their work.

There is also a ton of other interesting data we should work on getting access to as well so if this one is problematic, maybe we should figure out which ones are easier to do, like data that's already released on the internet but just needs the license to be applied to it ( like the BC Government Directory ).

Herb

--
Herb Lainchbury
Dynamic Solutions Inc.
www.dynamic-solutions.com
http://twitter.com/herblainchbury

James McKinney

unread,

Aug 19, 2011, 12:50:29 PM8/19/11

to opend...@googlegroups.com

Agreed. If we want to analyze and optimize communication within government, though, we first need to know how they are communicating. Email is one way. But perhaps (among electronic alternatives) ticketing systems, instant messaging, wikis, and other enterprise systems are being used. In my personal communications with government, I've found the phone to be way, way faster in cases where it is available. We wouldn't want to draw conclusions on govt communication based on what may be an unrepresentative sample. So, I would first find out how government is communicating.

Bruce Atherton

unread,

Aug 19, 2011, 12:56:10 PM8/19/11

to opend...@googlegroups.com

On Fri, Aug 19, 2011 at 9:08 AM, Kevin McArthur <ke...@stormtide.ca> wrote:

This is the beauty of data, it's not biased.

I'm afraid I hold more to Mark Twain's view: there are lies, damned lies, and statistics.

You can use data for good or evil. Consider my example of emails with the subject "BC Railgate Coverup". You can imagine what a great story it would make to announce that there were X many emails with that subject line. You can imagine the impression it would leave in the public's mind. Even if the actual contents of the emails were about addressing false accusations of a coverup that were flying around.

As I understand it, this data is almost all available under a costly FOI process if it is formally requested. The question then isn't one of 'should the public have this information' because they already have the info under FOI... but rather does having more efficient access to it (eg as a dataset under ogl) change the argument for FOI? Is this the big litmus test for FOI in an open-data world, and will it underscore the concept of proactive disclosure (that is government releasing data before it is requested under foi)

All FOI requests have to be passed through a process to eliminate privacy issues, don't they? An email with the subject line, "Welcome back from Rehab" has information that is none of our business. To vet every single email generated by government would seem like a monumental task.

James McKinney

unread,

Aug 19, 2011, 1:08:05 PM8/19/11

to opend...@googlegroups.com

On 2011-08-19, at 12:56 PM, Bruce Atherton wrote:
> All FOI requests have to be passed through a process to eliminate privacy issues, don't they? An email with the subject line, "Welcome back from Rehab" has information that is none of our business. To vet every single email generated by government would seem like a monumental task.

One of the advantages of the FOI system is indeed that the work of prepping a dataset is only done when it's requested. If the work is monumental, the FOI agents will often work with the requester to narrow down the data requested. In the case of sensitive datasets, proactive disclosure means that you have work to do all the time, whether or not anyone is interested in it. I think this is why proactive disclosure is limited to datasets that are not sensitive or easy to vet for privacy concerns, e.g. contracts, travel and hospitality expenses, grants and contributions, etc.

Kevin McArthur

unread,

Aug 19, 2011, 1:08:52 PM8/19/11

to opend...@googlegroups.com

>
> I'm afraid I hold more to Mark Twain's view: there are lies, damned
> lies, and statistics.
>
> You can use data for good or evil. Consider my example of emails with
> the subject "BC Railgate Coverup". You can imagine what a great story
> it would make to announce that there were X many emails with that
> subject line. You can imagine the impression it would leave in the
> public's mind. Even if the actual contents of the emails were about
> addressing false accusations of a coverup that were flying around.
>

This is where opendata is so powerful. So one reporter puts out a
sensationalist story, and the data, being open and available to all,
allows the other N number of reporters to shred that persons
credibility. It could actually bring peer-review to journalism in that
case as more than one reporter would have access to the information.

>
> All FOI requests have to be passed through a process to eliminate
> privacy issues, don't they? An email with the subject line, "Welcome
> back from Rehab" has information that is none of our business. To vet
> every single email generated by government would seem like a
> monumental task.

I would hope that government officials are not communicating third-party
personal information via email (given the obvious security risks), and
as I understand it, part of the standard government process informs
civil servants their email will be monitored and used for FOI, etc. As
such, I think civil servants have largely already given up their rights
to privacy in official work email.

That said, I like herb's comment about getting the information going
forward... which would eliminate the 'but i didnt know' scenario, and
could re-iterate the public nature of government emails. That said, I
think essentially all civil servants realize that their email is
publicly accessible -- and if not, then that's a pretty major internal
communications issue.

Right now we have a system where journalists can get the information,
but where citizens, lacking the resources to file foi requests broadly,
have a lot harder time accessing this information. Further, the research
angle as Herb alludes to is essentially impossible in the FOI sphere.

Essentially, the data's already published, the people involved are
already aware of the public nature of the email and we're not asking for
contents, simply subjects, which at least in my mind, seems a reasonable
intrusion given the obvious benefits of such information.