Hi Devdatta and Avinash,
yes, I, too, am frankly surprised at the ease with which one can access
sensitive data in bulk. Not only PDF rolls and voter details, but also
things such as land records, BPL lists, and much more - I think we are
in an exciting as well as dangerous phase of fairly uncontrolled,
nascent e-Governance practices. But I think the ethical issues here are
a little more complex than mere privacy concern.
Upfront, I must admit that I use all the above sources for academic
research (in UP and across India). What Avinash described in principle
and at the example of Delhi can indeed be done on an all-India scale,
and I am sure there are more people than just me who do it.
But then the social sciences have long dealt with sensitive data and
developed protocols to protect it. Even though the data is publicly
available, I for instance have my own copy on a secure workstation with
full disk encryption and two factor authentication. Whenever possible, I
also work on anonymized subsets of data. Yet there are other potential
uses - some of the more worrisome you pointed out - which are not bound
by such data protection standards.
To me, this once more highlights the nascent stage of ethical standards
around Big Data and eGovernance. On the plus side, I am happy to have
that kind of access to conduct research which will ultimately be
ethically beneficial, leading to better understanding of social issues
and potentially to better policy advice. Also, there is a point to be
made that transparency is an important asset in elections in particular,
not only in terms of individual electoral search functions, but also in
terms of publicly accessible (and cross-checkable, publicly verifiable)
PDF rolls. Finally, a lot of this data had been available in the past as
well, only in distributed and/or commercial form, which means there had
been a hierarchy of access: small-time crooks could not use it, but
large-time crooks were always able to use it. Likewise, scholars at
large (often foreign) universities were able to use it, but not smaller
ones (this is still true for some data, geodata in particular, which I
can only access because of Ivy-League contacts and only process because
of an association with Oxford University).
The ethical challenge as I see it thus comes not from data availability
per se, but from the bulk accessibility and processability of data, as
well as the potential to link otherwise disconnected datasets with each
other (for instance a voter ID from the rolls to the online electoral
search mechanism to that voter's polling booth locality to the ration
card of a person with the same name registered at a ration shop in close
spatial proximity to the amount of rice that person obtained last week,
all coupled - in case of my own research - to that person's religious
identity through a namematching algorithm). And this IS an ethical
challenge indeed, particularly if one leaves the ivory tower of
academia, where ethical standards for such data are more ingrained, and
more adhered to. One need not go all the way to the various criminal
uses of such data - are we all happy with commercial use, to start with?
I have no easy answers here, because I think the ethical issue is fairly
complex, balancing privacy and personal security against transparency in
the political process and legitimate academic use of data (also because
I think the answer must be found in India through political
deliberation, and not in German academia). Still, in the end, I have to
admit that I often leave my desk in the evening with quite some unease
over the sheer wealth of private data that I work with...
What do others think?
Raphael
On 11.04.2014 06:57, Avinash Celestine wrote:
> Hi Devdatta
>
> Yes, though (and in the current context, i suppose thats a good thing),
> its not so easy for some other states such as UP, due to certain
> problems with the way the pdfs are encoded. Raphael, who is on this
> group, will testify to that...
>
> I had alluded to this sometime back...
>
>
https://storify.com/ac_soc/voter-profiling
>
> Avinash
>
>
>
>
> On Fri, Apr 11, 2014 at 9:55 AM, Devdatta Tengshe <
devd...@tengshe.in
> <mailto:
devd...@tengshe.in>> wrote:
>
> Hi,
> I found this interesting article by a guy who downloaded and
> processed the Voter list of Delhi:
https://medium.com/p/1aff55526881
> <
https://medium.com/p/1aff55526881>
>
> I found this via a discussion on Reddit:
>
http://www.reddit.com/r/programming/comments/22pn8u/i_wrote_a_few_simple_python_scripts_to_retrieve/
>
> I'll like to quote his findings here:
>
> 1. It is possible to automate the retrieval of every single PDF
> roll all across India
> 2. These PDFs can then be processed in a matter of minutes to
> produce details like Addresses, names, father’s name, gender,
> age and voters ID number for every single registered voter of India
> 3. Nearly 25% of the Voter IDs assigned within only Delhi fail to
> conform to the government format, and fail the Luhn Checksum
> test used to validate them. It is likely that other states are
> in a similar, if not worse condition
>
>
> Regards,
>
> Devdatta Tengshe
>
>
> --
> For more details about this list
>
http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to
datameet+u...@googlegroups.com
> <mailto:
datameet+u...@googlegroups.com>.
> <mailto:
datameet+u...@googlegroups.com>.
--
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
Papers & Blog |
http://www.raphael-susewind.de
Please do consider
http://www.gnupg.org for encryption (key id A5ED49AE)