Security Issues with the Voter List

153 views
Skip to first unread message

Devdatta Tengshe

unread,
Apr 11, 2014, 12:25:03 AM4/11/14
to data...@googlegroups.com
Hi,
I found this interesting article by a guy who downloaded and processed the Voter list of Delhi: https://medium.com/p/1aff55526881

I found this via a discussion on Reddit: http://www.reddit.com/r/programming/comments/22pn8u/i_wrote_a_few_simple_python_scripts_to_retrieve/

I'll like to quote his findings here:

  1. It is possible to automate the retrieval of every single PDF roll all across India
  2. These PDFs can then be processed in a matter of minutes to produce details like Addresses, names, father’s name, gender, age and voters ID number for every single registered voter of India
  3. Nearly 25% of the Voter IDs assigned within only Delhi fail to conform to the government format, and fail the Luhn Checksum test used to validate them. It is likely that other states are in a similar, if not worse condition


Regards,

Devdatta Tengshe


Avinash Celestine

unread,
Apr 11, 2014, 12:57:33 AM4/11/14
to data...@googlegroups.com
Hi Devdatta

Yes, though (and in the current context, i suppose thats a good thing), its not so easy for some other states such as UP, due to certain problems with the way the pdfs are encoded. Raphael, who is on this group, will testify to that...

I had alluded to this sometime back...


Avinash




--
For more details about this list
http://datameet.org/discussions/
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gautam John

unread,
Apr 11, 2014, 1:20:10 AM4/11/14
to datameet
Not sure this is a flaw. Maybe it's a feature? :D

Raphael Susewind

unread,
Apr 11, 2014, 2:14:36 AM4/11/14
to data...@googlegroups.com
Hi Devdatta and Avinash,

yes, I, too, am frankly surprised at the ease with which one can access
sensitive data in bulk. Not only PDF rolls and voter details, but also
things such as land records, BPL lists, and much more - I think we are
in an exciting as well as dangerous phase of fairly uncontrolled,
nascent e-Governance practices. But I think the ethical issues here are
a little more complex than mere privacy concern.

Upfront, I must admit that I use all the above sources for academic
research (in UP and across India). What Avinash described in principle
and at the example of Delhi can indeed be done on an all-India scale,
and I am sure there are more people than just me who do it.

But then the social sciences have long dealt with sensitive data and
developed protocols to protect it. Even though the data is publicly
available, I for instance have my own copy on a secure workstation with
full disk encryption and two factor authentication. Whenever possible, I
also work on anonymized subsets of data. Yet there are other potential
uses - some of the more worrisome you pointed out - which are not bound
by such data protection standards.

To me, this once more highlights the nascent stage of ethical standards
around Big Data and eGovernance. On the plus side, I am happy to have
that kind of access to conduct research which will ultimately be
ethically beneficial, leading to better understanding of social issues
and potentially to better policy advice. Also, there is a point to be
made that transparency is an important asset in elections in particular,
not only in terms of individual electoral search functions, but also in
terms of publicly accessible (and cross-checkable, publicly verifiable)
PDF rolls. Finally, a lot of this data had been available in the past as
well, only in distributed and/or commercial form, which means there had
been a hierarchy of access: small-time crooks could not use it, but
large-time crooks were always able to use it. Likewise, scholars at
large (often foreign) universities were able to use it, but not smaller
ones (this is still true for some data, geodata in particular, which I
can only access because of Ivy-League contacts and only process because
of an association with Oxford University).

The ethical challenge as I see it thus comes not from data availability
per se, but from the bulk accessibility and processability of data, as
well as the potential to link otherwise disconnected datasets with each
other (for instance a voter ID from the rolls to the online electoral
search mechanism to that voter's polling booth locality to the ration
card of a person with the same name registered at a ration shop in close
spatial proximity to the amount of rice that person obtained last week,
all coupled - in case of my own research - to that person's religious
identity through a namematching algorithm). And this IS an ethical
challenge indeed, particularly if one leaves the ivory tower of
academia, where ethical standards for such data are more ingrained, and
more adhered to. One need not go all the way to the various criminal
uses of such data - are we all happy with commercial use, to start with?

I have no easy answers here, because I think the ethical issue is fairly
complex, balancing privacy and personal security against transparency in
the political process and legitimate academic use of data (also because
I think the answer must be found in India through political
deliberation, and not in German academia). Still, in the end, I have to
admit that I often leave my desk in the evening with quite some unease
over the sheer wealth of private data that I work with...

What do others think?
Raphael

On 11.04.2014 06:57, Avinash Celestine wrote:
> Hi Devdatta
>
> Yes, though (and in the current context, i suppose thats a good thing),
> its not so easy for some other states such as UP, due to certain
> problems with the way the pdfs are encoded. Raphael, who is on this
> group, will testify to that...
>
> I had alluded to this sometime back...
>
> https://storify.com/ac_soc/voter-profiling
>
> Avinash
>
>
>
>
> On Fri, Apr 11, 2014 at 9:55 AM, Devdatta Tengshe <devd...@tengshe.in
> <mailto:devd...@tengshe.in>> wrote:
>
> Hi,
> I found this interesting article by a guy who downloaded and
> processed the Voter list of Delhi:https://medium.com/p/1aff55526881
> <https://medium.com/p/1aff55526881>
>
> I found this via a discussion on Reddit:
> http://www.reddit.com/r/programming/comments/22pn8u/i_wrote_a_few_simple_python_scripts_to_retrieve/
>
> I'll like to quote his findings here:
>
> 1. It is possible to automate the retrieval of every single PDF
> roll all across India
> 2. These PDFs can then be processed in a matter of minutes to
> produce details like Addresses, names, father’s name, gender,
> age and voters ID number for every single registered voter of India
> 3. Nearly 25% of the Voter IDs assigned within only Delhi fail to
> conform to the government format, and fail the Luhn Checksum
> test used to validate them. It is likely that other states are
> in a similar, if not worse condition
>
>
> Regards,
>
> Devdatta Tengshe
>
>
> --
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+u...@googlegroups.com
> <mailto:datameet+u...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+u...@googlegroups.com
> <mailto:datameet+u...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

Gautam John

unread,
Apr 11, 2014, 2:27:32 AM4/11/14
to datameet
Leaving aside my earlier comment as perhaps tongue in cheek, the
electoral rolls are *meant* to be public. The Registration of Electors
Rules, 1960 makes that clear. However, your larger point is well made.
Maybe what needs to be done is to *de-centralise* the storage? That
fulfils the requirements of the Registration of Electors Rules, 1960
and making it harder to something like this.

It says: "As soon as the roll for a constituency is ready, the
registration officer shall publish it in draft by making a copy
thereof available for inspection and displaying a notice in Form 5--
(a) at his office, if it is within the constituency, and (b) at such
place in the constituency as may be specified by him for the purpose,
if his office is outside the constituency ; [or in the official
website of the Chief Electoral Officer of the concerned State:]
[Provided that where such draft contains names of overseas electors,
the copies of such rolls shall also be published in the Electronic
Gazette 6 [or in the official website of the Chief Electoral Officer
of the concerned State].]

The Representation of the People Act, 1951 contains this: "The
Government shall, at any election to be held for the purposes of
constituting the House of the People or the Legislative Assembly of a
State, supply, free of cost, to the candidates of recognised political
parties such number of copies of the electoral roll, as finally
published ..."

Worth asking if we want political parties to have free access to it
but not citizens.
People Act, 1950 (43 of 1950)

Chandrashekhar Raman

unread,
Apr 11, 2014, 2:49:02 AM4/11/14
to data...@googlegroups.com
Raphael, you raise very pertinent issues.

We as a community love open data and in this country there is a lot that can be done to free all kinds of data so that it can be made use of in a good way (election data in an aggregated form is one example). But at the same time there are certain kinds of data which are not open ( i mean not open in a machine readable format) for a good reason. I believe voter rolls data is one such type. In the past voter lists have been used to pinpoint members of specific communities which were then targeted with gruesome effect. Shudder to think what happens if it is automated, a 'riot app'?

As Raphael points out this is not just about privacy, but could be much worse.

This group is a fantastic initiative and as it evolves, it would be great for us to involve more social scientists and policy experts - so as we advocate vociferously to free more data and make it open - we can also bring in the technical expertise here to recommend where data needs to be better protected and how.

cs


To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.

Avinash Celestine

unread,
Apr 11, 2014, 2:53:48 AM4/11/14
to data...@googlegroups.com
Hi Gautam 

I dont think the issue is with having the electoral roll available publicly per se. personally, i think its better that the rolls are available in the open, as compared with the alternative, where it is confidential, thus leaving it open to other types of abuses. 

But i do think that certain minimum safeguards should be in place - even something as simple as a captcha code (and mentioned in the link which started off this thread), to deter heavy bulk downloading...it seems to me the bare minimum. 

Now, will this stop me from searching for someone specific within the voters list that i want to target, given that i have a rough idea of where they live? certainly not. 

Coupled with this is the irony, that other datasets for which there is absolutely no reason for secrecy (atleast i cant conceive of a reason for it - maybe its pure bureaucracy), are extremely difficult to get. Case in point is any official version of the PC, AC shapefiles which Raphael and others on this group have been trying so hard to create.

Raphael is right - these are complex issues. And we have barely begun to scratch the surface of what should be done. Interestingly, in the reddit thread linked above, there are references to the fact that  New York or Sweden too provide vast amounts of personal information for little or no fee...

Avinash 




--
For more details about this list
http://datameet.org/discussions/
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.

Raphael Susewind

unread,
Apr 11, 2014, 3:07:06 AM4/11/14
to data...@googlegroups.com
Chandrashekhar,

just on the specific issues of targeting communities, which I have
thought about a great deal (my first book was on post-2002 Gujarat), my
tentative conclusion is this:

The fact that electoral rolls had been used in the past in riots before
they were available online shows that rioters, if they want to, can
access this data already. As Gautam pointed out, it IS public by law.
What changes is merely the scale of data availability. Large-scale data
would only be 'more useful' for large-scale targeting, however
(small-scale targeting is possible already), which I don't see happening
at this time (with the troublesome exception of Gujarat, particularly
troublesome now that Mr Modi runs for PM - but here, too, the targeting
happened in small units on the ground, even though coordination took
place higher up). On the other hand, fine-grained large-scale data is
absolutely necessary to understand a range of issues about (religious,
caste) economic position. So that in this specific case, we have
additional benefits but no additional risk (beyond the worrisome risk
already out there)...

More detailed arguments about this in a forthcoming paper of mine at
http://pub.uni-bielefeld.de/publication/2631138

Best,
Raphael
> <mailto:datameet%2Bunsu...@googlegroups.com>
> > <mailto:datameet+u...@googlegroups.com
> <mailto:datameet%2Bunsu...@googlegroups.com>>.
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
> > --
> > For more details about this list
> > http://datameet.org/discussions/
> > ---
> > You received this message because you are subscribed to the Google
> > Groups "datameet" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to datameet+u...@googlegroups.com
> <mailto:datameet%2Bunsu...@googlegroups.com>
> > <mailto:datameet+u...@googlegroups.com
> <mailto:datameet%2Bunsu...@googlegroups.com>>.
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
> Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
> Papers & Blog | http://www.raphael-susewind.de
>
> Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)
>
> --
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+u...@googlegroups.com
> <mailto:datameet%2Bunsu...@googlegroups.com>.

Chandrashekhar Raman

unread,
Apr 11, 2014, 4:56:42 AM4/11/14
to data...@googlegroups.com
Raphael, To clarify, i am not trying to make a case against availability of fine grained data, far from it i'm with you on this argument among others that are made spuriously to restrict access. I might have stretched the point but then again - killing is just one extreme form of discrimination - there are others that are less visible

you summed it up very well, its good to have a healthy caution and unease when dealing with some of this data,there are probably no simple answers here. 

will read the paper at leisure.

cs.


To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.

Raphael Susewind

unread,
Apr 14, 2014, 3:15:30 AM4/14/14
to data...@googlegroups.com
As a follow-up to this discussion:

electoralsearch.in began to implement rate limiting and selective IP
blocking yesterday. Sad as this is for my own research purposes, I
welcome the step from a privacy point of view...

Raphael
> <mailto:li...@raphael-susewind.de
> > <mailto:datameet%2Bunsu...@googlegroups.com
> <mailto:datameet%252Buns...@googlegroups.com>>
> > > <mailto:datameet+u...@googlegroups.com
> <mailto:datameet%2Bunsu...@googlegroups.com>
> > <mailto:datameet%2Bunsu...@googlegroups.com
> <mailto:datameet%252Buns...@googlegroups.com>>>.
> > > For more options, visit https://groups.google.com/d/optout.
> > >
> > >
> > > --
> > > For more details about this list
> > > http://datameet.org/discussions/
> > > ---
> > > You received this message because you are subscribed to the
> Google
> > > Groups "datameet" group.
> > > To unsubscribe from this group and stop receiving emails
> from it, send
> > > an email to datameet+u...@googlegroups.com
> <mailto:datameet%2Bunsu...@googlegroups.com>
> > <mailto:datameet%2Bunsu...@googlegroups.com
> <mailto:datameet%252Buns...@googlegroups.com>>
> > > <mailto:datameet+u...@googlegroups.com
> <mailto:datameet%2Bunsu...@googlegroups.com>
> > <mailto:datameet%2Bunsu...@googlegroups.com
> <mailto:datameet%252Buns...@googlegroups.com>>>.
> > > For more options, visit https://groups.google.com/d/optout.
> >
> > --
> > Raphael Susewind | BGHS Bielefeld University, CSASP University
> of Oxford
> > Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
> > Papers & Blog | http://www.raphael-susewind.de
> >
> > Please do consider http://www.gnupg.org for encryption (key id
> A5ED49AE)
> >
> > --
> > For more details about this list
> > http://datameet.org/discussions/
> > ---
> > You received this message because you are subscribed to the Google
> > Groups "datameet" group.
> > To unsubscribe from this group and stop receiving emails from it,
> > send an email to datameet+u...@googlegroups.com
> <mailto:datameet%2Bunsu...@googlegroups.com>
> > <mailto:datameet%2Bunsu...@googlegroups.com
> <mailto:datameet%252Buns...@googlegroups.com>>.

Gautam John

unread,
May 19, 2014, 1:06:11 AM5/19/14
to datameet

Snehashish Ghosh

unread,
May 19, 2014, 1:28:39 AM5/19/14
to data...@googlegroups.com
Dear Gautam,

Thank you. This is very interesting. I wrote a piece on this issue right after the failed Google-ECI deal in February <http://goo.gl/e9Xea0>
The UK approach seems to be a good one. In UK there are two voter lists - "full list" and "edited list". You can choose to be removed from the edited list during the time of registration or at anytime thereafter. The edited list is available in the public domain and the full list is safeguarded by purpose limitation and UK Data Protection Law.

~Snehashish


On Mon, May 19, 2014 at 10:36 AM, Gautam John <gkj...@gmail.com> wrote:

--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.

Raphael Susewind

unread,
May 19, 2014, 1:41:29 AM5/19/14
to data...@googlegroups.com
Dear Gautam,

thanks for the link - a discussion overdue.

After some discussion a few weeks back on this list, the ECI at least
introduced rate limiting to electoralsearch.in (though probably for QoS
reasons rather than privacy). Chhattisgarh is the only state with a
CAPTCHA to prevent mass downloading, while Uttarakhand does not have the
rolls online at all. Rolls for all other states are freely available,
though there are some technical challenges in terms of extracting data
from corrupted PDFs (but this CAN be done).

While I am happy to be able to use electoral roll data for academic
research, this commodification was exactly what I worried about from the
beginning. Let's hope ECI changes its access policies soon - though
arguably the damage is done, with an "almost population register" online
for long enough for all to scrape and use. But then privacy laws that
prohibit what is technically possible could at least limit damage.

My five cents,
Raphael
> <mailto:datameet%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+u...@googlegroups.com
> <mailto:datameet+u...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Dilip Damle

unread,
May 19, 2014, 2:39:45 AM5/19/14
to data...@googlegroups.com
HI

YES,

I think the way the Data access is provided gives transparency but it can be misused.
I had downloaded Goa and Delhi pdfs several years back.

Then explained to someone on a social network how he/she can be tracked and Stalked.
PIPL.com can help you get complete name even if your name is hidden on some networks.
MTNL/BSNL Phone directory can get your number
Voter pdfs can give your address

and this can be done on a mass scale.

My opinion is they should make pdfs after Rasterising the pages in a kind of Odd and jaggered font
So that they are readable by Humans but not easily by Machines

Rgds
Dilip Damle

Avinash Celestine

unread,
May 21, 2014, 10:59:39 AM5/21/14
to data...@googlegroups.com
One more data point to our discussion on data privacy in indian elections, though from a slightly different perspective.

EC has told supreme court that it is against making polling station level voting data public


Rgds

Avinash
--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.

Naraina Damle

unread,
Oct 30, 2018, 1:48:11 AM10/30/18
to datameet
Hi, 

Just updating with a recent find. 



>>>>My opinion is they should make pdfs after Rasterising the pages in a kind of Odd and jaggered font 
>>>>So that they are readable by Humans but not easily by Machines


Now they have a captcha before you can download a pdf and the PDF itself is Rasterised.

srinivas kodali

unread,
Oct 31, 2018, 12:56:23 AM10/31/18
to data...@googlegroups.com
The election commission gives all this pdfs in CDs to political parties under law. The lack of availability of voter lists makes it hard for people to know if they are in the list or not. That is main issue for most of them. 

Regards,
Srinivas Kodali


Naraina Damle

unread,
Oct 31, 2018, 12:43:20 PM10/31/18
to datameet
True, 

But now I think it is more difficult to convert them back to a Processable Data. IMO

It is basically for human (visual) consumption. 

That is good 
Reply all
Reply to author
Forward
0 new messages