Any privacy issue in publishing names of voters?

80 views
Skip to first unread message

Anand Chitipothu

unread,
Mar 1, 2015, 11:05:10 PM3/1/15
to data...@googlegroups.com
Hi,

I've voter data for couple of states with me. I'm thinking of publishing gender, age and name of all voters of these. Do you see any privacy issue in this? Any other issue that I should be careful about?

I'm planning to sort the names before publishing so that the original order is lost.

I think it'll be very interesting to study the patterns of how names are changing over time.

Anand

Venkata Pingali

unread,
Mar 1, 2015, 11:24:46 PM3/1/15
to data...@googlegroups.com
Couple of thoughts based on my experience with voter lists: 

1. The value of the names (for profiling population clusters) increase 
with granularity. Would recommend sharing at state level. You could 
potentially annotate with variables (e.g., abstract region - north/south) to 
make it little more interesting. 

2. There are cases involving people getting married, and names
not being entered properly. You can use the relationship information
to clean the data a bit before you generate the summaries to increase
accuracy. 

-Venkata 








 






--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Raphael Susewind

unread,
Mar 2, 2015, 2:05:07 AM3/2/15
to data...@googlegroups.com
Hi Anand,

as someone who worked with the voter lists, including an analysis of
trends (http://www.raphael-susewind.de/blog/2012/noor-mohd-ali), I would
personally NOT put them online in disaggregate form. I would only share
aggregate data (i.e. the 50 most frequent names in state X and their
prominence over time, or some such). If you do put them online, I would
do so at state level only, not further disaggregated. But I DO think
there are big privacy issues here. There was a discussion on this on the
list a few months back as well - spurred by this post by Snehashish
Ghosh:
http://cis-india.org/internet-governance/blog/electoral-databases-2013-privacy-and-security-concerns

My 5 cents,
Raphael
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+u...@googlegroups.com
> <mailto:datameet+u...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

srinivas kodali

unread,
Mar 2, 2015, 8:28:05 AM3/2/15
to data...@googlegroups.com

I don't think its ideal to publish raw data. But we could bring some larger awareness on the issue and work with NIC, Election commission

Regards,
Srinivas Kodali

To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.

Johnson Chetty

unread,
Mar 2, 2015, 9:56:04 PM3/2/15
to data...@googlegroups.com
Hello,

Just my thoughts:
Adding name, age and gender is definitely not a good idea. It does not convey anything of demographic importance particularly apart from name analysis.

And you did not mention if you would be adding only first names or full names. Adding full names means all the women will have their age mentioned! And for those with uncommon names it would definitely be an issue.

You can either
a) Remove name and keep only gender and age, randomize the names.
  If you want to convey gender and age distribution among the population.
 
b) If you are doing name analysis.
     Some options:
    - If you want to convey information like the age-wise distribution of a name (a little esoteric and still not as demographically relevant but this dataset allows for it), then aggregation would work.
    - Use first names only
    - Adding only the surname initials,
    - Generate unique hashes for surname so that name analysis will still be possible according to surnames but identities will not be revealed. Alternatively you can have two datasets, one with unique hashes for name and the other with surname. This way, community name based analysis is also feasible.

 



--
Regards,
Johnson Chetty




sandeep

unread,
Mar 2, 2015, 11:08:59 PM3/2/15
to data...@googlegroups.com
Data of voter name, age, gender at booth level is already available on respective EC official sites. I do not see additional risk by aggregating data to booth->constituency/district->state level. It will help data community to draw meaningful inferences through aggregated, cleaned data.

Anand Chitipothu

unread,
Mar 2, 2015, 11:19:07 PM3/2/15
to data...@googlegroups.com
Thanks for the suggestions everyone.

I think the best way to go forward is to publish frequency of individual words in the names per birth year.

For example, raw data like:

1978 Vamsi Krishna 
1978 Gopala Krishna

would become:

1978 Vamsi 1
1978 Gopala 1
1978 Krishna 2

I think that'll make sure individual names are not retained in the published dataset. Let me know if you still see any privacy issues with this approach.

Anand


--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anand Chitipothu

unread,
Mar 3, 2015, 3:23:40 PM3/3/15
to data...@googlegroups.com
I've published the names of people in Andhra Pradesh (including Telangana as the data was collected before the split).


Will plan to add more states soon.

Found very interesting facts by looking at the data.

The age of oldest person is 891 years and there are 2000+ people who are more than 150 years old.

Anand

Venkata Pingali

unread,
Mar 3, 2015, 11:08:55 PM3/3/15
to data...@googlegroups.com
Just one thought: 

My election work gave me an appreciation for the immense work done 
by EC. They have very limited resources, are always under time
and political pressures, and almost no tech talent. Most of the 
execution is beyond their control. 

I have quantitatively measured the change the voterlist quality
over time (from 2009 onwards for bangalore south). It has 
significantly improved. 

I think the data community should see beyond corner cases, and 
creatively use the resource to understand/address problems. Voterlist 
is the only public dataset, even with all its limitations, that has serious 
scale in terms of demographics. 



Snehashish Ghosh

unread,
Mar 4, 2015, 12:09:16 AM3/4/15
to data...@googlegroups.com
Replug:  My take on privacy related issues with respect to publication of electoral databases.  http://cis-india.org/internet-governance/blog/electoral-databases-2013-privacy-and-security-concerns

I would also like to highlight that the main issue which needs to discussed is the requirement of "consent" for the publication of voter's details. This is currently absent in India. In UK, there is an opt-out provision which allow voters to remove their personal information from the publicly available electoral database.

~Snehashish

Sachin Kalsi

unread,
Oct 31, 2018, 1:42:17 PM10/31/18
to datameet
Where can I get the voter list details (name , ID etc) in ENGLISH ? 

If I see Karnataka's voter list, it will be in Kannada. If I see Bihar's, it will be in HINDI & so on

Is there any way one can get voter list in ENGLISH ?
Reply all
Reply to author
Forward
0 new messages