Brazil Elections Data

33 views
Skip to first unread message

Julio Trecenti

unread,
Oct 30, 2014, 7:53:49 AM10/30/14
to ropensci...@googlegroups.com
Hi!

I'm a statistician from Brazil. rOpenSci is really a great initiative!

Here in Brazil, elections are made electronically. That means we have a very efficient calculation of the votes (the final results come in less than three hours after the urns are closed). That also means that we have a great repository of data about elections. 

The Superior Court of Elections (TSE) makes this data publicly available here http://www.tse.jus.br/hotSites/pesquisas-eleitorais/resultados.html. The data is available by state (we have 27), in CSV format. I would like to build an R interface to query this data so that it could be easily analyzed and visualized. The two steps I can imagine to accomplish this task are 

1) Transfer the data to a public, "queriable" repository (by queriable I mean that one could either download all the data or some parts of the data. A kind of API?). 
2) Create an R package with some functions to gather data and make it tidy.

I think the biggest problem is the data size. It's not so big, but I think it's impossible to just add it in an R package. It's about 10GB.

My questions are 
a) Am I posting these questions in the right place? 
b) Is there any free solution to step 1? I considered Google BigQuery, to use with Hadley's bigrquery package, but I can't create a public and free data repo there...
c) Is rOpenSci a good place to share my package?

Thanks in advance,

Julio Trecenti

OBS: Sorry for my bad English ;)

Matt Jones

unread,
Oct 30, 2014, 2:34:28 PM10/30/14
to ropensci...@googlegroups.com
Sounds like a great initiative.

Regarding (c), have you also seen ROpenGov (http://ropengov.github.io/), which is like ROpenSci but focused on government and civics rather than science? They don't have the same level of participation yet it seems, and there's some overlap with ROpenSci (see the 'fmi' package which strikes me as appropriate for ROpenSci). But maybe its of interest to you.

Matt

--
You received this message because you are subscribed to the Google Groups "ropensci-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ropensci-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Scott Chamberlain

unread,
Oct 30, 2014, 3:28:31 PM10/30/14
to ropensci...@googlegroups.com
Regarding c) -> Indeed, rOpenGov sounds like a good place to me for data related to government operations (e.g. elections, lobbying operations, politicians activities, et.c ). That is, in terms of an R package, you should ask those folks if they'd be interested in a package in their ecosystem https://groups.google.com/forum/#!forum/ropengov-forum

a) Yes, this is a great place for this question
b) I would make sure to consider the license for the data. Does the license allow you to repost the data elsewhere and provide it to others?  After that, yes 10GB is too large for an R package. You could do it, but would be a very long wait for installation, and CRAN only allows up to 5 MB of data in a package. In an ideal world there would be a REST-based API for the data, but that can take a lot of work, so just a simple ftp server for the csv files could work, and then simply create an R package as an interface to those files. The ots package (https://github.com/ropensci/ots) I've been working on recently is like this in that it fetches flat files and presents them as data.frames with metadata. 

Hope that helps. 

Cheers, Scott
To unsubscribe from this group and stop receiving emails from it, send an email to ropensci-discuss+unsubscribe@googlegroups.com.

Leo Lahti

unread,
Oct 31, 2014, 7:17:33 AM10/31/14
to ropensci...@googlegroups.com
Excellent.

Very welcome to host the package at rOpenGov if you like. We are already working on election data in Finland and Sweden, and there are plans for other countries. The packages are in development but in the longer term we could ideally try to harmonize the interfaces as much as possible to make it possible to apply exactly same analytical tools to election data sets from different countries. By linking with our community would be a step towards this direction.

I agree with Scott that it would be important to check the data license first to see if it allows modification and redistribution.

In Finland we are building a public API on election (and other parliamentary data http://www.datavaalit.fi/resources/api/) and an associated R package. All code is open source and intended for wider distribution so in principle you could take advantage of these API tools that there already are for such data. We can discuss in more detail in  https://groups.google.com/forum/#!forum/ropengov-forum if you are interested but building an API is certainly more work than polishing and distributing the CSVs. Hence a collection of tidy CSV files might be a faster way to at least get started and having the R package running.

I think we can provide for you free server space with good bandwidth for this purpose if the data license allows redistribution. We can also provide support when you are preparing the R package for the election data queries. We could also collaborate on R packages for election data analytics that could operate with election data from different countries.

p.s. Regarding overlap between rOpenSci & rOpenGov, these projects are complementary and we may revise the package collections in the future to reduce the overlap. Our specific focus and community is on open government data analytics, the community is smaller but active on this area, and the infrastructure is building up steadily.
Reply all
Reply to author
Forward
0 new messages