from import to analysis

Aaron Swartz

unread,

Sep 5, 2008, 9:16:35 PM9/5/08

to watchdog-...@googlegroups.com

Speaking with our wide and benevolent funders over at the Sunlight
Network the other day, they made a good point: we've spent a lot of
time and effort acquiring data, but we haven't paused to do much
analysis. We came up with a couple of ideas (and we should all work on
more) but I've added a few to the volunteer page (usual rules apply;
except if you want to seriously work on something, let me know and
you'll get a bug tracker account to keep us all up-to-date with your
progress):

http://watchdog.jottit.com/volunteer

## why'd they vote that way?

[Ebonya Washington found][Washington 2007] [PDF] that politicians with
daughters tend to be better on women's issues. [David Wheeler
argues][Wheeler 2008] that state income, state dependence on fossil
fuels, and political ideology explained the voting on the
Warner-Lieberman global warming bill.

[Washington 2007]:
http://www.econ.yale.edu/faculty1/washington/genderpap10.pdf
[Wheeler 2008]: (http://www.cgdev.org/content/publications/detail/16387)

Your tasks is to write some automated code, ideally using the amazing
[TETRAD][], that analyzes votes on bills along with the other data in
watchdog and tries to explain why politicians voted that way.

[TETRAD]: http://www.phil.cmu.edu/projects/tetrad

## contribution clustering

Most campaign contribution data, like that on Open Secrets, is
[grouped into a bunch of basic industry categories][os]: Investment,
Real Estate, Entertainment, Lobbyists. These categories are
preselected and donations are placed into them by researching the
employer of the donor; a time-consuming manual process.

We could automate a lot of it, perhaps, but what if we tried something
difference: what if we ran [a clustering algorithm][wk] (see see [this
poorly-named book][pci] for explanation and Python examples) on the
data and let the data determine which clusters are most relevant.
Obviously, we'll need humans to interpret the data at the end, but
that's a lot less work and a lot more interesting.

[os]: http://www.opensecrets.org/politicians/industries.php?cycle=2008&cid=N00001821
[wk]: http://en.wikipedia.org/wiki/Data_clustering
[pci]: http://books.theinfo.org/go/0596529325

## voter/contributor heatmaps

Soon we'll have [[vrdb|voter registration data]] and individual
contribution data for much of the country. But this is a lot of data
-- we'll want nicer ways of visualizing it.

One obvious way is through maps: show where the most registered voters
are clustered; show where political contributions come from. Fundrace
does [a version of this][fm] that I find faintly hideous. _The New
York Times_, as you might imagine, [was a bit more tasteful][nyt].
(More from those guys: [1][], [2][].) Surely we can do better. Or at
least something.

[fm]: http://fundrace.huffingtonpost.com/neighbors.php?type=city&city=manhattan
[nyt]: http://www.nytimes.com/imagepages/2004/06/23/politics/20040623_BLOCK_GRAPHIC.html
[1]: http://www.nytimes.com/interactive/2007/10/23/business/20071104_MEGACHURCH_GRAPHIC.html
[2]: http://www.nytimes.com/interactive/2007/10/23/business/20071104_MEGACHURCH_GRAPHIC.html

Alex Gourley

unread,

Sep 6, 2008, 4:06:35 PM9/6/08

to watchdog-...@googlegroups.com

I wanted to point out this article:
http://www.americanprogressaction.org/issues/2008/palin_earmarks.html
which is currently gaining traction (on reddit, even :)

This graph should be easily created from the data we have, and a small
article using the data would be a good way to spread awareness of the
project.

It would be even cooler if we could provide tools which let users play
with these numbers in real time, then someone could spread a deep link
into the arranged graph, and let people explore out from there. I know
there are tools and sites which currently do something like this for
various data sets, but I can't remember enough about them to even
google them.

-Alex

Aaron Swartz

unread,

Sep 15, 2008, 3:51:27 PM9/15/08

to watchdog-...@googlegroups.com

> I wanted to point out this article:
> http://www.americanprogressaction.org/issues/2008/palin_earmarks.html
> which is currently gaining traction (on reddit, even :)
>
> This graph should be easily created from the data we have, and a small
> article using the data would be a good way to spread awareness of the
> project.

This would be a fun project for a volunteer.

> It would be even cooler if we could provide tools which let users play
> with these numbers in real time, then someone could spread a deep link
> into the arranged graph, and let people explore out from there. I know
> there are tools and sites which currently do something like this for
> various data sets, but I can't remember enough about them to even
> google them.

You're probably thinking of swivel.com and many-eyes.com.

Reply all

Reply to author

Forward