ECHELON: Wrangling messy political data

289 views
Skip to first unread message

Zack Maril

unread,
Sep 16, 2014, 1:48:46 PM9/16/14
to clo...@googlegroups.com
This might be of interest to the Clojure/Datomic community: 

http://sunlightfoundation.com/blog/2014/09/16/wrangling-messy-political-data-into-usable-information/

I'm part of the Influence Explorer team at the Sunlight Foundation. We're building a system with Datomic and Instaparse to disentangle and analyze the web of relationships between lobbyists, special interest groups, and legislators. It's been surprisingly successful so far, as in it works as well as we hoped it would when we first started building it. Datomic has been a pleasure to use and Instaparse has been excellent so far. The above link is a results oriented blog post about what we did and why we did it. We haven't written up the technical details yet as they are still evolving (we're attempting to move from the current static batch processing system to hopefully a streaming approach in the near future). But, if you have any questions about the project I'd be happy to answer them.
-Zack

kovas boguta

unread,
Sep 16, 2014, 2:08:18 PM9/16/14
to clo...@googlegroups.com
Thats very cool!!
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Ashton Kemerling

unread,
Sep 16, 2014, 2:28:10 PM9/16/14
to clo...@googlegroups.com
That's really neat. Planning on giving a talk?

Zack Maril

unread,
Sep 16, 2014, 3:15:19 PM9/16/14
to clo...@googlegroups.com
I had submitted a talk to clojure conj but it wasn't accepted. Stiff competition this year. This project is ongoing though so I imagine sometime in the next year I'll try again and give a talk at a conference about all the weird stuff ECHELON can do. Like, it only took the better part of an afternoon to make a prototype for an intelligent search engine for corporations within the database. We just turned the output of the parser into an index on corporations and firms. So a search for "Coca Cola" should turn up records like "Coca Cola Inc. formerly known as COCA-COLA COMPANY", "The Washington Group on behalf of Coca Cola Inc.", and "COCA COLA INCORPORATED". We're still exploring what's possible and what would actually be useful for folks. 
-Zack

You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/lxpsgkwITCw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.

Colin Fleming

unread,
Sep 16, 2014, 4:52:19 PM9/16/14
to clo...@googlegroups.com
It's a shame your talk wasn't accepted - that looks fascinating and I would have loved to see it. Please let us know when you do your technical write-up!

Joshua Ballanco

unread,
Sep 17, 2014, 3:47:10 AM9/17/14
to clo...@googlegroups.com
Very interesting project! (and congrats on joining SunlightLabs)

As someone who has, on occasion, contributed to various SunlightLabs foundation projects, my only question would be: what do you need help with?

I understand completely if this project is not at a point where you’re looking for outside contribution yet, but if there’s any area you’d like help with, I’d be interested to throw at least a few weekend hours in your general direction.

Cheers,

Josh
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com (mailto:clo...@googlegroups.com)
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com (mailto:clojure+u...@googlegroups.com)
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com (mailto:clojure+u...@googlegroups.com).

Zack Maril

unread,
Sep 17, 2014, 10:47:11 AM9/17/14
to clo...@googlegroups.com
One issue we've been having is that we are using Datomic in a non standard way (I think). We're touching all the data all at once for some resolution steps and Datomic wasn't necessarily made for that (at least that has been my impression). I've been trying to understand the Datomic query system better so I know the limitations of the system and how to work with them. ECHELON takes a few hours to run right now and we're looking to increase the amount of data we collect and link by an order of magnitude or two in the next year. It's fine to let ECHELON run for a few days (the batch resolution step only needs to be run once, the streaming approach should always be fast) but it would be nice if we knew for sure that we had found the upper limit for how fast ECHELON can resolve things. If anybody has resources about how to abuse Datomic in this manner correctly, I'd be really happy if you sent them my way. I'll see if I can't push for a technical right up soon so I can provide more information about how ECHELON actually works.
-Zack

xavriley

unread,
Oct 3, 2014, 1:31:37 PM10/3/14
to clo...@googlegroups.com
Hi Zack,

First off, really great work on Echelon! I work on similar data problems at opencorporates.com (who are partly funded by sunlight) and it's great to see instaparse and clojure being used in this way. I'm looking forward to trying it out myself.

I just wanted to suggest that although you didn't make it onto he conj this year, had you considered the CFP for clojure exchange in London? http://blog.skillsmatter.com/2014/08/29/clojure-exchange-call-for-papers/

I've attended the previous two years and it's a great event. It's also run by Bruce Durling of Mastodon C who are a "big data" startup in London that specialise in clojure - I'm sure they'd have some useful advice to offer.

Xavier

Reply all
Reply to author
Forward
0 new messages