Scraping the global civic tech community on GitHub

84 views
Skip to first unread message

Stefan Baack

unread,
Nov 19, 2015, 3:19:11 PM11/19/15
to pop...@googlegroups.com

Hi everyone,

I’m a PhD student from the Netherlands and I’m currently working on a research project about civic hacking at mySociety. In connection with this research, I started a little experiment: scraping information about civic tech organizations on GitHub to get a grasp of the global community. After I posted a first draft on mySociety's community mailing list I got a lot of feedback and help to compile a bigger, more complete dataset. As a result, here is an updated version of the article:

http://sbaack.com/2015/11/19/scraping-the-global-civic-tech-community-on-github-part-2.html

I hope it's interesting for you and I would love to hear your thoughts about it :-)

Best,
Stefan

James McKinney

unread,
Nov 19, 2015, 5:18:13 PM11/19/15
to Stefan Baack, Poplus - Collaborative Civic Coding
Nice! I have been curious to know who contributes the most *outside* their home organizations. rgrp is indeed prolific, but 115 of his 133 repositories are within the okfn organization. If we remove people’s home organizations, we get this list:

47 jpmckinney
22 dracos
21 mhl
19 zarino
18 konklone
17 duncanparkes
16 invalid-email-address
16 rgrp
16 andylolz
16 evdb

I see I’m in good company! :)

The code for this is at https://gist.github.com/jpmckinney/56632f96808ef1ee326d

The script can also print out the top organizations to which those users contributed, as follows:

47 jpmckinney
10 mysociety
8 datamade
6 sunlightlabs
22 dracos
3 datauy
3 ciudadanointeligente
3 openpolis
21 mhl
3 ciudadanointeligente
3 datauy
3 Sobanukirwa
19 zarino
2 openpolis
2 openaustralia
2 openstate
18 konklone
4 codeforamerica
3 datamade
2 civio
17 duncanparkes
2 Code4SA
2 TEDICpy
2 Sobanukirwa
16 invalid-email-address
3 ushahidi
3 appsembler
3 hasadna
16 rgrp
2 codeforamerica
2 opengovfoundation
2 g0v
16 andylolz
3 everypolitician
2 g0v
2 datauy
16 evdb
3 ciudadanointeligente
2 Sobanukirwa
2 openstate

If you instead look for the number of organizations to which users contributed outside their home organization(s) (regardless of the number of repositories contributed to within those organizations), the list is a little different:

17 jpmckinney
13 rgrp
13 pudo
13 zarino
12 duncanparkes
12 nickstenning
12 dracos
12 mhl
11 andylolz
10 henare

Thanks for getting this data together!

James

> On Nov 19, 2015, at 3:19 PM, Stefan Baack <s.b...@gmail.com> wrote:
>
> Hi everyone,
>
> I’m a PhD student from the Netherlands and I’m currently working on a research project about civic hacking at mySociety. In connection with this research, I started a little experiment: scraping information about civic tech organizations on GitHub to get a grasp of the global community. After I posted a first draft on mySociety's community mailing list I got a lot of feedback and help to compile a bigger, more complete dataset. As a result, here is an update version of the article:
>
>
> http://sbaack.com/2015/11/19/scraping-the-global-civic-tech-community-on-github-part-2.html
>
> I hope it's interesting for you and I would love to hear your thoughts about this :-)
> Best,
> Stefan
>
>
> --
> Poplus.org - Get involved: http://poplus.org/get-involved
> IRC: #poplus https://webchat.freenode.net
> Docs: http://bit.ly/poplusdrive
> ---
> You received this message because you are subscribed to the Google Groups "Poplus - Collaborative Civic Coding" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to poplus+un...@googlegroups.com.
> To post to this group, send email to pop...@googlegroups.com.
> Visit this group at http://groups.google.com/group/poplus.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/poplus/3e3eb88d-5278-4063-9916-342ecaacccc8%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

signature.asc

James McKinney

unread,
Nov 19, 2015, 5:26:51 PM11/19/15
to Poplus - Collaborative Civic Coding
Note that there is some overcounting when counting the distinct organizations to which a user contributed, because the organization of some repositories is incorrect (e.g. alaveteli is mysociety, not Sobanukirwa). However, the graph has only one alaveteli, so there is no overcounting of repositories - as far as I can tell.
> To unsubscribe from this group and stop receiving emails from it, send an email to poplus+unsubscribe@googlegroups.com.

Stefan Baack

unread,
Nov 19, 2015, 5:58:16 PM11/19/15
to Poplus - Collaborative Civic Coding
Thank you James, this is great! Never thought about filtering out home organizations :-)

You're right, the graph is a bit off. I think the problem is that my scraper does not check whether a repo is forked. I had a similar problem with users because users can be part of more than one organization of course, but I wasn't sure how to reflect this in the graph. I decided that this is tolerable because the colors should only give a rough sense of where an organization located in the graph. I think a CSV file would be better for the kind of analysis you did. Maybe I try scraping a bit more tomorrow, GitHub doesn't like me right now :-)
> To unsubscribe from this group and stop receiving emails from it, send an email to poplus+un...@googlegroups.com.

Steven Clift

unread,
Dec 13, 2015, 11:47:44 AM12/13/15
to Stefan Baack, Poplus - Collaborative Civic Coding
If you haven't checked out the image of OUR org connections on GitHub
among those who joined the Poplus.org Google Group, see it here:

http://sbaack.com/2015/11/19/scraping-the-global-civic-tech-community-on-github-part-2.html

The Follower network:
http://sbaack.com/downloads/follower-network_2015-11-19.png

The Contributer network:
http://sbaack.com/downloads/contributor-network_2015-11-19.png

So what do you "see" looking at these images?

What opportunities for greater connections and collaboration among
orgs/code might exist?

Who is still MISSING from your GitHub experience with civic code from
the Poplus network?

Steve
Steven Clift - Executive Director, E-Democracy.org
cl...@e-democracy.org - +1 612 234 7072
@democracy - http://linkedin.com/in/netclift
http://1radionews.com - My radio app
> https://groups.google.com/d/msgid/poplus/7e6ff44b-56e4-45b4-ba00-4afa23b24f35%40googlegroups.com.

Ryan Wold

unread,
Dec 14, 2015, 7:13:39 PM12/14/15
to Poplus - Collaborative Civic Coding
Steven,

What I see in the diagram is the largest contributor groups are infused with resources (money) to support the network of contributors. 

Innovation happens in bursts, and often around the edges of the diagram, but weekend projects often fail to gain sustaining momentum.  How can we better fund and grow experiments and concepts that work?

One opportunity is for more collaboration between funding and project sustainability; "scaling what works." Another opportunity is defining "what works" through encouraging common vocabulary across jurisdictions - maybe, a civic patterns library.

- Ryan

steven...@gmail.com

unread,
Nov 18, 2017, 9:48:04 AM11/18/17
to Poplus - Collaborative Civic Coding
Stefaan,

OK, I am ready for part 3!

I am really curious how we ID new characters on the map noting for example the spread of Democracy OS, Decidm, Consul, etc. And how we capture new people and projects.

Steve

Reply all
Reply to author
Forward
0 new messages