Hi everyone,
Various people have suggested it would be a good idea to post
something here about a project we've been working on in the UK
to help out an organization called Democracy Club; the site is
called YourNextMP, for crowd-sourcing election candidates:
http://yournextmp.com
https://democracyclub.org.uk/
(Edmund von der Burg kindly let us take over the domain and name
of the site from his project that did this for the last generel
election.)
The problem it's trying to solve is that the official lists of
election candidates for UK general elections aren't published
until about 10 days before the election, and then there's no
guarantee that any of it will be in machine readable form. This
means that anyone who wants to build a site based on candidate
data (e.g. for doing surveys of candidates, collecting their
public statements, campaign materials etc.) can't do it in time
to produce an effective site.
The idea of YourNextMP is to build an open (CC-BY-SA) database
of election candidates, by combining crowd-sourced data with
candidate data scraped from some (as yet incomplete) official
party lists. When people make changes, we ask them to include a
source (ideally a URL for a newspaper story or an official party
page), so you should be able to check the attribution for any
information on the site as well, similarly to how you can with
Wikipedia.
We thought about doing this initially just by giving each user a
login to the web-based editing interface for a PopIt instance,
but it became apparent quickly that this would have a number of
problems, in particular:
* The web interface to PopIt is very generic, so that you can
use it for multiple use cases, but for a crowd-sourced site
it's important to make it as clear as possible to a naive
user how they can contribute to the effort.
* You can't easily add constraints to the web interface to make
people stick to a particular data model within the very
general Popolo schema, e.g. one might want to ensure that the
user has added the role "Candidate" to membership of a post.
* We needed to use some custom data fields on person records;
perhaps the most significant one of these is a "versions"
attribute which recorded all previous versions of the
person's data, along with details about who made the change,
and what their source was. (Note that this is really in lieu
of first class support for versioning in PopIt, which is in
our development plan.) We also used a custom attribute to
record that someone is known *not* to be standing in a
particular constituency. (Arguably that could be done with
memberships with a "Not A Candidate" role, but that doesn't
seem very natural.)
So, rather than use PopIt's web-based admin interface, we
thought that a better plan would be to write a custom front-end
which uses the PopIt API to store and retrieve data. The
YourNextMP code does this: it's a Django-based site that does
use PostgreSQL, but only for things like user authentication
data, logs of user actions and cached aggregated values - all
the data about candidates is stored in PopIt. Anyone who wants
to use the site's data programmatically can use the PopIt and
MapIt APIs to find candidates for anywhere in the country:
http://yournextmp.com/help/api
So far, this seems to be working well - we've got details for
over 1000 candidates that are now known to be standing at the
next election, and we've had updates from candidates themselves,
party insiders and enthusiastic members of the public. This is
probably only a third of the way there, but we think this is
currently roughly as good as the commercial candidate databases
that campaigning organizations would otherwise have to
purchase. In addition, we hope we can get a good boost by
emailing all the candidates from the last election whose status
we're not sure about.
I hope this is an interesting example of how PopIt can be used
for this kind of candidate data. If you're thinking of doing
something similar, we'd suggest using Popolo's concepts of a
"post" for each elected position, and then a membership with
role "Candidate" to represent a candidacy:
http://www.popoloproject.com/appendices/examples.html#electoral-candidate
There have been some interesting lessons from using PopIt as a
back-end for a site like this:
* The relatively recent support in the PopIt API for embedding
memberships [1] was crucial for keeping the number of
requests required per page down to a reasonable number.
[1]
http://popit.poplus.org/docs/api/reference/#embedding-memberships
* I missed the expressiveness of SQL or SPARQL queries in many
respects: a simple example of this is that there's no way to
do the equivalent of COUNT with a WHERE condition, so to
produce the statistics on pages like [2] we're using cached
data rather than getting it from PopIt on-demand.
[2]
http://yournextmp.com/numbers/constituencies
* This has been a great way of finding sometimes subtle bugs in
PopIt, and has been helpful in informing how we prioritize
its development.
* You have to be careful in developing a site like this not to
lose some of the benefits that you'd otherwise get from the
carefully thought-out Popolo data model. For example, a
mistake of this kind is that we use a simple form for editing
the details of a person, which doesn't allow you to specify
alternative names; there are just a couple of administrators
who can do that using the PopIt web interface. This is a
small part of a larger issue, really, which is that
developing a UI that allows you to intuitively use the full
expressiveness of the Popolo data model is a big project, as
we've discovered from working on the PopIt web interface.
* We developed the site for this particular use case, and under
time pressure, so it would take quite a lot of work to adapt
it for another electoral system. That's not to say that
couldn't be done; at the moment it uses the strings '2010'
and '2015' to distinguish the two elections that the site has
data for, and they could perfectly well be identifiers for
arbitrary elections instead. There are other assumptions that
are made which may not hold in other situations, though - for
example, we could simplify use of the PopIt API by using the
MapIt area ID as the ID for posts. (For example, one post is
"Member of Parliament for Cambridge", and the ID of that post
would be 65927, corresponding to
http://mapit.mysociety.org/area/65927.html ) I'm afraid there
are bound to be lots of other assumptions that we've made
that tie it to the first-past-the-post system with small
geographical constituencies as well. However, they could
certainly be overcome if there was enough interest in that.
Anyway, I hope that's of some interest. The source code and
issue tracker are on GitHub:
https://github.com/mysociety/yournextmp-popit/
... although please bear in mind the "under time pressure" thing
I mentioned ;)
Best regards,
Mark