Dear all,
We wanted to share some of our technical work with other
Alaveteli site owners, as we don't necessarily share much in
that area in the network.
Over the past year or so, we have built some additions into our
alaveteli setup, and thought some of this might be
interesting to some of you. (And hopefully, some of you might
feel like sharing what tweaks you made to your sites as
well!).
I'll try to keep this semi-technical so that you can make sense
of it even if you're not a software developer.
If any of this feels like it could be useful on your site,
please do let us know. It's unlikely these can move into
alavateli as such, but it would be very interesting to see if
there are ways to reuse some of it on other sites.
## CADA appeals right inside madada.fr
Following a rejection, a requester can appeal the CADA (the
appeal body/ombudsman in France) to request an expert
opinion on the communicability of the requested documents.
As about 80% of requests are denied or ignored, this process
should be widely used, but can be a bit tedious to
follow.
We added a simple form which replicates the official one on the
CADA's website, but we automatically bundle the required
documents:
- a printout of all emails exchanged with various details the
CADA requires (email addresses used, dates and times...)
- a summary table of batch requests, a new requirement meant to
simplify processing batches for the CADA, listing all
the requests that are part of a batch, and their
status/rejection date.
- we also notify all public bodies by email that an appeal was
made against them (one more requirement that could be
seen as yet another hurdle in the process to deter
requesters).
With this, we're hoping that a majority of users will not
hesitate to appeal rejections. It's too early to say what
impact this will have, but our first users seem happy with the
simplified process.
## CADA opinions to support writing requests
In France, the CADA provides non-binding advice ("Avis CADA")
about FOI requests. While these
positions are not enforceable, they tend to be followed by
administrative courts, so they can be seen as some sort of
jurisprudence.
We added a simple search system which lists "possibly relevant"
CADA advice while users type their request on
madada.fr. The system is keyword based, there is no fancy
"artificial intelligence" involved, but results are still
helpful. If anything it encourages requesters to check past
decisions to improve their requests.
## Email filters on incoming messages
We added a basic system to handle some automatic emails the
server receives:
- obvious delivery receipts do not trigger a request for users
to update the status of their request. This is still very
limited as we don't want requesters to miss an important
message.
- many public bodies in France use a system called
"mailinblack", which forces senders to solve a captcha before
their email can
be delivered. We don't autosolve them yet, but tag them and
list them in the admin dashboard, so that site admins can
process them promptly.
- The same happens with wetransfer links, which reduces the risk
of missing an expiry date and losing a precious
document, as those tend to expire after a week.
- The annoying Microsoft "email delivered... oh wait, no, it
didn't work!" message triggers a delivery error just as if
Microsoft's software behaved according to standards, so that
site admin can update contact addresses without waiting
for a sad user to find out a month later that their request
was never received.
There is nothing mind blowing in all this, but it does save us a
few hours of boring work each month.
These filters could possibly be generalised so that site admins
could configure them the same way you do it in an email
client.
## Reminders to public bodies
The official delay to reply to
requests in France is one month, but seeing that so many
requests never see so much as an
acknowledgment, we setup automatic reminders 3 weeks after the
request if nothing came back. While it hasn't fundamentally
changed our stats, we have seen a number of responses that
appeared to be prompted by such reminders. Maybe something
worth exploring further?
## Link with WikiData
We have created a "Ma Dada Identifier" in WikiData, like what
WhatDoTheyKnow did. This allows us to link our public bodies
database with wikidata, but also lots of other datasets beyond
it. With this, we are now able to assign a geographic
location to public bodies, requests and documents received, sort
cities by population, link replies to a political party, etc...
It is A LOT of work, especially with the number of public bodies
that France has (we have over 50,000 in our database),
but we are looking forward to all the possibilities this opens.
## Improved search results
We tweaked our search settings in response to lots of users
commenting they couldn't find the public bodies they
were looking for. We changed this so that national or regional
bodies turn up before local ones (based on the categories that
they're assigned) and soon we will take population into account
as well to rank cities for instance. We also changed the
weight of various fields, so that when users search for "Paris",
they get the city with the Eiffel tower, and not the
long list of bodies that have an address on "rue de Paris".
## Document portal
We are in the process of adding a document portal to madada.fr.
The aim is to bring the outcome of FOI requests to the
front, instead of the requests themselves that can be seen as
just a means to an end, allowing us to showcase the sort
of documents that can be obtained. It should also help users
find interesting documents more easily.
It looks like what your government probably uses for their open
data. We are working on adding RDF metadata, something
that tells search engines and other bots what each part of the
page is ("this is a public body that produced the
document", "this is the date it was published"...). Use cases
for this are endless, but we are
hoping it can make reusing the documents we have easier for
third parties, especially when combined with information
from WikiData.
I hope this is somehow useful! If you are running an alaveteli
site as well, we would love to hear about your own
customisations!
Laurent and the team at Ma Dada
Thanks Laurent, this sounds like a huge batch of improvements and I can appreciate the amount of work that’s gone into making them! I’d love to see how all of these improvements work in practice – but in particular the ones I comment on below – perhaps via a little screencast or demo on one of the monthly catchups?
Hi Gareth,
Yes, I will prepare a short demo
for the call on Friday. And hopefully we can find a moment to
chat about things during TicTec.
> Link with WikiData
The geographic location part sounds really interesting. We wanted to explore something similar for WhatDoTheyKnow but looks like we might run out of time within the current project where we’d pencilled this in. If we could pull in some existing work that would be a huge win.
For now, we use a sparql query to fetch all the data we want in one go from wikidata, and store that in a hash in memory in alaveteli (the geo coords are not actually pulled in alaveteli at this point, we just use coords on a separate site that I won't yet share publicly).
This commit https://gitlab.com/madada-team/dada-core/-/commit/b969565f2a03 shows how we fetch the data from wikidata, and make it available inside the PublicBody model. And this other bit of code https://gitlab.com/madada-team/dada-core/-/blob/d9a29311d6b5/dada_recherche/dada_recherche/main.py#L70 shows a more extensive sparql query that pulls the geo data for public bodies (note that this works for French cities only at this point, as each entity type is likely to need a specific graph traversal on wikidata to find useful coordinates, but as this is more than half our database, it's a good start).
Unfortunately, due to how the data is modelled on wikidata, I
suspect there is no universal sparql query that will magically
find all relevant data, it will have to be ad hoc, per country and
per type of authority, I suspect. But the structure should work.
I've used a very basic caching logic, but it seems to function
well. For now, we only use wikidata data to add RDF metadata to
our public body pages, but it would be trivial at this point to
show a small map or similar on the body's page.
> CADA opinions to support writing requests
We’re hoping to explore some similar ideas in this area – though more aimed at preventing misuse. I’ve got a code spike where I had a go at this using simple regex to match terms in the message body, so might be something we can pull out of your implementation when we get stuck in (https://github.com/mysociety/alaveteli/pull/7963).
> CADA appeals right inside madada.fr…
Nice! I like the simpler approach of just having some nudges and a nice downloadable package for the citizens to send off to the regulator. How are you initiating this workflow?
I’m intrigued at the summary for batch requests and what that looks like compared to the already available CSV export available from the pro dashboard for batch requests (https://www.mysociety.org/2020/05/04/exporting-data-from-your-batch-foi-requests/). This feels like something that might be quite generically useful.
We followed the requirements in French law, specifically:
- list each public body against which the appeal is made (which
might exclude bodies that replied to the request). We show the
full list of bodies from the batch and let the user pick and
choose what to include. The list defaults to what should make
sense (successful requests excluded, etc...)
- for each them, include the public body name, contact email used, date of the request, date of rejection if any.
So it is a bit different from the CSV export you mention. I am
not sure how useful this would be generically, as the law is
likely to differ elsewhere, but happy to hear otherwise!
> Email filters on incoming messages
Auto-classifying auto-acknowledgements sounds like it could be pulled into Alaveteli core! We do have a bit of a vision for this (https://github.com/mysociety/alaveteli/issues/2045#issuecomment-1057844092), but I think the implementation you’ve described would be a good first step to pull in as we don’t currently have a plan to return to this soon.
Our code is pretty basic, you can see patches on the IncomingMessage model here: https://gitlab.com/madada-team/dada-core/-/blob/d9a29311d6b5/dada-france-theme/lib/model_patches.rb#L197
Most of it relies on a cron job that looks at recent incoming
messages and tags them based on headers and content. The result is
very similar to what you describe in that ticket. For instance, we
fold delivery receipts from ministries as they appear to use a
standard wording, and mark them with a "Ma Dada identified a
delivery receipt in this message" message like here
https://madada.fr/demande/listes_des_rapports_dinspection#incoming-7874
. My guess is that to make this generic would require a bunch of
knobs to adjust for each context (I just noticed that the same
body seems to have modified its template, so our code might
already be broken :/ ). I think making this available to site
admins with a set of configurable rules would be easiest to
manage)
> Improved search results
We’re really struggling with the search results too. Feels like there might be some generic improvements possible here based on what you’ve done?
Most probably yes. Here's what we did for Public Bodies for instance: https://gitlab.com/madada-team/dada-core/-/blob/d9a29311d6b5/dada-france-theme/lib/model_patches.rb#L729 I think the weight adjustment can be generalised a bit and moved into code alaveteli quite easily.
For something a bit more ambitious, I pondered replacing xapian
with meilisearch. It was hard to find good docs about xapian, and
the code seems to be pretty much abandoned, but this is definitely
a much bigger endeavour.
I'd be happy to discuss any of the above and maybe help with moving some of the code into core if it makes sense.
Laurent