A quick (tech) update from Ma Dada in France

22 views
Skip to first unread message

Laurent Savaëte

unread,
Apr 18, 2024, 10:38:26 AMApr 18
to alavete...@googlegroups.com

Dear all,

We wanted to share some of our technical work with other Alaveteli site owners, as we don't necessarily share much in
that area in the network.

Over the past year or so, we have built some additions into our alaveteli setup, and thought some of this might be
interesting to some of you. (And hopefully, some of you might feel like sharing what tweaks you made to your sites as
well!).

I'll try to keep this semi-technical so that you can make sense of it even if you're not a software developer.

If any of this feels like it could be useful on your site, please do let us know. It's unlikely these can move into
alavateli as such, but it would be very interesting to see if there are ways to reuse some of it on other sites.

## CADA appeals right inside madada.fr

Following a rejection, a requester can appeal the CADA (the appeal body/ombudsman in France) to request an expert
opinion on the communicability of the requested documents.

As about 80% of requests are denied or ignored, this process should be widely used, but can be a bit tedious to
follow.
We added a simple form which replicates the official one on the CADA's website, but we automatically bundle the required
documents:

- a printout of all emails exchanged with various details the CADA requires (email addresses used, dates and times...)
- a summary table of batch requests, a new requirement meant to simplify processing batches for the CADA, listing all
  the requests that are part of a batch, and their status/rejection date.
- we also notify all public bodies by email that an appeal was made against them (one more requirement that could be
  seen as yet another hurdle in the process to deter requesters).

With this, we're hoping that a majority of users will not hesitate to appeal rejections. It's too early to say what
impact this will have, but our first users seem happy with the simplified process.

## CADA opinions to support writing requests

In France, the CADA provides non-binding advice ("Avis CADA") about FOI requests. While these
positions are not enforceable, they tend to be followed by administrative courts, so they can be seen as some sort of jurisprudence.

We added a simple search system which lists "possibly relevant" CADA advice while users type their request on
madada.fr. The system is keyword based, there is no fancy "artificial intelligence" involved, but results are still
helpful. If anything it encourages requesters to check past decisions to improve their requests.

## Email filters on incoming messages

We added a basic system to handle some automatic emails the server receives:

- obvious delivery receipts do not trigger a request for users to update the status of their request. This is still very
  limited as we don't want requesters to miss an important message.
- many public bodies in France use a system called "mailinblack", which forces senders to solve a captcha before their email can
  be delivered. We don't autosolve them yet, but tag them and list them in the admin dashboard, so that site admins can
  process them promptly.
- The same happens with wetransfer links, which reduces the risk of missing an expiry date and losing a precious
  document, as those tend to expire after a week.
- The annoying Microsoft "email delivered... oh wait, no, it didn't work!" message triggers a delivery error just as if
  Microsoft's software behaved according to standards, so that site admin can update contact addresses without waiting
  for a sad user to find out a month later that their request was never received.

There is nothing mind blowing in all this, but it does save us a few hours of boring work each month.

These filters could possibly be generalised so that site admins could configure them the same way you do it in an email
client.

## Reminders to public bodies

The official delay to reply to requests in France is one month, but seeing that so many requests never see so much as an
acknowledgment, we setup automatic reminders 3 weeks after the request if nothing came back. While it hasn't fundamentally
changed our stats, we have seen a number of responses that appeared to be prompted by such reminders. Maybe something
worth exploring further?

## Link with WikiData

We have created a "Ma Dada Identifier" in WikiData, like what WhatDoTheyKnow did. This allows us to link our public bodies
database with wikidata, but also lots of other datasets beyond it. With this, we are now able to assign a geographic
location to public bodies, requests and documents received, sort cities by population, link replies to a political party, etc...
It is A LOT of work, especially with the number of public bodies that France has (we have over 50,000 in our database),
but we are looking forward to all the possibilities this opens.

## Improved search results

We tweaked our search settings in response to lots of users commenting they couldn't find the public bodies they
were looking for. We changed this so that national or regional bodies turn up before local ones (based on the categories that
they're assigned) and soon we will take population into account as well to rank cities for instance. We also changed the
weight of various fields, so that when users search for "Paris", they get the city with the Eiffel tower, and not the
long list of bodies that have an address on "rue de Paris".

## Document portal

We are in the process of adding a document portal to madada.fr. The aim is to bring the outcome of FOI requests to the
front, instead of the requests themselves that can be seen as just a means to an end, allowing us to showcase the sort
of documents that can be obtained. It should also help users find interesting documents more easily.

It looks like what your government probably uses for their open data. We are working on adding RDF metadata, something
that tells search engines and other bots what each part of the page is ("this is a public body that produced the
document", "this is the date it was published"...). Use cases for this are endless, but we are
hoping it can make reusing the documents we have easier for third parties, especially when combined with information
from WikiData.


I hope this is somehow useful! If you are running an alaveteli site as well, we would love to hear about your own customisations!

Laurent and the team at Ma Dada


Oliver Lineham

unread,
Apr 18, 2024, 12:55:41 PMApr 18
to alavete...@googlegroups.com
Nice work! Is this all done through the theme (dada-core repo on Gitlab)?

> - The same happens with wetransfer links, which reduces the risk of missing an expiry date and losing a precious
>  document, as those tend to expire after a week.

I assume this is a file sharing thing. I haven't seen WeTransfer but have a growing problem of agencies using M365 sharing links which only last a week or so, and can't be accessed without a 15-minute one-time code emailed to the requester. 

> - The annoying Microsoft "email delivered... oh wait, no, it didn't work!" message triggers a delivery error just as if
>  Microsoft's software behaved according to standards, so that site admin can update contact addresses without waiting
>  for a sad user to find out a month later that their request was never received.

I haven't seen this, and wonder what it looks like. We use Sendgrid so perhaps they're already interpreting these bounces for us. 

Is it a bounce to the envelope-from (return-path) rather than header-from (request address)? Isn't that a standard email "DSN" (Delivery Status Notification)

(Sendgrid reports bounces to us via webhook. Although I haven't got around to automating anything after that.)

Oliver

Laurent Savaëte

unread,
Apr 18, 2024, 3:07:10 PMApr 18
to alavete...@googlegroups.com

> Nice work! Is this all done through the theme (dada-core repo on Gitlab)?
Correct, it's all in there. Some stuff happened manually (for one-offs, which I tried to document in READMEs and such) but it should be marginal. Some code is python and runs outside the actual rails
app, mostly for data sync and such, and is probably not very portable.
>
> > - The same happens with wetransfer links, which reduces the risk of missing an expiry date and losing a precious
> >  document, as those tend to expire after a week.
>
> I assume this is a file sharing thing. I haven't seen WeTransfer but have a growing problem of agencies using M365 sharing links which only last a week or so, and can't be accessed without a
> 15-minute one-time code emailed to the requester.
yes, WeTransfer is one of many services used here to send big files. Luckily we haven't seen the M365 you're talking about, it sounds pretty horrible to deal with. I tried automatic retrieval of
files, but because they make money showing ads during link retrieval, they seem to actively prevent this, so for now we handle them manually. Files are uploaded onto a nextcloud instance, shared
publicly and a comment is posted on the request to make the files available. Clunky, but better than losing documents.
>
> > - The annoying Microsoft "email delivered... oh wait, no, it didn't work!" message triggers a delivery error just as if
> >  Microsoft's software behaved according to standards, so that site admin can update contact addresses without waiting
> >  for a sad user to find out a month later that their request was never received.
>
> I haven't seen this, and wonder what it looks like. We use Sendgrid so perhaps they're already interpreting these bounces for us.
>
> Is it a bounce to the envelope-from (return-path) rather than header-from (request address)? Isn't that a standard email "DSN" (Delivery Status Notification)
>
> (Sendgrid reports bounces to us via webhook. Although I haven't got around to automating anything after that.)

What we have seen are SMTP servers acknowledging receipt of emails via an SMTP code 250, so alaveteli thinks the email was properly delivered by analysing the logs. But a few seconds later the
recipient's email server sends an email with an attached delivery-status message detailing the failure. Here's an example: https://madada.fr/demande/demande_adresse_mail_valide#incoming-8445 When the
email comes in, we just check its content and update the request state to "delivery problem" without waiting for someone to do it by hand.

The outcome is that alaveteli shows a green checkmark, but the email never actually went anywhere. I am not sure whether it's standard or not, I always assumed not given that it was mostly coming from
MS servers, but I will dig further to understand better and see if there's a real fix we could implement.

Thank you for the feedback!

Laurent

claude archer

unread,
Apr 19, 2024, 6:29:03 PMApr 19
to alavete...@googlegroups.com
Congratulations Laurent ! we would be delighted to help implenting these features on Transparencia.be, like you did for the appeal-email-data last year
Best regards

--
You received this message because you are subscribed to the Google Groups "Alaveteli Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alaveteli-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/alaveteli-users/ff23a282-1741-4d08-86a9-eac81e03d3ec%40gmail.com.

Gareth Rees

unread,
Apr 29, 2024, 6:50:02 AMApr 29
to Alaveteli Community
Thanks Laurent, this sounds like a huge batch of improvements and I can appreciate the amount of work that’s gone into making them!  I’d love to see how all of these improvements work in practice – but in particular the ones I comment on below – perhaps via a little screencast or demo on one of the monthly catchups?

> Link with WikiData

The geographic location part sounds really interesting. We wanted to explore something similar for WhatDoTheyKnow but looks like we might run out of time within the current project where we’d pencilled this in. If we could pull in some existing work that would be a huge win.


> CADA opinions to support writing requests

We’re hoping to explore some similar ideas in this area – though more aimed at preventing misuse. I’ve got a code spike where I had a go at this using simple regex to match terms in the message body, so might be something we can pull out of your implementation when we get stuck in (https://github.com/mysociety/alaveteli/pull/7963).

> CADA appeals right inside madada.fr

Nice! I like the simpler approach of just having some nudges and a nice downloadable package for the citizens to send off to the regulator. How are you initiating this workflow?

I’m intrigued at the summary for batch requests and what that looks like compared to the already available CSV export available from the pro dashboard for batch requests (https://www.mysociety.org/2020/05/04/exporting-data-from-your-batch-foi-requests/). This feels like something that might be quite generically useful.


> Email filters on incoming messages

Auto-classifying auto-acknowledgements sounds like it could be pulled into Alaveteli core! We do have a bit of a vision for this (https://github.com/mysociety/alaveteli/issues/2045#issuecomment-1057844092), but I think the implementation you’ve described would be a good first step to pull in as we don’t currently have a plan to return to this soon.

> Improved search results

We’re really struggling with the search results too. Feels like there might be some generic improvements possible here based on what you’ve done?

Best,

Gareth

Laurent Savaëte

unread,
Apr 29, 2024, 1:58:46 PMApr 29
to alavete...@googlegroups.com

On 29/04/2024 12:50, Gareth Rees wrote:
Thanks Laurent, this sounds like a huge batch of improvements and I can appreciate the amount of work that’s gone into making them!  I’d love to see how all of these improvements work in practice – but in particular the ones I comment on below – perhaps via a little screencast or demo on one of the monthly catchups?

Hi Gareth,

Yes, I will prepare a short demo for the call on Friday. And hopefully we can find a moment to chat about things during TicTec.


> Link with WikiData

The geographic location part sounds really interesting. We wanted to explore something similar for WhatDoTheyKnow but looks like we might run out of time within the current project where we’d pencilled this in. If we could pull in some existing work that would be a huge win.

For now, we use a sparql query to fetch all the data we want in one go from wikidata, and store that in a hash in memory in alaveteli (the geo coords are not actually pulled in alaveteli at this point, we just use coords on a separate site that I won't yet share publicly).

This commit https://gitlab.com/madada-team/dada-core/-/commit/b969565f2a03 shows how we fetch the data from wikidata, and make it available inside the PublicBody model. And this other bit of code https://gitlab.com/madada-team/dada-core/-/blob/d9a29311d6b5/dada_recherche/dada_recherche/main.py#L70 shows a more extensive sparql query that pulls the geo data for public bodies (note that this works for French cities only at this point, as each entity type is likely to need a specific graph traversal on wikidata to find useful coordinates, but as this is more than half our database, it's a good start).

Unfortunately, due to how the data is modelled on wikidata, I suspect there is no universal sparql query that will magically find all relevant data, it will have to be ad hoc, per country and per type of authority, I suspect. But the structure should work. I've used a very basic caching logic, but it seems to function well. For now, we only use wikidata data to add RDF metadata to our public body pages, but it would be trivial at this point to show a small map or similar on the body's page.


> CADA opinions to support writing requests

We’re hoping to explore some similar ideas in this area – though more aimed at preventing misuse. I’ve got a code spike where I had a go at this using simple regex to match terms in the message body, so might be something we can pull out of your implementation when we get stuck in (https://github.com/mysociety/alaveteli/pull/7963).
I've used meilisearch for a few things, including this. It's a typo-tolerant search engine which you could possibly use to return json docs including the tags you try to assign in your PR. I'm not entirely sure how well it would work for your use case, but maybe worth exploring (assuming you're ok with adding an external tool). It's an extremely efficient tool, so we actually call it from javascript as the user types. I take the query text, trim off "useless" parts (like the template we provide the user with, and dead words, like "the", "this"...) and see what comes back. I suspect you could load a mapping of "query text -> tag" in meilisearch to get what you're doing, with typo tolerance built-in.


> CADA appeals right inside madada.fr

Nice! I like the simpler approach of just having some nudges and a nice downloadable package for the citizens to send off to the regulator. How are you initiating this workflow?
For now the workflow is under a url that we only share with some advanced users whom we trust not to spam, but we plan on having a big "Appeal this decision" button on the request page when it's over the 1 month delay.


I’m intrigued at the summary for batch requests and what that looks like compared to the already available CSV export available from the pro dashboard for batch requests (https://www.mysociety.org/2020/05/04/exporting-data-from-your-batch-foi-requests/). This feels like something that might be quite generically useful.

We followed the requirements in French law, specifically:

- list each public body against which the appeal is made (which might exclude bodies that replied to the request). We show the full list of bodies from the batch and let the user pick and choose what to include. The list defaults to what should make sense (successful requests excluded, etc...)

- for each them, include the public body name, contact email used, date of the request, date of rejection if any.

So it is a bit different from the CSV export you mention. I am not sure how useful this would be generically, as the law is likely to differ elsewhere, but happy to hear otherwise!


> Email filters on incoming messages

Auto-classifying auto-acknowledgements sounds like it could be pulled into Alaveteli core! We do have a bit of a vision for this (https://github.com/mysociety/alaveteli/issues/2045#issuecomment-1057844092), but I think the implementation you’ve described would be a good first step to pull in as we don’t currently have a plan to return to this soon.

Our code is pretty basic, you can see patches on the IncomingMessage model here: https://gitlab.com/madada-team/dada-core/-/blob/d9a29311d6b5/dada-france-theme/lib/model_patches.rb#L197

Most of it relies on a cron job that looks at recent incoming messages and tags them based on headers and content. The result is very similar to what you describe in that ticket. For instance, we fold delivery receipts from ministries as they appear to use a standard wording, and mark them with a "Ma Dada identified a delivery receipt in this message" message like here https://madada.fr/demande/listes_des_rapports_dinspection#incoming-7874 . My guess is that to make this generic would require a bunch of knobs to adjust for each context (I just noticed that the same body seems to have modified its template, so our code might already be broken :/ ). I think making this available to site admins with a set of configurable rules would be easiest to manage)


> Improved search results

We’re really struggling with the search results too. Feels like there might be some generic improvements possible here based on what you’ve done?

Most probably yes. Here's what we did for Public Bodies for instance: https://gitlab.com/madada-team/dada-core/-/blob/d9a29311d6b5/dada-france-theme/lib/model_patches.rb#L729 I think the weight adjustment can be generalised a bit and moved into code alaveteli quite easily.

For something a bit more ambitious, I pondered replacing xapian with meilisearch. It was hard to find good docs about xapian, and the code seems to be pretty much abandoned, but this is definitely a much bigger endeavour.

I'd be happy to discuss any of the above and maybe help with moving some of the code into core if it makes sense.

Laurent

Reply all
Reply to author
Forward
0 new messages