We've just found out about
a weird problem with our site where email replies sent to
*batch* requests (and only batch ones) were not processed
correctly for almost 2 months. Some of our users suddenly
received a notification with a backlog of dozens of updates
after I restarted the server following an OS+alaveteli
upgrade. So it's hard to figure out what went wrong.
I've gathered some notes about the incident in this ticket if you're curious. https://gitlab.com/madada-team/dada-core/-/issues/106
I have 2 questions arising out of this:
- how/why are replies to
batch requests processed differently from non-batch ones?
(Gareth/Graeme, this one is probably for you)
- do you have any tips about monitoring this part of the process, so that you can be notified when something is not running as it should? We have an external service (freshping.io mostly because they have a free tier) which monitors the website and emails me if it is not working. I'd like to find something similar for the email server, and ideally the various daemons that conspire to make alaveteli function properly.
Thanks for any suggestions!
Laurent for team Madada.fr
> how/why are replies to batch requests processed differently from non-batch ones? (Gareth/Graeme, this one is probably for you)
The general problem with batch requests has been that once its sent, we usually see a huge influx of auto-responses which would create enough load to cause availability issues. This probably isn't strictly true for WDTK these days as we've made significant hardware upgrades, but probably still an issue if you're hosting on a more modest VPS.
So, instead of just ingesting immediately, we added the POP polling setup to try to distribute the load a little more evenly. This gets automatically enabled for pro users – even single requests made by them will go through the poller system. At some point we really want to clean this up and have just one way of handling incoming mail. Perhaps this is by moving all users over to the poller , or maybe an alternative approach is to use background jobs to process the incoming mails . There's no particular timeline on these given capacity constraints.
> do you have any tips about monitoring this part of the process
We use Icinga (née Nagios) internally for most of our monitoring.
We also appear to have the following lines in our WDTK crontab that aren't provided in core:
# Every 10 minutes
5,15,25,35,45,55 * * * * foi /etc/init.d/foi-alert-tracks check
5,15,25,35,45,55 * * * * foi /etc/init.d/foi-poll-for-incoming check
Looks like these "check" lines got removed back in 2017 when we improved how we generate daemons .
It looks like those lines should get added to the crontab when you generate the daemons, and we do talk about it in the install documentation , but perhaps its not that clear.
I imagine you could use a monitoring service – Icinga, Monit, Sensu, etc – to run the check command rather than relying on cron.