I say this in a personal capacity:
> I think this raises very serious questions about whether
RDM can be relied upon for production use if it was down for nearly 24
hours.
I think it's unfortunately looking abundantly clear now it's entirely unsuitable for any sort of production use-case.
The response to incidents shouldn't depend on RDM tickets being raised (though I'm glad to hear folks are sparing time out of their day and effectively doing free QA for Rail Delivery Group to raise tickets that flag up problems): someone should be paged as
soon as it stops publishing messages which, as David has already pointed
out, is a trivial healthcheck from a technical perspective to put in place. These are basic measures that a 2 person tech start-up can get right.
This particular feed should probably be considered one of the flagship data products on RDM that sets the standard for other products. Things need to change both in terms of technical architecture and the approach to service management if the marketplace can remotely be considered fit-for-purpose at doing its job of delivering data - rather than just being an increasingly flaky single point of failure that sits in front of other systems that (for the most part) do work.
I was critical when it came to the GEMINI issues, but it's become clearer that that technical architecture was yet another symptom of a lot of investment being directed to some places, and not enough to the places that actually matter when it comes to delivering a reliable, fit-for-purpose technical service.
> Just to put some context on it, a direct Push Port setup is just a case of consuming messages from an ActiveMQ server. Very little stateful data there and something like a 5-10 minute expiry on messages. There are lots of strategies for high availability in this kind of scenario.
Yes; I hope RDG will now look to engage with the people that can advise them on how to approach this properly (and probably could've advised on the problems with e.g. the broken partitioning approach from the start, and how to engage with the community).