Dear Users,
As you are probably aware we had a number of issues on the NRDP and NROD feeds over the 2017 Christmas and New Year period.
These issues started on 21/12/17 when we performed our planned restarts of the servers.
The restarts were performed to pre-empt a hardware switchover planned by our data centre provider, Amazon AWS. By doing this early in a controlled manner we were trying to minimise the possible downtime window and ensure we had resources at hand to resolve any issues that may have occurred when the hardware switchover took place. We now know this hardware switchover planned by Amazon was to move the virtualised servers onto hardware that had been patched to protect against the Meltdown and Spectre bugs, although we were not informed about this at the time.
Soon after the restart we detected that the AWS patching had caused the automatic time synchronisation to drift on all of our servers. This started to affect any services that relied on time limited tokens, for example the CIF downloads on NROD. Due to the AWS patches - it was also not possible for us to update the time manually on each server. The only way to get the time back to in sync was to reboot the virtual servers themselves, the drift then started again but it allowed a few days of normal usage. The drift value was inconsistent so it was not possible for us to predict when this would have to be done,.
We also noticed that the same applications were consuming up to 30% more system resources than previously, this was the cause of the latency and extended catch up times on the NRDP feeds.
Our out of hours oncall support teams rebuilt a number of servers over the Christmas and New Year break, deploying the software applications onto more efficient and reliable AWS instance types. We needed a period of testing and observation before we could switch the production systems to these servers.
The main NROD servers were switched during the downtime on 03/01/2018 and the NRDP servers were switched on 04/01/2018.
Since the switchovers we have had no latency on NRDP and there has been no time drift on NRDP or NROD servers.
We still have some work to do on some back end servers on that may result in short interruptions as we run restarts, we’ll notify you of these through the @open_rail_feeds twitter account, but this disruption will be minimal and there should only be a couple of occurrences as we bring the updated instances on line.
I apologise for any inconvenience that was caused to you during this period and thank you for your patience while we resolve these complex issues.
Kind regards
David Higginbottom
Head of Support
Digital Systems Group
CACI Limited
5th Floor
8 St Paul’s Street
Leeds LS1 2LE
Follow @open_rail_feeds on twitter for service announcements and outage information
This electronic message contains information from CACI International Inc or
subsidiary companies, which may be confidential, proprietary,
privileged or otherwise protected from disclosure. The information is
intended to be used solely by the recipient(s) named above. If you are not
an intended recipient, be aware that any review, disclosure, copying,
distribution or use of this transmission or its contents is prohibited. If
you have received this transmission in error, please notify us immediately
at postm...@caci.co.uk
Viruses: Although we have taken steps to ensure that this e-mail and
attachments are free from any virus, we advise that in keeping with good
computing practice the recipient should ensure they are actually virus free.
CACI Limited. Registered in England & Wales. Registration No. 1649776. CACI House, Avonmore Road, London, W14 8TS