Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Status] [PEW] LNS: Work to help resolve recent LNS problems (Open)

29 views
Skip to first unread message

Andrews & Arnold

unread,
Nov 14, 2023, 11:56:06 AM11/14/23
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold

_Started: 2023-11-14 17:00:00_

Over the past few weeks we have been carrying out planned work to
'shuffle' customers between some of the 'Witless' LNS routers at our
side. This work was carried out to apply both a software update and
configuration change. From a customer view, this work cause a line drop
overnight which is usually a very short outage in the early hours of
the morning, and not usually much of an inconvenience. However, in
addition to this planned work we have had a few crashes over the past
week. (10th at 8:30AM, 13th at 1:40PM and 14th at 2:50AM.) These have
caused a few minutes interruption to some customers during the day
time, which is very inconvenient and we do apologise for this.

The cause of these restarts is known to us. It is in relation to a low
level processor issue that requires a complicated workaround. We are
working on a longer term software fix for this. However, one of the
configuration changes we made has exacerbated this bug and has made it
more prevalent, we therefore plan to revert the configuration change
ASAP. The longer term software will then be applied at a later date
once it is ready

The configuration change does need us to reboot our routers, and so
this will involve us carrying out another round of moving customers
between LNSs again. This work will happen in the early hours of the
morning starting on Wednesday 15th November.


URL: https://aastatus.net/apost.cgi?incident=42577

--
AAISP Status Feed
URL: https://aastatus.net/

Andrews & Arnold

unread,
Nov 15, 2023, 3:48:05 AM11/15/23
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #1: 2023-11-15 08:43:24 GMT
> _Update 14 Nov 2023 17:01:12_ Customers on "Y.Witless" will be moved
> during the early hours of Wednesday 15th November. Y.witless will then
> have it's config change and rebooted the following day.
>
> _Update 14 Nov 2023 21:20:41_ Tuesday evening (21:03) Sadly, X.Witless
> crashed this evening causing disruption for those customers connected
> to it. X.Witless would have been scheduled to have the mentioned config
> change and reboot tomorrow night - that won't be needed now, as the
> restart means it has now applied the config change.
>
> _Update 15 Nov 2023 08:43:24_ All our 'Witless' LNS routers have now
> had the configuration change applied, which should hopefully reduce the
> chance of further crashes. We'll keep this post open and will post
> further updates regarding the software upgrade which we hope will
> happen early next week. _Update expected: 2023-11-17 14:00:00_

Andrews & Arnold

unread,
Nov 15, 2023, 12:16:06 PM11/15/23
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #2: 2023-11-15 17:14:02 GMT

Andrews & Arnold

unread,
Nov 20, 2023, 1:32:06 PM11/20/23
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #3: 2023-11-20 18:29:01 GMT
> _Update 20 Nov 2023 18:29:01_ Some more changes and testing of the
> software are still required before we update our routers. _Update
> expected: 2023-11-21 15:00:00_

Andrews & Arnold

unread,
Nov 27, 2023, 10:24:07 AM11/27/23
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #4: 2023-11-27 15:21:18 GMT
> _Update 27 Nov 2023 15:21:18_ A software upgrade is being applied this
> week: https://aastatus.net/42582 _Update expected: 2023-11-30 15:00:00_

Andrews & Arnold

unread,
Nov 30, 2023, 7:56:06 AM11/30/23
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #5: 2023-11-30 12:51:16 GMT
> _Update 30 Nov 2023 12:51:16_ Newer software was applied to Z.Witless
> and some lines were moved across to it. However there was still a
> crash. Work ongoing to resolve this, and newer software is being
> testing at the moment. _Update expected: 2023-12-01 16:00:00_

Andrews & Arnold

unread,
Dec 1, 2023, 5:48:07 AM12/1/23
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #6: 2023-12-01 10:46:25 GMT
> _Update 1 Dec 2023 10:42:51_ Z.Witless crashed again at around 10:30,
> causing an outage for about 200 customers.
>
> _Update 1 Dec 2023 10:46:25_ We do apologise for these recent
> disconnects and we are fully aware how frustrating and interrupting
> this problems have been for those customers affected. Our developers
> have been constantly working on the problem these past few weeks, and
> we are discussing what next steps to take. _Update expected: 2023-12-01

Andrews & Arnold

unread,
Dec 1, 2023, 5:56:06 AM12/1/23
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #7: 2023-12-01 10:53:23 GMT
> _Update 1 Dec 2023 10:53:23_ Z.Witless has crashed again, dropping
> connections for around 160 customers. We're taking Z.Witless out of
> service, and customers will see their connections routed by x.witless
> or y.witless. _Update expected: 2023-12-01 16:00:00_

Andrews & Arnold

unread,
Dec 6, 2023, 7:31:40 AM12/6/23
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #8: 2023-12-06 12:23:32 GMT
> _Update 6 Dec 2023 12:23:32_ We've been testing newer software this
> week in the test lab. _Update expected: 2023-12-11 16:00:00_

Andrews & Arnold

unread,
Dec 13, 2023, 7:08:08 AM12/13/23
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #9: 2023-12-13 12:06:56 GMT
> _Update 13 Dec 2023 12:06:56_ It's been 12 days since the last
> incident, we are running slightly older software which is more stable.
> Unless the situation changes we will not make any changes to our live
> LNSs until the new year. Meanwhile our efforts are focused on the root
> cause investigation and are performing continuous tests in the lab. _Update
> expected: 2023-01-03 11:00:00_

Andrews & Arnold

unread,
Dec 14, 2023, 10:40:06 AM12/14/23
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #10: 2023-12-14 15:33:19 GMT
| _Update 13 Dec 2023 12:06:56_ [As of 13th December] It's been 12 days

Andrews & Arnold

unread,
Jan 2, 2024, 6:56:06 AMJan 2
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #11: 2024-01-02 11:52:46 GMT
> _Update 2 Jan 2024 11:52:46_ Good progress has been made in the
> investigation and fix for this problem, including weeks of testing in
> the FireBrick test lab. We will start to load new software on to our
> routers. A separate Planned Work post will be created for this work. _Update
> expected: 2024-01-08 12:00:00_

Andrews & Arnold

unread,
Jan 2, 2024, 8:08:06 AMJan 2
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #12: 2024-01-02 13:03:26 GMT
| routers. A separate Planned Work post has been created for this work:
| https://aastatus.net/42593 _Update expected: 2024-01-08 12:00:00_

Andrews & Arnold

unread,
Jan 11, 2024, 4:48:05 AMJan 11
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #13: 2024-01-11 09:44:22 GMT
> _Update 11 Jan 2024 09:44:22_ Update as of January 11th. The good news
> is that the fix for the original hardware lockup has been applied to
> two of our three 'Witless' LNSs and we've not seen the same lockup in
> either our test lab or the upgraded Witless LNSs. However, Z.Witless
> has had a couple of additional crashes which have not been seen on our
> other units. As a result of this we will be replacing Z.Witless with
> new hardware as a matter of urgency. _Update expected: 2024-01-12
> 15:00:00_

Andrews & Arnold

unread,
Jan 12, 2024, 12:16:05 PMJan 12
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #14: 2024-01-12 17:12:26 GMT
> _Update 12 Jan 2024 17:09:37_
>
> An update of where we are (Friday 12th January).
>
> Some customers have had interruption to their service this week as we
> have seen a number of crashes on both Z.Witless and Y.Witless.
>
> Today we replaced the hardware of Z.Witless.
>
> Our developers have been working on investigating each crash we have.
> We have been saying in recent updates that progress had been made on
> the crashes we have seen, and this week we applied the software update
> to two of our three 'Witless' LNSs. In our test lab we have never seen
> this updated software crash during 3 weeks of testing. However, we have
> had crashes this week since applying the updated software.
>
> Usually with a crash, our developers are sent a crashlog with details
> specifying exactly where in the code the crash happened. However, the
> crashes that have been affecting us are different in that the hardware
> locks up and restarts - with this type of crash we have less forensic
> to work with which is making getting to the bottom of the problem that
> much harder.
>
> We are still working hard to resolve this. We various avenues of
> investigation to take, and during the next week we will be planning
> more overnight work as well as datacentre trips.
>
> We know how disruptive this has been for those customers affected, and
> we are doing all we can to work towards a stable service for everyone.
>
> _Update expected: 2024-01-15 13:00:00_

Andrews & Arnold

unread,
Jan 12, 2024, 10:00:18 PMJan 12
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #15: 2024-01-13 02:58:17 GMT
> _Update 13 Jan 2024 02:58:17_ Y.Witless has very few customers
> connected to it, and will be rebooted at 3AM on Sat 13 Jan. _Update

Andrews & Arnold

unread,
Jan 19, 2024, 10:56:05 AMJan 19
to
Posted at 2023-11-14 16:50:33 GMT by Andrews & Arnold
Update #16: 2024-01-19 15:55:30 GMT
> _Resolution_ This incident will be closed as we have posted a new
> update/summary regarding this problem: https://aastatus.net/42608
0 new messages