Instability of N4L provided Switches this morning (23/05/2024)

484 views
Skip to first unread message

d.keen...@gc.ac.nz

unread,
May 22, 2024, 7:30:18 PMMay 22
to Techies for schools
Hello All,

It would appear there is instability with the Ruckus SmartZone controller for our site, which we noticed when one of the core network switches decided to reboot itself during our Numeracy&Literacy exams.

Is anyone else having issues this morning, including a large amount of APs/Switches being described as offline?

Regards,

David Keenleyside, BSc CS & IS, CTech

ITP Associate

EFF Member

ICT Technician

Glenfield College

PO Box 40176 (Kaipatiki Rd)

Glenfield, Auckland City 0629


Ph:       +64 9 444 9066 ext 677

DDI: +64 9 441 9779

Email:    d.keen...@gc.ac.nz

https://itp.nz/CTech/NZ160799

https://www.linkedin.com/in/david-keenleyside-626871/

The Three O’s of Backup: Online, Offline, Off-site.

The Three RA’s of Cloud: Run Anywhere, Run Anytime, Run Agnostic.


Sue Way

unread,
May 23, 2024, 4:28:55 PMMay 23
to Techies for schools
Oh I feel for you.

This is one of my huge concerns about going N4L for Networking..

Good luck. And at such a critical time.

Sue Way
Wellington Girls' College



Mark Edwards

unread,
May 23, 2024, 4:44:29 PMMay 23
to Techies for schools
Hi David,

I have a few sites yesterday displaying weird behaviour but seems to be resolved now, hopefully N4L found the issue for you?

Simon Wright

unread,
May 23, 2024, 9:10:17 PMMay 23
to techies-f...@googlegroups.com
Not today...

We had an issue a couple of weeks back where we had an unscheduled power outage which took out our core switches (even though they are on a UPS). When they got powered back up we had all sorts of connectivity issues... namely a lot of the APs got the wrong IP (they got the staff vlan IP as the port on the switch saw the auth comes from a client and not the AP).
A lot of them also ended up on the backup controller. Needless to say, the N4L helpdesk worked through it and got them all back to normal.


Regards,

Simon Wright


--
You received this message because you are subscribed to the Google Groups "Techies for schools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to techies-for-sch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/techies-for-schools/5e856d91-be43-48bc-b215-2a8894b28071n%40googlegroups.com.


DISCLAIMER
This e-mail is intended for the addressee only and may contain information which is subject to legal privilege. This e-mail message and accompanying data may contain information that is confidential and subject to privilege. Its contents are not necessarily the official view Otago Boys’ High School or communication of the Otago Boys’ High School. If you are not the intended recipient you must not use, disclose, copy or distribute this e-mail or any information in, or attached to it. If you have received this e-mail in error, please contact the sender immediately or return the original message to Otago Boys’ High School by e-mail, and destroy any copies. Otago Boys’ High School does not accept any liability for changes made to this e-mail or attachments after sending.

Ben Green

unread,
May 23, 2024, 9:10:24 PMMay 23
to techies-f...@googlegroups.com
All is perfectly normal for us this morning (on a Chch vSZ-HS controller).

- Ben.

From: techies-f...@googlegroups.com <techies-f...@googlegroups.com> on behalf of d.keen...@gc.ac.nz <d.keen...@gc.ac.nz>
Sent: Thursday, 23 May 2024 11:30 am
To: Techies for schools <techies-f...@googlegroups.com>
Subject: [techies-for-schools] Instability of N4L provided Switches this morning (23/05/2024)
 
--
You received this message because you are subscribed to the Google Groups "Techies for schools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to techies-for-sch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/techies-for-schools/5e856d91-be43-48bc-b215-2a8894b28071n%40googlegroups.com.
Christchurch Boys' High School
phone: +64 3 348 5003
address: 71 Straven Road, Riccarton, Christchurch 8014
postal: PO Box 8157, Riccarton, Christchurch 8440
web: www.cbhs.school.nz

Marlon Yu

unread,
May 23, 2024, 9:10:32 PMMay 23
to techies-f...@googlegroups.com

Hi David,

 

We had the same problem of APs appearing offline before. Apparently, according to N4L, that happens when the AP is on a backup/secondary controller.

 

Have not (*knock on wood*) experienced a switch rebooting yet.

 

Marlon

From: techies-f...@googlegroups.com <techies-f...@googlegroups.com> On Behalf Of d.keen...@gc.ac.nz
Sent: Thursday, May 23, 2024 11:30 AM
To: Techies for schools <techies-f...@googlegroups.com>
Subject: [techies-for-schools] Instability of N4L provided Switches this morning (23/05/2024)

 

CAUTION: This email originated from outside of Rangitoto College. Be careful about clicking on links or opening attachments. If in doubt, ask IT.

 

--

You received this message because you are subscribed to the Google Groups "Techies for schools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to techies-for-sch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/techies-for-schools/5e856d91-be43-48bc-b215-2a8894b28071n%40googlegroups.com.

*** RANGITOTO COLLEGE EMAIL DISCLAIMER ***
The contents of this email and any attachments are confidential and may be legally privileged. If you are not the intended recipient please advise the sender immediately and delete the email and attachments. Any use, dissemination, reproduction or distribution of this email and any attachments by anyone other than the intended recipient is prohibited.
*** RANGITOTO COLLEGE EMAIL DISCLAIMER ***

Leigh Cranefield

unread,
May 23, 2024, 9:10:32 PMMay 23
to techies-f...@googlegroups.com
I've had N4L looking into this for us since last term.  We have 29 APs and at one stage only 6 were showing as online in Zone Director, despite being physically online.  N4L got them showing but then they disappeared again.  They're all showing this morning but I've not been told why it's happening.  We haven't had our switches replaced yet......

Thankfully being primary, we don't have the pressure of exams but very frustrating all the same.

Leigh Cranefield

Leigh Cranefield
IT Administrator
Waterloo School
Hardy St, Lower Hutt    5011
04 939 2055   |   027 240 2006

--
You received this message because you are subscribed to the Google Groups "Techies for schools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to techies-for-sch...@googlegroups.com.

d.keen...@gc.ac.nz

unread,
May 23, 2024, 9:34:45 PMMay 23
to Techies for schools
We are currently looking at an RMA for a power unit for one of the Core switches, as it is giving a fault report; however, it really shouldn't be the cause of the reset, as these switches have redundancy to avoid this issue (supposedly.  Perhaps it's half-baked.)
So, with that unit physically removed, hopefully, there won't be any major surprises. When it arrives, refitting the new one should be a relatively fast process, with all the hot-swap capabilities of the Ruckus gear.

Many units still haven't reported back to base, so that's being investigated as a separate issue; given this hit after the unscheduled reboot, it seems likely that they are using a backup controller, as indicated earlier in this group.

The real pity is that the logs on these switches are not very useful when diagnosing these faults. The last log essentially indicates that the unit reset for no reason; a proper fault sub-code would be much better.

Note:  We have all this equipment on UPS units, for protection and to get through power outages.  There's a surprising amount of them in Glenfield, especially surges; it's crazy, I used to live where there was an acknowledged poor power situation, but it wasn't as bad as this.

Regards,

David Keenleyside, BSc CS & IS, CTech

ITP Associate

EFF Member

ICT Technician

Glenfield College

PO Box 40176 (Kaipatiki Rd)

Glenfield, Auckland City 0629


Ph:       +64 9 444 9066 ext 677

DDI: +64 9 441 9779

Email:    d.keen...@gc.ac.nz

https://itp.nz/CTech/NZ160799

https://www.linkedin.com/in/david-keenleyside-626871/

The Three O’s of Backup: Online, Offline, Off-site.

The Three RA’s of Cloud: Run Anywhere, Run Anytime, Run Agnostic.


adeel....@n4l.co.nz

unread,
May 23, 2024, 11:25:37 PMMay 23
to Techies for schools
Hi David,

Thanks for raising,  our initial investigation indicates they were two unrelated issues, a switch issue at Glenfield being an isolated incident and an issue with the smartzone cluster where SmartZone was showing switches offline.


As David mentioned in last email we are working with the school in relation to the root cause of this core switch outage with the support of Ruckus.

We are also investigating the root cause of the Switches showing offline on the Ruckus controller when they are in fact functioning.

The communication to switches is supplemental with the SmartZone, so the SmartZone can be disconnected and that does not affect the operation of the switches at the school. (Switches also do not fail over to any backup clusters on our SmartZone solution for this reason)


For any connectivity issues during assessment or exam times please give us a call straight away on 0800 532 764 so we can investigate and resolve promptly.

Regards,
Adeel Soomro

d.keen...@gc.ac.nz

unread,
May 24, 2024, 3:48:53 AMMay 24
to Techies for schools
The plot has thickened. The core switch [naturally the second in a stack, for maximum excitement] decided to reboot at 16:21 or so; fortunately, not too many people were around then, and the outage timeframe was not "too" long.

So, this reveals an interesting thing about this fault.  Because one of the PSUs was already faulty, there was nothing to fail over to when this originally happened, so our "reporting green and good" PSU could be at fault; it might be something else, but Occam's razor points here.
I'll know soon enough, as the replacement module will arrive next week. After it is added, we can see if this stops.

Now, should that core go down, things will be most interesting ;)  (Just imagine the excitement felt by the staff if they suddenly have to do everything using paper, without access to cloud services; expensive cellphone teathers aside.)

For mitigation purposes, I think it might be wise if N4L looked closely at decoupling that stack so we lose only half the college.

Note:  Prior to this, I ran the network using mostly Ubiquiti Equipment, so it's quite educational to see how a professional crew with high-cost sophisticated systems does it.

David Keenleyside, BSc CS & IS, CTech

ITP Associate

EFF Member

ICT Technician

Glenfield College

PO Box 40176 (Kaipatiki Rd)

Glenfield, Auckland City 0629


Ph:       +64 9 444 9066 ext 677

DDI: +64 9 441 9779

Email:    d.keen...@gc.ac.nz

https://itp.nz/CTech/NZ160799

https://www.linkedin.com/in/david-keenleyside-626871/

The Three O’s of Backup: Online, Offline, Off-site.

The Three RA’s of Cloud: Run Anywhere, Run Anytime, Run Agnostic.


Adeel Soomro

unread,
May 24, 2024, 2:09:47 PMMay 24
to techies-f...@googlegroups.com
Hi David,

Thanks for raising,  Our initial investigation indicates they were two unrelated issues, a switch issue at Glenfield being an isolated incident and an issue with the smartzone cluster where SmartZone was showing switches offline.


As David mentioned in last email we are working with the school in relation to the root cause of this core switch outage with the support of Ruckus.

We are also investigating the root cause of the Switches showing offline on the Ruckus controller when they are in fact functioning.

The communication to switches is supplemental with the SmartZone, so the SmartZone can be disconnected and that does not affect the operation of the switches at the school. (Switches also do not fail over to any backup clusters on our SmartZone solution for this reason)


For any connectivity issues during assessment or exam times please give us a call straight away on 0800 532 764 so we can investigate and resolve promptly.

Regards,
Adeel Soomro



This email, including attachments, may contain information which is confidential or privileged material. If you are not the intended recipient, please notify us immediately and then delete this email from your system. Email communications are not secure and are not guaranteed by The Network for Learning to be free of unauthorised interference, error or virus. Anyone who communicates with us by email is taken to accept this risk. Anything in this email which does not relate to the official business of The Network for Learning is neither given nor endorsed by The Network for Learning.

d.keen...@gc.ac.nz

unread,
May 27, 2024, 11:11:32 PMMay 27
to Techies for schools
This has now escalated to the stage where both Core switches will be replaced. Unit 1 has interesting issues, and Unit 2 certainly has a dead power bay [it probably died to save the PSU]; we checked this by swapping the power units around after the RMA unit reported the same fault.

So far, given this is knife-edge stuff for network functionality, there's certainly one thing we can say, by invoking SpaceX: "Ruckus: Excitement guaranteed."

Note: the RMA process is very fast; the equipment arrives the next morning.

Regards,

David Keenleyside, BSc CS & IS, CTech

ITP Associate

EFF Member

ICT Technician

Glenfield College

PO Box 40176 (Kaipatiki Rd)

Glenfield, Auckland City 0629


Ph:       +64 9 444 9066 ext 677

DDI: +64 9 441 9779

Email:    d.keen...@gc.ac.nz

https://itp.nz/CTech/NZ160799

https://www.linkedin.com/in/david-keenleyside-626871/

The Three O’s of Backup: Online, Offline, Off-site.

The Three RA’s of Cloud: Run Anywhere, Run Anytime, Run Agnostic.



Message has been deleted

SteveC

unread,
May 29, 2024, 1:49:01 AMMay 29
to Techies for schools
Is the world ready for lower-cost software managed, switches, or should be return to ridiculously expensive legacy switches from Cisco?  They were (can they still be purchased?) very expensive to buy and support, but in my experience, and recollections from what I've read, they kept working ...
More seriously, the question is how should our industry should adopt this particular Brave New World.  (The same question that we should be asking about Generative AI.)

Steve Cosgrove
22-year veteran Cisco Network Academy instructor
Currently doing Master of Engineering (Network Engineering) at Te Herenga Waka, partly because most young graduates don't know or understand networking!

Andrew Hood

unread,
May 29, 2024, 6:52:10 PMMay 29
to Techies for schools
I could argue that the pinnacle of networking was Cisco 2501 routers (7206 where you needed more kick) and 3560G switches, but the demands of performance and features will always push things forwards.

Modern networking kit is now a mashups of ASICs and common CPUs to allow all of those Software Defined Networking features that we dreamed of when I first designed networks 28 years ago. I wish I could say that price is a proxy for reliability, but in reality, some product lines even in the same vendor just end up being better than others and there is no easy way to predict that.

We should look to modern switching and routing technologies to build performance, reliance and flexibility in networks. The challenge is not so much the vendor of choice, but knowledge and design experience to make them work. Making a network work is easy, making it work well is hard. Having a fancy GUI for the SDN gets you so far, but it really helps if you know how to SYN-ACK your SYN when it starts to play up.

Thanks,

Andrew

Pete Mundy

unread,
May 29, 2024, 8:15:55 PMMay 29
to techies-f...@googlegroups.com
> "it really helps if you know how to SYN-ACK your SYN when it starts to play up"

It does! But QUIC may put a stop to that level of visibility soon enough...

10/10 recommended viewing, Geoff Houstin's (from APNIC) talk AT NZNOG in Nelson a month ago:




Jeffrey B

unread,
May 31, 2024, 4:42:54 AMMay 31
to techies-f...@googlegroups.com
We don't have visibility for that but have had two schools now that have lost ports or catchment areas to the dreaded backup controller killing internet access.  Still waiting on a fix.

Jeffrey.

From: 'Marlon Yu' via Techies for schools <techies-f...@googlegroups.com>
Sent: Thursday, May 23, 2024 1:12:27 PM
To: techies-f...@googlegroups.com <techies-f...@googlegroups.com>
Subject: RE: [techies-for-schools] Instability of N4L provided Switches this morning (23/05/2024)
 

David Keenleyside

unread,
May 31, 2024, 5:37:05 AMMay 31
to techies-f...@googlegroups.com
So, now that I'm back from InterfacEXPO, I can cover what happened with the equipment replacement, which ran until quite late yesterday.

Working with the N4L engineer, the new core switches + modules from the old ones were put in position, and the stack was rebuilt.

We confirmed the RMA PSU was indeed faulty [reconfirmed today].  We also confirmed that the old switch, Unit 2, would give a green light for any PSU in bay 2, regardless of its fault condition [it'd probably be green even if it were on fire, unlike a printer on BSD].  Bay 1 is certainly faulty.

After the stack was rebuilt, we were able to continue testing.  We confirmed two of the wireless points had indeed failed over to the backup controller and were, therefore, outside of our SmartZone controller's configuration; this was fixed after confirmation, but they were basically dropping anything connecting into a black hole [APs on 172.27 vs our 172.16].

Now, as I had an opportunity to talk with one of the Ruckus team at InterfacEXPO, I could detail what had happened.  He was very interested and then pointed out that they'd received messaging concerning these particular units (the model) in the last few days, in that the PSUs would run into problems with some UPS configurations due to the power profile not matching mains power; and a subsequent potential reset on failover.
This is a matter of sine wave vs. square wave, and we have pure sine wave units, which are supposed to be gentler.  However, it looks like we've hit on the magic smoke version ;)  Given what was indicated, more robust PSU units should be available soon.

With switching hardware, I'd tend towards Hybrid, where you have very high-spec modules that can plug into a normal chassis and are made available to Virtual machines [any vendor] running an SDN; incredible upgrade and software flexibility there.  Still potentially expensive, but far more options.

Note:  If I made a total hash of the communications from the Ruckus team, please correct it.  Switch unit 2 has what I would term a "Voodoo fault."

Regards,

David Keenleyside, BSc CS & IS, CTech

ITP Associate

EFF Member

ICT Technician

Glenfield College

PO Box 40176 (Kaipatiki Rd)

Glenfield, Auckland City 0629


Ph:       +64 9 444 9066 ext 677

DDI: +64 9 441 9779

Email:    d.keen...@gc.ac.nz

https://itp.nz/CTech/NZ160799

https://www.linkedin.com/in/david-keenleyside-626871/

The Three O’s of Backup: Online, Offline, Off-site.

The Three RA’s of Cloud: Run Anywhere, Run Anytime, Run Agnostic.


Reply all
Reply to author
Forward
0 new messages