Switch Stack issues, your thoughts?

263 views
Skip to first unread message

Matt Strickland

unread,
Sep 1, 2016, 8:04:46 PM9/1/16
to Techies for schools
Hi all, (long technical post sorry)

I have had ongoing switch stacking issues between two 8000GS/24 switches that are stacked, and also have an LACP trunk (2 ports) back to the core.

I had this issue about a year ago, contacted Allied Telesis and they kindly replaced the switches, so I installed the new switches, new HDMI staking cables, SFP modules etc.
The weird thing tho, a week or two later, the exact same thing. I gave up on it last year and unstacked the switches and used separate uplinks to the core, ran perfectly fine.

A week ago I've decided to give this another shot. The rest of the school uses a near identical setup; where we have stacked switches the last port on the top and bottom switches are LACP trunked back to the core.
In this case, Ports 1/g24 and 2/g24 form an LACP active trunk port. The switches are stacked in the usual way - using the console cable the top switch is stack 1, the second stack 2. (LED's, terminal, GUI confirm this)

The errors I get are below, and config further down. If anyone here has an idea let me know - else I may revert back to Allied Telesis. I may try without LACP and keep the stack, or a few other things. Could end devices do this? (no loops, storm control active) nothing unusual on individual ports.

The errors I get are:
  •  UNIT ID 2,Msg:%2SWDMAIN-F-MSTRTMREXPR: SW2P_dist_tp_change_timer_expiry : TIMER expired on MASTER ENABLE unit    ***** FATAL ERROR *****   Reporting Task: BOXS.  Software Version: 2.0.0.27 (date  21-Oct-2012 time  11:17:50)  0x14e11c  0x14b704  0x5798b8  0x3a45e8  0x3a7b14  0x6e2050  0x7acef8  0x7ad0ac    ***** END OF FATAL ERROR *****     
  •  UNIT ID 2,Msg:%Box-F-DISPATCHER-SEND-FAILED: Function BOXP_send_dispatcher_event: failed sending dispatcher event 31526156, status =     ***** FATAL ERROR *****   Reporting Task: EVGN.  Software Version: 2.0.0.27 (date  21-Oct-2012 time  11:17:50)  0x14e11c  0x14b704  0x5798b8  0x3a45e8  0x3a7b14  0x758b64  0x763d18  0x76ffa0  0x6ec558    ***** END OF FATAL ERROR *****     
  •  UNIT ID 1,Msg:%SYSLOG-A-NONPRINTABLE: Formatted message for ComponentID 1 application 0 (Box Application), message 10 (DISPATCHER-SEND-FAILED)  contains non-printable characters   message string is : $$$%Box-F-DISPATCHER-SEND-FAILED: Function BOXP_send_dispatcher_event: failed sending dispatcher event 31526156, status = ?!??  $$$    
Also in syslog, endlessly are:
  • <190>1 2016-09-02T11:21:29.178292+12:00 %STCK SYSL-I-UNITMSG - - - SYSL-I-UNITMSG: UNIT ID 1,Msg:%Stack-I-LINK UP: link 0 on unit-1    
  • <188>1 2016-09-02T11:21:29.180125+12:00 %Stack-W-LINK DOWN - - - DOWN: link 0 on unit-1  
  • <188>1 2016-09-02T11:21:29.181675+12:00 %STCK SYSL-W-UNITMSG - - - SYSL-W-UNITMSG: UNIT ID 1,Msg:%Stack-W-LINK DOWN: link 0 on unit-1    
  • <190>1 2016-09-02T11:21:29.183504+12:00 %Stack-I-LINK UP - - - UP: link 0 on unit-1  
  • <190>1 2016-09-02T11:21:29.192054+12:00 %STCK SYSL-I-UNITMSG - - - SYSL-I-UNITMSG: UNIT ID 1,Msg:%Stack-I-LINK UP: link 0 on unit-1    
  • <188>1 2016-09-02T11:21:29.360938+12:00 %Stack-W-LINK DOWN - - - DOWN: link 0 on unit-1  
  • <188>1 2016-09-02T11:21:29.372465+12:00 %STCK SYSL-W-UNITMSG - - - SYSL-W-UNITMSG: UNIT ID 1,Msg:%Stack-W-LINK DOWN: link 0 on unit-1    
  • <190>1 2016-09-02T11:21:29.377452+12:00 %STCK SYSL-I-UNITMSG - - - SYSL-I-UNITMSG: UNIT ID 1,Msg:%Stack-I-LINK UP: link 0 on unit-1    
  • <188>1 2016-09-02T11:21:29.379837+12:00 %STCK SYSL-W-UNITMSG - - - SYSL-W-UNITMSG: UNIT ID 1,Msg:%Stack-W-LINK DOWN: link 0 on unit-1    
  • <190>1 2016-09-02T11:21:29.382084+12:00 %STCK SYSL-I-UNITMSG - - - SYSL-I-UNITMSG: UNIT ID 1,Msg:%Stack-I-LINK UP: link 0 on unit-1  
After some time, or a reboot, everything is fine for a day, or a week, or 2 weeks.

General config of switches:

show stack
Unit  MAC Address        Software    Master    Uplink     Downlink   Status
----  -----------------  ----------  --------  ---------  ---------  -------
1     ec:cd:6d:d1:5a:5b  2.0.0.27    Enabled   2          2          backup
2     ec:cd:6d:d1:50:7e  2.0.0.27    Enabled   1          1          master
Topology is Ring

show stack 1
Unit:                   1
MAC address:            ec:cd:6d:d1:5a:5b
Master:                 Enabled.
Product:                AT-8000GS/24. Software: 2.0.0.27
Uplink unit:            2 Downlink unit: 2.
Status:                 backup
Active image:           image2.
Selected for next boot: image2.
Topology is Ring
Unit Num After Reset:   1

show stack 2
Unit:                   2
MAC address:            ec:cd:6d:d1:50:7e
Master:                 Enabled.
Product:                AT-8000GS/24. Software: 2.0.0.27
Uplink unit:            1 Downlink unit: 1.
Status:                 master
Active image:           image2.
Selected for next boot: image2.
Topology is Ring
Unit Num After Reset:   2

show interfaces port-channel
Gathering information...
Channel  Ports
-------  -----
ch1      Active: 1/g24,2/g24

show lacp ethernet 1/g24
1/g24 LACP parameters:
      Actor
              system priority:       1
              system mac addr:       ec:cd:6d:d1:50:7e
              port Admin key:        1000
              port Oper key:         1000
              port Oper number:      24
              port Admin priority:   1
              port Oper priority:    1
              port Admin timeout:    LONG
              port Oper timeout:     LONG
              LACP Activity:         ACTIVE
              Aggregation:           AGGREGATABLE
              synchronization:       TRUE
              collecting:            TRUE
              distributing:          TRUE
              expired:               FALSE
      Partner
              system priority:       32768
              system mac addr:       ec:cd:6d:84:95:cb
              port Admin key:        0
              port Oper key:         2
              port Oper number:      5003
              port Admin priority:   0
              port Oper priority:    32768
              port Oper timeout:     LONG
              LACP Activity:         ACTIVE
              Aggregation:           AGGREGATABLE
              synchronization:       TRUE
              collecting:            TRUE
              distributing:          TRUE
              expired:               FALSE
1/g24 LACP statistics:
      LACP Pdus sent:                100
      LACP Pdus received:            100
1/g24 LACP Protocol State:
      LACP State Machines:
              Receive FSM:           Current State
              Mux FSM:               Collecting Distributing State
              Periodic Tx FSM:       Slow Periodic State
      Control Variables:
              BEGIN:                 FALSE
              LACP_Enabled:          TRUE
              Ready_N:               FALSE
              Selected:              SELECTED
              Port_moved:            FALSE
              NNT:                   FALSE
              Port_enabled:          TRUE
      Timer counters:
              periodic tx timer:     22
              current while timer:   88
              wait while timer:      0

show lacp ethernet 2/g24
2/g24 LACP parameters:
      Actor
              system priority:       1
              system mac addr:       ec:cd:6d:d1:50:7e
              port Admin key:        1000
              port Oper key:         1000
              port Oper number:      74
              port Admin priority:   1
              port Oper priority:    1
              port Admin timeout:    LONG
              port Oper timeout:     LONG
              LACP Activity:         ACTIVE
              Aggregation:           AGGREGATABLE
              synchronization:       TRUE
              collecting:            TRUE
              distributing:          TRUE
              expired:               FALSE
      Partner
              system priority:       32768
              system mac addr:       ec:cd:6d:84:95:cb
              port Admin key:        0
              port Oper key:         2
              port Oper number:      5004
              port Admin priority:   0
              port Oper priority:    32768
              port Oper timeout:     LONG
              LACP Activity:         ACTIVE
              Aggregation:           AGGREGATABLE
              synchronization:       TRUE
              collecting:            TRUE
              distributing:          TRUE
              expired:               FALSE
2/g24 LACP statistics:
      LACP Pdus sent:                104
      LACP Pdus received:            105
2/g24 LACP Protocol State:
      LACP State Machines:
              Receive FSM:           Current State
              Mux FSM:               Collecting Distributing State
              Periodic Tx FSM:       Slow Periodic State
      Control Variables:
              BEGIN:                 FALSE
              LACP_Enabled:          TRUE
              Ready_N:               FALSE
              Selected:              SELECTED
              Port_moved:            FALSE
              NNT:                   FALSE
              Port_enabled:          TRUE
      Timer counters:
              periodic tx timer:     26
              current while timer:   63
              wait while timer:      0

On the CORE switch for this trunk:
show etherchannel detail
% Aggregator po2 (4602)
%  Mac address: ec:cd:6d:84:95:cb
%  Admin Key: 0002 - Oper Key 0002
%  Receive link count: 2 - Transmit link count: 2
%  Individual: 0 - Ready: 1
%  Partner LAG: 0x0001,ec-cd-6d-d1-50-7e
%   Link: port1.0.3 (5003) sync: 1
%   Link: port1.0.4 (5004) sync: 1






Andrew Godfrey

unread,
Sep 1, 2016, 8:22:36 PM9/1/16
to techies-f...@googlegroups.com
I take it you have also looked at the config at the core switch end and compared the config lines with other LACP trunks?

_______________________________________
 
Andrew Godfrey  |  Network Manager




Paul Batchelor

unread,
Sep 1, 2016, 8:25:25 PM9/1/16
to techies-f...@googlegroups.com
Matthew,

We are looking at this ….. Please call 0800 114 141 or contact sup...@alliedtelesis.net.nz.

Sounds a tad strange.

Paul Batchelor

Country Manager

Allied Telesis NZ Ltd

Mob + 64 21 660 347

tel + 64 4 566 4438 extn 7002.

Freephone 0800 114 141

AT-Product_selector_email


This e-mail message is for the sole use of the intended recipient(s) and may contain confidential and privileged information belonging to Allied Telesis. Any unauthorized review, use, disclosure, store, distribution or copying (either whole or partial) is prohibited. If you are not the intended recipient, please contact the sender by return and destroy all copies of the original message. E-mails are susceptible to alteration and their integrity cannot be guaranteed. Allied Telesis shall not be liable for this e-mail if modified or falsified. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's e-mail System Administrator.



--
You received this message because you are subscribed to the Google Groups "Techies for schools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to techies-for-sch...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matt Strickland

unread,
Sep 1, 2016, 9:27:26 PM9/1/16
to Techies for schools
Hi Andrew,

Yes config is identical with all the other switches, both on core and edge. Its only one block that does this, almost as if a weird earth loop issue or something outside of the switches.

I may even remove and set them up somewhere else, see what happens?

Matt

Matt Strickland

unread,
Sep 1, 2016, 9:33:47 PM9/1/16
to Techies for schools
Thanks Paul,

Yes I plan to follow up with AT support, just seeing if someone has a simple solution just in case its a silly setting.
This is the same replica issue from 31/7 last year (emails with Fay)

Cheers,

Matt
To unsubscribe from this group and stop receiving emails from it, send an email to techies-for-schools+unsub...@googlegroups.com.

Andrew Godfrey

unread,
Sep 1, 2016, 9:37:20 PM9/1/16
to techies-f...@googlegroups.com

On 2 September 2016 at 13:27, Matt Strickland <ma...@zebis.co.nz> wrote:
almost as if a weird earth loop issue or something

Isn't it a fiber link !?!?

Kevin Whelan

unread,
Sep 4, 2016, 4:26:22 PM9/4/16
to Techies for schools
you could try your trunk across 2 adjacent ports in the same switch, would take a few things out of the equation

Robert Hurley

unread,
Sep 4, 2016, 6:17:46 PM9/4/16
to Techies for schools
Hi Matt,

From my previous encounters with the logs containing:

  • <190>1 2016-09-02T11:21:29.178292+12:00 %STCK SYSL-I-UNITMSG - - - SYSL-I-UNITMSG: UNIT ID 1,Msg:%Stack-I-LINK UP: link 0 on unit-1    
  • <188>1 2016-09-02T11:21:29.180125+12:00 %Stack-W-LINK DOWN - - - DOWN: link 0 on unit-1  
  • <188>1 2016-09-02T11:21:29.181675+12:00 %STCK SYSL-W-UNITMSG - - - SYSL-W-UNITMSG: UNIT ID 1,Msg:%Stack-W-LINK DOWN: link 0 on unit-1    
  • <190>1 2016-09-02T11:21:29.183504+12:00 %Stack-I-LINK UP - - - UP: link 0 on unit-1  
  • <190>1 2016-09-02T11:21:29.192054+12:00 %STCK SYSL-I-UNITMSG - - - SYSL-I-UNITMSG: UNIT ID 1,Msg:%Stack-I-LINK UP: link 0 on unit-1    
  • <188>1 2016-09-02T11:21:29.360938+12:00 %Stack-W-LINK DOWN - - - DOWN: link 0 on unit-1  
  • <188>1 2016-09-02T11:21:29.372465+12:00 %STCK SYSL-W-UNITMSG - - - SYSL-W-UNITMSG: UNIT ID 1,Msg:%Stack-W-LINK DOWN: link 0 on unit-1    
  • <190>1 2016-09-02T11:21:29.377452+12:00 %STCK SYSL-I-UNITMSG - - - SYSL-I-UNITMSG: UNIT ID 1,Msg:%Stack-I-LINK UP: link 0 on unit-1    
  • <188>1 2016-09-02T11:21:29.379837+12:00 %STCK SYSL-W-UNITMSG - - - SYSL-W-UNITMSG: UNIT ID 1,Msg:%Stack-W-LINK DOWN: link 0 on unit-1    
  • <190>1 2016-09-02T11:21:29.382084+12:00 %STCK SYSL-I-UNITMSG - - - SYSL-I-UNITMSG: UNIT ID 1,Msg:%Stack-I-LINK UP: link 0 on unit-1  

This would point towards a stack/hdmi cable issue. If you unplug the cable from the left stacking port on unit 1 do the log messages stop?
Since this is the same issue as you had with the previous switches, it may be worth investigating if your stacking cables are being impacted when the cabinet is closed. That was the main reason I found for this to occur.

Rob Hurley | ICT Consultant
0800 SNUP HELP | M +64 27 203 9893E rob.h...@torqueip.co.nz

Pete Mundy

unread,
Sep 4, 2016, 8:18:35 PM9/4/16
to techies-f...@googlegroups.com

+1 to Rob's comments below. I've seen that more than once myself too.

Hope you get it sorted Matt :)

Pete

Matt Strickland

unread,
Sep 4, 2016, 9:01:52 PM9/4/16
to Techies for schools
Hi Robert,

For the cabinet I always open using the side open panels (I don't swing the cabinet out) and there is nothing behind both switches, plenty of room for the stack cables.
I have changed the stack cables to brand new out of the box but no changes :(

The message above is also reversed; sometimes its link 0 on unit 1, sometimes its link 1 on unit 2.

If I ran in chain mode (not ring) it would happen with either cable removed, again after a period of time.

Unfortunately its the time issue, you think its fixed, 3 days later same issue :(

Matt
Reply all
Reply to author
Forward
0 new messages