Hi all, (long technical post sorry)
I have had ongoing switch stacking issues between two 8000GS/24 switches that are stacked, and also have an LACP trunk (2 ports) back to the core.
I had this issue about a year ago, contacted Allied Telesis and they kindly replaced the switches, so I installed the new switches, new HDMI staking cables, SFP modules etc.
The weird thing tho, a week or two later, the exact same thing. I gave up on it last year and unstacked the switches and used separate uplinks to the core, ran perfectly fine.
A week ago I've decided to give this another shot. The rest of the school uses a near identical setup; where we have stacked switches the last port on the top and bottom switches are LACP trunked back to the core.
In this case, Ports 1/g24 and 2/g24 form an LACP active trunk port. The switches are stacked in the usual way - using the console cable the top switch is stack 1, the second stack 2. (LED's, terminal, GUI confirm this)
The errors I get are below, and config further down. If anyone here has an idea let me know - else I may revert back to Allied Telesis. I may try without LACP and keep the stack, or a few other things. Could end devices do this? (no loops, storm control active) nothing unusual on individual ports.
The errors I get are:
- UNIT ID 2,Msg:%2SWDMAIN-F-MSTRTMREXPR: SW2P_dist_tp_change_timer_expiry : TIMER expired on MASTER ENABLE unit ***** FATAL ERROR ***** Reporting Task: BOXS. Software Version: 2.0.0.27 (date 21-Oct-2012 time 11:17:50) 0x14e11c 0x14b704 0x5798b8 0x3a45e8 0x3a7b14 0x6e2050 0x7acef8 0x7ad0ac ***** END OF FATAL ERROR *****
- UNIT ID 2,Msg:%Box-F-DISPATCHER-SEND-FAILED: Function BOXP_send_dispatcher_event: failed sending dispatcher event 31526156, status = ***** FATAL ERROR ***** Reporting Task: EVGN. Software Version: 2.0.0.27 (date 21-Oct-2012 time 11:17:50) 0x14e11c 0x14b704 0x5798b8 0x3a45e8 0x3a7b14 0x758b64 0x763d18 0x76ffa0 0x6ec558 ***** END OF FATAL ERROR *****
- UNIT ID 1,Msg:%SYSLOG-A-NONPRINTABLE: Formatted message for ComponentID 1 application 0 (Box Application), message 10 (DISPATCHER-SEND-FAILED) contains non-printable characters message string is : $$$%Box-F-DISPATCHER-SEND-FAILED: Function BOXP_send_dispatcher_event: failed sending dispatcher event 31526156, status = ?!?? $$$
Also in syslog, endlessly are:
- <190>1 2016-09-02T11:21:29.178292+12:00 %STCK SYSL-I-UNITMSG - - - SYSL-I-UNITMSG: UNIT ID 1,Msg:%Stack-I-LINK UP: link 0 on unit-1
- <188>1 2016-09-02T11:21:29.180125+12:00 %Stack-W-LINK DOWN - - - DOWN: link 0 on unit-1
- <188>1 2016-09-02T11:21:29.181675+12:00 %STCK SYSL-W-UNITMSG - - - SYSL-W-UNITMSG: UNIT ID 1,Msg:%Stack-W-LINK DOWN: link 0 on unit-1
- <190>1 2016-09-02T11:21:29.183504+12:00 %Stack-I-LINK UP - - - UP: link 0 on unit-1
- <190>1 2016-09-02T11:21:29.192054+12:00 %STCK SYSL-I-UNITMSG - - - SYSL-I-UNITMSG: UNIT ID 1,Msg:%Stack-I-LINK UP: link 0 on unit-1
- <188>1 2016-09-02T11:21:29.360938+12:00 %Stack-W-LINK DOWN - - - DOWN: link 0 on unit-1
- <188>1 2016-09-02T11:21:29.372465+12:00 %STCK SYSL-W-UNITMSG - - - SYSL-W-UNITMSG: UNIT ID 1,Msg:%Stack-W-LINK DOWN: link 0 on unit-1
- <190>1 2016-09-02T11:21:29.377452+12:00 %STCK SYSL-I-UNITMSG - - - SYSL-I-UNITMSG: UNIT ID 1,Msg:%Stack-I-LINK UP: link 0 on unit-1
- <188>1 2016-09-02T11:21:29.379837+12:00 %STCK SYSL-W-UNITMSG - - - SYSL-W-UNITMSG: UNIT ID 1,Msg:%Stack-W-LINK DOWN: link 0 on unit-1
- <190>1 2016-09-02T11:21:29.382084+12:00 %STCK SYSL-I-UNITMSG - - - SYSL-I-UNITMSG: UNIT ID 1,Msg:%Stack-I-LINK UP: link 0 on unit-1
After some time, or a reboot, everything is fine for a day, or a week, or 2 weeks.
General config of switches:
show stack
Unit MAC Address Software Master Uplink Downlink Status
---- ----------------- ---------- -------- --------- --------- -------
1 ec:cd:6d:d1:5a:5b 2.0.0.27 Enabled 2 2 backup
2 ec:cd:6d:d1:50:7e 2.0.0.27 Enabled 1 1 master
Topology is Ring
show stack 1
Unit: 1
MAC address: ec:cd:6d:d1:5a:5b
Master: Enabled.
Product: AT-8000GS/24. Software: 2.0.0.27
Uplink unit: 2 Downlink unit: 2.
Status: backup
Active image: image2.
Selected for next boot: image2.
Topology is Ring
Unit Num After Reset: 1
show stack 2
Unit: 2
MAC address: ec:cd:6d:d1:50:7e
Master: Enabled.
Product: AT-8000GS/24. Software: 2.0.0.27
Uplink unit: 1 Downlink unit: 1.
Status: master
Active image: image2.
Selected for next boot: image2.
Topology is Ring
Unit Num After Reset: 2
show interfaces port-channel
Gathering information...
Channel Ports
------- -----
ch1 Active: 1/g24,2/g24
show lacp ethernet 1/g24
1/g24 LACP parameters:
Actor
system priority: 1
system mac addr: ec:cd:6d:d1:50:7e
port Admin key: 1000
port Oper key: 1000
port Oper number: 24
port Admin priority: 1
port Oper priority: 1
port Admin timeout: LONG
port Oper timeout: LONG
LACP Activity: ACTIVE
Aggregation: AGGREGATABLE
synchronization: TRUE
collecting: TRUE
distributing: TRUE
expired: FALSE
Partner
system priority: 32768
system mac addr: ec:cd:6d:84:95:cb
port Admin key: 0
port Oper key: 2
port Oper number: 5003
port Admin priority: 0
port Oper priority: 32768
port Oper timeout: LONG
LACP Activity: ACTIVE
Aggregation: AGGREGATABLE
synchronization: TRUE
collecting: TRUE
distributing: TRUE
expired: FALSE
1/g24 LACP statistics:
LACP Pdus sent: 100
LACP Pdus received: 100
1/g24 LACP Protocol State:
LACP State Machines:
Receive FSM: Current State
Mux FSM: Collecting Distributing State
Periodic Tx FSM: Slow Periodic State
Control Variables:
BEGIN: FALSE
LACP_Enabled: TRUE
Ready_N: FALSE
Selected: SELECTED
Port_moved: FALSE
NNT: FALSE
Port_enabled: TRUE
Timer counters:
periodic tx timer: 22
current while timer: 88
wait while timer: 0
show lacp ethernet 2/g24
2/g24 LACP parameters:
Actor
system priority: 1
system mac addr: ec:cd:6d:d1:50:7e
port Admin key: 1000
port Oper key: 1000
port Oper number: 74
port Admin priority: 1
port Oper priority: 1
port Admin timeout: LONG
port Oper timeout: LONG
LACP Activity: ACTIVE
Aggregation: AGGREGATABLE
synchronization: TRUE
collecting: TRUE
distributing: TRUE
expired: FALSE
Partner
system priority: 32768
system mac addr: ec:cd:6d:84:95:cb
port Admin key: 0
port Oper key: 2
port Oper number: 5004
port Admin priority: 0
port Oper priority: 32768
port Oper timeout: LONG
LACP Activity: ACTIVE
Aggregation: AGGREGATABLE
synchronization: TRUE
collecting: TRUE
distributing: TRUE
expired: FALSE
2/g24 LACP statistics:
LACP Pdus sent: 104
LACP Pdus received: 105
2/g24 LACP Protocol State:
LACP State Machines:
Receive FSM: Current State
Mux FSM: Collecting Distributing State
Periodic Tx FSM: Slow Periodic State
Control Variables:
BEGIN: FALSE
LACP_Enabled: TRUE
Ready_N: FALSE
Selected: SELECTED
Port_moved: FALSE
NNT: FALSE
Port_enabled: TRUE
Timer counters:
periodic tx timer: 26
current while timer: 63
wait while timer: 0
On the CORE switch for this trunk:
show etherchannel detail
% Aggregator po2 (4602)
% Mac address: ec:cd:6d:84:95:cb
% Admin Key: 0002 - Oper Key 0002
% Receive link count: 2 - Transmit link count: 2
% Individual: 0 - Ready: 1
% Partner LAG: 0x0001,ec-cd-6d-d1-50-7e
% Link: port1.0.3 (5003) sync: 1
% Link: port1.0.4 (5004) sync: 1