A200 issue with node firmware 11.5.1

167 views
Skip to first unread message

Paul Carrington

unread,
Dec 8, 2022, 12:16:50 PM12/8/22
to isilon-u...@googlegroups.com
Currently were engaging with dell technical support but given hwo far downhill its gone recently im not hoping for a speedy resolution.

We have a 30 node cluster (8 H500 nodes and 22 A200 Nodes).
Prior to deploying 9.4.0.7 we are upgrading node firmware to 11.5.1 and have hit a snag - it appears some of our nodes are corrupting either the network card firmware or putting an incompatible version on the card.
Behaviour is after node is upgraded the 2.83 nic firmware goes to 2.84 and the nic shows no carrier.

I've seen this once before with a 10.x.x node firmware that broke a couple of A200 network cards (front end).
Just wondering if any on the group here are on as recent a firmware with similar hardware?

For information its not all A200 nodes - about 50% and the H500s are unaffected due to having different Nics.

Paul.

Alistair Stewart

unread,
Dec 8, 2022, 1:04:30 PM12/8/22
to isilon-u...@googlegroups.com
Considering how underfunded the A200 nodes are in the CPU department, I suggest you stick to using only the A500 nodes for client connections.

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/isilon-user-group/CALmzHC9RgmASoB6V_CP%3D-B6V%2BY3_E-kb6GTy6zsXcE82JoPD7A%40mail.gmail.com.

Paul Carrington

unread,
Dec 8, 2022, 1:24:01 PM12/8/22
to isilon-u...@googlegroups.com
Our A200's actually perform quite well as its very few connections and Dell did validate the configuration. 
Absolutely if this was home filestore I wouldnt even have A200s in the cluster. Once A200s are upgraded to 64GB ram they do perform considerably better.
Typically our A200 nodes see about 10 concurrent user connections if that and can flood multiple 1Gig edge network ports where client machines upload data concurrently 

That all aside it doesnt address the NICs failing to detect a carrier after the latest node firmware.

Ebert, Michael

unread,
Dec 8, 2022, 1:27:44 PM12/8/22
to isilon-u...@googlegroups.com
Was a redeployment of the FW attempted?

Michael Ebert

Storage Team Supervisor

State of West Virginia

Department of Administration

Office of Technology

 

1900 Kanawha Blvd East

Building 7, Room 101

Charleston, WV  25305

 

304.352.5283 Voice

 

No trees were killed in the sending of this message, but a large number of electrons were terribly inconvenienced.

 

"Notice of Confidentiality" The information contained in this e-mail message is intended for the use of the individual or entity named above.  If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copy of the communication is strictly prohibited.



Paul Carrington

unread,
Dec 8, 2022, 1:31:48 PM12/8/22
to isilon-u...@googlegroups.com
Upgrade is still running - Ideally we wanteds to pause and re-run node firmware on one of the affected nodes but Dell have said there is no way we can pause or abort a node fw upgrade.
So I'll wait for this to finish and initially at least do a node reboot of one of the affected nodes. about a year ago we had 3 nodes completely lose their minds with network cards and required 3 replacements.
Just wondering if others had had issues with the X552 Nics in their A200's
interestingly the two newest A200s dont have intel nics but mellanox instead much like h500 nodes. 

This was more a headsup / enquiry of anyone else had suffered this.

Ebert, Michael

unread,
Dec 8, 2022, 1:33:07 PM12/8/22
to isilon-u...@googlegroups.com
How about changing the link negotiation to/from auto?  Check with the network vendor to see if there are incompatibilities with certain NIC FW versions.


Michael Ebert

Storage Team Supervisor

State of West Virginia

Department of Administration

Office of Technology

 

1900 Kanawha Blvd East

Building 7, Room 101

Charleston, WV  25305

 

304.352.5283 Voice

 

No trees were killed in the sending of this message, but a large number of electrons were terribly inconvenienced.

 

"Notice of Confidentiality" The information contained in this e-mail message is intended for the use of the individual or entity named above.  If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copy of the communication is strictly prohibited.



On Thu, Dec 8, 2022 at 1:24 PM Paul Carrington <paul.ca...@gmail.com> wrote:

Ebert, Michael

unread,
Dec 8, 2022, 1:35:37 PM12/8/22
to isilon-u...@googlegroups.com
Sorry, we have nothing but H500s and have only been through 2 upgrades since initial deployment, from 8 to 9.3.0.3 then to 9.3.0.6.  Planning one for January.


Michael Ebert

Storage Team Supervisor

State of West Virginia

Department of Administration

Office of Technology

 

1900 Kanawha Blvd East

Building 7, Room 101

Charleston, WV  25305

 

304.352.5283 Voice

 

No trees were killed in the sending of this message, but a large number of electrons were terribly inconvenienced.

 

"Notice of Confidentiality" The information contained in this e-mail message is intended for the use of the individual or entity named above.  If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copy of the communication is strictly prohibited.



Monty Bolinger

unread,
Dec 8, 2022, 2:21:31 PM12/8/22
to isilon-u...@googlegroups.com
What are you doing to check the FW on the cards?
Also what version of oneFS were you at before the upgrade?

We have run into some network weirdness but not like what you are seeing on 9.2.1, but I think this version is 2.52 not a 2.8 version.

MB

--

Paul Carrington

unread,
Dec 8, 2022, 2:43:05 PM12/8/22
to isilon-u...@googlegroups.com
Yeah my mistake 2.52 - been a long day lol.
I'm using isi_upgrade_logs --get-fw_report

The update is one of these (none are the nic itself so presumably the bmc or some other management interface is going bonkers):

DEeth_infinity            ePOST   02.52                   02.52                   False    10-31

DEpost_banshee            ePOST   28.15                   28.16                   True     10-31
DEbios_banshee            ePOST   37.41                   37.42                   True     10-31
DEsas_infinity            ePOST   0001.0004.0000.11192    0001.0004.0000.11193    True     10-21
DEspime_banshee           ePOST   02.02                   03.00                   True     10-31
DEssp_infinity            ePOST   02.83                   02.84                   True     10-31

These are the only updates occurring on the nodes which are affected.

I'm hopeful a reboot may help as last year on a firmware packeg we had to reboot a second time and they sorted themselves out.
Ive tried restarting flexnet daemon as that looked a likely candidate. Anyway its with Dell technical and I've escalated via our technical contact at Dell.
Helps were in the middle of procuring some replacements to get accelertae tech support. 1st line support in India is massively worse than when we dealt with EMC direct back in our IQ node days.

Only another 3 node reboots and I can test my theory - weve already swapped sfps and connections and the fault stays with the node not the port so its isilon/nic centric

Reply all
Reply to author
Forward
0 new messages