ISILON x 400 | wear_life threshold exceeded

57 views
Skip to first unread message

Sayeed Syed

unread,
Apr 29, 2023, 5:01:49 PM4/29/23
to isilon-u...@googlegroups.com
Dear experts,

We have isilon X400 nodes(9) and don't have support.
received the below notification and not sure how to handle it.

Kindly advise.


ID
9.58726
Event
100010023
Time
2023-04-29 12:22:17
Node
9
Severity
Warning
Message
Drive at Internal J4/ada1 wear_life threshold exceeded: 90 (Threshold: 90). Please schedule drive replacement.

--

Thanks & Regards,

Syed Sayeed

Paul Carrington

unread,
Apr 29, 2023, 5:09:43 PM4/29/23
to isilon-u...@googlegroups.com
Sounds like one of the sdcards needs replacing google the isi radish command to determine the actual.wear 

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/isilon-user-group/CALswTOr_bMUuc0D50zJ0GO8QJ6yYpAs-Db9PyVHbbW%2BgpYrHjg%40mail.gmail.com.

Paul Carrington

unread,
Apr 29, 2023, 5:13:35 PM4/29/23
to isilon-u...@googlegroups.com
This should show wear rate on all sdcards

isi_for_array -s 'isi_radish -a /dev/ad[2,3,4,7] | grep -E "^Internal.*|Total Wear|Lifetime Left|Life Remain|^Carrier board.*"' | awk -F '[(]' '{ if(match($0,"Wear")) { printf "%s%d%s\n"," Life remaining: ",100 - ("0x" substr($0,match($0,"/")-2,2)),"% (SanDisk - Firmware issue causes inaccurate SMART wear data)" } else if(match($0,"Life")) { printf "%s%d%s\n"," Life remaining: ","0x" substr($2,15,2),"%" } else { printf ("%s%s%s",substr($0,0,match($0,":")-1)," ",substr($0,match($0,"/")-2,6)) } }'

Sayeed Syed

unread,
Apr 29, 2023, 5:39:49 PM4/29/23
to isilon-u...@googlegroups.com
Hi Paul,

Thank you for the command and after executing here is the output.
Does that mean node1, 2 ,3 have the issue and needs to replace all the boot disks ?


Isilon -1 J3/ada Life remaining: 96% (SanDisk - Firmware issue causes inaccurate SMART wear data)
Isilon -1 J4/ada Life remaining: 96% (SanDisk - Firmware issue causes inaccurate SMART wear data)
Isilon -2 J3/ada Life remaining: 99% (SanDisk - Firmware issue causes inaccurate SMART wear data)
Isilon -2 J4/ada Life remaining: 99% (SanDisk - Firmware issue causes inaccurate SMART wear data)
Isilon -3 J3/ada Life remaining: 94% (SanDisk - Firmware issue causes inaccurate SMART wear data)
Isilon -3 J4/ada Life remaining: 94% (SanDisk - Firmware issue causes inaccurate SMART wear data)
Isilon -4 J3/ada Life remaining: 19%
Isilon -4 J4/ada Life remaining: 16%
Isilon -5 J3/ada Life remaining: 26%
Isilon -5 J4/ada Life remaining: 24%
Isilon -6 J3/ada Life remaining: 16%
Isilon -6 J4/ada Life remaining: 13%
Isilon -7 J3/ada Life remaining: 35%
Isilon -7 J4/ada Life remaining: 32%
Isilon -8 J3/ada Life remaining: 21%
Isilon -8 J4/ada Life remaining: 19%
Isilon -9 J3/ada Life remaining: 12%
Isilon -9 J4/ada Life remaining: 10%

Ebert, Michael

unread,
Apr 29, 2023, 6:02:30 PM4/29/23
to isilon-u...@googlegroups.com
I would assume that the output reported states that it is due to outdated or problematic firmware and that SMART cannot provide good info.  If the nodes were bought at the same time, chances are the wear is about the same across the board.

Ebert, Michael

unread,
Apr 29, 2023, 6:04:14 PM4/29/23
to isilon-u...@googlegroups.com
And the immediate issue is node 9, J4 which hit the 90% wear threshold.

On Sat, Apr 29, 2023, 5:39 PM Sayeed Syed <sdsa...@gmail.com> wrote:

Sayeed Syed

unread,
Apr 29, 2023, 6:17:53 PM4/29/23
to isilon-u...@googlegroups.com
Hi Eberts,

Thank you for the reply.
How much time do I have to replace the failed disk ?
May I know the procedure for this failed book disk replacement.

Isilon-9   J4/ada   Life remaining: 10%



Ebert, Michael

unread,
Apr 29, 2023, 6:25:21 PM4/29/23
to isilon-u...@googlegroups.com
It does not appear to have failed, just hit 90% of its predicted life.  Based on a linear usage, you would have about 10% of the time it's been in service.   If it been in use for 5 years, there would be 6 months remaining.  But that doesn't mean that it won't  last years, or that it can't fail tomorrow. 

Erik Weiman

unread,
Apr 29, 2023, 6:34:54 PM4/29/23
to isilon-u...@googlegroups.com
Also depending on the age those are probably still the netlist brand 8GB SATA SSD that has no enclosure, it’s just a raw circuit board basically. It might be a 32GB drive that was used later as replacements, but unlikely. … there’s a ton of the 8Gb ones on eBay for the record, no idea on their wear life though. 

If you do decide to get a used drive please be sure to format it in a computer outside the cluster first so that you don’t risk the node booting up from that drive instead and mirroring over your nodes good boot disk. (Pretty sure that wouldn’t happen but better to play it safe)

--
Erik Weiman 
Sent from my iPhone 

On Apr 29, 2023, at 5:25 PM, 'Ebert, Michael' via Isilon Technical User Group <isilon-u...@googlegroups.com> wrote:



Sayeed Syed

unread,
Apr 29, 2023, 6:43:07 PM4/29/23
to isilon-u...@googlegroups.com
Hi Eberts,

Thank you for the clarification and it's a notification for heads up for me to prepare the future boot disk replacement.

Sayeed Syed

unread,
Apr 29, 2023, 6:47:28 PM4/29/23
to isilon-u...@googlegroups.com
Hi Erik,

Thank you for the information and I will follow your advice before the replacing the disk.

Reply all
Reply to author
Forward
0 new messages