IQ1920x Node Offline - Reporting Journal Problems

360 views
Skip to first unread message

Jeff Y.

unread,
Jun 29, 2016, 1:47:40 PM6/29/16
to Isilon Technical User Group
I have an IQ1920 node that is offline. It is a member of a three node cluster and had previously been reporting NVRAM battery charging issues. I am not sure if the NVRAM battery is failing or has already failed, but what's more concerning is what the front LED panel is reporting: "Problems detected with Journal" + "Reset".

I am not sure what the consequence of hitting "Reset" other than deleting the Journal. Assuming that there hadn't been any writes to the node I am guessing that action would be of little consequence. The cluster is reporting 99% full, and with one node offline its, its one small leap to disaster. Since the node is offline I can't get any realtime reporting information from the system to better understand what might be happening at the hardware level. OneFS 4.5.5 has limited reporting features so I can't get a full picture of what might actually be ailing this system.

Just thinking out loud, does anyone know if hitting "Reset" will actually bring the node back online, and if so, what are the consequences other than wiping the journal?

The system is no longer in production but the owner would like to keep it online for and maintain access to the data for now.

Thoughts?

Thanks in advance.

Jeff

erik.j...@gmail.com

unread,
Jun 29, 2016, 5:30:16 PM6/29/16
to isilon-u...@googlegroups.com
If you reset the node it will format the node and lose all data left on it. 

That's a really old version of OneFS. 

If there was space I'd suggest starting smartfail of this node and hope it can complete. If / when it does then you can reset that node and move along with life. However since you say it is at 99% I'd say you are already at disaster point. You'd need to delete enough data from the 2 online nodes to have room for the 3rd nodes data.

If you are lucky you might be able to copy most / all of the data off of the cluster using synciq. 

There should be 2 NVRAM batteries involved here. Likely they are both dead / hold no charge so if the node went down ungracefully the journal contents would have been lost. It sounds to me like the batteries weren't able to charge and the node rebooted for some reason and now you're in a tight spot. 

--
Erik Weiman
Sent from my iPhone 6s
--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeff Y.

unread,
Jun 30, 2016, 12:50:04 PM6/30/16
to Isilon Technical User Group
Thanks Erik. I figured the news would not be good. Do you have any idea why the node is offline and not pingable even though its powered on? I can't perform any admin tasks on the node unless its visible and responsive to the cluster.

erik.j...@gmail.com

unread,
Jun 30, 2016, 12:55:10 PM6/30/16
to isilon-u...@googlegroups.com
If the front panel is asking about resetting the node this means the boot is halted at a point early enough that there is no networking configured yet. Effectively its in single user mode and what a Windows user would call safe mode without networking. The only way to access this node is going to be via serial console cable. However, getting on the node still isn't going to allow you to access the data on it or have it mount ifs due to the journal / NVRAM error. 


--
Erik Weiman
Sent from my iPhone 6s

Saker Klippsten

unread,
Jun 30, 2016, 1:06:37 PM6/30/16
to isilon-u...@googlegroups.com
The node will not boot up fully into the OS without a good NVram Card. I still have a few laying around l if you are interested. Basically there was a design defect with how the battery was soldered to the board. When the battery expanded ( most likely due to heat ) the solder points broke free of the board. In newer versions  they moved to a battery than is Velcro strapped to the board with wire connection. In their next gen, NVram Cards will no longer be needed. 

Assuming you have hooked up a console cable? To watch it boot up? you should see the error..

Once you put into the new card and the battery is charged enough You will go through the node reset process and re-join the node to the cluster but will be like a newly formatted node. 


-S

 

erik.j...@gmail.com

unread,
Jun 30, 2016, 1:11:50 PM6/30/16
to isilon-u...@googlegroups.com
The 1920x has the newer type of NVRAM card with the 2 battery trays. The 1920i is the one you are thinking of that had the batteries soldered to the NVRAM card. 
All of the "X" series had the new card and all of the "i" series had the older style card. 


--
Erik Weiman
Sent from my iPhone 6s

Saker Klippsten

unread,
Jun 30, 2016, 2:29:01 PM6/30/16
to isilon-u...@googlegroups.com
I did not see where he said 1920x in his initial post... He is also running 4.5.5. The 1920x was not supported till OneFS 5.0 ( Q1 2008) If I recall. So if he has a 1920x it should have shipped with 5.x.

Jeff if you need a card. I checked and I have 5+ laying around. Do not bother resetting that node without a new NVram Card.

-S

Erik Weiman

unread,
Jun 30, 2016, 4:04:47 PM6/30/16
to isilon-u...@googlegroups.com
True ... looking at the version he is running you are correct. 
It says a 1920x in the subject line.
I've seen some people replace the batteries on the card used in the 'i' series with some basic soldering skills assuming that the swelled battery didn't destroy the circuit board when the failure happened. 

Jeff Y.

unread,
Jun 30, 2016, 9:06:44 PM6/30/16
to Isilon Technical User Group
Thanks Saker & Erik. I came onsite to view the hardware and confirmed that its actually a 1920i, not an 1920x.  I don't see any FRU NVRAM card/s, so I assuming that its internal.

To get a better idea what might be happening I attached a display to the node and confirmed that is hanging at the BSD bootloader stage "Loading /boot/defautls/loader.conf", so its looking more and more like an OS issue.  How and why it got to that stage I cant be sure.

I am hoping that this is not going to require a re-imaging of the OS.

Jeff Y.

unread,
Jun 30, 2016, 9:37:03 PM6/30/16
to Isilon Technical User Group

Here is a screenshot. Looks rather grim.

Andrew van Slageren

unread,
Jun 30, 2016, 9:41:28 PM6/30/16
to isilon-u...@googlegroups.com

You'll need to plug in a serial console, as once the kernel is started (i.e. post bootloader) nothing further is displayed on the VGA output.

Saker Klippsten

unread,
Jun 30, 2016, 9:56:47 PM6/30/16
to isilon-u...@googlegroups.com
Correct. Need console. I'm sure it's nvram. If you need some swing by tomorrow I'll givem to you. 

Sent from my iPhone

Jeff Y.

unread,
Jun 30, 2016, 10:17:38 PM6/30/16
to Isilon Technical User Group
Thank you Gents.

Unfortunately, I don't have a cross-over cable with me today, only a straight-through, so the console connection will have to wait. 

Saker, I'll defer to you on this one. I am tied up tomorrow but how does Monday or Tuesday of next week look for you?

Thank you!

Jeff

Jeff Y.

unread,
Jun 30, 2016, 10:19:57 PM6/30/16
to Isilon Technical User Group
Lastly, any special tools required to replace the card? I am guessing the card attaches to the internal RAID card?


On Thursday, June 30, 2016 at 6:56:47 PM UTC-7, Saker Klippsten wrote:

erik.j...@gmail.com

unread,
Jun 30, 2016, 10:53:06 PM6/30/16
to isilon-u...@googlegroups.com
The card is plugged into the PCI bus and has no extra integration outside of this if I remember correctly. 
The raid / sas card is just JBOD controller as OneFS doesn't use raid. 


--
Erik Weiman
Sent from my iPhone 6s

Saker Klippsten

unread,
Jun 30, 2016, 10:53:59 PM6/30/16
to isilon-u...@googlegroups.com
I might* be flying to NY, but I'll have them for you at our front desk just give me an hour heads up. 

-s

Saker Klippsten

unread,
Jun 30, 2016, 10:57:11 PM6/30/16
to isilon-u...@googlegroups.com
No special tools. Just a Phillips head , if you have a multi-head one that's good. You will have to take the raid card out first as it sits under it. Pretty straight forward. 
Also take the time to ensure fans are working and cleaned of any dust balls... So maybe a little vacuum.. 

Jeff Y.

unread,
Jul 1, 2016, 12:45:52 AM7/1/16
to Isilon Technical User Group
Except that these systems appear to have 3Ware (RAID) cards installed if I am not mistaken (assumption is that card is in pass-through mode), but I blitzed through the setup menu and didn't take the time to look.

Thanks

Jeff Y.

unread,
Jul 1, 2016, 12:47:46 AM7/1/16
to Isilon Technical User Group
Thanks Saker. I'll let you know.

Jeff

Saker Klippsten

unread,
Jul 1, 2016, 12:51:10 AM7/1/16
to isilon-u...@googlegroups.com
They are just used in HBA mode..

Sent from my iPhone

Jeff Y.

unread,
Jul 5, 2016, 7:07:07 PM7/5/16
to Isilon Technical User Group
Hi Saker-

I was thinking about coming by tomorrow. Will that work for you?

Thank you.

Jeff
Reply all
Reply to author
Forward
0 new messages