A200 cluster rebooting

272 views
Skip to first unread message

John Murray

unread,
Oct 13, 2022, 11:07:54 AM10/13/22
to isilon-u...@googlegroups.com
Hey All,
We have an A200 cluster that we moved to another location a while back.  We are trying to get it up and running at the moment, but it looks like it is doing a rolling reboot constantly. 
Our CLI knowledge isn't huge and we are looking to get some assistance in that regard.  Getting blinking amber lights at the back of all 4 nodes.  Reading that may be power supply related. 
Any help would be appreciated.
Thanks

Maciej Fabisiak

unread,
Oct 13, 2022, 11:17:04 AM10/13/22
to isilon-u...@googlegroups.com
Hi,
You need to connect RS cable to this node and see the boot sequence.
Then we can talk about this.

Maciek

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/isilon-user-group/CAONW7BYyWJGx_PcFAO3GJi45h8FWaPMd6qnF%2BsgktW-_koh5wA%40mail.gmail.com.

John Murray

unread,
Oct 13, 2022, 11:20:27 AM10/13/22
to isilon-u...@googlegroups.com
Hi Maciej, 
Thanks for reaching out.  I have this.  I can get to a # prompt. not sure where to go from there.
John 

Maciej Fabisiak

unread,
Oct 13, 2022, 11:23:48 AM10/13/22
to isilon-u...@googlegroups.com
I need entire boot sequence from the begining to the #

So connect rs to this node and power it on.



Anurag Chandra

unread,
Oct 13, 2022, 10:52:10 PM10/13/22
to isilon-u...@googlegroups.com
The power on sequence will be helpful

What you see as # is the node in single user mode .

/var/log/messages will give you some info as well. 

Thanks
Anurag 

John Murray

unread,
Oct 14, 2022, 6:50:24 AM10/14/22
to isilon-u...@googlegroups.com
putty.log

John Beranek - PA

unread,
Oct 14, 2022, 9:57:40 AM10/14/22
to Isilon Technical User Group
That does not look like a very happy node. The system disk is coming up with file system errors, and then the node finally fails with:

mount_efs: Reporting missing logical drive 15 with guid 5bd99dc000018e2d 19749e144a0cccd5 as DOWN
mount_efs: driveConfIdentifySSD: Error mapping lnum 15 to bay
mount_efs: driveConfBayDetermineWCE: Error mapping lnum 15 to bay
The journal for this node is not valid. Please contact Isilon support for resolution of this problem.
panic @ time 1665580614.610, thread 0xfffff8017ef4f780: Assertion Failure
cpuid = 0
Panic occurred in module kernel loaded at 0xffffffff80200000:

John

John Murray

unread,
Oct 14, 2022, 10:11:30 AM10/14/22
to isilon-u...@googlegroups.com
Thanks John, 
Do I have any options at this stage.  There is no data on it, so a isi_format_nodes is fine if it brings us back to a day 1 situation, if that works. 
John 

Saker Klippsten

unread,
Oct 14, 2022, 10:17:39 AM10/14/22
to isilon-u...@googlegroups.com
What steps were taken to move this node?
 Was it properly shutdown? 
Did you remove drives from the chassis ?
If yes , are you sure you put the drives back into chassis proper order?
Are you sure you put the nodes back into the chassis in the correct ordered pairs? / order?


Checking Battery Health... Batteries good Checking Isilon Journal integrity... DRAM and disk backup are invalid for local journal. Zero out journal superblock. The Peer journal state is internally consistent, but the Peer journal state does not match the node and cluster state. Journal: cluster guid f8f21e2f483c4689d95b1723b32a77a14e89, device id 2. Disk: cluster guid


mount_efs: Reporting missing logical drive 15 with guid 5bd99dc000018e2d 19749e144a0cccd5 as DOWN
mount_efs: driveConfIdentifySSD: Error mapping lnum 15 to bay
mount_efs: driveConfBayDetermineWCE: Error mapping lnum 15 to bay
The journal for this node is not valid. Please contact Isilon support for resolution of this problem.
panic @ time 1665580614.610, thread 0xfffff8017ef4f780: Assertion Failure
cpuid = 0
Panic occurred in module kernel loaded at 0xffffffff80200000:



-S

On Oct 14, 2022, at 6:57 AM, John Beranek - PA <john.b...@pamediagroup.com> wrote:

That does not look like a very happy node. The system disk is coming up with file system errors, and then the node finally fails with:

Paul Carrington

unread,
Oct 14, 2022, 1:20:13 PM10/14/22
to isilon-u...@googlegroups.com
Also how was the node shutdown? I've seen something similar when a colleague simply used a shutdown -h now instead of the correct isi config and shutdown from within. I know you can use shutdown if you force a flush cache command beforehand. Only other time ive seen this was generated nodes where the node had a double battery failure.

John Murray

unread,
Oct 17, 2022, 5:29:53 AM10/17/22
to isilon-u...@googlegroups.com
Shutdown was done from the web console.  All drives and nodes were pulled, but marked (numbered) and they all look to be back in the correct locations.  

Reply all
Reply to author
Forward
0 new messages