Isilon hardware refresh procedure

601 views
Skip to first unread message

John Beranek - PA

unread,
Mar 1, 2017, 7:49:53 AM3/1/17
to Isilon Technical User Group
Hi all,

We're currently in the process of performing an Isilon hardware refresh, swapping out NL400 nodes for NL410 nodes.

On advice from our Isilon TAM at the time, we started the migration based on having an NL400 node pool and another NL410 node pool, and using SmartPools file policies to move data between the node pools.

We've now run the policy on our DR cluster, which has left the NL400 node pool "all but" empty. Here "all but" equates to 495GiB of HDD and 65GiB of SSD (metadata acceleration).

What I'm not clear about is where the procedure goes from here...

Do we just start SmartFailing the NL400 nodes, one by one? Given we only have 3 NL400 nodes, once NL400 has been failed out, I assume there won't be a workable NL400 node pool any more.

Slightly concerned that the NL400s still show as using so much data, is this just "fluff", or is there real user data on there? Will SmartFailing the NL400 nodes end up migrating any more data?

Cheers,

John

P.S. Awaiting a response from our Isilon TAM on this question too...

Ozen

unread,
Mar 1, 2017, 1:08:52 PM3/1/17
to Isilon Technical User Group
Hi John,

You should not worry about current "remaining data" on nodes, they are not actual user data. At this point you can safely remove older nodes (NL400s) one by one. Since nodes are already "almost" empty, remove jobs (Flexprotect) will succeeded in a short.

Since newer NL410s are have higher LNNs (logical node numbers) you may also want to re-order them to start from 1, which could be done via "isi config" menu.

Josh Hampton

unread,
Mar 1, 2017, 1:45:09 PM3/1/17
to Isilon Technical User Group
Yup, literally your only step now is to smartfail the nodes then get rid of them.  Happily hardware refreshes are pretty straight forward.  The fun part comes down the road when you start finding interesting OneFS bugs that can't be explained aside from that the cluster was originally an old code level and has been upgraded over time...

John Beranek - PA

unread,
Mar 2, 2017, 8:31:18 AM3/2/17
to Isilon Technical User Group
Already discovered that with:

"isi auth ads" --store-sfu-mappings

after the upgrade to 8.0. Peculiar auth mapping issues with "--sfu-support rfc2307" and "--store-sfu-mappings no"

Thanks for the confirmation on the next steps.

John

John Beranek - PA

unread,
Mar 2, 2017, 8:32:23 AM3/2/17
to Isilon Technical User Group
Thanks, we should be good to go then! I'd already considered renaming the NL410 nodes after removing the NL400s. No gotcha's with that then?

Cheers,

John

Özen Zorba

unread,
Mar 2, 2017, 9:25:41 AM3/2/17
to isilon-u...@googlegroups.com
No, you can go ahead. Also this tiny article can help you on this. 

--
You received this message because you are subscribed to a topic in the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/isilon-user-group/yDU-hKdsOGI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isilon-user-group+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Steve Bogdanski

unread,
Mar 3, 2017, 10:40:48 AM3/3/17
to Isilon Technical User Group
What issues did you encounter?  I am getting ready to upgrade our production cluster (5 X410 & 4 NL410) from 7.2.1.4 to 8.0.0.4 next weekend, and would be interested in knowing about any "gotchas" that pooped up.

Thanks
Steve

Özen Zorba

unread,
Mar 3, 2017, 10:49:59 AM3/3/17
to isilon-u...@googlegroups.com
Hi Steve,

In my experience it was very smooth, which is very similar to yours, tech refresh of NL400s with NL410s and upgrade cluster to 8.0.0.4 from 7.2.1.2 level. We did not faced any issues at all.

Using SMB shares with Active Directory auth.




--
You received this message because you are subscribed to a topic in the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/isilon-user-group/yDU-hKdsOGI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isilon-user-gr...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Sent from Gmail Mobile

John Beranek - PA

unread,
Mar 4, 2017, 4:17:14 AM3/4/17
to Isilon Technical User Group
We were upgrading from 7.1.1.x code level, and have more complex use cases including mixed-protocol access. The latter is what tripped us up a bit.

Just after the upgrade we also had a pretty serious incident down to AD authentication services not working quite right, which required serious escalation with EMC Support.

John

Dan Pritts

unread,
Mar 5, 2017, 3:39:52 PM3/5/17
to isilon-u...@googlegroups.com
John Beranek - PA wrote:
> We were upgrading from 7.1.1.x code level, and have more complex use
> cases including mixed-protocol access. The latter is what tripped us
> up a bit.
>
> Just after the upgrade we also had a pretty serious incident down to
> AD authentication services not working quite right, which required
> serious escalation with EMC Support.
What sort of mixed-protocol issues did the upgrade entail?

Did it have to do with the move from kernel to user-space NFS that
occurred in 7.2? Or something else going on?

inquiring minds need to know :)

thanks
danno
--
Dan Pritts
ICPSR Computing & Network Services
University of Michigan

John Beranek - PA

unread,
Mar 6, 2017, 10:41:01 AM3/6/17
to Isilon Technical User Group

On Sunday, 5 March 2017 20:39:52 UTC, Daniel Pritts wrote:
John Beranek - PA wrote:
> We were upgrading from 7.1.1.x code level, and have more complex use
> cases including mixed-protocol access. The latter is what tripped us
> up a bit.
>
> Just after the upgrade we also had a pretty serious incident down to
> AD authentication services not working quite right, which required
> serious escalation with EMC Support.
What sort of mixed-protocol issues did the upgrade entail?

Did it have to do with the move from kernel to user-space NFS that
occurred in 7.2?  Or something else going on?

inquiring minds need to know :)

Well, it wasn't to do with the NFS changes, as our NFS clients were completely happy with the upgrade.

SMB clients were the ones impacted - they were still able to see the Isilon storage, but user/group mapping and/or ACLs weren't honoured correctly, so services/users couldn't see files they should have been able to see.

Ended up being sorted by clearing all the ID mapping databases, and restarting lsass processes. We also had to restart a number of Windows servers in order to get them to see files they should've been able to see...

Cheers,

John

John Beranek - PA

unread,
Mar 10, 2017, 4:09:11 AM3/10/17
to Isilon Technical User Group

On Wednesday, 1 March 2017 18:45:09 UTC, Josh Hampton wrote:
Yup, literally your only step now is to smartfail the nodes then get rid of them.  Happily hardware refreshes are pretty straight forward.  The fun part comes down the road when you start finding interesting OneFS bugs that can't be explained aside from that the cluster was originally an old code level and has been upgraded over time...

An update, the Smartfails are hardly instantaneous (4.5 hours per node), but are progressing.

John

Reply all
Reply to author
Forward
0 new messages