Is there an isi command to move the SmartConnect Service IP to another cluster node?

3,267 views
Skip to first unread message

SCR512

unread,
Apr 11, 2013, 10:30:34 PM4/11/13
to isilon-u...@googlegroups.com
Some back story... realized during a very eventful rolling upgrade from OneFS 7.0.1.2 to 7.0.1.4 that the SmartConnect Service IP (SCSIP) is resident one node 1 almost always. 3 node IQ108NL nodes in cluster, rolling upgrade was required for us to add in 3 additional X400 192GB/144TB nodes.

In any event, we had a strange hot VM (lots of small, random IO) causing node 1 's NFSD PID to spin and spin causing node 1 to go into in a quasi-down state. Eventually, left un-checked it was crashing nodes left and right. Unfortunately, during this CPU peg DNS queries against the SCSIP would timeout effectively causing a down outage.

So, is there an isi command available that would allow me to manually move the SCSIP to another healthy node. Also, how to people deal with this potential single point of failure? I was unaware of this and it was somewhat unsettling. Sure, during a controlled rolling reboot or node SmartFail the SCSIP will get moved over but in these weird, not quite down scenarios then this becomes a big point of failure.

Thanks!
Jason

Erik Weiman

unread,
Apr 11, 2013, 10:40:45 PM4/11/13
to isilon-u...@googlegroups.com
The service ip always lives on the lowest logical node number that cannot be reused. If you were to smart fail out node 1 and re add it back as node 1 again the scip would then live on what would typically be the next lnn usually node 2. 

--
Erik Weiman 
Sent from my iPhone 4
--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Jason Davis

unread,
Apr 11, 2013, 10:50:09 PM4/11/13
to isilon-u...@googlegroups.com
Ok, that makes sense. 

Unfortunately, a SmartFail is a controlled scenario... so if I can't just manually move this IP between nodes through something sane like isi networks modify pool <blah> then is there something else I can do to mitigate this as a SPoF? 

Looking at the other SmartConnect Zones in place right now they are all hanging out on node 1... Hurray all in one basket!

Locking nodes for sure can cause a bad day :/

Peter Serocka

unread,
Apr 11, 2013, 11:16:58 PM4/11/13
to Jason Davis, isilon-u...@googlegroups.com
On 2013 Apr 12. md, at 10:50 st, Jason Davis wrote:

Ok, that makes sense. 

Unfortunately, a SmartFail is a controlled scenario... so if I can't just manually move this IP between nodes through something sane like isi networks modify pool <blah> then is there something else I can do to mitigate this as a SPoF? 

Just make sure that "suspicious" clients (VMs...) 
will never connect to node 1...? 

Remove node 1's interfaces from list of interfaces for the address pool.
Make a second address pool for 'safe' clients, 
in which node 1's interfaces do appear.
(Hm, node 1 happens to be the backup accelerator in our cluster, 
has never any NFS/CIFS client contact... :)

Peter
Peter Serocka
CAS-MPG Partner Institute for Computational Biology (PICB)
Shanghai Institutes for Biological Sciences (SIBS)
Chinese Academy of Sciences (CAS)
320 Yue Yang Rd, Shanghai 200031, China





Jason Davis

unread,
Apr 11, 2013, 11:23:56 PM4/11/13
to isilon-u...@googlegroups.com

Ah very true... Not the most elegant solution but its clean :)

Peter Serocka

unread,
Apr 11, 2013, 11:47:22 PM4/11/13
to Jason Davis, isilon-u...@googlegroups.com
Want more elegant? Get a used performance accelerator from Ebay
for <1000$ and make it the SCSIP. Or get two for failover...

Peter

Jerry Uanino

unread,
Apr 12, 2013, 8:11:20 AM4/12/13
to isilon-u...@googlegroups.com
Maybe this nugget will help?


# isi networks support sc_put_ip
isi networks support sc_put_ip: Required argument SC_IP not specified

'isi networks support sc_put_ip' options are:
  SC_IP(--sc-ip)        IP address to move/put (must be in a dynamic IP Pool)
  SC_IFACE(--sc-iface)  Destination node:interface for IP (must be a member of
                        the same dynamic IP Pool as the IP being moved)
  --help, -h            Print usage help and exit



--

Jason Davis

unread,
Apr 12, 2013, 8:13:50 AM4/12/13
to isilon-u...@googlegroups.com

That there nugget looks like gold :)

I'll give this a shot on our DR cluster.

Thanks for the tip!

Peter Serocka

unread,
Apr 12, 2013, 9:19:13 AM4/12/13
to isilon-u...@googlegroups.com
On 2013 Apr 12. md, at 20:13 st, Jason Davis wrote:

That there nugget looks like gold :)


Same for me... but for a different reason,
as the function is to move single dynamic IPs
around to other physical nodes/interfaces supporting
the same pool, so you can implement your own re-balancing ;-)
Just tried it out, nice.

I don't dare to apply it to the SCSIP address 
(I wouldn't do THAT on our production cluster...)

Does it work for you?

Peter
 

Jason Davis

unread,
Apr 12, 2013, 10:11:14 AM4/12/13
to isilon-u...@googlegroups.com
About to try ;)

Jason Davis

unread,
Apr 12, 2013, 10:23:04 AM4/12/13
to isilon-u...@googlegroups.com
Hmm, maybe I am doing it wrong.

Looking at the command and it's usage, this does what you mentioned before Peter, this allows you to move IPs around in a dynamic pool. 

This _doesn't_ appear to actually move the SmartConnect Service IP around however.

# isi_for_array ifconfig | grep "192.168.0.12"
isilon-1:    inet 192.168.0.12 netmask 0xffffff00 broadcast 192.168.0.255 zone 1

# isi networks support sc_put_ip --sc-ip 192.168.0.12 --sc-iface 2:10gige-agg-1
Putting 192.168.0.12 on 2,10gige-agg-1                                           FAIL 

!! sc_put_ip command failed: IP address '192.168.0.12' not found in
!! any Pool: No such file or directory

Matt Dey

unread,
Apr 12, 2013, 11:01:44 AM4/12/13
to isilon-u...@googlegroups.com
I am fairly certain that you can't move the Service IP.   It want's to live on the node with the lowest device ID. 

We found the cluster to be to slow for our needs as to when it decided to move the IP off during a node reboot.  As a result we had clients that would have issues.  I opened a case a while back with support as I wanted to move the IP off pro-actively before a node reboot to prevent issues with our clients.  Originally support had me try the sc_put_ip command but I had the same issue it wasn't part of a pool.  They may have since fixed the issue with it being to slow to move the IP but I have found the best way to move the IP is to down the interface of the node just before issuing shutdown.  That forces all the IP's to move much faster.

That is the only way I know of to move the service IP but it will move back after the node comes back up.

Peter Serocka

unread,
Apr 12, 2013, 12:33:25 PM4/12/13
to isilon-u...@googlegroups.com
Never mind, good luck next time

Peter

Peter Serocka

unread,
Apr 12, 2013, 12:43:36 PM4/12/13
to isilon-u...@googlegroups.com
Yes it's a shame OneFS relies on IP fail-over
for planned reboot rather than moving IPs in advance.

But static addresses (bound to one node
on purpose!) should not be moved around anyway.

Just use dynamics pools wherever possible.

Peter
Reply all
Reply to author
Forward
0 new messages