Some back story... realized during a very eventful rolling upgrade from OneFS 7.0.1.2 to 7.0.1.4 that the SmartConnect Service IP (SCSIP) is resident one node 1 almost always. 3 node IQ108NL nodes in cluster, rolling upgrade was required for us to add in 3 additional X400 192GB/144TB nodes.
In any event, we had a strange hot VM (lots of small, random IO) causing node 1 's NFSD PID to spin and spin causing node 1 to go into in a quasi-down state. Eventually, left un-checked it was crashing nodes left and right. Unfortunately, during this CPU peg DNS queries against the SCSIP would timeout effectively causing a down outage.
So, is there an isi command available that would allow me to manually move the SCSIP to another healthy node. Also, how to people deal with this potential single point of failure? I was unaware of this and it was somewhat unsettling. Sure, during a controlled rolling reboot or node SmartFail the SCSIP will get moved over but in these weird, not quite down scenarios then this becomes a big point of failure.
Thanks!
Jason