I agree that a long runtime for AutoBalance is normal depending on
cluster size, IO activity, and how out of balance it is.
Contrary to Peter's post, I have never seen *AutoBalance* cancelled by
disk stalls; I *have* seen MultiScan, which is simultaneous Collect and
AutoBalance, system cancelled by disk stalls (fairly frequently,
unfortunately). In my experience AutoBalance runs steadily toward
completion and steps around disk stalls and node reboots just fine.
Now, what constitutes "completion" to AutoBalance is not necessarily
processing all LINs. AutoBalance aims to get each node's usage within 1%
of the total usage on the cluster. AutoBalance does its heavy lifting in
phase 2; very little IO occurs in phase 1. I don't know for sure, but I
infer phase 1 lays out the plan for what files to restripe and phase 2
actually does it.
As for monitoring what's going on, the IO meters in the GUI ignore IO
from background jobs, so they won't tell you anything. When I want to
see what is actually going on, I use
isi statistics system --nodes --top
You may find that useful. Subtract the Net In (write) and Net Out (read)
from the total Disk In and Disk Out respectively, and you'll have the
internal IO generated by maintenance jobs like AutoBalance. I expect in
your case you will see primarily read activity on the X and primarily
write activity on the NL...if you're in phase 2. In phase 1, it's just a
trickle of read activity cluster-wide.
Lastly, I'd be careful about assuming the placement of snaps is
unimportant. Depending on your workflow, you could want your snaps on
your X pool; for our most IO-intensive application here, we most
definitely rely on fast snaps.