AutoBalance running over 1 week

1,051 views
Skip to first unread message

scott

unread,
Apr 25, 2013, 8:02:25 PM4/25/13
to isilon-u...@googlegroups.com
I noticed a problem with my '6.5.5.11' cluster last week.  Some antivirus scans we not running and FSAnalyze data was not making it to InsightIQ.  Close examination shows AutoBalance running 7+ days while other jobs (SmartPools, FSAnalyze, QuotaScan, MediaScan) stacking up in waiting status
 
isi jobs status -v provides some detail on the progress:
 
Progress: Processed 112719975 lins; 0 zombies and 0 errors
 
I've noticed the '112719975 lins' number incrementing through the day.  Is there a way to see how far along this job is? 
 
Background:  I have a 20TB xSeries pool at 99+% utilization set to spill to a 292TB NL Series pool at 91% utilization.  I believe this problem (slow autoBalance) is related to my cluster getting full.  I have a new node which should be delivered tomorrow and installed next week,   I've been working with EMC support but would appreciate any additional advise.
 
Thanks
 
-Scott

Erik Weiman

unread,
Apr 25, 2013, 9:30:46 PM4/25/13
to isilon-u...@googlegroups.com
Full pools will really have a negative impact on performance of the cluster in general. 
The new node that you are adding is it for the >99% pool or the NL pool?

LINs are directly relative to the number of Files and Folders in your filesystem.
There is a hidden 'lincount' job that you could run that would tell you the number of LINs in the system but even running that and comparing the number of LINs won't really give you a 'percentage of completion'.

Sounds to me like you should be running SmartPools instead of AutoBalance as AB is great for balancing data inside each pool but doesn't do between pools as efficiently as SmartPools does. 
On a multipool cluster I would personally run SmartPools and Collect rather than AutoBalance (unless for some reason you have some nodes with offset usage).
You might consider tuning your FilePool policies to reduce the amount of data on the X series pool and then run a SmartPools job to rebalance between pools and hopefully get a better balance of data.

Are you using Snapshots?
If you are consider removing some and / or changing where they are stored, Snaps aren't usually accessed very often so you may find that it makes sense to have the snaps all stored on the NL pool.


--
All of my comments are my personal opinion and not the official word of EMC / Isilon Storage Division.


 
-Scott

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Peter Serocka

unread,
Apr 26, 2013, 5:10:13 AM4/26/13
to isilon-u...@googlegroups.com
Scott,

AutoBalance running for more than a week on high-density ML nodes
seems to be normal, even if only 60% full. Is there really any
Autobalance disk(!) activity still on the **X** pool?

With the new node to be added soon, just cancel AutoBalance now and
have the other jobs run.
I agree with Erik, let SmartPools do the job then (it will also
balance disk usage to the new node).

You seem to be lucky anyway, in the sense that AutoBalance has not yet
been silently "System Cancelled" after sporadic disk stalls (not so
sporadic on our NLs).

Peter

Cory Snavely

unread,
Apr 26, 2013, 9:57:03 AM4/26/13
to isilon-u...@googlegroups.com
I agree that a long runtime for AutoBalance is normal depending on
cluster size, IO activity, and how out of balance it is.

Contrary to Peter's post, I have never seen *AutoBalance* cancelled by
disk stalls; I *have* seen MultiScan, which is simultaneous Collect and
AutoBalance, system cancelled by disk stalls (fairly frequently,
unfortunately). In my experience AutoBalance runs steadily toward
completion and steps around disk stalls and node reboots just fine.

Now, what constitutes "completion" to AutoBalance is not necessarily
processing all LINs. AutoBalance aims to get each node's usage within 1%
of the total usage on the cluster. AutoBalance does its heavy lifting in
phase 2; very little IO occurs in phase 1. I don't know for sure, but I
infer phase 1 lays out the plan for what files to restripe and phase 2
actually does it.

As for monitoring what's going on, the IO meters in the GUI ignore IO
from background jobs, so they won't tell you anything. When I want to
see what is actually going on, I use

isi statistics system --nodes --top

You may find that useful. Subtract the Net In (write) and Net Out (read)
from the total Disk In and Disk Out respectively, and you'll have the
internal IO generated by maintenance jobs like AutoBalance. I expect in
your case you will see primarily read activity on the X and primarily
write activity on the NL...if you're in phase 2. In phase 1, it's just a
trickle of read activity cluster-wide.

Lastly, I'd be careful about assuming the placement of snaps is
unimportant. Depending on your workflow, you could want your snaps on
your X pool; for our most IO-intensive application here, we most
definitely rely on fast snaps.

Chris Pepper

unread,
Apr 26, 2013, 10:27:19 AM4/26/13
to isilon-u...@googlegroups.com
Peter & Cory,

Have you tuned your disk thresholds ("sysctl hw.disk_event.thresh")? Support gave us overrides for several of these to reduce the frequency of disk stall events and let vulnerable jobs complete more frequently.

Chris

Cory Snavely

unread,
Apr 26, 2013, 10:38:21 AM4/26/13
to isilon-u...@googlegroups.com
Thanks for mentioning that. No, the recommendation we got was to
decrease the number of workers per node, which actually made sense
because at the time, the cluster was more dominated by IQ/EX pairs (and
so twice the number of disks per node). However those nodes are getting
retired off gradually and some other approach may be required...

So...what does that setting effectively do?

On 04/26/2013 10:27 AM, Chris Pepper wrote:
> Peter& Cory,

Peter Serocka

unread,
Apr 26, 2013, 11:36:50 AM4/26/13
to isilon-u...@googlegroups.com
In fact I have to correct my self, it is MultiScan (= Collect +
AutoBalance) that gets system-cancelled. Just recently I wondered
wether the Collect or the AutoBalance part is so sensitive to disk
stalls, so I started a single Collect. Which got system-cancelled
twice and now still running (arrived at 30% of LINs). So the
experiment with a pure AutoBalance has still to be done. Good to head
that one could be more optimistic about AutoBalance, thanks a lot!

Tracking the internal job I/O is useful indeed, and for Scott's
AutobBalance there should be no intra-pool I/O on the finished pool
and noticable intro-pool I/O on the pool still being balanced. The
interesting point is, wether the small, fast but full X pool would lag
behind the NL pool or not.

Peter

Peter Serocka

unread,
Apr 26, 2013, 11:41:16 AM4/26/13
to isilon-u...@googlegroups.com

On Fri 26 Apr '13 md, at 16:38 st, Cory Snavely wrote:

> Thanks for mentioning that. No, the recommendation we got was to
> decrease the number of workers per node, which actually made sense
> because at the time, the cluster was more dominated by IQ/EX pairs
> (and so twice the number of disks per node). However those nodes are
> getting retired off gradually and some other approach may be
> required...
>
> So...what does that setting effectively do?

We had already been advised to set

hw.disk_event.thresh.slowacc_usec = 3500000

to make the system more tolerant as to what is considered a disk stall.
It is basically a timeout threshold for disk transactions,
in microseconds. 3500000 usec = 3500 ms = 3.5 s !

Peter

Cory Snavely

unread,
Apr 26, 2013, 11:53:38 AM4/26/13
to isilon-u...@googlegroups.com, Peter Serocka
Right, yeah - we don't run pools here, so I'm not sure how to explain
how those pools are so imbalanced, and I can't really say much about
whether running AutoBalance or a pools-related job is the best and
fastest way to clear space on the X pool...

However, I will say it has been the case for us that, because the
stall-sensitive Collect component of MultiScan tends to get the
MultiScan job cancelled by the system, our clusters (which consist of
different node densities) tend to get out of balance because the
AutoBalance component of MultiScan never gets to finish its job!

So all these things are definitely related. It's complex, but I'll take
it over "LUN carving" and manually migrating data any day...!

Cory Snavely

unread,
Apr 26, 2013, 11:54:19 AM4/26/13
to isilon-u...@googlegroups.com, Peter Serocka
Oh, very interesting; I'll keep this in mind. Thanks!

Chris Pepper

unread,
Apr 26, 2013, 4:24:16 PM4/26/13
to isilon-u...@googlegroups.com
Cory,

There are several, but the most important one changed OneFS' threshold for a disk stall. Apparently some newer drives *normally* take longer to respond than older drives, putting them too close to the threshold even in normal operation. Loosening the tolerance means OneFS reports less disk stalls and less operations are disrupted.

I don't know what all the knobs do, though, so you should definitely talk it through with Support before fiddling with /etc/mcp/override/sysctl.conf.

Chris

Cory Snavely

unread,
Apr 26, 2013, 4:32:47 PM4/26/13
to isilon-u...@googlegroups.com, Chris Pepper
Yep, understood...five minutes is probably too long. :)

Cool. Thanks for the tip!
Reply all
Reply to author
Forward
0 new messages