Aliases

Chris Pepper

unread,

Feb 11, 2015, 5:00:28 PM2/11/15

to isilon-u...@googlegroups.com

We have a few aliases and scripts I use frequently. Are there other commands you use often enough to be worth aliasing?

Chris

> solisi-1# tail .zshrc
>
> #alias glog="for LOG in ~pepper/log/hash-*.log ~pepper/log/archive-*.log; do echo $LOG; grep -v Starting $LOG|grep -v Completed| egrep -v '^$'; done"
> alias c=check
> alias iaca="isi alert cancel all && isi alert quiet all"
> alias ifa="isi_for_array -s"
> alias igi="time isi_gather_info --incremental"
> alias ihs="isi_hw_status -i"
> alias quotat="isi quota list | egrep 'T +$' | sort -n"
> alias iqover="isi quota list --exceeded"
> alias gid="isi quota quotas list | grep -i ID:"

I type 'c' several times a day on several clusters; check executes a few commands to review cluster status.
When "isi alerts" shows (only) resolved events I use 'iaca' to clear them. Despite what Isilon Support tells you, never quiet events before canceling them.
We run commands across the cluster often but isi_for_array is awkward. Note that if you are tailing logs on all nodes you cannot use '-s'.
Gathers take a long time so I typically run incremental gathers under 'time'.
'quotat' shows users/directories with (approximately) 1tb or more in use.
'gid' identifies UIDs/GIDs present on the filesystem but not known to the cluster (LDAP in our case). These are old users and from other clusters, and we add/reconcile them so our quota reports are readable.

Neproshennie

unread,

Feb 20, 2015, 12:01:21 PM2/20/15

to isilon-u...@googlegroups.com

Chris,

A few of my favorites are:

If you need to give auth services a swift kick in the shorts after making changes:

isi auth mapping flush --all; isi auth users flush; isi auth groups flush; isi auth refresh

Finding those pesky busy vnodes on /ifs (not a cluster-wide command but can be easily made so (and I have)):

sysctl efs.bam.busy_vnodes | egrep -v "find any" | egrep -o "[0-9a-f]:[0-9a-f]{4}:[0-9a-f]{4}" | sort | uniq

Top 5 processes by CPU usage:

isi_for_array -s 'ps auxwww | egrep -vi "user|idle|av-" | sort -k3 -n -r | head -n 5'

Top 5 processes by resident memory usage:

isi_for_array -s 'ps auxwww | egrep -vi "user|idle|av-" | sort -k5 -n -r | head -n 5'

Checking to make sure that both IB ports are running at the same speed:

isi_for_array -s 'sysctl net.ib.devices.0.ports.1.active_speed net.ib.devices.0.ports.2.active_speed | paste - -'

I've got a bazillion little jems that really aren't "little".

Peter Serocka

unread,

Feb 20, 2015, 12:55:49 PM2/20/15

to isilon-u...@googlegroups.com

On 2015 Feb 21 Sat, at 01:01, Neproshennie <jamie....@gmail.com> wrote:

>
> Finding those pesky busy vnodes on /ifs (not a cluster-wide command but can be easily made so (and I have)):
> sysctl efs.bam.busy_vnodes | egrep -v "find any" | egrep -o "[0-9a-f]:[0-9a-f]{4}:[0-9a-f]{4}" | sort | uniq
>

shouldn’t that end in: uniq -c | sort -n ? ;-)

isi statistics heat …

always has been one of my favorites, while not much
additional scripting required here.
Just discovered that

isi statistics heat -nlocal … (verbatim local)

does the obvious thing, restrict stats to the local node,
without the need to get the node number first.

Do you happen do deal with hangdumps sometimes?
I always wondered wether a hangdump indicates that
some clients had experienced noticable delays or something.

Thanks for sharing the cool stuff!

— Peter

Neproshennie

unread,

Feb 20, 2015, 1:37:59 PM2/20/15

to isilon-u...@googlegroups.com

The particular vnodes command there is just looking for unique LINs in which I can then pipe to "isi get" to get path and protection information, such as:

sysctl efs.bam.busy_vnodes | egrep -v "find any" | egrep -o "[0-9a-f]:[0-9a-f]{4}:[0-9a-f]{4}" | sort | uniq | while read lin; do isi get -L "${lin}"; done

With the heatmap, you can have some fun if you want to get to the nitty-gritty of certain types of files or paths. A fine example is showing those fun "UNKNOWN" LINs which aren't really unknown but used for special purposes such as filesystem events, snapshot tracking files, quotas, and etc.

isi statistics heat --nodes=all --orderby=ops --long | egrep "[a-f0-9]{1,2}:[a-f0-9]{4}:[a-f0-9]{4}.*?UNKNOWN"

Of course you can adjust the expression as you need, grab the data you want, pipe the LINs into another command, or just redirect the output somewhere fun. Same applies for a specific file extension (eg; 'egrep -i "^.*?\.vmdk"' to look at all I/O going to vmdk files).

You can also specify specific events using the heatmap like:

isi statistics heat --events=getattr

Oh I've got a number of isi statistics commands that I love to run and a number of which are in my custom written performance gathering script called "SkiesOfBlue" which I then collect days, weeks, or months worth of data and have "Parsely" read the data and store into sqlite databases that "ParselyGraph" or "ParselyWeb" generates graphs to visualize trends over time. That collects data cluster-wide and you can generate reports/trends down to individual drives on individual nodes, all drives in a node, all nodes, etc. It gets memory data, processes, down to various caches, mbuf usage, network stats, IB stats, RBM stats, protocol stats, and a lot more which is FreeBSD specific with the addition of the Isilon related goodness. It covers thousands of datapoints and stores it in plain text files that are human readable.

I'm familiar with hangdumps and I rather love them. I have a dozen, or two, one-liners that I wrote to analyze them as they contain a wealth of information about whats happening at a given moment. Hangdumps won't give you a heatmap or drive statistics per say, but you get a lot of information on the state of processes, file locks (initiators, coordinators, etc), RBM messages, IB stats, dexitcode ring, and much more that is helpful in indeifying lock contention or delays in file locking. Basically a hangdump will be automatically triggered if an I/O request exceeds a timeout or one can be triggered manually by sending the hangup (HUP) signal to isi_hangdump_d across the cluster. It's a bit of an eyesore to go through manually if you're not familiar with whats in them, but awk and regular expressions are absolutely gold in this arena.

--Jamie Ivanov

Jason Davis

unread,

Feb 21, 2015, 10:14:27 AM2/21/15

to isilon-u...@googlegroups.com

Interesting...

On the performance side we do quite a bit with output from the various isi statistics calls and other things to gain further insight into what "evil" our workloads do to the underlying storage clusters. We then pipe that into Graphite (and other things that bolt on on top of it) I'd be curious to see what you are doing.

In regards to hangdumps, we see these semi-often as in our environment (HPC) we really push the distributed file system HARD. In my conversations with support and with engineering It was stated that without access to a lot of the "crown jewels" of OneFS then self analysis is a difficult proposition as you need these to debug proper. If you are also a customer I'd be curious to see what you are doing :)

-Jason

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Neproshennie

unread,

Feb 23, 2015, 1:06:32 PM2/23/15

to isilon-u...@googlegroups.com

The "self analysis" project is going to behave kind-of like an antivirus scanner -- it will have predefined templates of what to look for and will monitor logs for whatever criteria is setup and will eventually have the ability to automatically take action but should initially offer suggestions. I'm not sure if that would also monitor "live" data from each node but logs will be a for-sure thing when it finally becomes available.

I'm not sure what you mean by "crown jewels" but self analysis is not referring to application debugging. For that, core files will never be a substitute for any kind of analysis nor would a hangdump. I'm speaking from the perspective of the position I once held supporting Isilon products and from a background of data mining and reverse engineering. Simply based on one of my log analysis tools I wrote for analyzing cluster logs at Isilon, I can calculate almost 85% of a cluster's configuration and performance problems with a nearly perfect hit rate. Based on my own work, I can only assume that the functionality that would be used can be quite similar which is one thing I'm excited about (even as a former employee).

I'm not a proper customer but data I've collected has been from dozens of customers clusters which was then used to generate respective graphs and data points to identify anomalies in cluster activity or suppuration/abusive/naughty client activity.

On a side node, if automatic hangdump generation is problematic then they can be disabled. Hangdumps can be easily triggered manually.

--Jamie Ivanov

Peter Serocka

unread,

Feb 25, 2015, 1:44:24 AM2/25/15

to isilon-u...@googlegroups.com

On 2015 Feb 21. md, at 23:14 st, Jason Davis wrote:

In regards to hangdumps, we see these semi-often as in our environment (HPC) we really push the distributed file system HARD. In my conversations with support and with engineering It was stated that without access to a lot of the "crown jewels" of OneFS then self analysis is a difficult proposition as you need these to debug proper. If you are also a customer I'd be curious to see what you are doing :)
-Jason

Thanks Jason

the point is, does the occurence of a hangdump supposed to tell

me that some actual client has experienced a hiccup or outage

of some kind? In this case I would like to take some

action (with support) to prevent this.

Otherwise, if hangdumps indication some internal, but uncritical,

glitch I'd be more relaxed with not taking specific means.

Up to now I have been seeing hangdumps every

few days to weeks, to do feel uncomfortable with this.

-- Peter

Peter Serocka

CAS-MPG Partner Institute for Computational Biology (PICB)

Shanghai Institutes for Biological Sciences (SIBS)

Chinese Academy of Sciences (CAS)

320 Yue Yang Rd, Shanghai 200031, China

pser...@picb.ac.cn

Neproshennie

unread,

Feb 25, 2015, 9:02:17 AM2/25/15

to isilon-u...@googlegroups.com

Hangdumps by themselves are not indicative of a problem. If they are being generated that means that a particular threshold has been exceeded while a process (whether pertaining to a client or a core cluster process) was trying to acquire a lock on a LIN. If they are in high frequency then that could indicate something is not performing properly and causing I/O delays or the cluster (or individual node) is being overrun. If they are few and far between then there could be a momentary increase in duress but not long-lived; it would really depend on what was happening at the time of the hangdump generation and to trend backwards in time to see similar traits developing. Hangdumps are just a diagnostic measure :)

--Jamie Ivanov

On Wednesday, February 11, 2015 at 4:00:28 PM UTC-6, Chris Pepper wrote:

Peter Serocka

unread,

Feb 26, 2015, 5:01:01 AM2/26/15

to isilon-u...@googlegroups.com

Thanks, what is that treshold? - a second, a minute?

And if it takes more than one day to finish a dump (apparently):

2015-01-16T21:45:12+08:00 <1.5> isilon-7(id14) isi_hangdump: Triggering clusterwide hangdump.

2015-01-18T15:02:27+08:00 <1.5> isilon-7(id14) isi_hangdump: LOCK TIMEOUT AT 1421564547 UTC

2015-01-18T15:02:27+08:00 <1.5> isilon-7(id14) isi_hangdump: Hangdump timeout after 0 seconds: Received HUP

2015-01-18T15:02:51+08:00 <1.5> isilon-7(id14) isi_hangdump: END OF DUMP AT 1421564547 UTC

then shouldn't I wonder wether some client

has been waiting all that very time span

for some lock, i.e. for certain file access to happen?

-- Peter

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jamie Ivanov

unread,

Feb 26, 2015, 10:06:16 AM2/26/15

to isilon-u...@googlegroups.com

Looking at isi_hangdump (specifically OneFS 7.1.x):

212 # The thread that dumps debug state on several conditions:
213 #   1) the ping thread stops responding indicating a vnode deadlock
214 #   2) a signal is received indicating the user wants the state dumped
215 #   3) a signal is received indicating that a dump thread on another node
216 #      saw event 1 or 2 and sent a signal over isi_for_array.

One of the threads watches /ifs to detect if there may be a vnode deadlock but also watches a few sysctl OIDs that list pending locks and how long they've been waiting. Default timeout is configured as 5 minutes but isi_hangdump will start to consider it a problem if a lock has been waiting for approximately 30% of the timeout. Once support has hangdump data, they can graph it and some can manually investigate the hangdumps to see what has been going on and why.

A lock timeout does not necessarily mean a client is waiting for a lock. When a client issues a flock(), that request is sent over the appropriate service to the cluster where the corresponding daemon then requests a lock. Those aren't the only processes that are requesting locks; the job engine, the kernel, avscan, synciq, and etc can all hold LIN locks. Most of the time I've seen frequent locking issues is because either clients are misbehaving and assaulting the nodes in the cluster or because there is heavy metadata activity (such as SnapshotDelete or SyncIQ jobs). There are other reasons that could happen as well but those are two of the common ones I've run into. Without knowing what OneFS release you are running, cluster configuration, analyzing the hangdumps, or doing an investigation, it's impossible for me to say exactly why they are happening. Hangdumps should not be problematic enough to disable unless they are filling up /var/crash (on individual nodes) faster than they can be cleared.

They can vary in cause and the frequency of them would be the biggest determining factor to whether there is a cause for alarm or not. You can use this handy one-liner to see the frequency of them per day:

egrep -i "lock.*?timeout|triggering.*?hangdump" /var/log/messages | egrep -o "^[0-9]{4}-[0-9]{2}-[0-9]{2}" | sort | uniq -c

or

isi_for_array -s 'egrep -i "lock.*?timeout|triggering.*?hangdump" /var/log/messages | egrep -o "^[0-9]{4}-[0-9]{2}-[0-9]{2}"' | sort | uniq -c

A few tips to preventing lock contention would be:
* Schedule jobs appropriately and try to reduce overlapping as much as possible.
* Job impact policies can be set on a schedule basis to raise/lower impact policies during set times.
* Don't have a ridiculous number of snapshots (SyncIQ creates snapshots of the path you are syncing).
* Don't have a ridiculous number of nested snapshots and do *NOT* snapshot /ifs -- that will kill performance on /ifs/.ifsvar which is the nerve center of the cluster.
* Be mindful of the data protection levels and lower the protection on temporary/lower priority files, if possible, to reduce the overhead on the cluster (which can be set on a per-file or per-directory basis).
* Be mindful of your data and how clients access it so you can have the appropriate concurrent, random, or streaming I/O pattern for your files (which can also be set on a per-file or per-directory basis). Also notice how much metadata activity you have and whether you are equipped with metadata acceleration.
* With the above, also keep in mind the more data on each spindle will increase the seek time ergo decreased I/O performance for locking mechanisms. Make sure drives are not being over-saturated (based on spindle type).

Jamie Ivanov
Mobile: 608.399.4252
http://www.linkedin.com/in/jamieivanov
-- -- -- -- -- -- -- -- -- -- -- --
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

Jason Davis

unread,

Feb 26, 2015, 8:01:23 PM2/26/15

to isilon-u...@googlegroups.com

If I'm correct, I want to say it's indicative of deadlocks?

At least that's my experience.

Our "unoptimized" HPC workloads tend to do terrible filesystem IOs and we've had to engage with support and our DSE to have these post identified to determine where the bad is occurring using hangdumps and occasionally coredumps.

"Terrible" - Being heavy write metadata with tens or hundreds of millions of files and directories created during the life of a large project. 90+% of files being under 8k in size, being hit by tens of thousands of CPU cores.

We throw a LOT of S200s and SSDs at the problem but it doesn't fix all :)

Jamie Ivanov

unread,

Feb 26, 2015, 9:59:08 PM2/26/15

to isilon-u...@googlegroups.com

Delays in locking, not necessarily deadlocks. I’ve seen a few deadlocks in my day and they’re certainly not pretty but most hangdumps are triggered by locking latency.

I’ve worked on a large number of HPC’s but I can’t say I’ve ever had the pleasure to have worked with yours. The biggest recommendation I can make was one of the mitigation steps in a previous message: be mindful of the protection level (and I/O pattern) of temporary and short-lived files. If they’re small files then you could have more benefit with a random I/O pattern for that specific path (which can be set manually or with SmartPools), medium and large short-lived/temp files may benefit from a streaming I/O pattern. The protection level may be better at a +2 or 2x instead of a default of +2:1; this will save you CPU cycles and interrupts that can be used for the storage controller, nic, and etc.

I also hope that the network stack was tuned (on the cluster) and better yet, the disk prefetching. I’ve seen countless clusters prefetching too much data which wastes IOPS for legitimate I/O requests. By default OneFS prefetches 9 blocks of data and to see how effective that is, you can run this on each node in the cluster:

isi_cache_stats -v

Look at the “l2_data” section and look at the “prefetch.hit” percentage. You can imagine that the percentage you see there is how much of those 9 blocks are used. Look at that I would not want to get too precise as to go lower than the hit rate (in prefetched blocks) which could impede performance. If a hit rate is at 50 to 70% then I would reduce the prefetched blocks to 6 but I wouldn’t feel comfortable going lower than 4.

The sysctl OID for viewing and setting would be “efs.bam.l2_prefetch_clusters”.

COMMON SENSE DISCLAIMER: This information should be considered informational only. If this information should be used then make sure you understand that I am not liable if you break something, start something on fire, make people very angry, or losing your job for breaking something really expensive. This setting won’t bring a cluster to a grinding halt if you notice problems but I am no longer an EMC employee therefor I cannot be counted as an official support representative cannot be a substitute for contacting EMC’s support.

Jamie Ivanov
Mobile: 608.399.4252
http://www.linkedin.com/in/jamieivanov
-- -- -- -- -- -- -- -- -- -- -- --
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

Peter Serocka

unread,

Feb 28, 2015, 3:51:19 AM2/28/15

to isilon-u...@googlegroups.com

On 2015 Feb 27 Fri, at 10:59, Jamie Ivanov <jamie....@gmail.com> wrote:
>
> isi_cache_stats -v
>
> Look at the “l2_data” section and look at the “prefetch.hit” percentage. You can imagine that the percentage you see there is how much of those 9 blocks are used.

Jamie, your are again giving me a big suprise ;-)

My understanding is that the hit percentage
is the fraction of file system read blocks that
could be satisfied by prefetch disk reads.
Thats a different thing!

In fact I also wondered how the “wasted”
prefetches can be measured…

— Peter

Peter Serocka

unread,

Feb 28, 2015, 3:58:04 AM2/28/15

to isilon-u...@googlegroups.com

Thanks Jamie

An important thing is not to mix up file locks
(flock etc, initial from client apps)
and OneFS LIN locks, which are an internal
mechanism to enable cluster-wide data integrity.

— Peter

Jamie Ivanov

unread,

Feb 28, 2015, 10:12:24 AM2/28/15

to isilon-u...@googlegroups.com

LIN locks are not an internal mechanism for data integrity per-say… LIN locks are file locks on the logical indoors across the distributed filesystem which is pretty much the same as a flock call to an inode. The file lock request comes in from the client (which can be flock()) and the corresponding LIN gets locked per request of the lock initiator. Each process, whether for NFS, SMB, job engine, etc, can obtain a LIN lock, whether advisory or exclusive. NFS locks are protected at a default level of +3 so you have the initiator (the node that initially served the lock request) and two failover; in the event where the initiator isn’t available, a new coordinator is elected out of the remaining nodes and the lock protection is maintained. After the initiator comes back online, then the coordinator will pass the lock request off to that node instead of handling it itself -- a little wonky and convoluted sometimes which results in stale locks that need to be pulled out manually (NFSv3 locks do not timeout, per RFC).

I just want to make sure we’re on the same page when it comes to locking and that LIN locks are not something more special than a flock() request from a client.

Jamie Ivanov
Mobile: 608.399.4252
http://www.linkedin.com/in/jamieivanov
-- -- -- -- -- -- -- -- -- -- -- --
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

Jamie Ivanov

unread,

Feb 28, 2015, 10:15:41 AM2/28/15

to isilon-u...@googlegroups.com

I was highly sought out for analyzing and resolving cluster related performance concerns, this is just one of the many tools in my doctors kit. :)

The really nice thing is that you can still access data from the FreeBSD kernel on top of what can be provided by Isilon commands (one of the many reasons I’m an avid FreeBSD fan). In fact, a good portion of that data is where the Isilon commands pull their data from.

Jamie Ivanov
Mobile: 608.399.4252
http://www.linkedin.com/in/jamieivanov
-- -- -- -- -- -- -- -- -- -- -- --
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

Reply all

Reply to author

Forward