High Inode counts on disk causing extreme performance problems

897 views
Skip to first unread message

Jim Long

unread,
Nov 22, 2013, 10:09:19 AM11/22/13
to isilon-u...@googlegroups.com
We have had a case open with Isilon since August 6th and they have been unable to resolve the problem.   I'm hoping someone has seen something similar.   A little about our environment:

11 X200 nodes 
 3  S200 nodes
130 TB content
240 million files
30 million directories

In September we swapped 2 SSD disks into each X200 and enabled global namespace acceleration.  We were assured that this would improve our cluster performance;  it has not.  Our performance is actually worse.  We have changed numerous business rules to try an alleviate stress on the system.  These have had no effect as well
    • Migrated InsightIQ database off Isilon
    • Decreased SyncIQ jobs from every ten minutes to once an hour
    • Stopped snapshotting every hour

The attached image shows a 3 hours window of the InsightIQ Pending Disk Operations graph.   Over this 3 hour window the average latency of 6 disks is over 1 second.  These 6 disks have an inode count that is three time higher than their peers..   We have run many jobs to try and rectify this to no avail  (autobalance, autobalancelin, collect, flexprotect  (after failing these 6 drives and replacing them).  

We are really floundering here.  Isilons is unable to provide relief and the frequency of these performance problems is increasing.

Thanks in advance
  Jim




inisght_iq.JPG

Peter Serocka

unread,
Nov 22, 2013, 10:36:35 AM11/22/13
to isilon-u...@googlegroups.com
Hello Jim,

another Jim (from Isilon) recently explained:
"As for your second concern (odd distribution of inodes per drive post rebalance), that does sound like an issue with inode balancing that was already re-factored in OneFS 7.0 and later releases. Have you only seen it on OneFS 6.5.x?"
See https://community.emc.com/message/765660#765660

Cheers

— Peter
-- 
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-group+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Luc Simard

unread,
Nov 22, 2013, 1:48:43 PM11/22/13
to isilon-u...@googlegroups.com
Which version of ONEFS is being used (specifically), can you afford to upgrade to 7.0.2.4. Do you use snapshots ? Quotas ?

What is you average file size, what is the smallest, largest ? How many files per directories on average ? How deep are the directory structures ?

Does your cluster meet the 1.5% (2% recommended) min of cluster capacity with 20% of the nodes in the cluster supporting SSD drives


To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
inisght_iq.JPG

Jim Long

unread,
Nov 25, 2013, 4:18:23 PM11/25/13
to isilon-u...@googlegroups.com
Peter / Isimard,  here are the answers to your questions.  

We are running 6.5.5.24.   Isilon is not recommending upgrading to 7.x yet because there is concern that this would exacerbate the performance problems.  
We use snapshots (every 6 hours and with SyncIQ hourly)
We do not use quotas
Average file size is < 128k   (largest files would be around 1 gb.  
Average files per directory = 60
Directories are 5 levels deep 
Cluster does meet SSD recommendations.

Thanks for taking the time to reply
   Jim

Peter Serocka

unread,
Nov 26, 2013, 1:47:01 AM11/26/13
to isilon-u...@googlegroups.com
Jim,

why not (Smart)failing those 6 drives again -- 
and doing the 7.x upgrade at that stage.

Unless Isilon comes up with a better path to 7.x.
I suppose they will do something before conceding that 
your cluster is DOOMED.

Like for example, temporarily adding some nodes for more capacity 
and performance, and for giving the chances for better balancing 
a new twist.

We have started to see a strange inode imbalance here on 
X200 (one SSD per node), but with the 25% of affected disks 
having a LOWER inode count than normal (500K : 2.5M). 
Which does not cause an immediate performance bottleneck,
so we just observe it and will do an 7.x upgrade probably 
next spring.


Cheers

Peter

Peter Serocka
CAS-MPG Partner Institute for Computational Biology (PICB)
Shanghai Institutes for Biological Sciences (SIBS)
Chinese Academy of Sciences (CAS)
320 Yue Yang Rd, Shanghai 200031, China





Reply all
Reply to author
Forward
0 new messages