GFS2 or NFS

1,306 views
Skip to first unread message

Vince Forgetta

unread,
Jun 10, 2015, 3:36:15 PM6/10/15
to ware...@lbl.gov
Hi all,

I am in the midst of drafting a deployment plan for a cluster of ~20 nodes (each with 24 cores) that connect to a SAN (4 LUNs with total capacity of 160TB in RAID 10).  My initial plan was to have a file server connect to the LUNs via iSCSI, setup FS layout with LVM, and serve filesystem via this NFS server to compute nodes. This is similar to our existing setup, which is not running Warewulf.  Performance is quite bad on current setup, but might likely be due to RAID 60, and not purely due to poor NFS performance.

Now, in researching the implementation of the file server for new cluster I came across GFS, which I knew about but did not consider for lack of knowledge.  However, it appears that given the compute resources of the new cluster, NFS may have performance issues (https://www.redhat.com/magazine/008jun05/features/gfs_nfs/). Given this, I am considering to test both NFS and GFS, or just give GFS a shot.

So, is there any experience here with using GFS with Warewulf?  I assume it will work, as config can be provisioned to the nodes, but I wanted to get a feedback from those who may have actually implemented it.

Related, but clearly not a Warewulf question, any thoughts on general strategy for making choice between GFS or NFS?

Some specs on proposed NFS file server: Connected to SAN via dual 10Gb connection (round-robin).  File-server is connected to compute nodes via 10Gb netwotk switch.

Thanks in advance for your help,

Vince


Tom Houweling

unread,
Jun 10, 2015, 3:59:39 PM6/10/15
to ware...@lbl.gov
Vince,

NFS does not perform well when you go beyond 20 or 30  nodes and have applications that are big on disk i/o.  I used to run NFS over RDMA over infiniband and saw very high loads on the file server. Using the same infrastructure running fhGFS the performance is dramatically better. I now rarely see high loads on the file servers. 



—Tom


--
You received this message because you are subscribed to the Google Groups "Warewulf" group.
To unsubscribe from this group and stop receiving emails from it, send an email to warewulf+u...@lbl.gov.
To post to this group, send email to ware...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/warewulf/136089a6-9a26-42e3-8a3f-f3097bfc6821%40lbl.gov.
For more options, visit https://groups.google.com/a/lbl.gov/d/optout.

Ian Kaufman

unread,
Jun 10, 2015, 4:34:08 PM6/10/15
to ware...@lbl.gov
I have a cluster with ~90 nodes, running over IB. NFS performance has
been pretty solid. Just make sure you tweak the NFS server config, the
defaults have not changed much in the past decade or two ...

Ian
> https://groups.google.com/a/lbl.gov/d/msgid/warewulf/A2226928-1C55-4951-85BE-4BE09EE72821%40gmail.com.
>
> For more options, visit https://groups.google.com/a/lbl.gov/d/optout.



--
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu

Vince Forgetta

unread,
Jun 10, 2015, 4:46:52 PM6/10/15
to ware...@lbl.gov

Thanks Ian. Any quick pointers or links on NFS tuning?

Thanks.

Vince

Vince Forgetta

unread,
Jun 10, 2015, 4:59:39 PM6/10/15
to ware...@lbl.gov
Thanks, Tom. I will look into this. I assume it is compatible with the SAN scenario I mentioned? For instance, if we consider the architecture for BeeGFS:


I assume that the storage servers can be LUNs, and the meta-data server can take the place of the NFS server I proposed?

Clearly, I may have this all wrong, so I will read up the more on BeeGFS...

Vince



Vince Forgetta

unread,
Jun 10, 2015, 5:14:33 PM6/10/15
to ware...@lbl.gov

Got it now. I can put the storage and metadata server on same machine. It is basically a replacement for NFS, at least in the most simple scenario.

Pape, Brian

unread,
Jun 10, 2015, 5:21:39 PM6/10/15
to Ian Kaufman, ware...@lbl.gov
We have no problems running NFS for 1500 nodes. You just need to scale your NFS infrastructure commensurately with your node count and expected throughput. We have set a floor (minimum aggregate sustained throughput requirement for random access patterns) of 200,000 IOPS and 30GB/s.

________________________________________________________________________
From: Ian Kaufman <ikau...@eng.ucsd.edu>
Date: Wed Jun 10 2015 13:34:37 GMT-0700 (PDT)
To: ware...@lbl.gov <ware...@lbl.gov>
Subject: Re: [Warewulf] GFS2 or NFS

I have a cluster with ~90 nodes, running over IB. NFS performance has
been pretty solid. Just make sure you tweak the NFS server config, the defaults have not changed much in the past decade or two ...

Ian

On Wed, Jun 10, 2015 at 12:59 PM, Tom Houweling <tom.ho...@gmail.com> wrote: > Vince,
--
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu

--
You received this message because you are subscribed to the Google Groups "Warewulf" group. To unsubscribe from this group and stop receiving emails from it, send an email to warewulf+u...@lbl.gov. To post to this group, send email to ware...@lbl.gov.

Ian Kaufman

unread,
Jun 10, 2015, 5:55:19 PM6/10/15
to ware...@lbl.gov
They are all over.

Here are a few I have used in the past:

http://www.billharlan.com/papers/NFS_for_clusters.html
http://nfs.sourceforge.net/nfs-howto/ar01s05.html
http://unix.stackexchange.com/questions/29196/automount-nfs-autofs-timeout-settings-for-unreliable-servers-how-to-avoid-han
https://wiki.archlinux.org/index.php/NFS/Troubleshooting

Check the NFSd threads - usually the default is 16, which is far too
low. Modern systems can handle 256, 512, etc. I have mine set at 1024.
Also play with lockd_listen_backlog, lockd_servers, listen_backlog,
mountd_listen_backlog, and mountd_max_threads

Use Jumbo frames if possible. Play with your TCP window size. Use no atime.

Ian
> https://groups.google.com/a/lbl.gov/d/msgid/warewulf/CAEO3gsDeECQdHUkNzzuMkpxVE%2BfgYtcep2B9Ftjt0G3DZsEytg%40mail.gmail.com.

Allen, Benjamin S.

unread,
Jun 10, 2015, 6:05:30 PM6/10/15
to ware...@lbl.gov
Be aware NFS is not a distributed filesystem, so locking and consistent views of the filesystem between nodes is not guaranteed. RHEL7 accentuates this with some of it's default settings. If you need such functionality start taking a look at the various HPC filesystems, GPFS, Lustre, etc. GPFS fits this hardware design quite well.

A note on GFS2, it requires some level of fencing doesn't it? In other words, being able to STONITH a node that fails to communicate to avoid split brain.

What's the goal(s) for this storage? OS installs, scratch space, home directories? Do you have parallel workloads for this storage, or is it all batch work?

Ben
> To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/warewulf/20150610204051.4575309.81228.10306%40idre.ucla.edu.

Vince Forgetta

unread,
Jun 10, 2015, 7:25:43 PM6/10/15
to ware...@lbl.gov

Thanks Ben.

The storage will be used for scratch and home directories. I will use it as a shared store for Torque cluster as well as shared storage for a few Rstudio nodes.

Vince Forgetta

unread,
Jun 11, 2015, 10:43:14 AM6/11/15
to ware...@lbl.gov
Thanks everyone for the informative replies. From your feedback, I will attempt to optimize NFS first, as the other options may require additional hardware/setup (GFS2) or deployment is sufficiently different from NFS to be potential delay on deployment. 

Also, seeing that I am using a file server with 24 cores connected to the SAN and to compute nodes via 10 Gb network, I assume a properly configured NFS server will likely function well within the batch-oriented setting.

Vince


Reply all
Reply to author
Forward
0 new messages