POSIX file locking on Fhgfs

Harry Mangalam

unread,

Oct 7, 2013, 12:15:03 AM10/7/13

to fhgfs...@googlegroups.com

We are strongly suggesting to our users that they move High IO processing to our Fhgfs system and mostly it has gone very well. We have just now run into a problem that seems to involve the client fhgfs settings for

'tuneUseGlobalFileLocks'.

In the client '/etc/fhgfs/fhgfs-client.conf' file, the 'tuneUseGlobalFileLocks' is set to false (the default). The information section on it says:

# [tuneUseGlobalFileLocks]

# Controls whether application file locks should be checked for conflicts on

# the local machine only (=false) or globally on the servers (=true).

# Default: false

And this seems to reflect what we're seeing. The user sent this report when running large array jobs that cause multiple processes to hit the same file (/ffs is the fhgfs filesystem; '/bio' is on a gluster filesystem

====

Just a heads-up that there is an issue on /ffs. The short of it is that advisory file file locking is either not configured on the existing installation or that such locking simply does not work on that file system. The behavior of fcntl-mediated file locking is very different on /bio vs. /ffs.

I have an array job that I am currently running on both file systems. The output files from a job are 3 binary files + 1 ASCI index file that gives the file offsets of the 3 binary files for each record. On bio, the following is true:

1. Every file offset is unique.

2. As you read down the index file, the file offsets are always larger than the preceding offset.

3. Downstream code is able to read the data from the binary files using the offsets.

4. All records are present in the index file.

The above are all what you would predict if locking is working. The program works by first locking the index file, then locking the 3 output files. Once output is written, the files are unlocked in the reverse order from which they were locked. This means that the index file is unlocked last, and is the signal that another task can begin output.

I've done a lot of experiments and run 1000s of jobs involving file locking on /bio in the last week or two, and all has been good.

But, when run the same jobs on ffs:

1. Offsets get repeated in the index file. This is a big hint that locking is NOT working on ffs. This can only occur when two different tasks attempt to open the same file in append mode, which can only happen if the lock is ignored.

2. As you read down the index file, the offsets are not monotonically increasing as they should be.

3. Downstream programs are unable to read data using the offsets.

4. Records are missing from the index file.

====

This seems to indicate that 'tuneUseGlobalFileLocks' should be set to 'true' - would that make sense in light of the complaint?

Also, I assume that the client would have to be re-started with this new config, but would the server also have to be restarted?

And what is the performance hit for turning this on? Trivial? Or a lot?

Thanks for any advice!

Harry

Harry Mangalam

unread,

Oct 7, 2013, 9:43:17 PM10/7/13

to fhgfs...@googlegroups.com

To partly answer my own question, the answer is Yes, that variable [tuneUseGlobalFileLocks] does enable global, cluster-wide locking (at least the problem that initiated this question is resolved when those clients were set to
tuneUseGlobalFileLocks = true

)

We set this on ~8 clients that were involved in the distributed SGE array job, all hitting the fhgfs simultaneously and the job that was described previously as failing went thru without a hitch.

The user did not sense any performance drop in doing this, but it would be useful to have a fhgfs developer comment on the implications of setting this cluster wide. This was 8 out of ~75 compute nodes, almost all of which are 64c QDR-IB-connected.

hjm

Frank Kautz

unread,

Oct 8, 2013, 12:53:15 PM10/8/13

to fhgfs...@googlegroups.com

Hi Harry,

when your application use flock (manpage 2 flock) or fcntl (manpage 2 fcntl with parameter F_SETLK), you are right. The configuration option tuneUseGlobalFileLocks only applies to flock() and fcntl(). When the application doesn't use this methods the performance will not decreased. With tuneUseGlobalFileLocks=true the file locks work across multiple nodes but then additional messages to the metadata server are required, because the metadata server manages the locks. If tuneUseGlobalFileLocks is set to false the locks are managed by the client and the locks works only for multiple threads on this node. We set this parameter by default to false because the most cluster applications (e.g. MPI-based) typically have other mechanisms for synchronization of critical sections directly among the nodes instead of going through file system calls.

kind regards,
Frank

----------------ursprüngliche Nachricht-----------------
Von: "Harry Mangalam" hjman...@gmail.com
An: fhgfs...@googlegroups.com
Datum: Mon, 7 Oct 2013 18:43:17 -0700 (PDT)
-------------------------------------------------

> --
> You received this message because you are subscribed to the Google Groups
> "fhgfs-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> fhgfs-user+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

--

Harry Mangalam

unread,

Oct 8, 2013, 2:11:29 PM10/8/13

to fhgfs...@googlegroups.com, frank...@itwm.fraunhofer.de

Thanks, Frank.

So it sounds like the small performance hit will only be seen when the application tries to use the locks you mentioned and otherwise not at all. Do I have that correct?

Best

harry

Sven Breuner

unread,

Oct 8, 2013, 5:43:24 PM10/8/13

to fhgfs...@googlegroups.com, Harry Mangalam

Harry Mangalam wrote on 08.10.2013 20:11:
> So it sounds like the small performance hit will only be seen when the
> application tries to use the locks you mentioned and otherwise not at all. Do I
> have that correct?

yes, that's right.

best regards,
sven

Reply all

Reply to author

Forward