parallel computing - to use NFS, or not

Joshua Maurice

unread,

Jun 8, 2012, 11:39:08 PM6/8/12

to

So, I put my money where my mouth was, and I'm seeing as to the best
way to write a distributed build system. Let's suppose that I'm trying
to distribute an existing Maven build (because I am). The build is
broken up into separate pieces in a DAG, like any other build. Any
build step can use any of its dependencies (or transitive
dependencies).

So, naively, I want to have a bunch of agents running on a bunch of
computers on a local network with access to some shared NFS where the
build takes place, with a single one coordinating it all by assigning
jobs to the agents. (I don't care at the moment about redundancy, fail-
over, etc. Let's just get it working. Doesn't need to have high
availability.) So, naively, when an agent is told to run a job, it
needs to guarantee that the results of all dependency jobs (the files
on NFS) are visible, then do that job (which is basically writing
files to NFS), then do whatever is needed to guarantee that the files
will be visible to dependent jobs on different computers.

Am I barking up the wrong tree? Would I be better off copying the
files myself to each agent computer? That seems like a lot of work,
especially to get it right. This seems like a job for NFS - I think.

So, let's go with the NFS approach. Is there a way to implement the
NFS equivalence of a mutex lock and unlock? I understand "close-to-
open consistency", but that doesn't help me here I think. Each job
writes lots of files.

Would something like the following work?

An agent runs a job by:
1- Gets the job details over some TCP socket from the coordinater.
2- Busy loops or sleep loops, opening and closing a sentry file for
each dependency job, waiting until the sentry file contents for each
dependency job become "done".
3- Do the job. Write out files to NFS. This will probably be done from
other processes, ex: gcc, javac, etc.
4- Wait for the other processes to finish and die, specifically wait
for all of the files for the job to be closed.
5- Call sync().
6- Open the sentry file for this job, write "done", and close it.

I don't know if this would work. I half-suspect not. The idea of what
I'm trying to do is guarantee visibility ordering ala a read and write
memory barriers, aka acquire and release memory semantics, aka C++11
memory_order_acquire and memory_order_relaxed. The above scheme will
work if I can get the guarantee that the write of the sentry file will
definitely hit the server only after all of the writes of the job hit
the server. I think "close-to-open consistency" gets me the rest.

Would sync() work, even if called from a different process than the
process doing the file writes? Would this give me the ordering
guarantee I need over NFS? What gives me this guarantee - which NFS
version, which NFS options, which OS, etc.? Will this work in
practice? Or again would I be better off re-implementing NFS in some
small part by manually copying the files of a job to each computer
node where they're needed?

Thank you for your time.

Ian Collins

unread,

Jun 9, 2012, 1:41:42 AM6/9/12

to

On 06/ 9/12 03:39 PM, Joshua Maurice wrote:
> So, I put my money where my mouth was, and I'm seeing as to the best
> way to write a distributed build system. Let's suppose that I'm trying
> to distribute an existing Maven build (because I am). The build is
> broken up into separate pieces in a DAG, like any other build. Any
> build step can use any of its dependencies (or transitive
> dependencies).
>
> So, naively, I want to have a bunch of agents running on a bunch of
> computers on a local network with access to some shared NFS where the
> build takes place, with a single one coordinating it all by assigning
> jobs to the agents. (I don't care at the moment about redundancy, fail-
> over, etc. Let's just get it working. Doesn't need to have high
> availability.) So, naively, when an agent is told to run a job, it
> needs to guarantee that the results of all dependency jobs (the files
> on NFS) are visible, then do that job (which is basically writing
> files to NFS), then do whatever is needed to guarantee that the files
> will be visible to dependent jobs on different computers.
>
> Am I barking up the wrong tree? Would I be better off copying the
> files myself to each agent computer? That seems like a lot of work,
> especially to get it right. This seems like a job for NFS - I think.

The data still has to go over the wire, so NFS is as good a way to
manage the transfer as any

> So, let's go with the NFS approach. Is there a way to implement the
> NFS equivalence of a mutex lock and unlock? I understand "close-to-
> open consistency", but that doesn't help me here I think. Each job
> writes lots of files.

You shouldn't have to. Assuming simple compiles, why would more than one
job want to write to the same file?

> Would something like the following work?
>
> An agent runs a job by:
> 1- Gets the job details over some TCP socket from the coordinater.
> 2- Busy loops or sleep loops, opening and closing a sentry file for
> each dependency job, waiting until the sentry file contents for each
> dependency job become "done".
> 3- Do the job. Write out files to NFS. This will probably be done from
> other processes, ex: gcc, javac, etc.
> 4- Wait for the other processes to finish and die, specifically wait
> for all of the files for the job to be closed.
> 5- Call sync().
> 6- Open the sentry file for this job, write "done", and close it.

Why make things so complicated? Just use the machine you kick of the
build from as the master and send jobs out to the other machines.

I used to use distributed building most of the time (Sun's dmake) and
still do on occasion, but with modern systems you really do need better
than gig-E network performance to see much gain for all but very large
projects.

--
Ian Collins

Paul

unread,

Jun 9, 2012, 1:57:41 AM6/9/12

to

You mean like the "distcc" I use in Gentoo, to speed up building ?

http://en.wikipedia.org/wiki/Distcc

That doesn't work as well as you'd think. In that, not all aspects
of the build process, are accelerated. There is still a bottleneck.
Still, it does result in a reduction in clock time for builds. And
the main advantage, it's ready to use.

It might be better to study an existing system, see what mistakes
were made, where the scheme could be improved, before re-inventing
the wheel. That particular one works that way, for a reason.

Paul

Joshua Maurice

unread,

Jun 9, 2012, 2:17:28 AM6/9/12

to

Yea, like distcc. distcc doesn't help me out too much, when most of my
build is Java and related stuffs, but thanks for mentioning it. It's a
neat little tool.

> It might be better to study an existing system, see what mistakes
> were made, where the scheme could be improved, before re-inventing
> the wheel. That particular one works that way, for a reason.

Such as? Do you know of any relevant software that can distribute a
maven build? Because some commercial companies talking to my company
about selling build products don't know, and it's their business. I
also have a very low estimation of the modern state of the art of
build systems. I think they're all crap, to varying degrees. For
example, no one cares much about incremental correctness.

Joshua Maurice

unread,

Jun 9, 2012, 2:30:27 AM6/9/12

to

On Jun 8, 10:41 pm, Ian Collins <ian-n...@hotmail.com> wrote:
> On 06/ 9/12 03:39 PM, Joshua Maurice wrote:
>
> > So, let's go with the NFS approach. Is there a way to implement the
> > NFS equivalence of a mutex lock and unlock? I understand "close-to-
> > open consistency", but that doesn't help me here I think. Each job
> > writes lots of files.
>
> You shouldn't have to. Assuming simple compiles, why would more than one
> job want to write to the same file?

As I said earlier, I'm trying to distribute a Maven build. This
includes a lot of Java. I'm not distributing single file compiles
because 1- you can't do single file compiles with Java, and 2- Maven's
indivisible unit is the "pom" which usually includes lots of files. To
break it down to the single file level, even for C++ built with Maven,
would require effectively a complete rewrite of the Maven build system
into something else - which, while I'd like to do for a lot of
reasons, is too much work for the time allotted for this particular
task.

> > An agent runs a job by:
> > 1- Gets the job details over some TCP socket from the coordinater.
> > 2- Busy loops or sleep loops, opening and closing a sentry file for
> > each dependency job, waiting until the sentry file contents for each
> > dependency job become "done".
> > 3- Do the job. Write out files to NFS. This will probably be done from
> > other processes, ex: gcc, javac, etc.
> > 4- Wait for the other processes to finish and die, specifically wait
> > for all of the files for the job to be closed.
> > 5- Call sync().
> > 6- Open the sentry file for this job, write "done", and close it.
>
> Why make things so complicated? Just use the machine you kick of the
> build from as the master and send jobs out to the other machines.
>
> I used to use distributed building most of the time (Sun's dmake) and
> still do on occasion, but with modern systems you really do need better
> than gig-E network performance to see much gain for all but very large
> projects.

Maybe I am making it complicated. Let's make it simpler. Can the
following situation happen?

Comp 1, process 1, spawns process 2,
Comp 1, process 2, opens a file X on NFS, writes some contents, closes
the file, dies,
Comp 1, process 1, wait's on process 2, opens a file Y on NFS, writes
some contents, closes the file, sends a message to comp 2 process 3
over TCP,
Comp 2, process 3, receives the message over TCP, opens file Y on NFS,
it sees the full contents from the "earlier" write, spawns process 4,
Comp 2, process 4, opens file X on NFS, it sees only half of the
contents of the "earlier" write or maybe none at all,

In other words, when working with NFS, I assume there exists the
potential for writes to be reordered. There's write and read caching
going on on both client machines. Until I'm told explicitly otherwise,
and preferably with documentation to support it, I'm going to operate
as though the above scenario is possible. In which case, this breaks
even the correctness and reliability of a full clean (re)build. I
can't start the second job until I know that it will see all of the
writes of its dependency job. If it sees only half, then "very bad
things" (tm) will happen.

I think the block you have is that you're assuming the builds are of
single files, whereas in my particular case they are not. Each build
action will work on groups of files.

Paul

unread,

Jun 9, 2012, 2:35:50 AM6/9/12

to

I'm a user of distcc, rather than a designer. All I'm suggesting
to you, is to not underestimate the size of the project you're
attempting. I wonder if the distcc people thought it would be
that much work, when they started.

If the build process was completely self contained, was
platform independent... then maybe it's all a matter of
just selecting the right communications scheme. It's possible
though, the further you get into the process, the more
complex it'll be.

I like to think of the "weirdness" of the distcc design,
as an indicator of underlying issues. It doesn't work
anything like I was expecting, when I set it up. And the
authors of that package, probably didn't start out trying
to "make it weird".

Perhaps the distcc idea of establishing their own "plumbing"
between machines, was to avoid file system issues (locks or
whatever). The distcc machines have their own transfer protocol
between machines. That may have been an attempt to make
installation easier (fewer dependencies).

Like, say I want to use your package, and my machines (or my
organization) doesn't use NFS. I would need extra work and
skill set, to satisfy the dependency of using NFS.

Paul

Rainer Weikusat

unread,

Jun 9, 2012, 2:13:13 PM6/9/12

to

Joshua Maurice <joshua...@gmail.com> writes:

[...]

>> It might be better to study an existing system, see what mistakes
>> were made, where the scheme could be improved, before re-inventing
>> the wheel. That particular one works that way, for a reason.
>
> Such as? Do you know of any relevant software that can distribute a
> maven build? Because some commercial companies talking to my company
> about selling build products don't know, and it's their business. I
> also have a very low estimation of the modern state of the art of
> build systems. I think they're all crap, to varying degrees. For
> example, no one cares much about incremental correctness.

Have you ever wondered why nobodies bothers selling glass windows
which can't be broken with a hammer? And have you voluntarily knocked
all the windows in the place you live in out just to prove that you
can do this? If not, what's your problem with understanding that
damaging tools and installations for the sake of doing so is a stupid
thing to do?