On Jun 8, 10:41 pm, Ian Collins <
ian-n...@hotmail.com> wrote:
> On 06/ 9/12 03:39 PM, Joshua Maurice wrote:
>
> > So, let's go with the NFS approach. Is there a way to implement the
> > NFS equivalence of a mutex lock and unlock? I understand "close-to-
> > open consistency", but that doesn't help me here I think. Each job
> > writes lots of files.
>
> You shouldn't have to. Assuming simple compiles, why would more than one
> job want to write to the same file?
As I said earlier, I'm trying to distribute a Maven build. This
includes a lot of Java. I'm not distributing single file compiles
because 1- you can't do single file compiles with Java, and 2- Maven's
indivisible unit is the "pom" which usually includes lots of files. To
break it down to the single file level, even for C++ built with Maven,
would require effectively a complete rewrite of the Maven build system
into something else - which, while I'd like to do for a lot of
reasons, is too much work for the time allotted for this particular
task.
> > An agent runs a job by:
> > 1- Gets the job details over some TCP socket from the coordinater.
> > 2- Busy loops or sleep loops, opening and closing a sentry file for
> > each dependency job, waiting until the sentry file contents for each
> > dependency job become "done".
> > 3- Do the job. Write out files to NFS. This will probably be done from
> > other processes, ex: gcc, javac, etc.
> > 4- Wait for the other processes to finish and die, specifically wait
> > for all of the files for the job to be closed.
> > 5- Call sync().
> > 6- Open the sentry file for this job, write "done", and close it.
>
> Why make things so complicated? Just use the machine you kick of the
> build from as the master and send jobs out to the other machines.
>
> I used to use distributed building most of the time (Sun's dmake) and
> still do on occasion, but with modern systems you really do need better
> than gig-E network performance to see much gain for all but very large
> projects.
Maybe I am making it complicated. Let's make it simpler. Can the
following situation happen?
Comp 1, process 1, spawns process 2,
Comp 1, process 2, opens a file X on NFS, writes some contents, closes
the file, dies,
Comp 1, process 1, wait's on process 2, opens a file Y on NFS, writes
some contents, closes the file, sends a message to comp 2 process 3
over TCP,
Comp 2, process 3, receives the message over TCP, opens file Y on NFS,
it sees the full contents from the "earlier" write, spawns process 4,
Comp 2, process 4, opens file X on NFS, it sees only half of the
contents of the "earlier" write or maybe none at all,
In other words, when working with NFS, I assume there exists the
potential for writes to be reordered. There's write and read caching
going on on both client machines. Until I'm told explicitly otherwise,
and preferably with documentation to support it, I'm going to operate
as though the above scenario is possible. In which case, this breaks
even the correctness and reliability of a full clean (re)build. I
can't start the second job until I know that it will see all of the
writes of its dependency job. If it sees only half, then "very bad
things" (tm) will happen.
I think the block you have is that you're assuming the builds are of
single files, whereas in my particular case they are not. Each build
action will work on groups of files.