Controlling ninja concurrency from the outside

Marc Espie

unread,

Jan 23, 2025, 4:40:15 AMJan 23

to ninja-build

Hey, I've tried building a different approach to handling concurrency from the outside,
see https://github.com/marcespie/build-control

Basically, the protocol allows an outside server to adjust the level of concurrency of a build program, assuming the program was connected using two env variables.

My POC was made on OpenBSD make, but writing the same code in ninja should be trivial.
(And it's next on my list)

This should be a more sturdy approach than a jobserver, because there are no jobs that can be lost, and also the protocol allows for the server to reside on the local cluster, on another machine (the cryptographic cookie limits the possibility of tampering, but of course firewall rules should be considered).

This has two applications:
- pervasive computing, where you run a long build on your workstation, using a few cpus, and want to give it full power while you leave for lunch or to sleep
- distributed builders over a small cluster, where connecting through inet is more or less necessary.

I was wondering if there were any specific caveats before I start making an acceptable patch (I've read the blurb about contributing guidelines).

Also, if someone wants to poke holes in my protocol they're welcome.

(the server code itself may need a bit of adaptation outside of OpenBSD, that is if you do not have arc4random_buffer or recallocarray).

David Turner

unread,

Jan 24, 2025, 11:52:37 AMJan 24

to Marc Espie, ninja-build

Thanks, Marc,

I have spent an hour trying to understand your protocol and your code. I am not sure I understand it all at this point, so correct me if the following doesn't match your vision.

First, let's be clear that the chances that patches for an unspecified and untested new protocol are accepted in upstream Make or Ninja are pretty nil.
You should probably create your own forks of these projects, maintain your own patches there, and convince other people to try it first to see if this solves real problems.
Having a real specification and a test suite would help better your project's adoption chances as well.

Second, it looks like the main feature of this protocol is that build tools like Make or Ninja will connect to the server to periodically query the current job capacity (a simple integer value) at runtime.
In Ninja, this value is computed during the build by the RealCommandRunner::CanRunMode() function, which can already adjust the result dynamically based on the load factor.
Have you considered using a named semaphore (whose name is passed as an environment variable) instead to let the build tool retrieve this information? I believe this would drastically simplify your patches.
(Note: while sem_getvalue() is fast and available on Posix, on Windows you cannot get the value of a shared semaphore, and one would need to use a shared memory segment, which is still considerably easier than dealing with Winsock).
In the local case, the server could create and update the named semaphore directly.
In the distributed case, you could provide a proxy program to run on the builder machine, and receive updates from the remote server with a real secure connection.
And be able to modify your remoting protocol if necessary without changing the build tools.

--
You received this message because you are subscribed to the Google Groups "ninja-build" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ninja-build...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ninja-build/a2f4a920-ac57-48bb-8bd8-9a4dd2a8885en%40googlegroups.com.

Marc Espie

unread,

Jan 24, 2025, 1:04:59 PMJan 24

to ninja-build

I'll admit I haven't considered Windows at all, it's not my main target.

On Unix, the protocol works like a charm. The sockets buffer the messages nicely, and make calls poll when needed, which is very cheap and atrociously standard.

As for security, the cookie was easy to create, so why not ? Real security is trivial on unix domain: the socket is chmod'd 0700 before even listening.
For TCP/IP the idea is to use it on a local build cluster, separated from the outside world.

If security from the outside is needed, there are usually other transport considerations, so some modicum of ipsec or other vpn technology is expected.

Ease of programming is something of a compromise: maybe on windows machine sockets will be hell to use, but the socket API is trivial and well supported anywhere else.
It will actually work in lots of places, including locations which do not have fully functional semaphore support !

I'll have a patch for ninja soon anyway. I could definitely fork the github repo.

As for the make patch, I'm OpenBSD's make maintainer. That patch contains two things:
- unoptimize some decisions that go in the way of dynamic job control (namely, remove an optimization for -j1 and make sure each Job structure is allocated separately)
- do the dynamic job control thingy.

The code design tries really hard not to depend on complicated external components. I just haven't had time to test it with ninja.

As for the POC server, I haven't tried building it elsewhere yet. The only parts that will probably break (a bit) are the
use of recallocarray and arc4random_buf. Compiling it in C++ mode will require a few casts (but then the growable arrays would
probably best suited as vectors, obviously).

Marc Espie

unread,

Jan 25, 2025, 2:44:50 PMJan 25

to ninja-build

Replying to myself: I don't think every protocol wants security all around, especially when it makes the programs more complicated.
If it's needed, it's fairly trivial to add in various ways (using ipsec, proxying over TLS... just for instance).

Again, the "random cookie" is just done in the POC server because it was cheap as dirt, so why not ?

I would rather know what pitfalls a windows socket version would have than move to semaphores right now.

(and talk is cheap, I'll have more code ready in a few days)

Reply all

Reply to author

Forward