subprocess-posix: redundant close on exec

85 views
Skip to first unread message

Elad Lahav

unread,
Aug 8, 2020, 6:50:04 AM8/8/20
to ninja-build
Hello,

I'm trying to build ninja for QNX and ran into a small problem. The code in Subprocess::Start() calls SetCloseOnExec() on output_pipe[0], but then adds a posix_spawn() close action for the same fd. The call to posix_spawn() then fails with EBADF.
I need to figure out whether this is a bug in the QNX implementation of posix_spawn() (unlikely, I wrote it!) but at the very least it seems redundant to do both.
If I just #ifdef out the call to SetCloseOnExec() then the build succeeds.

Any thoughts?

Thanks,
--Elad

Evan Martin

unread,
Oct 4, 2020, 1:48:26 PM10/4/20
to Elad Lahav, ninja-build
[disclaimer, haven't touched ninja code in many years]

I think the history here is the SetCloseOnExec() code predated the addition of the posix_spawn code, so it's plausible that the interaction between the two hasn't been thought through.  I think, rereading this code, the SetCloseOnExec isn't needed.

Note that just commenting out SetCloseOnExec isn't sufficient to verify removing it is correct.  The way this would go wrong if it did go wrong is that spawned processes would start inheriting open file descriptors, which is usually harmless but can cause strange behaviors when it goes wrong.  I vaguely recall verifying the subprocesses here by creating a test build.ninja file where the subcommands dumped the list of open file descriptors (`ls -l /proc/self/fd`), perhaps mixed in with some sleeps of varying lengths (to ensure some processes are running in parallel).

--
You received this message because you are subscribed to the Google Groups "ninja-build" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ninja-build...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ninja-build/39a293ad-1c30-4da5-9fb4-e3da4b3f0d42n%40googlegroups.com.

Nico Weber

unread,
Oct 5, 2020, 9:37:52 AM10/5/20
to Evan Martin, Elad Lahav, ninja-build
When I moved the code to posix_spawn, I tried to preserve the pre-posix_spawn behavior. It looks like the close() call is present on the lhs too: https://github.com/ninja-build/ninja/commit/89587196705f54afb904c8f4572e65de7274dd81

I wasn't fully successful in that preservation attempt, https://github.com/ninja-build/ninja/commit/dd2b587e fixed a mistake -- but at least that was the intent.

I agree with what Evan says: SetCloseOnExec() does look unnecessary, but carefully testing such a change is a good idea.

As far as I can tell, POSIX isn't super clear on what posix_spawn() should do in the presence of already-closed file descriptors that are CLOEXEC. https://pubs.opengroup.org/onlinepubs/007904975/functions/posix_spawn.html mentions CLOEXEC as a fourth phase after process creation, attrp and file action processing. The "Errors" section says that the function may fail if process creation ("fork/exec"), attrp or file action processing fails. It doesn't mention cloexec processing, which happens after these 3 things. This weakly suggests that CLOEXEC on closed fds shouldn't make posix_spawn fail.

https://pubs.opengroup.org/onlinepubs/9699919799//functions/exec.html says "File descriptors open in the calling process image shall remain open in the new process image, except for those whose close-on- exec flag FD_CLOEXEC is set." So exec() is fairly explicit about cloexec on already-closed fds just being a no-op. As far as I known, the intent of the posix_spawn spec is that posix_spawn should be implementable by calling fork(), and then in the child processing attrp and file actions, followed by exec. Since exec() doesn't make this an error, that also suggests that this shouldn't make posix_spawn fail.

So while the SetCloseOnExec() calls is (probably) not needed, a pedantic reading of posix suggests it's not strictly wrong either, as far as I understand. But relying on this was certainly not intentional :)

(I admire your confidence that you having written something means that it's unlikely to have bugs! For me, it's usually the opposite.)

Elad Lahav

unread,
Oct 5, 2020, 10:17:43 AM10/5/20
to ninja-build
I agree that posix_spawn() failing in this case is suspicious, and will take another look at why it actually happens (if it wasn't obviously clear, my tongue was very firmly stuck in my cheek when I claimed the code could not possibly have any bugs...). But even if I find a problem and fix it the fix will have to wait for the next release, and I was just wondering if making this change on the ninja side was deemed safe.

By the way, the QNX implementation of posix_spawn() does not use fork()/exec() under the hood, and is instead its own system call that is much cheaper than the alternative (no need to set up a clone of the parent process just to discard most of it). POSIX doesn't mandate that it is implemented as a fork()/exec() pair, only that it behaves as such.

Thanks for the responses.

--Elad

Nico Weber

unread,
Oct 5, 2020, 10:23:37 AM10/5/20
to Elad Lahav, ninja-build
On Mon, Oct 5, 2020 at 10:17 AM Elad Lahav <e2l...@gmail.com> wrote:
I agree that posix_spawn() failing in this case is suspicious, and will take another look at why it actually happens (if it wasn't obviously clear, my tongue was very firmly stuck in my cheek when I claimed the code could not possibly have any bugs...). But even if I find a problem and fix it the fix will have to wait for the next release, and I was just wondering if making this change on the ninja side was deemed safe.

It looks safe. If you do the test Evan recommended and find that it's safe in practice too, I think we can merge a change that removes the SetCloseOnExec() call.

By the way, the QNX implementation of posix_spawn() does not use fork()/exec() under the hood, and is instead its own system call that is much cheaper than the alternative (no need to set up a clone of the parent process just to discard most of it). POSIX doesn't mandate that it is implemented as a fork()/exec() pair, only that it behaves as such.

Right, understood. I think it's a dedicated system call on macOS too, and ninja does work there. I'm not 100% sure if macOS isi too lax or qnx too strict, but my current understanding at least is that qnx is too strict.
 

Konstantin Tokarev

unread,
Oct 5, 2020, 4:24:58 PM10/5/20
to Elad Lahav, ninja-build


05.10.2020, 17:17, "Elad Lahav" <e2l...@gmail.com>:
> I agree that posix_spawn() failing in this case is suspicious, and will take another look at why it actually happens (if it wasn't obviously clear, my tongue was very firmly stuck in my cheek when I claimed the code could not possibly have any bugs...). But even if I find a problem and fix it the fix will have to wait for the next release, and I was just wondering if making this change on the ninja side was deemed safe.
>
> By the way, the QNX implementation of posix_spawn() does not use fork()/exec() under the hood, and is instead its own system call that is much cheaper than the alternative (no need to set up a clone of the parent process just to discard most of it).

AFAIK posix_spawn() is usually implemented via vfork()/exec() on systems which have vfork(), e.g. glibc and FreeBSD implementations work this way. vfork() avoids most of the fork()'s overhead, but it's very easy to misuse, that's why posix_spawn() API was created.
> To view this discussion on the web visit https://groups.google.com/d/msgid/ninja-build/4fdbe232-e589-44de-a590-d4bd0e3b4ce7n%40googlegroups.com.


-- 
Regards,
Konstantin
Reply all
Reply to author
Forward
0 new messages