posix_spawn: What's going on with it?

Kenny McCormack

unread,

Oct 25, 2014, 1:24:35 PM10/25/14

to

Basic thesis: It would be nice if Unix/Linux had a "spawn"
function/functionality - that would start up a program without the
overhead/bother/security-risks of system(). I.e., something similar to the
current functionality of fork-followed-by-exec, but, again, without the
complexity/risk of those system calls.

Now, there is something called "posix_spawn", which seems to be designed to
fill this niche, but this function seems to be shrouded in mystery. It
exists on my system, but there is no man page for it. I found out about
its existence by Googling (on the theory that it should exist, even though
I had no actual knowledge that it did) and found a man page (on the net,
not on my system).

So, I wrote a program to test this function, and it seems to work - sort of.
A couple of things I've noticed about it:
1) It seems to be implemented as vfork() then exec().
2) It doesn't return an error code (i.e., non-zero return value) if you
try to run a non-existence program (e.g., I was testing with
"echo", but when I passed "echx", the return value was still zero).
This is understandable if one assumes that the fork() succeeded (as
it always will), but then the exec() failed, but there's no way for
the exec() failure to be transmitted back to the parent process.
Anyway, to me, this problem is a deal-breaker, and makes the
function essentially useless.

Any further information or discussion that anyone can provide would be
appreciated. Thanks.

--
This is the GOP's problem. When you're at the beginning of the year
and you've got nine Democrats running for the nomination, maybe one or
two of them are Dennis Kucinich. When you have nine Republicans, seven
or eight of them are Michelle Bachmann.

Richard Kettlewell

unread,

Oct 25, 2014, 1:56:02 PM10/25/14

to

gaz...@shell.xmission.com (Kenny McCormack) writes:
> Now, there is something called "posix_spawn", which seems to be designed to
> fill this niche, but this function seems to be shrouded in mystery. It
> exists on my system, but there is no man page for it. I found out about
> its existence by Googling (on the theory that it should exist, even though
> I had no actual knowledge that it did) and found a man page (on the net,
> not on my system).

A specification can be found in SUS:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html

> So, I wrote a program to test this function, and it seems to work - sort of.
> A couple of things I've noticed about it:
> 1) It seems to be implemented as vfork() then exec().
> 2) It doesn't return an error code (i.e., non-zero return value) if you
> try to run a non-existence program (e.g., I was testing with
> "echo", but when I passed "echx", the return value was still zero).
> This is understandable if one assumes that the fork() succeeded (as
> it always will), but then the exec() failed, but there's no way for
> the exec() failure to be transmitted back to the parent process.
> Anyway, to me, this problem is a deal-breaker, and makes the
> function essentially useless.

That seems to be within the specification - “if the error occurs after
the calling process successfully returns, the child process shall exit
with exit status 127”.

--
http://www.greenend.org.uk/rjk/

Melzzzzz

unread,

Oct 25, 2014, 2:06:45 PM10/25/14

to

Man page says that WIFEXITED/WEXITSTATUS (of the passed pid_t* arg)
should be checked.
If status is 127 there was some error. I checked and it actually
returns 127.

--
Manjaro all the way!
http://manjaro.org/

Philip Guenther

unread,

Oct 26, 2014, 3:45:36 AM10/26/14

to

On Saturday, October 25, 2014 10:24:35 AM UTC-7, Kenny McCormack wrote:
> Basic thesis: It would be nice if Unix/Linux had a "spawn"
> function/functionality - that would start up a program without the
> overhead/bother/security-risks of system(). I.e., something similar to the
> current functionality of fork-followed-by-exec, but, again, without the
> complexity/risk of those system calls.

Having seen enough code screw up error handling after fork(), I'll grant you the complexity point, but can you clarify your concern about the *risks* of fork/exec?

> Now, there is something called "posix_spawn", which seems to be designed to
> fill this niche, but this function seems to be shrouded in mystery. It
> exists on my system, but there is no man page for it.

Some projects think that a feature isn't complete without documentation. You appear to be using one that doesn't feel that way, but for some reason you aren't naming names so that others can know to avoid it. The project I prefer to associate with provides this manpage for posix_spawn(3)

http://www.openbsd.org/cgi-bin/man.cgi?query=posix_spawn&apropos=0&sec=0&arch=default&manpath=OpenBSD-current

> I found out about
> its existence by Googling (on the theory that it should exist, even though
> I had no actual knowledge that it did) and found a man page (on the net,
> not on my system).
>
> So, I wrote a program to test this function, and it seems to work - sort of.
> A couple of things I've noticed about it:
> 1) It seems to be implemented as vfork() then exec().

and so...? You mention this like it's a concern, but don't say why, with the result that no one can respond to whatever your concerns are.

> 2) It doesn't return an error code (i.e., non-zero return value) if you
> try to run a non-existence program (e.g., I was testing with
> "echo", but when I passed "echx", the return value was still zero).
> This is understandable if one assumes that the fork() succeeded (as
> it always will), but then the exec() failed, but there's no way for
> the exec() failure to be transmitted back to the parent process.
> Anyway, to me, this problem is a deal-breaker, and makes the
> function essentially useless.

What is the essence of the problem you're trying to solve?

Philip Guenther

Jorgen Grahn

unread,

Oct 26, 2014, 5:34:10 AM10/26/14

to

On Sun, 2014-10-26, Philip Guenther wrote:
> On Saturday, October 25, 2014 10:24:35 AM UTC-7, Kenny McCormack wrote:

...

>> Now, there is something called "posix_spawn", which seems to be designed to
>> fill this niche, but this function seems to be shrouded in mystery. It
>> exists on my system, but there is no man page for it.
>

> Some projects think that a feature isn't complete without
> documentation. You appear to be using one that doesn't feel that way,
> but for some reason you aren't naming names so that others can know to
> avoid it.

It might be a matter of what he's installed, too. On Debian that
manpage is in 'manpages-posix-dev', and people tend to forget to
install that package. (They are also written in IEEE legalese, and
partly overlap with the Linux man pages, but that's another problem.)

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Kenny McCormack

unread,

Oct 26, 2014, 6:32:27 AM10/26/14

to

In article <slrnm4pg0c.1...@frailea.sa.invalid>,
Jorgen Grahn <grahn...@snipabacken.se> wrote:
...

>On Debian that
>manpage is in 'manpages-posix-dev', and people tend to forget to
>install that package.

That's probably it. I shall speak with the admins about this.

>(They are also written in IEEE legalese, and
>partly overlap with the Linux man pages, but that's another problem.)

--
To most Christians, the Bible is like a software license. Nobody
actually reads it. They just scroll to the bottom and click "I agree."

- author unknown -

Rainer Weikusat

unread,

Oct 26, 2014, 12:14:22 PM10/26/14

to

gaz...@shell.xmission.com (Kenny McCormack) writes:
> Basic thesis: It would be nice if Unix/Linux had a "spawn"
> function/functionality - that would start up a program without the
> overhead/bother/security-risks of system(). I.e., something similar to the
> current functionality of fork-followed-by-exec, but, again, without the
> complexity/risk of those system calls.

Why?

> Now, there is something called "posix_spawn", which seems to be designed to
> fill this niche,

posix_spawn was (see SUS rationale) specifically created to enable
execution of programs on systems without a MMU.

[...]

> So, I wrote a program to test this function, and it seems to work - sort of.
> A couple of things I've noticed about it:
> 1) It seems to be implemented as vfork() then exec().

This is usually going to be a library wrapper around 'some system calls'
which means that - except that you can count on it to work 'reasonably
well' on a system w/o MMU - no assumptions about the properties of the
call can be made. In particular, it might or might cause the invoking
process to be suspended for a relatively long time (because a vforked
child process has to wait for I/O before it can execute another program
and the calling process has to wait for the child to release its
borrowed address space).

Kaz Kylheku

unread,

Oct 26, 2014, 12:48:29 PM10/26/14

to

On 2014-10-26, Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:
> gaz...@shell.xmission.com (Kenny McCormack) writes:
>> Basic thesis: It would be nice if Unix/Linux had a "spawn"
>> function/functionality - that would start up a program without the
>> overhead/bother/security-risks of system(). I.e., something similar to the
>> current functionality of fork-followed-by-exec, but, again, without the
>> complexity/risk of those system calls.
>
> Why?

It's useful to have functions which, like system and popen, hide the details of
forking, execing, waiting, setting up pipes between parent and child (taking
care of the error handling in all these) but which take a precise argument list
in a manner similar to the exec functions, rather than a hacky shell command.

Sometimes you know the exact program you want to run, and the exact arguments,
and don't need a clumsy interpreter in between to suck up CPU time and
introduce security risks that arise from reacting to environment variables, or
interpreting syntax in an unexpected way.

Yet, in those times, you also want a ready-made solution: you don't want to
mess around with fork, exec*, waitpid, dup, fdopen, close, _exit, kill, ...

In the TXR language, I provided a variety of functions for exactly this
reason. If you want to run a shell command, you can use (sh "command").
If you want to spawn a command with precise arguments, you use the
run function, which takes a command name and a list of arguments:
(run "command" '("arg1" "arg2 ...)).

There is a similar relationship between the functions open-command
and open-process, which provide an input or output pipe similar to popen:

;; shell command
(open-command "whatever --option=splat foo" "r")

;; precise arguments
(open-process "whatever" "r" '("--option=splat" "foo"))

"whatever" is passed to execvp at some point; so it is subject to PATH
searching, and the second list-of-strings argument become the argv[]
argument of execvp.

Implementation:

http://www.kylheku.com/cgit/txr/stream.c

Noob

unread,

Oct 26, 2014, 2:55:49 PM10/26/14

to

On 25/10/2014 19:24, gaz...@shell.xmission.com (Kenny McCormack) wrote:

> Now, there is something called "posix_spawn", which seems to be designed to
> fill this niche, but this function seems to be shrouded in mystery. It
> exists on my system, but there is no man page for it.

The NPTL (Native POSIX Thread Library) does not seem to be well-documented.
https://www.kernel.org/doc/man-pages/missing_pages.html

Regards.

Barry Margolin

unread,

Oct 26, 2014, 4:47:55 PM10/26/14

to

In article <927a22e6-8047-48f7...@googlegroups.com>,

Philip Guenther <guen...@gmail.com> wrote:

> On Saturday, October 25, 2014 10:24:35 AM UTC-7, Kenny McCormack wrote:
> > Basic thesis: It would be nice if Unix/Linux had a "spawn"
> > function/functionality - that would start up a program without the
> > overhead/bother/security-risks of system(). I.e., something similar to the
> > current functionality of fork-followed-by-exec, but, again, without the
> > complexity/risk of those system calls.
>
> Having seen enough code screw up error handling after fork(), I'll grant you
> the complexity point, but can you clarify your concern about the *risks* of
> fork/exec?

I guess he means the risk that programmers who can't handle the
complexity will use it, and thus screw it up as you say you've seen.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

Rainer Weikusat

unread,

Oct 26, 2014, 4:56:22 PM10/26/14

to

Kaz Kylheku <k...@kylheku.com> writes:
> On 2014-10-26, Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:
>> gaz...@shell.xmission.com (Kenny McCormack) writes:
>>> Basic thesis: It would be nice if Unix/Linux had a "spawn"
>>> function/functionality - that would start up a program without the
>>> overhead/bother/security-risks of system(). I.e., something similar to the
>>> current functionality of fork-followed-by-exec, but, again, without the
>>> complexity/risk of those system calls.
>>
>> Why?
>
> It's useful to have functions which, like system and popen, hide the details of
> forking, execing, waiting, setting up pipes between parent and child (taking
> care of the error handling in all these) but which take a precise argument list
> in a manner similar to the exec functions, rather than a hacky shell
> command.

Even when omitting the 'just-in-case' generalization that using a
shell-command as argument so such a function will make it a bad way to
accomplish the largest-possible number of tasks, such a function still
needs to make a lot of assumptions about its environment. Just to name a
few out of my head:

- how are currently open file descriptors supposed to be
handled?

- the signal mask is inherited. Some signals may need to be
unblocked. Which are they?

- are their possibly other children?

- how does the calling process prefer to do I/O? "I'm a sucker
and so are you" ie, "force use of stdio" doesn't quite cut it
in practice.

- is the calling process possibly handling SIGCHILD? If so, it
could eat a termination event the library routine wants to
wait for.

- how do error notification work?

> Sometimes you know the exact program you want to run, and the exact arguments,
> and don't need a clumsy interpreter in between to suck up CPU time and
> introduce security risks that arise from reacting to environment variables, or
> interpreting syntax in an unexpected way.

In rare cases, I have actually used sh -c to execute some command but
usually, I don't because having intermediate shell accomplishes nothing
beyond 'having an intermediate shell'.

> Yet, in those times, you also want a ready-made solution: you don't want to
> mess around with fork, exec*, waitpid, dup, fdopen, close, _exit,
> kill, ...

In a sufficiently complicated program, implementing higher-level
'process creation primitives' might make sense. In one I'm presently
looking at, there are

- start a 'sink' command (to skink data into it)

- start a 'source' command (producing data)

- start a 'source' command and capture fd #2

These three are implemented on top of a general 'start command on pipe'
subroutine. Then, there's a 'start command' subroutine which just runs a
command. On top of this exist a 'run command' subroutine wich waits for
the command, prints some diagnostics in case of a non-zero exit code and
returns true or false as appropriate, and a 'run command
asynchronously' subroutine which doesn't wait for the command but
optionally, registers a closure with 'the coprocess subsystem' which
will be invoked after it [the command] terminated. Despite all of these,
there are nevertheless a few cases where the required functionality is
'special' enough that fork and exec are used in a way not already
covered.

In a simple program, using a library subroutine whose source is twice
the size of the program itself (hyperbole) because it's supposed to be
useful as autonomous vacuum cleaner, heavy bomber, train engine, easy
chair, pillow and firewall is just bizarre nonsense.

Some problems are too ill-defined to enable sensible, generally useful
solutions and the solution to that is not to invent some insanely
complex behemoth capable of doing everything (the author happened to
think of) somehow but to accept that.

Rainer Weikusat

unread,

Oct 26, 2014, 5:09:14 PM10/26/14

to

Barry Margolin <bar...@alum.mit.edu> writes:
> Philip Guenther <guen...@gmail.com> wrote:
>> On Saturday, October 25, 2014 10:24:35 AM UTC-7, Kenny McCormack wrote:
>> > Basic thesis: It would be nice if Unix/Linux had a "spawn"
>> > function/functionality - that would start up a program without the
>> > overhead/bother/security-risks of system(). I.e., something similar to the
>> > current functionality of fork-followed-by-exec, but, again, without the
>> > complexity/risk of those system calls.
>>
>> Having seen enough code screw up error handling after fork(), I'll grant you
>> the complexity point, but can you clarify your concern about the *risks* of
>> fork/exec?
>
> I guess he means the risk that programmers who can't handle the
> complexity will use it, and thus screw it up as you say you've seen.

That's sort-of a red herring: Humans are really good at getting
everything wrong which can be gotten wrong and will screw up something
else instead.

Barry Margolin

unread,

Oct 26, 2014, 6:16:22 PM10/26/14

to

In article <877fzmc...@doppelsaurus.mobileactivedefense.com>,

So we might as well go back to assembly (or machine code), if it's just
as easy to get C or PHP or Lisp wrong as it is to screw up assembly.

Rainer Weikusat

unread,

Oct 27, 2014, 10:58:57 AM10/27/14

to

Barry Margolin <bar...@alum.mit.edu> writes:
> Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:
>> Barry Margolin <bar...@alum.mit.edu> writes:
>> > Philip Guenther <guen...@gmail.com> wrote:
>> >> On Saturday, October 25, 2014 10:24:35 AM UTC-7, Kenny McCormack wrote:
>> >> > Basic thesis: It would be nice if Unix/Linux had a "spawn"
>> >> > function/functionality - that would start up a program without the
>> >> > overhead/bother/security-risks of system(). I.e., something similar to
>> >> > the
>> >> > current functionality of fork-followed-by-exec, but, again, without the
>> >> > complexity/risk of those system calls.
>> >>
>> >> Having seen enough code screw up error handling after fork(), I'll grant
>> >> you
>> >> the complexity point, but can you clarify your concern about the *risks*
>> >> of
>> >> fork/exec?
>> >
>> > I guess he means the risk that programmers who can't handle the
>> > complexity will use it, and thus screw it up as you say you've seen.
>>
>> That's sort-of a red herring: Humans are really good at getting
>> everything wrong which can be gotten wrong and will screw up something
>> else instead.
>
> So we might as well go back to assembly (or machine code), if it's just
> as easy to get C or PHP or Lisp wrong as it is to screw up assembly.

If this was the only conceived benefit, yes: From a technical
standpoint, the language used to write broken code matters preciously
little and from an empirical standpoint, the code will be broken
regardless of the language it was written in.

More to the actual topic: posix_spawn takes six different arguments,
among them being two pointers to opaque attribute objects which have to
be initialized and destroyed dynamically (as everybody knows, humans
never 'screw up' explicit resource allocation/ deallocation). One of
them is a 'spawn attribute object' supporting at least least 6 different
flags for changing the behaviour of the call and there are 14 functions
(whose possible errors need to be handled correctly) for manageing
it. The other is a file action argument with 5 different functions
(whose errors have to be handled correctly), yet it is only expected to
'replace at least 50% of typical executions of fork().' (the glibc
posix_spawn implementation consists of more code than all the 'start a
command' routines I described in another posting combined).

Someone who seriously believes that 'fork and exec are too complex' and
that this must be the solution ought to have the wiring in his head
checked as some signals have obviously been inverted there: It should
have been fork and exec are too simple and too easy to use BUT
posix_spawn fixes this.

Rainer Weikusat

unread,

Oct 27, 2014, 11:04:54 AM10/27/14

to

Rainer Weikusat <rwei...@mobileactivedefense.com> writes:

[...]

> More to the actual topic: posix_spawn takes six different arguments,
> among them being two pointers to opaque attribute

[...]

Silly joke I can resist here: Is it called posix_spawn because it spawns
posixes or because posixes spawned it?

Kaz Kylheku

unread,

Oct 27, 2014, 11:44:08 AM10/27/14

to

Ironically, when I read your reply to me elsewhere in this thread
(see below) I thought about the reply, "why we could address those concerns
with some clunky, 'Microsoftian' API function that takes a dozen arguments
where you specify numerous attributes and preferences."

Evidently, you don't like that either: there should be no simple wrapper that
leaves various cases unhandled, and there must be no complex wrapper that has
arguments for various things. We must open-code process spawning using numerous
lower level pieces, for ever and ever, amen.

(Myself, I object to the idiotic prefix "posix_". I mean, of course everything
in The Open Group's document is fucking POSIX something or other! The open function
is POSIX open, the isatty function is POSIX isatty, and so on. There has to be
a shorter way to avoid clashing with the "spawn" identifier than sticking
"posix_" on it. Heck, call it "process_spawn" or "job_spawn" or something
meaningful.)

KK>> It's useful to have functions which, like system and popen, hide the details o
KK>> forking, execing, waiting, setting up pipes between parent and child (taking
KK>> care of the error handling in all these) but which take a precise argument lis
KK>> in a manner similar to the exec functions, rather than a hacky shell
KK>> command.
RW>
RW> Even when omitting the 'just-in-case' generalization that using a
RW> shell-command as argument so such a function will make it a bad way to
RW> accomplish the largest-possible number of tasks, such a function still
RW> needs to make a lot of assumptions about its environment. Just to name a
RW> few out of my head:
RW>
RW> - how are currently open file descriptors supposed to be
RW> handled?
RW>
RW> - the signal mask is inherited. Some signals may need to be
RW> unblocked. Which are they?
RW>
RW> - are their possibly other children?
RW>
RW> - how does the calling process prefer to do I/O? "I'm a sucker
RW> and so are you" ie, "force use of stdio" doesn't quite cut it
RW> in practice.
RW>
RW> - is the calling process possibly handling SIGCHILD? If so, it
RW> could eat a termination event the library routine wants to
RW> wait for.
RW>
RW> - how do error notification work?

wil...@wilbur.25thandclement.com

unread,

Oct 27, 2014, 3:30:05 PM10/27/14

to

Even if there were a man page it would be suspect. Many manual pages on
Linux are not very well maintained, and increasingly they simply include the
POSIX specification verbatim.

http://linux-man-pages.blogspot.com/2014/01/announcing-posix1-2013-man-pages-for.html

The latter is often the worst case because it omits important implementation
behavior, or in rare cases the implementation doesn't actually behave
precisely as described by POSIX. Arguable it's better to simply have no man
page at all than a misleading one.

Linux isn't the only guilty implementation, either. OS X has this problem,
too: verbatim POSIX language not kept in-sync with the actual
implementation. Sometimes you get a blurb in the BUGS section about
implementation deviations; sometimes not.

OpenBSD, OTOH, rigorously maintains their manual pages. In fact, no changes
are accepted into the tree without accompanying patches to the relevant
manual pages. One could do much worse than explore POSIX and Unix
programming by using OpenBSD.

Ultimately there's no excuse for not using the Open Group publication
directly. It's freely downloadable in HTML form. The HTML Frames version is
easy to navigate, with multiple different indices--syscall, header, utility,
etc. And you can use grep(1) when hunting down something obscure.

Jorgen Grahn

unread,

Oct 27, 2014, 5:21:55 PM10/27/14

to

On Mon, 2014-10-27, <wil...@wilbur.25thandClement.com> wrote:
> Noob <ro...@127.0.0.1> wrote:
>> On 25/10/2014 19:24, gaz...@shell.xmission.com (Kenny McCormack) wrote:
>>
>>> Now, there is something called "posix_spawn", which seems to be designed
>>> to fill this niche, but this function seems to be shrouded in mystery. It
>>> exists on my system, but there is no man page for it.
>>
>> The NPTL (Native POSIX Thread Library) does not seem to be
>> well-documented. https://www.kernel.org/doc/man-pages/missing_pages.html
>
> Even if there were a man page

And there is. Like I mentioned upthread, there are the POSIX man
pages for pthreads and stuff like that.

> it would be suspect.

Why?

> Many manual pages on
> Linux are not very well maintained, and increasingly they simply include the
> POSIX specification verbatim.

Citation needed. And not this one

> http://linux-man-pages.blogspot.com/2014/01/announcing-posix1-2013-man-pages-for.html

which seems to merely announce that the man-pages people got
permission to "distribute extracts from the latest version of the
POSIX.1 standard". Doesn't mean they'll throw out well-written
existing man pages.

> The latter is often the worst case because it omits important implementation
> behavior, or in rare cases the implementation doesn't actually behave
> precisely as described by POSIX. Arguable it's better to simply have no man
> page at all than a misleading one.

That's an overstatement IMO. But I agree to this extent: it would suck
to maintain an API, and not be able to usefully and safely extend it
because the documentation is carved in stone!

But perhaps the worst thing about those POSIX man pages is that they
don't read like man pages at all. None of that friendly Unix
informality.

Rainer Weikusat

unread,

Oct 27, 2014, 6:36:40 PM10/27/14

to

Kaz Kylheku <k...@kylheku.com> writes:
> On 2014-10-27, Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:
>> Rainer Weikusat <rwei...@mobileactivedefense.com> writes:
>>
>> [...]
>>
>>> More to the actual topic: posix_spawn takes six different arguments,
>>> among them being two pointers to opaque attribute
>>
>>
>> [...]
>>
>> Silly joke I can resist here: Is it called posix_spawn because it spawns
>> posixes or because posixes spawned it?
>
> Ironically, when I read your reply to me elsewhere in this thread
> (see below) I thought about the reply, "why we could address those concerns
> with some clunky, 'Microsoftian' API function that takes a dozen arguments
> where you specify numerous attributes and preferences."
>
> Evidently, you don't like that either: there should be no simple wrapper that
> leaves various cases unhandled, and there must be no complex wrapper that has
> arguments for various things. We must open-code process spawning using numerous
> lower level pieces, for ever and ever, amen.

Having a simple wrapper which handles a small subset of the possible
cases is perfectly fine, however, such a simple wrapper can't also be a
generic wrapper unless it's considered ok that it forces applications
into some arbitrarly restricted straight jacket. Eg, this is 'a simple
wrapper' (written in Perl) which integrates properly into the
environment it is used in:

sub start_cmd(@)
{
my $pid;

p_info('%s: \'%s\'', __func__, join(' ', @_));

$pid = fork();
given ($pid) {
when (undef) {
sys_warn('fork');
return;
}

when (0) {
no warnings;

unblock_handled_signals();
$SIG{PIPE} = 'DEFAULT';

exec(@_);

sys_warn('exec', $_[0]);
_exit(1);
}
}

return $pid;
}

[In case someone's not that familiar with Perl, a plain return returns
an undefined value, hence, the caller is informed about the error. Since
this is not a Perl newsgroup here, I've omitted hiding the Disgusting
Obscenity[tm] of using Perl prototypes for argument checking instead of
maximizing syntactic entropy].

There is, however, something very wrong with 'complex wrappers'
emulating the kind of 'process creation facilities' one can only use
after acquiring an academic degree in doing so and undergoing three
weeks of preparatory fasting and meditation which are the norm outside
of UNIX(*): 'Spawning a program' is not really a primtive operation but
one composed of three different tasks:

1. Create a new process.

2. Configure it suitably.

3. Run the program.

and the devil is in step 2): This requires running code performing
whatever operations need to be performed depending on both the intended
purpose of the new process and the present situation of the process
which created it. This is simple in UNIX(*) because each application can
just supply this code. It's hideously complicated otherwise because some
kind of programmable facility is needed here and this means some kind of
(however primitive) 'process configuration programming language' is
called for and - since this is usually also supposed to be fast - it
will bend humans to the needs of the machine so hard that it will make
assemblers look welcoming.

This is, of course, something the people who came up with fork and exec
hit on more or less by accident, because these were likely just
motivated to implement something simple which was nevertheless flexible
enough to the the job done.

wil...@wilbur.25thandclement.com

unread,

Oct 27, 2014, 6:45:06 PM10/27/14

to

Jorgen Grahn <grahn...@snipabacken.se> wrote:
<snip>

> But perhaps the worst thing about those POSIX man pages is that they
> don't read like man pages at all. None of that friendly Unix
> informality.

The POSIX descriptions are literally structured just like manual pages. The
specification of each system call starts with a NAME section, then SYNOPSIS,
DESCRIPTION, RETURN VALUE, and ERRORS. Then there's an "informative" part,
with sections for EXAMPLES, APPLICATION USAGE, RATIONALE, FUTURE DIRECTIONS,
SEE ALSO, and CHANGE HISTORY.

The language uses phrases like "shall be", but that hardly detracts from the
clarity and ease of reading. I find POSIX descriptions to be just as
friendly (or not friendly) as the vendors' manual pages. They certainly tend
to be more consistent.

wil...@wilbur.25thandclement.com

unread,

Oct 27, 2014, 7:15:05 PM10/27/14

to

Jorgen Grahn <grahn...@snipabacken.se> wrote:
> On Mon, 2014-10-27, <wil...@wilbur.25thandClement.com> wrote:
>> Noob <ro...@127.0.0.1> wrote:
>>> On 25/10/2014 19:24, gaz...@shell.xmission.com (Kenny McCormack) wrote:
>>>
>>>> Now, there is something called "posix_spawn", which seems to be
>>>> designed to fill this niche, but this function seems to be shrouded in
>>>> mystery. It exists on my system, but there is no man page for it.
>>>
>>> The NPTL (Native POSIX Thread Library) does not seem to be
>>> well-documented. https://www.kernel.org/doc/man-pages/missing_pages.html
>>
>> Even if there were a man page
>
> And there is. Like I mentioned upthread, there are the POSIX man pages
> for pthreads and stuff like that.
>
>> it would be suspect.
>
> Why?
>
>> Many manual pages on Linux are not very well maintained, and increasingly
>> they simply include the POSIX specification verbatim.
>
> Citation needed. And not this one
>
>> http://linux-man-pages.blogspot.com/2014/01/announcing-posix1-2013-man-pages-for.html
>
> which seems to merely announce that the man-pages people got permission to
> "distribute extracts from the latest version of the POSIX.1 standard".
> Doesn't mean they'll throw out well-written existing man pages.

I never said there existed no well written manual pages on Linux, or for the
kernel or glibc specifically.

And I'm not going to bother hunting down any references. Phrases like "not
very well maintained" and "increasingly" are based on my personal
experience, and should be taken as opinion. If I wanted to attempt to claim
an unequivocal fact, I'd have explicated an arugment with suitable
references.

But if we agree that one dimension of well maintained is comprehensiveness,
I would note that the BSDs, OS X, Solaris, and AIX all document posix_spawn,
and have for quite some time. So there's a data point right there.

The link at https://www.kernel.org/doc/man-pages/missing_pages.html also
admits that "there is an existing, outdated set of pages supplied with glibc
that document the old LinuxThreads implementation".

I've also pointed out on this forum before that the glibc documentation for
connect(2) conflicts with the actual kernel behavior. Specifically,
"connectionless protocol sockets may use connect() multiple times to change
their association" is wrong because of the the way Linux binds sockets to
devices. For example, connecting a UDP socket to 8.8.8.8 will fail if you
first connected it to 127.0.0.1.

Then you have dangerous recommendations like this (repeated by both OS X and
Linux):

For a portable version of timegm(), set the TZ environment variable
to UTC, call mktime(3) and restore the value of TZ.

which is not thread safe. In fact, timegm is useful precisely because of
this problem.

The Linux manual pages are an _independent_ project. Given how tedious the
job, how large the number of interfaces there are (both to document and to
_track_), and how small the team (it appears only two people are doing the
work these days), I'm not surprised that the manual pages are in
comparatively poor shape.

But my benchmark is OpenBSD, where the manual pages are often considered
[citation needed] to be of exceptional quality because of 1) their code
commit rules; 2) their prefererence to document features through the manual
page system, rather than, e.g., info pages, text files (Linux
documentation/), or ad-hoc web pages; and 3) their active evolution of the
manual page ecosystem (http://mdocml.bsd.lv/).

James K. Lowden

unread,

Oct 27, 2014, 8:06:11 PM10/27/14

to

On Mon, 27 Oct 2014 22:36:36 +0000
Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:

> This is simple in UNIX(*)

I was looking forward to the footnote.

> 'Spawning a program' is not really a primtive operation but
> one composed of three different tasks

Trenchant insight. I hadn't seen it expressed that way before.

> This is, of course, something the people who came up with fork and
> exec hit on more or less by accident, because these were likely just
> motivated to implement something simple which was nevertheless
> flexible enough to the the job done.

I don't know who you mean by "the people", but that separation didn't
originate in Murray Hill.

"A good example is the separation of the fork and exec
functions. The most common model for the creation of new processes
involves specifying a program for the process to execute; in Unix, a
forked process continues to run the same program as its parent until it
performs an explicit exec. The separation of the functions is certainly
not unique to Unix, and in fact it was present in the Berkeley
time-sharing system [2], which was well-known to Thompson."

--http://cm.bell-labs.com/who/dmr/hist.html

--jkl

[2] L. P. Deutch and B. W. Lampson, `SDS 930 Time-sharing System
Preliminary Reference Manual,' Doc. 30.10.10, Project Genie, Univ. Cal.
at Berkeley (April 1965).

Rainer Weikusat

unread,

Oct 28, 2014, 11:08:57 AM10/28/14

to

"James K. Lowden" <jklo...@speakeasy.net> writes:
> Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:

[...]

>> This is, of course, something the people who came up with fork and
>> exec hit on more or less by accident, because these were likely just
>> motivated to implement something simple which was nevertheless
>> flexible enough to the the job done.
>
> I don't know who you mean by "the people", but that separation didn't
> originate in Murray Hill.
>
> "A good example is the separation of the fork and exec
> functions. The most common model for the creation of new processes
> involves specifying a program for the process to execute; in Unix, a
> forked process continues to run the same program as its parent until it
> performs an explicit exec. The separation of the functions is certainly
> not unique to Unix, and in fact it was present in the Berkeley
> time-sharing system [2], which was well-known to Thompson."
>
> --http://cm.bell-labs.com/who/dmr/hist.html
>
> --jkl
>
> [2] L. P. Deutch and B. W. Lampson, `SDS 930 Time-sharing System
> Preliminary Reference Manual,' Doc. 30.10.10, Project Genie, Univ. Cal.
> at Berkeley (April 1965).

http://bitsavers.informatik.uni-stuttgart.de/pdf/sds/9xx/940/

Comparing the 'Berkeley Time-sharing System' with UNIX(*) is somewhat
difficult[*] because its underlying concepts are too alien. In this system,
a fork is essentially a coroutine capable of executing concurrently with
other forks which may or may not share memory with 'its controlling
program'. Independently of this, user can dump/ save areas of memory to
'files' which can later be reloaded into a memory and control can be
transferred to them, either to a 'starting location' determined when the
memory dump was saved or to an explicitly provided one.

Conceptually, fork/ exec can be regarded as providing a subset of the
functionality available in the other but that's IMHO stretching the
similarity quite a bit: A UNIX(*) process is ultimatively associated
with 'a program file' whose code it executes (at some point in time)
while an 'BTS program' could be composed of multiple forks sharing all
of their memory but all executing code coming from different files. In
reality, the distinction is not that sharp anymore because of 'dynamic
linking' but users are still neither supposed to nor usually capable of
assembling executable 'core images' by manually loading files into
memory at certain addresses a user needs to manage and have multiple
active forks executing some parts of this code concoction at their
discretion.

At the heart of the UNIX(*) model sits the idea that programs are stored
in files. Among other things, files can be 'executed' but how this
exactly happens is part of the implementation of the system users usually
don't have to deal with (when using something other than machine code
for writing programs, that is). Originally, 'a process' was associated
with a terminal and thus, a user using this terminal. Executing a
program meant the shell executed a bootstrap routine which replaced
itself with the code of the new program in 'the process'. It (the shell)
would then later be reloaded. With the addition of 'background
processes' this changed such that a new process also inhabited by the
shell was created and anything else continued to work as it did before.

IMHO, a good metaphor for 'fork' is 'a fork in a river': Water from the
same source now flows in two different directions.

[*] Somewhat interesting aside: BTS reserverd 'file numbers 0 and 1' for
'controlling teleptype input and output' and 'file number 2' for a
'discard everything' output sink (aka /dev/null). Naturally, this
became the standard error output :->. Some I also especially liked was
the statement that 'a user has to have peripheral status in order to be
allowed to use the printer'.

Xavier Roche

unread,

Oct 28, 2014, 2:26:54 PM10/28/14

to

Le 27/10/2014 15:58, Rainer Weikusat a écrit :
> Someone who seriously believes that 'fork and exec are too complex' and
> that this must be the solution ought to have the wiring in his head
> checked as some signals have obviously been inverted there: It should
> have been fork and exec are too simple and too easy to use BUT
> posix_spawn fixes this.

First of all, I'm not convinced the pthread_atfork() code would be
bigger than the fork/exec part (including kernel code) if it was
correctly written (ie. not emulated with a clone call)

With fork, things can go horribly wrong. pthread_atfork() lets you
define arbitrary callbacks before/after fork, and any random library can
add its own handler that do fancy things. The problem is that the child
callback must be carefully written, because fork() can be typically
called within a signal handler (say, to call an external debugger in
case of crash)

Besides, between the child's birth and the exec, the whole memory
process is switched to copy-on-write (COW), impacting performances in
intensive multithreaded applications. Yes, yes, you are supposed to exec
"as soon as possible".

This is really stupid: we are cloning the universe (memory regions,
including mutexes, file descriptors, etc.), run a thread at the same
position (but not creating the other ones), so that we can immediately
drop everything in the trash (memory regions, and generally also file
descriptors) to run another process through exec().

People do really think this is the sane solution, really ?

Now there's another issue: how do you handle opened file descriptors ?
This has been a recurring rant in this group, but unfortunately there is
no final answer: fopen() typically only recently introduced the "e" flag
(set O_CLOEXEC), and NOBODY is using it in the wild. Yes, I know, too
bad for bad programmers. But try to look at real code, and go back and
tell me how many fopen(.., "rb") and how many fopen(.., "rbe") you have
seen.

The consequence is that you can either say "these are bad programmers,
that should set O_CLOEXEC" and NOT solve the issue, or do what everybody
does: (random pick)
https://www.redhat.com/archives/libguestfs/2014-July/msg00074.html

That is, fetch sysconf (_SC_OPEN_MAX), and, in the child process,
close() everything from fd=3 to this upper limit.

Try a google search: this code template is everywhere, and of course it
is not clean, and amplify the COW performance impact described before.

In many situations, we only want to create a new process, whose parent
is us, and whose selected file descriptors can be chosen (stdio, generally)

posix_spawn() was a good idea, but the implementation is generally fair
(ie. yet another fork/exec), and it has the same file descriptor issue
(except that you can not solve it due to the idiotic API not letting you
close everything behind the curtains)

/Rant.

Rainer Weikusat

unread,

Oct 28, 2014, 3:25:11 PM10/28/14

to

Xavier Roche <xro...@free.fr.NOSPAM.invalid> writes:
> Le 27/10/2014 15:58, Rainer Weikusat a écrit :

[some details about the posix_spawn interface]

>> Someone who seriously believes that 'fork and exec are too complex' and
>> that this must be the solution ought to have the wiring in his head
>> checked as some signals have obviously been inverted there: It should
>> have been fork and exec are too simple and too easy to use BUT
>> posix_spawn fixes this.
>
> First of all, I'm not convinced the pthread_atfork() code would be
> bigger than the fork/exec part (including kernel code) if it was
> correctly written (ie. not emulated with a clone call)

I have no idea what this is supposed to mean here. pthread_atfork is an
abandoned part of the pthreads-specification and generally, 'things tend
to go horribly wrong' once that comes close to fork, as discussed here
quite a few times already.

> With fork, things can go horribly wrong. pthread_atfork() lets you
> define arbitrary callbacks before/after fork, and any random library can
> add its own handler that do fancy things. The problem is that the child
> callback must be carefully written, because fork() can be typically
> called within a signal handler (say, to call an external debugger in
> case of crash)

As far as I can parse that, it seems to mean "using pthread_atfork
handlers correctly in libraries is very difficult because the author of
the library code doesn't know anything about the context his
pthread_atfork handlers will run in in a real
application". "Relation to fork is their none".

> Besides, between the child's birth and the exec, the whole memory
> process is switched to copy-on-write (COW), impacting performances in
> intensive multithreaded applications. Yes, yes, you are supposed to exec
> "as soon as possible".

It's also impacting the performance of single-threaded applications,
even completely unrelated ones: Any system resources used for A aren't
available for B at the same time.

I think you intended to write something a la: 'COW is supposed to
eliminate physcial copying of pages provided the child executes
something soon' (that is, if the child runs before the parent, doesn't
block on anything, isn't suddenly interrupted by a higher priority
process, doesn't have to wait for I/O, ....) because it is conjectured
that 'the child' won't touch much of the inherited address spaces and
conjectured that the parent won't either. But there's no reason for this
conjectures and multi-threading doesn't make the matter any worse: A
single time slice is sufficient to make quite a lot of memory accesses.

But this 'battle of the conjectures' is really just hot air:
Pathological cases will require extraordinary means for dealing with
them. But they shouldn't be used as yardstick for the means for dealing
with non-pathological cases as this makes insane user interfaces. Eg,
breathing underwater requires quite a bit of fairly heavy gear but no
one would seriously consider carrying that around all the time just
because he could theoretically fall from a cliff.

> This is really stupid: we are cloning the universe (memory regions,
> including mutexes, file descriptors, etc.), run a thread at the same
> position (but not creating the other ones), so that we can immediately
> drop everything in the trash (memory regions, and generally also file
> descriptors) to run another process through exec().
>
> People do really think this is the sane solution, really ?

Yes. Ignoring your apparent gripes with the pthread-specification:
Usually, some code will need to run to ready a newly created process
before another program is executed and this code can be part of the
application which forked: Who else can conceivably know what to do in
this particular situation than said application?

This may create 'a performance problem' (as may anything else done by a
computer) but it demonstrably worked well enough on 16-bit systems
running at clock speeds below (and well below) 20Mhz with so little main
memory that a forked process couldn't even be copied in memory but had to
be written to 'a disk'. Considering technical developments since then, I
don't expect to encounter a 'fork performance problem' in the real world
anytime soon (and I'm including 'fork happy' applications with a RSS of
a few G).

Rainer Weikusat

unread,

Oct 28, 2014, 6:51:17 PM10/28/14

to

Rainer Weikusat <rwei...@mobileactivedefense.com> writes:

[...]

> Usually, some code will need to run to ready a newly created process
> before another program is executed and this code can be part of the
> application which forked: Who else can conceivably know what to do in
> this particular situation than said application?

As an afterthought: Logically, the process configuration code really
belongs to the application which created the process and wants to use it
to run a different program. A 'high-performance' interface for that
could look like this:

- provide a system call which creates a new process which is
initially not runnable because there's no code to run

- provide APIs for each and every manipulation an application
could possibly want to perform on this process prior to
'letting it run'

- provide a system call which instructs the kernel to run
program ... in process ...

OTOH, that's going to be a lot more work than the usual 'crude subset
available via crude configuration language', much more difficult to use
than the fork-based approach because there'll essentially be two
interfaces for everything, one for 'do it to the current process' and
one for 'do it to process ...'. The former could obviously be a
special-case implemented on top of the latter but then, all system calls
available in this way would be affected as they'd all need to take an
additional argument. It's also a bit silly because all of the
configuration code will only affect the new process so why not use that
to run it while the old one can continue with whatever else it was doing
before deciding to start a program.

James K. Lowden

unread,

Oct 29, 2014, 2:46:31 PM10/29/14

to

On Tue, 28 Oct 2014 19:26:53 +0100
Xavier Roche <xro...@free.fr.NOSPAM.invalid> wrote:

> This is really stupid: we are cloning the universe (memory regions,
> including mutexes, file descriptors, etc.), run a thread at the same
> position (but not creating the other ones), so that we can immediately
> drop everything in the trash (memory regions, and generally also file
> descriptors) to run another process through exec().

Plan 9 supplies rfork. It takes an argument that controls which
resources are acquired by the child. I think that goes a long way to
answer your concern about unnecessary duplication when fork is
immediately followed by exec.

I suppose the Linux equivalent, in some sense, is clone(2). clone is
certainly more complicated. It's not clear that it need be.

--jkl

Rainer Weikusat

unread,

Oct 29, 2014, 5:08:18 PM10/29/14

to

"James K. Lowden" <jklo...@speakeasy.net> writes:

> On Tue, 28 Oct 2014 19:26:53 +0100
> Xavier Roche <xro...@free.fr.NOSPAM.invalid> wrote:
>
>> This is really stupid: we are cloning the universe (memory regions,
>> including mutexes, file descriptors, etc.), run a thread at the same
>> position (but not creating the other ones), so that we can immediately
>> drop everything in the trash (memory regions, and generally also file
>> descriptors) to run another process through exec().
>
> Plan 9 supplies rfork. It takes an argument that controls which
> resources are acquired by the child. I think that goes a long way to
> answer your concern about unnecessary duplication when fork is
> immediately followed by exec.

The 'plan 9' rfork is fairly puny in this regard, cf

http://plan9.bell-labs.com/magic/man2html/2/fork

The only flag relevant for something mentioned so far is

RFCFDG If set, the new process starts with a clean file descriptor table.

but thus just adds "don't inherit any file descriptors" as alternate
optiojn to "inerit all file descriptors" which is not what was demanded
("inherit just the RIGHT file descriptors by default" (surely, that's
just 0, 1, 2)).

This seems to be more about enabling support for multi-threading and/or
'namespace-based process isolation'.

James K. Lowden

unread,

Oct 29, 2014, 10:43:34 PM10/29/14

to

On Wed, 29 Oct 2014 21:08:14 +0000
Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:

> "James K. Lowden" <jklo...@speakeasy.net> writes:
> > On Tue, 28 Oct 2014 19:26:53 +0100
> > Xavier Roche <xro...@free.fr.NOSPAM.invalid> wrote:
> >
> >> This is really stupid: we are cloning the universe (memory regions,
> >> including mutexes, file descriptors, etc.), run a thread at the
> >> same position (but not creating the other ones), so that we can
> >> immediately drop everything in the trash (memory regions, and
> >> generally also file descriptors) to run another process through
> >> exec().
> >
> > Plan 9 supplies rfork. It takes an argument that controls which
> > resources are acquired by the child. I think that goes a long way
> > to answer your concern about unnecessary duplication when fork is
> > immediately followed by exec.
>
> The 'plan 9' rfork is fairly puny in this regard, cf
>
> http://plan9.bell-labs.com/magic/man2html/2/fork
>
> The only flag relevant for something mentioned so far is
>
> RFCFDG If set, the new process starts with a clean file
> descriptor table.

OK, if you say so. I would say RFCENVG counts, too. It seemed to me
that, of the above complaints, duplicating the file descriptors was the
biggest, although I'm not aware of any evidence to support it as a
practical matter. With COW an immediate exec is cheap wrt to memory
segments; there must be *some* context from which to launch the new
process. We're a long way from the day when fork meant allocating from
swap.

Plan 9 doesn't inherit the misbegotten process/thread dichotomy, and
exec is the only way to create "threads of control" of any kind. One of
the papers mentions that apart from "canonical fork", it's hard to find
two uses of rfork with exactly the same flags, which they take to be an
indication that the option set is both useful and nonredundant.

I agree with you: fork & exec has the nice property of exposing the
entire syscall interface to the post-fork, pre-exec environment in a
way that is absolutely consistent and unspecial. Everything else is
*needlessly* complicated, even if it seems to more efficient or better
somehow. To the extent that fork per se does needless work, that work
can be parameterized, which is just what rfork does. I imagine
that's what the OpenBSD folks thought, too.
(http://nixdoc.net/man-pages/openbsd/man2/rfork.2.html)

I am surprised Plan 9 stayed with the fork-returns-pid model, instead
of fork-returns-descriptor. Or, for that matter, why not make a pid a
special kind of descriptor, one that identifies a process (for wait,
kill), and answers to read/write/select/etc? To write to a child,
write to its pid. To write to a parent, write to your own pid.
waitpid becomes read(2).

--jkl

--jkl

Philip Guenther

unread,

Oct 30, 2014, 1:24:31 AM10/30/14

to

On Wednesday, October 29, 2014 7:43:34 PM UTC-7, James K. Lowden wrote:
...

> I agree with you: fork & exec has the nice property of exposing the
> entire syscall interface to the post-fork, pre-exec environment in a
> way that is absolutely consistent and unspecial. Everything else is
> *needlessly* complicated, even if it seems to more efficient or better
> somehow. To the extent that fork per se does needless work, that work
> can be parameterized, which is just what rfork does. I imagine
> that's what the OpenBSD folks thought, too.
> (http://nixdoc.net/man-pages/openbsd/man2/rfork.2.html)

That site is badly out of date, with manpages from 11 years ago. rfork(2) was removed from OpenBSD a bit over 2 years ago after its only use was replaced with a new syscall. For current manpages, use
http://www.openbsd.org/cgi-bin/man.cgi

Note that plan9's rfork(RFPROC|RFMEM) creates "threads" that don't share stacks, so you can't, for example, usefully link a structure on the stack into a shared data structure. OpenBSD's rfork() stopped doing that many years ago in order to support POSIX thread semantics. But at that point, if the new thread can take a signal as it returns from the kernel for the first time (which we did see happen reproducibly in a real program) then you need to set at least the stack pointer and TCB pointer before returning, so I added a new syscall __tfork which passes those in. An alternative would have been to have rfork() block signals in the new threads, but I didn't find that as clean and efficient a design.

I bring up that long explanation to try to illustrate that rfork()'s simplicity is based on some basic design choices that unfortunately are incompatible with the direction that POSIX went.

> I am surprised Plan 9 stayed with the fork-returns-pid model, instead
> of fork-returns-descriptor. Or, for that matter, why not make a pid a
> special kind of descriptor, one that identifies a process (for wait,
> kill), and answers to read/write/select/etc? To write to a child,
> write to its pid. To write to a parent, write to your own pid.
> waitpid becomes read(2).

Newer versions of FreeBSD have added a family of syscalls around that idiom, I believe.

Philip Guenther

Rainer Weikusat

unread,

Oct 31, 2014, 12:54:17 PM10/31/14

to

I think the biggest complaint was that the new process is set up to run
the same application as the other and starting from the same memory
location, namely, the one fork returned to. Since there's no way to
predict what this application will do after the fork, its complete
environment has to be available to the new process, too. Assuming it
will actually execute another program 'soon', it will have inherited a
lot of baggage of no use for the new program which might even need to be
cleaned up prior to executing it.

This can be regarded as a red herring because most of any complicated
program is useless bagagge when restricting one's view to any
identifiable subsystem of it: It's needed for something else. There's
also an issue of 'user' (programmer) convience here: The system cannot
possibly know which parts of the present application will and won't be
useful in the new process, the only options are "provide none of it" and
"provide everything". I prefer the more convenient, 'Ok, I have the new
process now, let's move on to getting it ready for running ...' over
having to micromanage the details of its creation using some primitive
language designed just for that such that it 'starts' (in the sense that
control returns to my code) 'in a perfect shape'.

There's also a clash of the concepts here: Fork isn't half of
CreateProcess, it's a system call which creates a new process running
the same application as the one which forked and using the new process
to run a different program is just one of the things one can do with it
(also, exec isn't restricted to being used in a newly forked process and
this comes with its own uses, eg, nohup, as a well-known though somewhat
dated example). Part of the 'problem' with fork is presumably something
like "OMG! All I want is CreateProcess which does exactly what I need
because I've grown accustomed to wanting exactly what it does and now I
have to use this outlandish shit" --- programmers are people and they're
not less prone to xenophobia just because it doesn't make sense.

In addition to this, the problem of preventing unwanted inheritance of
file descriptors was mentioned, based on the usual worst case assumption
of an unknown number of threads running unknown code and all happily
opening, closing and forking away with no rhyme or reason: This doesn't
work and the only solution is "Don't do that". There are non-portable
ways to close all open file descriptors, eg, by reading /proc/seld/fd on
Linux. Again, this is also a question of convenience: Inheriting some
file descriptors is usually intended, the kernel cannot possibly know
which (the '0, 1 & 2' convention exists only in userspace) and inheriting
more file descriptors than necessary is usually harmless ('usually'
supposed to refer to a short running process working on some
well-defined task).

[...]

> I am surprised Plan 9 stayed with the fork-returns-pid model, instead
> of fork-returns-descriptor. Or, for that matter, why not make a pid a
> special kind of descriptor, one that identifies a process (for wait,
> kill), and answers to read/write/select/etc?

This cannot possibly work: File descriptors are only meaningful in
relation to some specific 'open files table' but processes may need to
control completely unrelated other processes. It's possible to provide
some special file which can be opened (eg, /proc/<pid>/ctl) in order to
obtain a process-control file descriptor but that's prone to the same
kind of race conditions as using 'process IDs' obtained in some other
way.

James K. Lowden

unread,

Oct 31, 2014, 3:44:23 PM10/31/14

to

On Fri, 31 Oct 2014 16:54:12 +0000

[...]

> This can be regarded as a red herring because most of any complicated
> program is useless bagagge when restricting one's view to any
> identifiable subsystem of it

[rest of fine explanation omitted]

Thanks. I hadn't thought much about the impetus for posix_foo as a
function of "do in Linux as ye did in Windows". Not to resume our
positions on the ramparts, but I think there's a case to be made along
those lines for threads generally: if "processes" are too "heavy" for
some kind of work, why not lighten them? Why invent (or ape) an
entirely different paradigm when shared memory already exists?

> > I am surprised Plan 9 stayed with the fork-returns-pid model,
> > instead of fork-returns-descriptor. Or, for that matter, why not
> > make a pid a special kind of descriptor, one that identifies a
> > process (for wait, kill), and answers to read/write/select/etc?
>
> This cannot possibly work: File descriptors are only meaningful in
> relation to some specific 'open files table'

That not quite true, or at least oversimplified, because sockets.

You will be glad to know you are not discussing the topic with someone
who would suggest solutions that cannot possibly work. ;-)

Let us imagine a new call -- let us call it "pfork", as in "pipe" --
that would automatically set up a bi-directional pipe between parent and
child. The pid serves as the descriptor for each end: child pid
is the parent's descriptor; parent's pid is the child's descriptor.

I realize this violates both expectation and implementation. File
descriptors are small integers that are indexes into the process's
descriptor table, and programmers expect them to be assigned
sequentially. I don't see anything sacrosanct or even particularly
important about either.

I think it is a truth generally acknowledged that the fork-pid-wait
triad was a botch, that life would be much better if fork had returned
a descriptor, and open/read/write/close could be used for process
control as well as IPC. This is one way there. I'm only surprised
Plan 9 didn't seize on the opportunity to make it so.

--jkl

wil...@wilbur.25thandclement.com

unread,

Oct 31, 2014, 4:45:06 PM10/31/14

to

James K. Lowden <jklo...@speakeasy.net> wrote:
<snip>

> I think it is a truth generally acknowledged that the fork-pid-wait
> triad was a botch, that life would be much better if fork had returned
> a descriptor, and open/read/write/close could be used for process
> control as well as IPC. This is one way there. I'm only surprised
> Plan 9 didn't seize on the opportunity to make it so.
>

Well, it's still a possibility:

https://www.cl.cam.ac.uk/research/security/capsicum/

Capsicum was added to FreeBSD 9.0. So FreeBSD now provides process
descriptors and fork-wait-kill variants:

https://www.freebsd.org/cgi/man.cgi?query=procdesc&sektion=4

A Google employee has also ported it to Linux:

http://www.cl.cam.ac.uk/research/security/capsicum/linux.html

With some luck it will be mainlined. Then hopefully it will spread more
widely.

Philip Guenther

unread,

Oct 31, 2014, 4:57:35 PM10/31/14

to

On Friday, October 31, 2014 12:44:23 PM UTC-7, James K. Lowden wrote:
> On Fri, 31 Oct 2014 16:54:12 +0000 Rainer Weikusat wrote:
>> James K. Lowden writes:
...

> > > I am surprised Plan 9 stayed with the fork-returns-pid model,
> > > instead of fork-returns-descriptor. Or, for that matter, why not
> > > make a pid a special kind of descriptor, one that identifies a
> > > process (for wait, kill), and answers to read/write/select/etc?
> >
> > This cannot possibly work: File descriptors are only meaningful in
> > relation to some specific 'open files table'
>
> That not quite true, or at least oversimplified, because sockets.

Saying "a PID is a valid file-descriptor number in the parent" confuses the namespaces. Every process has its own fd 'namespace' (the process's open files table), but PIDs are from a namespace that covers many (or all) processes; are you suggesting that fork() will fail if all the free PIDs correspond to fds already in use in this process?

> Let us imagine a new call -- let us call it "pfork", as in "pipe" --
> that would automatically set up a bi-directional pipe between parent and
> child. The pid serves as the descriptor for each end: child pid
> is the parent's descriptor; parent's pid is the child's descriptor.

The one UNIX OS I know of which has a "create a process and return an fd for operating on it", FreeBSD, doesn't do that: it's pdfork() allocates an fd number just like open() or socket() would and a separate call (pdgetpid()) is provided to get the PID for the process associated with that process descriptor.

> I realize this violates both expectation and implementation. File
> descriptors are small integers that are indexes into the process's
> descriptor table, and programmers expect them to be assigned
> sequentially. I don't see anything sacrosanct or even particularly
> important about either.

And the benefit of confusing PIDs and fds is what? Note that the mapping is broken as soon as a process calls dup() on one of these, so you can't depend on it anyway.

> I think it is a truth generally acknowledged that the fork-pid-wait
> triad was a botch, that life would be much better if fork had returned
> a descriptor, and open/read/write/close could be used for process
> control as well as IPC.

[citation needed]

(I.e., "no, I don't think that's generally acknowledged. Please back up that assertion.")

Philip Guenther

James K. Lowden

unread,

Oct 31, 2014, 8:11:26 PM10/31/14

to

On Fri, 31 Oct 2014 13:40:06 -0700
<wil...@wilbur.25thandClement.com> wrote:

> Capsicum was added to FreeBSD 9.0. So FreeBSD now provides process
> descriptors and fork-wait-kill variants:
>
> https://www.freebsd.org/cgi/man.cgi?query=procdesc&sektion=4

Thanks! Didn't know about that.

If I were going to do it that way, though, I'd think I'd prefer

pid_t pdfork(int* pfd);

where pfd receives a file handle if it is not NULL. Then fork becomes

#define fork pdfork(NULL)

and pdgetpid() is unnecessary.

It's interesting that the descriptor returned by pdfork doesn't connect
to, well, anything afaict. You can close it to kill (sort of), and
poll it to be alerted to POLLHUP. Not exactly what I had in mind.

--jkl

James K. Lowden

unread,

Oct 31, 2014, 8:11:30 PM10/31/14

to

On Fri, 31 Oct 2014 13:57:31 -0700 (PDT)
Philip Guenther <guen...@gmail.com> wrote:

> On Friday, October 31, 2014 12:44:23 PM UTC-7, James K. Lowden wrote:
> > Let us imagine a new call -- let us call it "pfork", as in "pipe" --
> > that would automatically set up a bi-directional pipe between
> > parent and child. The pid serves as the descriptor for each end:
> > child pid is the parent's descriptor; parent's pid is the child's
> > descriptor.
>
> are you suggesting that fork() will fail if all the free PIDs
> correspond to fds already in use in this process?

That's not really a problem in practice, right? I've seen processes
exhaust their file handles or hit their process ulimit. I've never
heard of a system running out of pids. It's hard to imagine 2^32
processes, much less 2^64.

> And the benefit of confusing PIDs and fds is what? Note that the
> mapping is broken as soon as a process calls dup() on one of these,
> so you can't depend on it anyway.

The benefit is that one number does double-duty with nothing lost. You
can wait for a pid, write to it, read from it, or signal it. I don't
see a good reason to use two variables to track one thing, tradition
notwithstanding. Eventually wait & friends go the way of creat.

I'm not sure I follow you wrt dup. You're right (I guess, if that's
what you mean) that you couldn't treat any arbitrary fd as a pid.
You'd have to know what you're doing and what you're dealing with,
always recommended.

The mapping isn't broken; it's 1:1 for those fds that are pids. Or we
choose different semantics: dup a pid and you get an fd that also
refers to the process, just like the original. A process then becomes
something more like the mysterious "file entry" object to which file
descriptors refer but are not themselves accessible.

> > I think it is a truth generally acknowledged that the fork-pid-wait
> > triad was a botch, that life would be much better if fork had
> > returned a descriptor, and open/read/write/close could be used for
> > process control as well as IPC.
>
> [citation needed]

Oh, fair enough. I'm not sure it's an assertion if it's qualified with
"I think". I'll dial it down by any factor you please. I was being
dramatic for your entertainment, and to draw contradiction if there's a
good argument against (because I don't know of one).

I think the argument for pids is that you escape the cost of an IPC
pipe for parent/child relationships that don't require it. The
argument against is that you buy complexity in the form of a few
special-purpose functions whose only job is to manage processes and
whose functionality could be subsumed into read/write/select.

A perfect example is wait4. Were pids to disappear tomorrow and be
replaced with fds, wait4 has no I/O counterpart. But really that
exposes a more basic fault, that the Byzantine process group is not
fully accessible. (You can see what group you're in. You cannot get a
list of processes in your group. Make that information available,
and select does the work of wait4 without the magic.)

I doubt the efficiency argument, and I favor simplicity whenever the
opportunity presents itself. I doubt setting up pipe in a a
specialized kernel function costs 1 ms these days, or requires much more
than 100 bytes of bookkeeping. Probably the overhead mattered on a
PDP/11. It just doesn't anymore.

--jkl

Philip Guenther

unread,

Nov 1, 2014, 1:56:25 AM11/1/14

to

On Friday, October 31, 2014 5:11:30 PM UTC-7, James K. Lowden wrote:
> On Fri, 31 Oct 2014 13:57:31 -0700 (PDT)
> Philip Guenther wrote:
>
> > On Friday, October 31, 2014 12:44:23 PM UTC-7, James K. Lowden wrote:
> > > Let us imagine a new call -- let us call it "pfork", as in "pipe" --
> > > that would automatically set up a bi-directional pipe between
> > > parent and child. The pid serves as the descriptor for each end:
> > > child pid is the parent's descriptor; parent's pid is the child's
> > > descriptor.
> >
> > are you suggesting that fork() will fail if all the free PIDs
> > correspond to fds already in use in this process?
>
> That's not really a problem in practice, right? I've seen processes
> exhaust their file handles or hit their process ulimit. I've never
> heard of a system running out of pids. It's hard to imagine 2^32
> processes, much less 2^64.

PID_MAX ain't 2^32 on any system I've seen. (I don't know why you mention 2^64 unless you are currently using a system where 'int' is 64bits, as 'int' is the relevant type for fds.)

> > And the benefit of confusing PIDs and fds is what? Note that the
> > mapping is broken as soon as a process calls dup() on one of these,
> > so you can't depend on it anyway.
>
> The benefit is that one number does double-duty with nothing lost. You
> can wait for a pid, write to it, read from it, or signal it. I don't
> see a good reason to use two variables to track one thing, tradition
> notwithstanding. Eventually wait & friends go the way of creat.

Okay, so you're passing this pid-but-fd value to read/write/close, why not dup() it to another fd? (Why do that? Why do you dup *any* fd? At least one of the reasons applies to *all* possible fd types! Think about it)

> I'm not sure I follow you wrt dup. You're right (I guess, if that's
> what you mean) that you couldn't treat any arbitrary fd as a pid.
> You'd have to know what you're doing and what you're dealing with,
> always recommended.
> The mapping isn't broken; it's 1:1 for those fds that are pids.

"You use it like an fd, except for these 5 places where you can't, and if you're unlucky and you get a child with pid 2, then you don't get to have stderr any more, so sorry!"

> Or we choose different semantics: dup a pid and you get an fd that also
> refers to the process, just like the original. A process then becomes
> something more like the mysterious "file entry" object to which file
> descriptors refer but are not themselves accessible.

You _do_ know that's how both BSD and Linux systems have handled all the not-really-a-file things that are handled via fds? kqueues, epoll instances, systrace handles, FreeBSD's process descriptors...

Philip Guenther

Nicolas George

unread,

Nov 1, 2014, 6:15:32 AM11/1/14

to

Philip Guenther , dans le message
<ddc979cb-f8f2-4b46...@googlegroups.com>, a écrit :

> Okay, so you're passing this pid-but-fd value to read/write/close, why not
> dup() it to another fd?

There is nothing to prevent you from dup()ing it.

> "You use it like an fd, except for these 5 places where you can't, and if

Have you ever heard of what happens when you call setsockopt() on a device?
Or ioctl() on a plain file?

> you're unlucky and you get a child with pid 2, then you don't get to have
> stderr any more, so sorry!"

FUD

Casper H.S. Dik

unread,

Nov 1, 2014, 6:46:18 AM11/1/14

to

Rainer Weikusat <rwei...@mobileactivedefense.com> writes:

>In addition to this, the problem of preventing unwanted inheritance of
>file descriptors was mentioned, based on the usual worst case assumption
>of an unknown number of threads running unknown code and all happily
>opening, closing and forking away with no rhyme or reason: This doesn't
>work and the only solution is "Don't do that". There are non-portable
>ways to close all open file descriptors, eg, by reading /proc/seld/fd on
>Linux. Again, this is also a question of convenience: Inheriting some
>file descriptors is usually intended, the kernel cannot possibly know
>which (the '0, 1 & 2' convention exists only in userspace) and inheriting
>more file descriptors than necessary is usually harmless ('usually'
>supposed to refer to a short running process working on some
>well-defined task).

That's why Solaris defined "closefrom(int lowfd)" (close all fds >= lowfd)
and fdwalk() (calls a function for all open file descriptors)

It is, of course, a lot more efficient than
for (fd = getdtablesize(); fd >= lowfd; fd--)
(void) close(fd);

and also doesn't have the problem that getdtablesize() which may
return a value less than the largest open file descriptor or it
might be huge (there is no fundamental limit in the number of
file descriptors for a single process; millions just work)

Casper

Xavier Roche

unread,

Nov 1, 2014, 11:08:38 AM11/1/14

to

Le 01/11/2014 11:45, Casper H.S. Dik a écrit :
> That's why Solaris defined "closefrom(int lowfd)"

Unfortunately, we had some ... cough cough cough cough cough cough
Ulrich Drepper cough cough cough cough cough cough .. issues adding it
to the GLIBC:

https://sourceware.org/bugzilla/show_bug.cgi?id=10353

James K. Lowden

unread,

Nov 1, 2014, 2:09:25 PM11/1/14

to

On Fri, 31 Oct 2014 22:56:21 -0700 (PDT)
Philip Guenther <guen...@gmail.com> wrote:

> > > are you suggesting that fork() will fail if all the free PIDs
> > > correspond to fds already in use in this process?
> >
> > That's not really a problem in practice, right? I've seen processes
> > exhaust their file handles or hit their process ulimit. I've never
> > heard of a system running out of pids. It's hard to imagine 2^32
> > processes, much less 2^64.
>
> PID_MAX ain't 2^32 on any system I've seen. (I don't know why you
> mention 2^64 unless you are currently using a system where 'int' is
> 64bits, as 'int' is the relevant type for fds.)

pid_t is an opaque type. If you're concerned about running out of file
descriptor space, there are bits enough to make that space bigger than
could be used by any hardware I can image.

PID_MAX on my ordinary machine is 30,000. I'm pretty sure it would be
outrun by a water-powered wheat mill before the sum of fds and pids
approached that figure, and I have 33 bits to spare.

I'm sure there are arguments against the noodle I'm suggesting. I
don't think *this* is one though. Am I missing something obvious?

> > The mapping isn't broken; it's 1:1 for those fds that are pids.
>
> "You use it like an fd, except for these 5 places where you can't,
> and if you're unlucky and you get a child with pid 2, then you don't
> get to have stderr any more, so sorry!"

You're too smart not to know the answer to that; I can't tell if you're
being intentionally obtuse or you're baiting me, neither of which I'm
expecting. So I'm not quite sure how to respond.

To state the obvious, the kernel assigning pids could surely start from
3. I guess it would be OK to keep init as 1, since it doesn't need
standard output.

I don't understand why you object to an fd that can do some things and
not others. You can't seek on a socket. You can't pass an fd over a
pipe (feh!). You can't select on a file. I don't see this as a
problem, neither a source of confusion nor a form of added complexity.

Let me put it to you this way: if I could demonstrate that my fd-as-pid
scheme could handle all the same functionality we presently have, but
with fewer function calls and a smaller sum of parameters passed over
the syscall interface, would you consider that beneficial? Because if
you wouldn't, we have different ideas of "good" and there's no
resolution to this discussion. (But it might be interesting to discuss
syscall goodness.)

> You _do_ know that's how both BSD and Linux systems have handled all
> the not-really-a-file things that are handled via fds? kqueues,
> epoll instances, systrace handles, FreeBSD's process descriptors...

Ayup. And I would say that all of them make the syscall interface
more complex. I don't see a need for another paradigm or different
semantics.

--jkl

Nicolas George

unread,

Nov 1, 2014, 2:51:57 PM11/1/14

to

"James K. Lowden" , dans le message

<20141101140911.b...@speakeasy.net>, a écrit :
> Let me put it to you this way: if I could demonstrate that my fd-as-pid
> scheme could handle all the same functionality we presently have, but

Does anyone actually doubt that?

> with fewer function calls and a smaller sum of parameters passed over
> the syscall interface, would you consider that beneficial?

I do not think the number of function calls and parameters is really
relevant.

(And actually, I believe that dedicated functions are better than abusing
generic functions like write().)

What really matters is that file descriptors BELONG to the process that is
using it. If you have "int fd = 42" somewhere, then fd 42 is really there,
it can not disappear unless your program actually closes it.

PIDs, on the other hand, belong to the corresponding process and its parent;
for any other process, this is just an integer with no special property.
This is really important because there will ALWAYS be a lapse between the
time a process can obtain a PID and the time it does something with it. And
in that lapse a race condition can happen: the target process can die on its
own and the number be reused.

The second reason process fds would be a good thing is the integration with
other events. If you have a dozen sockets, two ttys, three devices and two
dozens pipes, you can watch all that in a single poll() call (or any similar
call with more efficient but less portable APIs). But you can not integrate
a simple waitpid() into it. Even waiting for "any of these two processes" to
terminate is not easily possible, let alone mixing that with other kinds of
IPC.

Of course, you already knew it, but I did not see it stated clearly enough
elsewhere in the thread.

Rainer Weikusat

unread,

Nov 1, 2014, 3:49:08 PM11/1/14

to

Not really. 'You' had some (cough cough) trouble with the fact that

The assumption that a program knows all the open file
descriptors is simply invalid.

This is especially true for

"large multithreaded app[lications]" where there's
"no way in practice to control all the code running in the same
address space"

*If* there's actually a reason to close all open file descriptors and
it's not just

people (like myself) are maintaining library code
that starts new subprocesses, and they will continue to
indiscriminately close unknown file descriptors,

ie, "But I want to do that !!1 !!2", the first thing the procces has to
do is to fork so that it can use exec without killing the current
application, then, start a trusted subprogram which can then, after all
the 'unknown code where god only knows what it does' is out of the way,
go on to determine which file descriptors are open and close
them. Adding a library function for the sole benefit of a single program
seems a bit wasteful. OTOH, since creating 'hidden children' may disrupt
the main logic of an application, a library shouldn't fork at all but
rather communicate with a dedicated server process in case some task has
to be completed outside of the environment of the current application.

BTW, invoking an (unknown) 'large, multi-threaded application running an
unknown number of threads created by inaccessible code with unknown
behaviour' is just a complicate/ obfuscated way to state that "This is
necessary because it is unknown whether or not we can do without",

http://en.wikipedia.org/wiki/Argument_from_ignorance

Casper H.S. Dik

unread,

Nov 1, 2014, 7:05:10 PM11/1/14

to

He did not understand what the interface was doing.

Here's saying:

"No, it's a horrible idea. The assumption that a program knows all the

open file descriptors is simply invalid".

That's not the point of the interface; the point is to close all
file descriptors, efficiently.

Casper

Xavier Roche

unread,

Nov 2, 2014, 1:58:33 AM11/2/14

to

Le 02/11/2014 00:05, Casper H.S. Dik a écrit :
> He did not understand what the interface was doing.

Yes. And closed the bug as WONTFIX. But this is a common pattern.
(strlcpy et al.)

Xavier Roche

unread,

Nov 2, 2014, 2:04:10 AM11/2/14

to

Le 01/11/2014 20:49, Rainer Weikusat a écrit :
> Not really. 'You' had some (cough cough) trouble with the fact that
>
> The assumption that a program knows all the open file
> descriptors is simply invalid.

I don't care knowing them, I just want to close their copies.

> This is especially true for
>
> "large multithreaded app[lications]" where there's

You fail to see the point: this function is specifically intended for
single-threaded process, before executing another process.

(Solaris) man closefrom:

USAGE
The act of closing all open file descriptors should be per-
formed only as the first action of a daemon process. Clos-
ing file descriptors that are in use elsewhere in the
current process normally leads to disastrous results.

Full manual page:

Standard C Library Functions closefrom(3C)

NAME
closefrom, fdwalk - close or iterate over open file descrip-
tors

SYNOPSIS
#include <stdlib.h>

void closefrom(int lowfd);

int fdwalk(int (*func)(void *, int), void *cd);

DESCRIPTION
The closefrom() function calls close(2) on all open file
descriptors greater than or equal to lowfd.

The effect of closefrom(lowfd) is the same as the code

#include <sys/resource.h>
struct rlimit rl;
int i;

getrlimit(RLIMIT_NOFILE, &rl);
for (i = lowfd; i < rl.rlim_max; i++)
(void) close(i);

except that close() is called only on file descriptors that
are actually open, not on every possible file descriptor
greater than or equal to lowfd, and close() is also called
on any open file descriptors greater than or equal to
rl.rlim_max (and lowfd), should any exist.

The fdwalk() function first makes a list of all currently
open file descriptors. Then for each file descriptor in the
list, it calls the user-defined function, func(cd, fd),
passing it the pointer to the callback data, cd, and the
value of the file descriptor from the list, fd. The list is
processed in file descriptor value order, lowest numeric
value first.

If func() returns a non-zero value, the iteration over the
list is terminated and fdwalk() returns the non-zero value
returned by func(). Otherwise, fdwalk() returns 0 after
having called func() for every file descriptor in the list.

The fdwalk() function can be used for fine-grained control
over the closing of file descriptors. For example, the
closefrom() function can be implemented as:

static int
close_func(void *lowfdp, int fd)
{
if (fd >= *(int *)lowfdp)

(void) close(fd);
return (0);
}

void
closefrom(int lowfd)
{
(void) fdwalk(close_func, &lowfd);
}

The fdwalk() function can then be used to count the number
of open files in the process.

RETURN VALUES
No return value is defined for closefrom(). If close() fails
for any of the open file descriptors, the error is ignored
and the file descriptors whose close() operation failed
might remain open on return from closefrom().

The fdwalk() function returns the return value of the last
call to the callback function func(), or 0 if func() is
never called (no open files).

ERRORS
No errors are defined. The closefrom() and fdwalk() func-
tions do not set errno but errno can be set by close() or by
another function called by the callback function, func().

FILES
/proc/self/fd directory (list of open files)

USAGE
The act of closing all open file descriptors should be per-
formed only as the first action of a daemon process. Clos-
ing file descriptors that are in use elsewhere in the
current process normally leads to disastrous results.

ATTRIBUTES
See attributes(5) for descriptions of the following attri-
butes:

____________________________________________________________
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
|_____________________________|_____________________________|
| MT-Level | Unsafe |
|_____________________________|_____________________________|

SunOS 5.10 Last change: 27 Apr 2000 2

Philip Guenther

unread,

Nov 2, 2014, 3:51:02 AM11/2/14

to

On Saturday, November 1, 2014 11:09:25 AM UTC-7, James K. Lowden wrote:
> On Fri, 31 Oct 2014 22:56:21 -0700 (PDT)
> Philip Guenther wrote:
>
> > > > are you suggesting that fork() will fail if all the free PIDs
> > > > correspond to fds already in use in this process?
> > >
> > > That's not really a problem in practice, right? I've seen processes
> > > exhaust their file handles or hit their process ulimit. I've never
> > > heard of a system running out of pids. It's hard to imagine 2^32
> > > processes, much less 2^64.
> >
> > PID_MAX ain't 2^32 on any system I've seen. (I don't know why you
> > mention 2^64 unless you are currently using a system where 'int' is
> > 64bits, as 'int' is the relevant type for fds.)
>
> pid_t is an opaque type. If you're concerned about running out of file
> descriptor space, there are bits enough to make that space bigger than
> could be used by any hardware I can image.

Your declared intent is to pass these pid_t value as fds to read/write/close, which take those as 'int's, right?

> PID_MAX on my ordinary machine is 30,000. I'm pretty sure it would be
> outrun by a water-powered wheat mill before the sum of fds and pids
> approached that figure, and I have 33 bits to spare.

To quote out of date numbers from http://www.kegel.com/c10k.html#limits.filehandles

>>> I verified that a process on Red Hat 6.0 (2.2.5 or so plus patches)
>>> can open at least 31000 file descriptors this way. Another fellow has
>>> verified that a process on 2.2.12 can open at least 90000 file
>>> descriptors this way (with appropriate limits). The upper bound seems
>>> to be available memory.

(I suspect current systems limit PIDs to a much smaller range than 2^32 for user display and entry reasons: if all PIDs are >= 99999, for example, then ps, top, etc can use a relatively narrow column for the display, and will actually be short enough for people to remember well enough to type. Large numbers are not good identifiers for human use.)

> I'm sure there are arguments against the noodle I'm suggesting. I
> don't think *this* is one though. Am I missing something obvious?

I'm apparently unable to explain clearly why having overlapping/shared namespaces with substantially different allocation rules is a Bad Idea.

> > > The mapping isn't broken; it's 1:1 for those fds that are pids.
> >
> > "You use it like an fd, except for these 5 places where you can't,
> > and if you're unlucky and you get a child with pid 2, then you don't
> > get to have stderr any more, so sorry!"
>
> You're too smart not to know the answer to that; I can't tell if you're
> being intentionally obtuse or you're baiting me, neither of which I'm
> expecting. So I'm not quite sure how to respond.
>
> To state the obvious, the kernel assigning pids could surely start from
> 3. I guess it would be OK to keep init as 1, since it doesn't need
> standard output.

Better make that "pids start at 10" because complicted shell scripts regularly use at least single-digit fds by number.

Hmm, bash at least supports arbitrary fds. If a script runs a child in the background which gets assigned PID 10 (system was up long enough and the PIDs wrapped around) and then does
exec 10</some/file
that means bash will dup2() the new file to fd 10, which will implicitly close the pid-as-fd that it had from the fork. I guess that means it can't wait for that child any longer?

> I don't understand why you object to an fd that can do some things and
> not others. You can't seek on a socket. You can't pass an fd over a
> pipe (feh!). You can't select on a file. I don't see this as a
> problem, neither a source of confusion nor a form of added complexity.

For all of the existing things that are represented as fds, the exact number isn't important to the item itself, and they can be be dup'ed and closed as necessary. fds are just indirect references to the true underlying item and not the item itself. Changing that, IMO, is a huge change to the semantics of the UNIX API. Like that bash example above: in the FreeBSD process-descriptor model, bash could just dup() the fd of the child to a higher value as needed, eliminating the conflict.

> Let me put it to you this way: if I could demonstrate that my fd-as-pid
> scheme could handle all the same functionality we presently have, but
> with fewer function calls and a smaller sum of parameters passed over
> the syscall interface, would you consider that beneficial?

That, by itself? No. I was only interested in this conversation because of the additional functionality that process-descriptors can provide: closing the PID race by putting the lifetime of the parent's ID for the child under the control of the parent, cleaner methods of waiting for a set of events that include process state changes, etc.

> Because if you wouldn't, we have different ideas of "good" and there's no
> resolution to this discussion. (But it might be interesting to discuss
> syscall goodness.)

Your goal is just to find a smaller set of calls to express the UNIX API? That's an interesting philosophical topic, but in practice it's *way* down the list of consideration when deciding how the kernel should provide functionality. To avoid this discussion being unending, I will henceforth not reply to this subthread regarding pids-as-fds, etc; you may have the last word.

> > You _do_ know that's how both BSD and Linux systems have handled all
> > the not-really-a-file things that are handled via fds? kqueues,
> > epoll instances, systrace handles, FreeBSD's process descriptors...
>
> Ayup. And I would say that all of them make the syscall interface
> more complex. I don't see a need for another paradigm or different
> semantics.

You don't see the need for another paradigm or different semantics, so...you're suggesting a completely new paradigm for pids-as-fds that's distinct from the common something-via-fd paradigm that's used by files, devices, pipes, sockets, kqueues, systrace handles, and process-descriptors? You're right, we do have different of ideas of "good".

Philip Guenther

Richard Kettlewell

unread,

Nov 2, 2014, 4:56:15 AM11/2/14

to

Philip Guenther <guen...@gmail.com> writes:

> James K. Lowden wrote:

>> I'm sure there are arguments against the noodle I'm suggesting. I
>> don't think *this* is one though. Am I missing something obvious?
>
> I'm apparently unable to explain clearly why having overlapping/shared
> namespaces with substantially different allocation rules is a Bad
> Idea.

The weird thing about the number-space part of the suggestion is that’s
a completely unnecessary piece of complexity. If you want a thing like
fork() that returns a file descriptor, it can be like open(), pipe() etc
and just allocate the next free slot. There’s no advantage to making
the fd the same as the PID; the disadvantage is the disruption to the
existing PID number space, which now has to be bounded below by the
largest non-PID FD that any process will use, and you have no idea what
that bound is.

Also the larger your PIDs are the bigger your fd_set has to be if you
use select().

--
http://www.greenend.org.uk/rjk/

Xavier Roche

unread,

Nov 2, 2014, 7:21:03 AM11/2/14

to

Le 02/11/2014 08:04, Xavier Roche a écrit :
> Le 01/11/2014 20:49, Rainer Weikusat a écrit :
>> Not really. 'You' had some (cough cough) trouble with the fact that
>>
>> The assumption that a program knows all the open file
>> descriptors is simply invalid.
>
> I don't care knowing them, I just want to close their copies.

Specifically, only the C library (or, for the Linux case, the kernel)
needs to know what needs to be done.

(Yes, this is the kernel's job on the Linux side, and by the way a
closefrom syscall has been submitted recently on the kernel mailing-list
- not sure of its status though)

Rainer Weikusat

unread,

Nov 2, 2014, 1:28:11 PM11/2/14

to

Xavier Roche <xro...@free.fr.NOSPAM.invalid> writes:

NB: I really shouldn't reply to this as you weren't replying to my
posting but to two excepts from the glibc 'closefrom' thread I
integrated into a text you've chosen to strip away.

> Le 01/11/2014 20:49, Rainer Weikusat a écrit :
>> Not really. 'You' had some (cough cough) trouble with the fact that
>>
>> The assumption that a program knows all the open file
>> descriptors is simply invalid.
>
> I don't care knowing them, I just want to close their copies.

IIRC, I already included the conjecture that the motivation for this was
possibly just "But I WANT (!!) do to that", with no more reason for
wanting to do that than wanting to do that. The point of the (quoted)
statement above was supposed to be that you shouldn't be closing file
descriptors just based on not knowing why they're open: The code which
created these file descriptors should manage them appropriately.

NB: I'm using some 'close all open file descriptors except 0, 1 and 2'
and connect these to /dev/null' code myself, that's part of a special
purpose program to supposed to execute some other program (with
arguments) as indepdendent/ background task. That's pragmatically useful
when starting possibly long-running process from a shell script executed
by other code which was in turn .... But (for Linux) this is easily
implemented with 'special system support' by reading /proc/self/fd, it's
really only needed in this particular program and the statement above
nevertheless makes a valid point.

>> This is especially true for
>>
>> "large multithreaded app[lications]" where there's
>
> You fail to see the point: this function is specifically intended for
> single-threaded process, before executing another process.

Hadn't you deleted all of my text, you might have noticed that I wrote
about the need to execute a 'trusted subprogram' in order to get rid of
all the 'unknown code doing unknown things': There are at least two ways
how this unknown code can 'poison' the newly forked process: It could
have registered pthread_atfork handlers 'with unknown behaviour' or
might have installed signal handlers.

Apart from that, I still maintain that

- libraries should do things which change the 'global
environment' of an application

- "we need this because we don't know what we're dealing with"
is an appeal to ignorance

Rainer Weikusat

unread,

Nov 2, 2014, 1:29:29 PM11/2/14

to

Xavier Roche <xro...@free.fr.NOSPAM.invalid> writes:

NB: I really shouldn't reply to this as you weren't replying to my
posting but to two excepts from the glibc 'closefrom' thread I
integrated into a text you've chosen to strip away.

> Le 01/11/2014 20:49, Rainer Weikusat a écrit :
>> Not really. 'You' had some (cough cough) trouble with the fact that
>>
>> The assumption that a program knows all the open file
>> descriptors is simply invalid.
>
> I don't care knowing them, I just want to close their copies.

IIRC, I already included the conjecture that the motivation for this was
possibly just "But I WANT (!!) do to that", with no more reason for
wanting to do that than wanting to do that. The point of the (quoted)
statement above was supposed to be that you shouldn't be closing file
descriptors just based on not knowing why they're open: The code which
created these file descriptors should manage them appropriately.

NB: I'm using some 'close all open file descriptors except 0, 1 and 2'
and connect these to /dev/null' code myself, that's part of a special
purpose program to supposed to execute some other program (with
arguments) as indepdendent/ background task. That's pragmatically useful
when starting possibly long-running process from a shell script executed
by other code which was in turn .... But (for Linux) this is easily
implemented with 'special system support' by reading /proc/self/fd, it's
really only needed in this particular program and the statement above
nevertheless makes a valid point.

>> This is especially true for
>>
>> "large multithreaded app[lications]" where there's
>
> You fail to see the point: this function is specifically intended for
> single-threaded process, before executing another process.

Hadn't you deleted all of my text, you might have noticed that I wrote
about the need to execute a 'trusted subprogram' in order to get rid of
all the 'unknown code doing unknown things': There are at least two ways
how this unknown code can 'poison' the newly forked process: It could
have registered pthread_atfork handlers 'with unknown behaviour' or
might have installed signal handlers.

Apart from that, I still maintain that

- libraries shouldn't do things which change the 'global

James K. Lowden

unread,

Nov 2, 2014, 2:08:39 PM11/2/14

to

On Sun, 2 Nov 2014 01:50:58 -0700 (PDT)
Philip Guenther <guen...@gmail.com> wrote:

> > I'm sure there are arguments against the noodle I'm suggesting. I
> > don't think *this* is one though. Am I missing something obvious?
>
> I'm apparently unable to explain clearly why having
> overlapping/shared namespaces with substantially different allocation
> rules is a Bad Idea.

Actually, Phillip, your objections convinced me that there are
technical problems with the idea beyond just human confusion. Thanks
for batting it around with me.

From where I sit, the most important flaw is that the global
namespace of pids constitutes something very different from
descriptors. In particular, if you have a descriptor it's yours; you
can act on it, and it won't change its stripes until you close it.

> Your goal is just to find a smaller set of calls to express the UNIX
> API? That's an interesting philosophical topic, but in practice it's
> *way* down the list of consideration when deciding how the kernel
> should provide functionality.

Right on both counts. Simplicity is a good, and the smaller
number of functions *and* total parameters that can be used it
implement an API is surely a measure of simplicity. LIkewise the fewer
concepts the better. And from my occasional peeks at lkml I know that
perspective is not uniformly shared.

The Unix syscall interface has on the order of 250 functions. (I don't
have a convenient way to count parameters.) If that could be reduced
by half I would consider that a good thing, even it meant doing things
differently. Whether or not the result would still be "Unix" is a
matter of definition. :-)

--jkl

Rainer Weikusat

unread,

Nov 2, 2014, 2:23:40 PM11/2/14

to

"James K. Lowden" <jklo...@speakeasy.net> writes:
> Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:
>> "James K. Lowden" <jklo...@speakeasy.net> writes:

[...]

>> > I am surprised Plan 9 stayed with the fork-returns-pid model,
>> > instead of fork-returns-descriptor. Or, for that matter, why not
>> > make a pid a special kind of descriptor, one that identifies a
>> > process (for wait, kill), and answers to read/write/select/etc?
>>
>> This cannot possibly work: File descriptors are only meaningful in
>> relation to some specific 'open files table'
>
> That not quite true, or at least oversimplified, because sockets.

This is totally true: A file descriptor is meaningful within a process
which obtained it and possibly, its descendents, and it refers to 'some
kernel object accessible via file descriptors'. Somewhat imprecisely,
the set of objects a process can access in this way can be referred to
as 'open files table'.

> You will be glad to know you are not discussing the topic with someone
> who would suggest solutions that cannot possibly work. ;-)
>
> Let us imagine a new call -- let us call it "pfork", as in "pipe" --
> that would automatically set up a bi-directional pipe between parent and
> child. The pid serves as the descriptor for each end: child pid
> is the parent's descriptor; parent's pid is the child's descriptor.

Unlike a file descripor, a pid uniquely identifies a process in the
system during its complete lifetime: It's a different 'kind' of number
meaningful in a different (and larger) context. That's why the FreeBSD
'pdfork' call returns a file descriptor (sort of) which can also be used
to query the pid of the corresponding process.

> I realize this violates both expectation and implementation. File
> descriptors are small integers that are indexes into the process's
> descriptor table, and programmers expect them to be assigned
> sequentially. I don't see anything sacrosanct or even particularly
> important about either.

I remember that I have written code like

close(2);
close(1);
close(0);
open("/dev/null", O_RDWR, 0);
dup(0);
dup(0);

on more than one occasion and I'm certainly not the only one: While I
agree that relying on a certain allocation policy for file descriptors
violates the (encapsulation of) the abstraction referred to as 'file
descriptor', these properties have been around for almost 40 years and
changing them now would break a lot of existing code.

> I think it is a truth generally acknowledged that the fork-pid-wait
> triad was a botch, that life would be much better if fork had returned
> a descriptor, and open/read/write/close could be used for process
> control as well as IPC.

I've read about similar sentiments wrt signals, usually coming from
people who use 'encapsulated event loops' dealing with file descriptors
and who would like to avoid having to think about 'different kinds of
numbers' (like signals or process IDs). Based on code I'm using, I could
make a different kind of argument: Life would be much easier if only
signals for used for 'event notification': This works nicely for
'signals' and well-enough for 'processes' but there's no portable way to
do signal-driven I/O (and the traditional BSD O_ASYNC facility with a
single signal without meta-information being available for this is a bad
joke).

IMHO, this statement is as wrongheaded as the other and 'life is
complex --- deal with it and stop whining' a sensible piece of advice in
this context. There's also a loss of information in the source code
here. A call like

waitpid(pid, NULL, 0)

is clearly identifiable as one waiting for the termination status of a
subprocess. OTOH, in order to interpret a

read(fd, &variable, sizeof(variable)

correctly, one needs to know where the fd came from: Does it refer to a
timer, a signal, an event notification mechanism, a ...? IMHO, that's a
case for 'Make things as simple as possible, but not simpler':
Everything can be shoe-horned into 'a magic file' and be performed by
doing I/O on 'magic files' but I'm not convinced that this is such a
terribly good idea: The original idea (which possibly came from the
'Berkeley Time-sharing System') was to make devices available in the
file system namespace and use the regular 'file I/O calls' to
communicate with them.

Nicolas George

unread,

Nov 2, 2014, 3:04:30 PM11/2/14

to

"James K. Lowden" , dans le message

<20141102140835.b...@speakeasy.net>, a écrit :
> In particular, if you have a descriptor it's yours; you
> can act on it, and it won't change its stripes until you close it.

That is not true. You have a file descriptor, one second you can write on
it, the next second you get a SIGPIPE. Even a plain file can go from working
to EIO. Sockets are even worse, with shutdown().

Of course, there is a core of truth in your statement: a file descriptor to
a pipe will not become a file descriptor to /etc/shadow without your program
doing something.

But guess what: we want exactly that for PIDs too: you have a FD to a
process that is causing a DoS and you want to kill, you do not want it to
become a FD to sshd during the time you need to type "kill".

Nicolas George

unread,

Nov 2, 2014, 3:08:24 PM11/2/14

to

(Sorry for the two replies, forgot this on the first one; but this is
separate anyway.)

"James K. Lowden" , dans le message
<20141102140835.b...@speakeasy.net>, a écrit :

> The Unix syscall interface has on the order of 250 functions. (I don't
> have a convenient way to count parameters.) If that could be reduced
> by half I would consider that a good thing, even it meant doing things
> differently.

I can suggest how to make it just one:

int dwim(enum DWIM opcode, ...);

enum DWIM {
DWIM_OPEN,
DWIM_CLOSE,
...
};

I am sure everyone will agree this is an awful API.

(Of course, that is actually how it works internally on most architectures,
but that is not how it is exported.)

The kernel needs exactly as much system calls as the number of different
tasks it can perform for userspace. No more, no less.

James K. Lowden

unread,

Nov 2, 2014, 4:09:55 PM11/2/14

to

On 02 Nov 2014 20:08:21 GMT

Nicolas George <nicolas$geo...@salle-s.org> wrote:

> "James K. Lowden" , dans le message
> <20141102140835.b...@speakeasy.net>, a écrit :
> > The Unix syscall interface has on the order of 250 functions. (I
> > don't have a convenient way to count parameters.) If that could be
> > reduced by half I would consider that a good thing, even it meant
> > doing things differently.
>
> I can suggest how to make it just one:
>
> int dwim(enum DWIM opcode, ...);
>
> enum DWIM {
> DWIM_OPEN,
> DWIM_CLOSE,
> ...
> };
>
> I am sure everyone will agree this is an awful API.

Sure. That's why my qualification "and parameters". If you take

A(void);
B(void);

and replace them with

C( enum A_B );

there's no reduction in the sum of parameters + functions.

> The kernel needs exactly as much system calls as the number of
> different tasks it can perform for userspace. No more, no less.

I don't know if you meant to allude to Humpty Dumpty, but it's apt: the
"number of tasks" has no objective count. As you showed above, it is as
many as we choose to name, which of course makes the argument
circular.

It's not as though the syscall interface we have is perfection
itself. Devices are in /dev, except network devices. We have 4 kinds
of wait and 3 kinds of send, but only 1 kill that ranges from "indicate
an event" to "terminate the process without notifying it". We open a
file, "socket" a socket, and "pipe" a pipe, but close all three. I'm
sure you have your favorite example.

I understand that there are historical and sometimes anachronistic
reasons for such things. I'm inclined to agree with those who say Unix
is a very good first draft. In a sense, too good, because we have
set in stone some things that are better dynamited.

I think there is an irreducible set of concepts and functions, beyond
which point removal of a function forces the addition of a parameter
(to support a given functionality). I don't know what it is. I think
it's interesting to consider from time to time.

--jkl

Scott Lurndal

unread,

Nov 3, 2014, 11:17:47 AM11/3/14

to

<wil...@wilbur.25thandClement.com> writes:
>James K. Lowden <jklo...@speakeasy.net> wrote:
><snip>
>> I think it is a truth generally acknowledged that the fork-pid-wait
>> triad was a botch, that life would be much better if fork had returned
>> a descriptor, and open/read/write/close could be used for process
>> control as well as IPC. This is one way there. I'm only surprised
>> Plan 9 didn't seize on the opportunity to make it so.
>>
>
>Well, it's still a possibility:
>
> https://www.cl.cam.ac.uk/research/security/capsicum/

SVR4/Unixware had /proc (not the same /proc as linux, but a real process-control
mechanism built on read/write/open/close). Used primarily in place
of the ptrace(2) system call.

wil...@wilbur.25thandclement.com

unread,

Nov 3, 2014, 3:45:06 PM11/3/14

to

But can that solve the PID race problem? Fixing the PID race is a
side-effect of the Capsicum feature, and I don't think it was unwitting.

It would be nice if pdfork(2) became widely available, regardless of the
adoption of Capsicum proper. Of course, it would be even nicer if the
process descriptor could be used as an arbitrary communication channel per
Lowden's suggestion.[1] But adoption of pdfork, pdwait, and pdkill would be
an unmitigated win.

[1] Then it could also be used to begin to address init(1)
fragmentation--SMF, launchd, systemd, etc. A channel for signals is a step
up, but not enough to challenge the varying directions of the new service
managers.

Scott Lurndal

unread,

Nov 3, 2014, 5:04:23 PM11/3/14

to

<wil...@wilbur.25thandClement.com> writes:
>Scott Lurndal <sc...@slp53.sl.home> wrote:
>> <wil...@wilbur.25thandClement.com> writes:
>>>James K. Lowden <jklo...@speakeasy.net> wrote:
>>><snip>
>>>> I think it is a truth generally acknowledged that the fork-pid-wait
>>>> triad was a botch, that life would be much better if fork had returned a
>>>> descriptor, and open/read/write/close could be used for process control
>>>> as well as IPC. This is one way there. I'm only surprised Plan 9 didn't
>>>> seize on the opportunity to make it so.
>>>>
>>>
>>>Well, it's still a possibility:
>>>
>>> https://www.cl.cam.ac.uk/research/security/capsicum/
>>
>> SVR4/Unixware had /proc (not the same /proc as linux, but a real
>> process-control mechanism built on read/write/open/close). Used primarily
>> in place of the ptrace(2) system call.
>
>But can that solve the PID race problem? Fixing the PID race is a
>side-effect of the Capsicum feature, and I don't think it was unwitting.
>
>It would be nice if pdfork(2) became widely available, regardless of the
>adoption of Capsicum proper. Of course, it would be even nicer if the
>process descriptor could be used as an arbitrary communication channel per
>Lowden's suggestion.[1] But adoption of pdfork, pdwait, and pdkill would be
>an unmitigated win.

Unixware[*] supported both MAC and fine-grained DAC (via ACL's).

[*] As inherited from SVR4/ESMP.

Rainer Weikusat

unread,

Nov 3, 2014, 5:42:17 PM11/3/14

to

<wil...@wilbur.25thandClement.com> writes:
> Scott Lurndal <sc...@slp53.sl.home> wrote:
>> <wil...@wilbur.25thandClement.com> writes:
>>>James K. Lowden <jklo...@speakeasy.net> wrote:
>>><snip>
>>>> I think it is a truth generally acknowledged that the fork-pid-wait
>>>> triad was a botch, that life would be much better if fork had returned a
>>>> descriptor, and open/read/write/close could be used for process control
>>>> as well as IPC. This is one way there. I'm only surprised Plan 9 didn't
>>>> seize on the opportunity to make it so.
>>>>
>>>
>>>Well, it's still a possibility:
>>>
>>> https://www.cl.cam.ac.uk/research/security/capsicum/
>>
>> SVR4/Unixware had /proc (not the same /proc as linux, but a real
>> process-control mechanism built on read/write/open/close). Used primarily
>> in place of the ptrace(2) system call.
>
> But can that solve the PID race problem? Fixing the PID race is a
> side-effect of the Capsicum feature, and I don't think it was
> unwitting.

This problem can't really be fixed (without some pretty fundamental
changes) because a process may need to control an unrelated, other
process. And there is no way to obtain a 'handle' to do so which doesn't
suffer from this race.

> It would be nice if pdfork(2) became widely available, regardless of the
> adoption of Capsicum proper. Of course, it would be even nicer if the
> process descriptor could be used as an arbitrary communication channel per
> Lowden's suggestion.[1]

The problem with this is "With what do you plan to communicate in this
way?". Right now, message 'composed' of a single integer (a signal
number) can be sent to an arbitrary process. By using sigqueue, an
additional integer can be attached to the signal but it is completely up
to each individual process how - if at all - it reacts to such a
message. This wouldn't magically change just because message could be
sent in some other way or could be arbitrarily complicated. This could,
in fact, be achieved within the limits of the existing system: All
processes would just need to listen for message using some well-known
FIFO or (IMHO a better idea) an AF_UNIX datagram socket, eg, bound to

/process/<pid>

but this is still a convention which would need to be honoured by each
application and given that "two professors might agree on anything
except the proper rules for using quotations", this is never going to
happen in an open environment.

wil...@wilbur.25thandclement.com

unread,

Nov 3, 2014, 7:15:06 PM11/3/14

to

Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:
> <wil...@wilbur.25thandClement.com> writes:
>> Scott Lurndal <sc...@slp53.sl.home> wrote:
>>> <wil...@wilbur.25thandClement.com> writes:
>>>>James K. Lowden <jklo...@speakeasy.net> wrote:
>>>><snip>
>>>>> I think it is a truth generally acknowledged that the fork-pid-wait
>>>>> triad was a botch, that life would be much better if fork had returned a
>>>>> descriptor, and open/read/write/close could be used for process control
>>>>> as well as IPC. This is one way there. I'm only surprised Plan 9 didn't
>>>>> seize on the opportunity to make it so.
>>>>>
>>>>
>>>>Well, it's still a possibility:
>>>>
>>>> https://www.cl.cam.ac.uk/research/security/capsicum/
>>>
>>> SVR4/Unixware had /proc (not the same /proc as linux, but a real
>>> process-control mechanism built on read/write/open/close). Used primarily
>>> in place of the ptrace(2) system call.
>>
>> But can that solve the PID race problem? Fixing the PID race is a
>> side-effect of the Capsicum feature, and I don't think it was
>> unwitting.
>
> This problem can't really be fixed (without some pretty fundamental
> changes) because a process may need to control an unrelated, other
> process. And there is no way to obtain a 'handle' to do so which doesn't
> suffer from this race.
>

1) Process descriptors can be passed to other processes. (I presume. I
haven't tested it on FreeBSD.)

2) The default behavior is to SIGKILL the process when the last process
descriptor is closed.

AFAICT, that means there exists no race condition as long as you only broker
the descriptor, not the PID. As long as the handle exists, the process (and
PID) is valid. When the handle no longer vaid, the process no longer exists.

Granted, you can't rely on PID _files_ directly. You still need a process
broker. But the broker doesn't need to be a parent, nor does it need to be
PID 1, nor does it even need to be the only such broker on the system.

Also note that what's described here is distinct from the earlier suggestion
about mapping PIDs to descriptor numbers.

Rainer Weikusat

unread,

Nov 4, 2014, 8:46:52 AM11/4/14

to

So, assume I want to attach a debugger to a running process with a
temper trantrum (or I want to terminate it forcibly), how am I going to
accomplish that? Quickly write a program which talks to the embedded
file descriptor server of the process which started the process I'm
interested in, possibly after reverse-engineering the protocol for doing
so, provided that such an embedded file descriptor server actually
exists? And by the time I'm done with that and receive the descriptor,
how can I tell if it still refers to the process I was interested in and
how could I communicate which process I was interested in to the server
I'd like to talk to get to know the descriptor I need to refer to the
process?

> Granted, you can't rely on PID _files_ directly. You still need a process
> broker. But the broker doesn't need to be a parent, nor does it need to be
> PID 1, nor does it even need to be the only such broker on the system.

Just because Snow White and all seven dwarves individually wrote their
own init replacement which includes process management doesn't mean it
has to be done in this way, that was just a sugarplum to trick you into
running their code in a process which cannot be terminated (it will
certainly integrate IRC at some point in time :->). I'm doing 'process
management' with a program which manages processes (and provides an
interface for requesting that actions are performed on them).

James K. Lowden

unread,

Nov 4, 2014, 10:52:17 AM11/4/14

to

On Tue, 04 Nov 2014 13:46:47 +0000
Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:

> > AFAICT, that means there exists no race condition as long as you
> > only broker the descriptor, not the PID. As long as the handle
> > exists, the process (and PID) is valid. When the handle no longer
> > vaid, the process no longer exists.
>
> So, assume I want to attach a debugger to a running process with a
> temper trantrum (or I want to terminate it forcibly), how am I going
> to accomplish that? Quickly write a program which talks to the
> embedded file descriptor server of the process which started the
> process I'm interested in, possibly after reverse-engineering the
> protocol for doing so, provided that such an embedded file descriptor
> server actually exists?

You don't mean that question seriously, do you? I am quite sure Rainer
Weikusat would write that program in advance.

> And by the time I'm done with that and receive the descriptor, how
> can I tell if it still refers to the process I was interested in and
> how could I communicate which process I was interested in to the
> server I'd like to talk to get to know the descriptor I need to refer
> to the process?

If the descriptor refers to a pid, then surely stat(2) or something
like it would reveal the pid and other process data.

The difference, if I understand aright, is that every interaction with
a pid is a race; you never know if the process you signal is the one
you intended or another that's since taken its place. (That is the
allegation. I've never heard of it being a problem in practice because
afaik kernels don't reuse pids quickly.) Interaction through a
descriptor OTOH is safe because the process:descriptor map is
guaranteed.

--jkl

Casper H.S. Dik

unread,

Nov 4, 2014, 11:47:03 AM11/4/14

to

<wil...@wilbur.25thandClement.com> writes:

>But can that solve the PID race problem? Fixing the PID race is a
>side-effect of the Capsicum feature, and I don't think it was unwitting.

If the process is a child, there is never a PID race problem.

Only once a process has exited *and* it has been reaped, only then can
its process id being reused.

So with child/parent processes there is no race condition; for other processes,
it is possible to open the /proc/<pid> file. I think it both prevents reusing
of the PID as well as it would signal that the process has exited.

Casper

Rainer Weikusat

unread,

Nov 4, 2014, 11:55:44 AM11/4/14

to

Except that there's no way to assure that /proc/<pid> still refers to
the process it was referring to when the value of <pid> was determined.

Rainer Weikusat

unread,

Nov 4, 2014, 12:20:56 PM11/4/14

to

"James K. Lowden" <jklo...@speakeasy.net> writes:

> Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:
>> > AFAICT, that means there exists no race condition as long as you
>> > only broker the descriptor, not the PID. As long as the handle
>> > exists, the process (and PID) is valid. When the handle no longer
>> > vaid, the process no longer exists.
>>
>> So, assume I want to attach a debugger to a running process with a
>> temper trantrum (or I want to terminate it forcibly), how am I going
>> to accomplish that? Quickly write a program which talks to the
>> embedded file descriptor server of the process which started the
>> process I'm interested in, possibly after reverse-engineering the
>> protocol for doing so, provided that such an embedded file descriptor
>> server actually exists?
>
> You don't mean that question seriously, do you? I am quite sure Rainer
> Weikusat would write that program in advance.

I do mean that seriously: The 'process control file descriptor' is only
'naturally' available to the parent of the process and this can be
anything, with 'anything' to be determined at the moment the information
is needed: In practice, some kind of convention could be defined for this
(OTOH, right now, there is none) and could provide a solution to the
communication problem most of the time but in theory, there is no
solution.

>> And by the time I'm done with that and receive the descriptor, how
>> can I tell if it still refers to the process I was interested in and
>> how could I communicate which process I was interested in to the
>> server I'd like to talk to get to know the descriptor I need to refer
>> to the process?
>
> If the descriptor refers to a pid, then surely stat(2) or something
> like it would reveal the pid and other process data.
>
> The difference, if I understand aright, is that every interaction with
> a pid is a race; you never know if the process you signal is the one
> you intended or another that's since taken its place. (That is the
> allegation. I've never heard of it being a problem in practice because
> afaik kernels don't reuse pids quickly.) Interaction through a
> descriptor OTOH is safe because the process:descriptor map is
> guaranteed.

Well, but an unrelated process which wants to gain access to another
unrelated process needs some way to refer to such a descriptor prior to
having one itself. Which just reintroduces the problem.

James K. Lowden

unread,

Nov 4, 2014, 5:58:55 PM11/4/14

to

On Tue, 04 Nov 2014 16:55:40 +0000
Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:

> > So with child/parent processes there is no race condition; for
> > other processes, it is possible to open the /proc/<pid> file. I
> > think it both prevents reusing of the PID as well as it would
> > signal that the process has exited.
>
> Except that there's no way to assure that /proc/<pid> still refers to
> the process it was referring to when the value of <pid> was
> determined.

Can anyone point to references of this race condition being a problem
in practice? I would have thought the ratio of PID_MAX to feasible
processes is large, and the likelihood of pid reuse consequently
small. Do kernels not implement a bereavement period for expired
pids?

This is an ex-pid. It's pushing up daisies.
No it's not. it's just resting.

I can see how using /var/run/httpd.pid might lead to trouble. But
that's hardly an instantaneous issue with /proc.

--jkl

wil...@wilbur.25thandclement.com

unread,

Nov 5, 2014, 8:30:07 PM11/5/14

to

James K. Lowden <jklo...@speakeasy.net> wrote:

> On Tue, 04 Nov 2014 16:55:40 +0000
> Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:
>
>> > So with child/parent processes there is no race condition; for
>> > other processes, it is possible to open the /proc/<pid> file. I
>> > think it both prevents reusing of the PID as well as it would
>> > signal that the process has exited.
>>
>> Except that there's no way to assure that /proc/<pid> still refers to
>> the process it was referring to when the value of <pid> was
>> determined.
>
> Can anyone point to references of this race condition being a problem
> in practice? I would have thought the ratio of PID_MAX to feasible
> processes is large, and the likelihood of pid reuse consequently
> small.

The default range is usually in the 15-bit range.

On Linux the default is 32768:

$ uname -sr
Linux 3.13.0-24-generic
$ cat /proc/sys/kernel/pid_max
32768

On Solaris the default is 30000:

$ uname -sr
SunOS 5.11
$ perl -MPOSIX -e 'print sysconf(514), "\n"' # _SC_MAXPID
30000

On OpenBSD 5.5 PID_MAX is 32766. Unlike most other systems,
OpenBSD randomizes PIDs.

On NetBSD 6.1 PID_MAX is 30000.

On FreeBSD 9.0 PID_MAX is 99999. FreeBSD also has a kern.randompid sysctl.

> Do kernels not implement a bereavement period for expired pids?
>
> This is an ex-pid. It's pushing up daisies.
> No it's not. it's just resting.

OpenBSD remembers the last 100 PIDs. I'm unsure about the others, which
allocate PIDs sequentially, anyhow.

Off the top of my head I can't see how you could implement a PID hold-off
based on wall time without providing an easy way for people to DoS systems,
locally or remotely.

> I can see how using /var/run/httpd.pid might lead to trouble. But
> that's hardly an instantaneous issue with /proc.

I agree the race is largely an academic concern. You can fix the 'loaded
gun' problem by checking whether the lock is still held, which narrows the
race considerably as long as nothing nefarious is happening. I usually use
BSD flock because it allows me to attempt to take the lock before I
daemonize, exiting with a failure code if I can't take the lock. POSIX file
locks provide a way to query the PID of a process holding the lock, in which
case /var/run/httpd.pid can be an empty file, but it's more difficult to
use.

Still, the signal race highlights the headaches you have to go through
dealing with service management. The PID file locking dance is needless work
that doesn't even satisfactorily solve the problem.

Rainer Weikusat

unread,

Nov 6, 2014, 9:46:50 AM11/6/14

to

<wil...@wilbur.25thandClement.com> writes:
> James K. Lowden <jklo...@speakeasy.net> wrote:

[...]

>> Can anyone point to references of this race condition being a problem
>> in practice? I would have thought the ratio of PID_MAX to feasible
>> processes is large, and the likelihood of pid reuse consequently
>> small.

[...]

>> I can see how using /var/run/httpd.pid might lead to trouble. But
>> that's hardly an instantaneous issue with /proc.
>
> I agree the race is largely an academic concern. You can fix the 'loaded
> gun' problem by checking whether the lock is still held, which narrows the
> race considerably as long as nothing nefarious is happening.
> I usually use BSD flock because it allows me to attempt to take the lock before I
> daemonize,

[...]

> Still, the signal race highlights the headaches you have to go through
> dealing with service management. The PID file locking dance is
> needless work that doesn't even satisfactorily solve the problem.

Considering this, what about the 'brilliant' idea of not using them
instead of trying to rescue the broken paradigm as hard as possible?
'Some process has locked this file' doesn't really communicate anything
about the nature of the process. Also, the entity which knows how a
particular program should operate wrt other processes is the program/
process starting it, IOW, just that your program may be a server doesn't
mean it should necessarily try to disassociate itself from its
'natural, controlling entity'. 'Daemonization code' is entirely generic
and can thus be put into a program of its own instead of adding a C
library routine to make the 'bad software engineering' (DJB parahprase)
more tolerable to the people who cling to it because that's what they've
been doing since 4.2BSD (someone should have christened autonomously
running processes 'creepy crawlies' instead of 'daemons').

If server mangement is giving you headaches, offload it to someone
else (even if that 'someone' ends up being a program/ system which is
just as bad because it overshoots into the opposite direction: Instead
of having a myriad of independent processes all doing the similar things
in slightly different ways, let's write one program dealing with a
myriad of losely related problems).

Andrew Gabriel

unread,

Nov 6, 2014, 10:48:50 AM11/6/14

to

In article <ndvqib-...@wilbur.25thandclement.com>,

<wil...@wilbur.25thandClement.com> writes:
> On Solaris the default is 30000:
>
> $ uname -sr
> SunOS 5.11
> $ perl -MPOSIX -e 'print sysconf(514), "\n"' # _SC_MAXPID
> 30000

To make sure bigger numbers work, in Solaris debug kernel builds,
it's 999999

--
Andrew Gabriel
[email address is not usable -- followup in the newsgroup]

Rainer Weikusat

unread,

Nov 6, 2014, 2:22:12 PM11/6/14

to

"James K. Lowden" <jklo...@speakeasy.net> writes:

I don't think this is a real problem for using a process ID obtained in
some way (eg, via ps) interactively but this is a problem when programs
are supposed to control other programs, both via interactive commands
and happening as part of some other process, eg, an automated software
update: Such a program-controlling program would need to learn the pid
of the process it seeks to control somehow but there's no uniform
convention for recording pids, it's not possible to establish if a
recorded pid actually corresponds to the desired process at any given
time and 'the pid' actually needs to be recorded by something and that
something cannot be the process itself, at least not straight-forwardly,
because many different instance of it may be running: What is really
needed here is a name which was uniquely assigned to a 'task' (some
instance of a program performing in a specific role) by a system
administrator instead of some 'faceless' ID number managed by the kernel
as it sees fit.

There are some more problems regarding 'control requests': A program
cannot reliably be terminated by asking it to terminate itself and it
usually can't restart itself at all as 'other stuff' might need to be
done everytime the program is started but before it is started. Further,
a program which terminates unexpectedly will often need to be restarted
automatically in order to minimize unavailability of the service. In
case starting a program failed because of some 'transient' configuration
problem, it would be convenient if the system kept trying to start it
until success so that someone fixing the problem doesn't need to worry
about hunting for every affected 'task' and do whatever is necessary to
get it back into a working state (and won't accidentally forgot
any). Lastly, it should be possible to use the facility for starting
one-off tasks manually from the command-line.

I'm presently using a 'monitor' program for this which will fork and
exec another command in the new process, restarting it automatically if
that terminates, with code for detecting 'restart loops' aka 'respawning
to fast' and handling that. Additionally, it can restart the subordinate
command in certain intervals, configure core limit and niceness and
start an additional logger process to capture stdout and stderr of the
subordindate command. Some of these features should be moved into other
programs (and some of them were already moved). The monitor will also
create an AF_UNIX stream socket in a well-known directory using a name
passed on the command-line, defaulting to the first 'word' of the
subordinate command. This socket can be used to request that the
monitored process be stopped or restarted and each request can be
processed synchronously if so desired. A configurable signal is used to
terminate the monitored program 'politely' and if this doesn't work
within 'some time limit', it is killed instead. It is also possible to
query if the monitored process is presently running (meaning, a monitor
is listening on the socket and not in the process of shutting down) and
to request that an arbitrary signal is sent to it. A final request can
be used to instruct the monitor to re-execute itself, passing
information about its present state to the new instance via the
environment, thereby enabling monitor updates without disruption (exec
doesn't affect parent-child relations among processes).,

NB: This program has been written with the express purpose to provide
'server and background task management' in exactly the way needed for
the product I'm presently working on (and with a minimal feature set, as
anything which doesn't have to be part of this program should become its
own program, although that's more a long-term design goal).

wil...@wilbur.25thandclement.com

unread,

Nov 6, 2014, 3:45:05 PM11/6/14

to

Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:
> <wil...@wilbur.25thandClement.com> writes:

<snip>

>> Still, the signal race highlights the headaches you have to go through
>> dealing with service management. The PID file locking dance is
>> needless work that doesn't even satisfactorily solve the problem.
>
> Considering this, what about the 'brilliant' idea of not using them
> instead of trying to rescue the broken paradigm as hard as possible?
> 'Some process has locked this file' doesn't really communicate anything
> about the nature of the process. Also, the entity which knows how a
> particular program should operate wrt other processes is the program/
> process starting it, IOW, just that your program may be a server doesn't
> mean it should necessarily try to disassociate itself from its
> 'natural, controlling entity'.

It doesn't necessarily. In all my projects explicit daemonization is only an
option. By default they never dissassociate.

I find this most useful for development. It allows me to start the service
on a unique, non-privileged port, either manually or script running
regression tests.

But I find that including daemonization code is immensely convenient, for me
and for others. And I tend to eschew configuration files in favor of
comprehensive command-line options. I like to keep everything as
self-contained as possible. I'm very consistent that way.

> 'Daemonization code' is entirely generic and can thus be put into a
> program of its own instead of adding a C library routine to make the 'bad
> software engineering' (DJB parahprase) more tolerable to the people who
> cling to it because that's what they've been doing since 4.2BSD (someone
> should have christened autonomously running processes 'creepy crawlies'
> instead of 'daemons').

1) A seperate daemon service is a dependency. Dependencies are annoying and
a time sink, especially for portable projects, and even more especially
during development. There's nothing like spending half a day bootstrapping a
new environment just to fix a bug or add a small feature.

2) Separate daemon managers only work for the simple cases. Most recently I
implemented a service that can gracefully restart itself while never closing
the listening port, and never disconnecting existing clients. It first
pauses acceptance of incoming connections--allowing them to queue up in the
kernel--while still continuing to service existing clients. It execs a new
instance of itself, passing the listening socket. If the new instance
successfully enters a steady state where it can service new connections, it
signals the parent to transition to shutdown mode (close listening socket,
finish servicing existing connections, and exit), otherwise[1] the parent
resumes accepting new clients, terminating the errant child if necessary.

This can't be done with inetd, daemontools, launchd, SMF, or systemd. For
sophisticated services, there's no substitute for dealing with process
management yourself.

[1] The failure case is not uncommon, especially during development, because
all the business logic is implemented in a dynamically and weakly typed
scripting language. Often times you won't know there's a problem until you
execute the code in the new environment. And the distinction between
"production" and "development" is blurry when you eat your own dog food--we
run development and release candidates for our own usage.

> If server mangement is giving you headaches, offload it to someone
> else (even if that 'someone' ends up being a program/ system which is
> just as bad because it overshoots into the opposite direction: Instead
> of having a myriad of independent processes all doing the similar things
> in slightly different ways, let's write one program dealing with a
> myriad of losely related problems).

I never said server management is giving me headaches. I said the PID file
locking dance is a headache, from a programming standpoint. And no matter
how carefully you implement it, it's uniquely unsatisfying because of the
[theoretical] race condition, which is distasteful to me as a developer.

I personally don't find system administration to be bothersome. I learned
long ago to keep things simple, such as by reducing dependencies and
sticking to default configurations and services as much as possible. IMO,
that means avoiding things like daemontools. I've kept an OpenBSD server
colocated for nearly 15 years, and _remotely_ upgraded it every 6 months
(excepting relocations or hardware swaps), sometimes from a continent away,
and so far without incident. Do you know how methodical you have to be to do
that consistently? Or the minimalism you need to keep to mitigate the
tedium? I have a finely honed (if admittedly peculiar) sense of the
cost+benefit of these things.

Rainer Weikusat

unread,

Nov 6, 2014, 4:01:29 PM11/6/14

to

<wil...@wilbur.25thandClement.com> writes:
> Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:

[...]

> 2) Separate daemon managers only work for the simple cases. Most recently I
> implemented a service that can gracefully restart itself while never closing
> the listening port, and never disconnecting existing clients. It first
> pauses acceptance of incoming connections--allowing them to queue up in the
> kernel--while still continuing to service existing clients. It execs a new
> instance of itself, passing the listening socket. If the new instance
> successfully enters a steady state where it can service new connections, it
> signals the parent to transition to shutdown mode (close listening socket,
> finish servicing existing connections, and exit), otherwise[1] the parent
> resumes accepting new clients, terminating the errant child if
> necessary.

Thank you for the nice demonstration why people think they have to
resort to 'desparate means' (like cgroups) in order to stop unruly
third-party code from knocking walls down in order to do what its author
prefers, as opposed to what the person owning the computer would like it
do to (always using a Very Sensible Pretext[tm]). This could be
implemented in a less non-cooperative way by forking a new process to
handle old connections, trying the in-place update (or whatever that was
supposed to become) in the old process and relying on a process
management facility to deal with failures.

OTOH, I somewhat routinely remove 'pid files' and other such nonsense
from open source code supposed to perform as part of the concert, not as
awe-inspiring soloist ....

Rainer Weikusat

unread,

Nov 7, 2014, 12:33:01 PM11/7/14

to

<wil...@wilbur.25thandClement.com> writes:

- more detailed reply -

> Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:

[...]

>> the entity which knows how a particular program should operate wrt
>> other processes is the program/ process starting it, IOW, just that
>> your program may be a server doesn't mean it should necessarily try
>> to disassociate itself from its 'natural, controlling entity'.
>
> It doesn't necessarily. In all my projects explicit daemonization is only an
> option. By default they never dissassociate.
>
> I find this most useful for development. It allows me to start the service
> on a unique, non-privileged port, either manually or script running
> regression tests.
>
> But I find that including daemonization code is immensely convenient, for me
> and for others.

It may be 'immensly convenient' to include it in The Program[tm] itself
as that's sort-of the natural place to put code into and The Program[tm]
also comes with a build system &c and is presumably already a
version-controlled project using some kind of 'modern VCS' which becomes
immensly inconvenient once one has to deal with more than one program
because that's all the people who designed and implemented The Modern
VCS[tm] ever do (I'm still using CVS for the sole reason that it's the
only SCM known to me which supports 'handle more than one independent
codebase' in a decent way). But 'least inconvenient for the
developer(s) of The Program' is not a good design guideline when viewing
things from a user (or system integrator) perspective.

Assuming some system ends up running more than 50 different servers and
some of them in more than one configuration, this ends up being immensely
inconvenient because this means 50 different ways to tell a server how
to run, fifty different logging schemes one needs to deal with,
possibly, half a dozen logging super schemes with a logging hyper scheme
enabling using any of the other transparently (This is not a joke but a
Java) and so on.

[...]

>> 'Daemonization code' is entirely generic and can thus be put into a
>> program of its own instead of adding a C library routine to make the 'bad
>> software engineering' (DJB parahprase) more tolerable to the people who
>> cling to it because that's what they've been doing since 4.2BSD (someone
>> should have christened autonomously running processes 'creepy crawlies'
>> instead of 'daemons').
>
> 1) A seperate daemon service is a dependency. Dependencies are annoying and
> a time sink, especially for portable projects, and even more especially
> during development. There's nothing like spending half a day bootstrapping a
> new environment just to fix a bug or add a small feature.

Can we please omit the bizarre Windows-oriented terminology where 'the
system' provides 'services' (as an abstract concept) in some unknown and
surely very magical way? This would be 'a process management server'
providing 'process management' as a service.

But I didn't refer to one and wrote 'program' for reason. An extremely
simple-minded implementation could look like this:

----------
#!/bin/sh
(
exec 0</dev/null
exec 1>/dev/null
exec 2>/dev/null

exec "$@" ) &
---------

and this can then be used to run anything as an autonomous process, eg,
assume this is called daemon invoking it as

./daemon xterm -e top

would start 'a top daemon' (the tool I'm actually using is written in C
an slightly more featureful and I do also use that to start 'top-level GUI
programs' as I grew tired of maintaining and using menu structures years
ago).

> 2) Separate daemon managers only work for the simple cases. Most recently I
> implemented a service that can gracefully restart itself while never closing
> the listening port, and never disconnecting existing clients.

[...]

> This can't be done with inetd, daemontools, launchd, SMF, or systemd. For
> sophisticated services, there's no substitute for dealing with process
> management yourself.

Most cases are simple and a process manager which is itself just a
program, ie, not integrated into init, can be used whenever this makes
sense but it doesn't have to be used. It also doesn't need complex
arrangements of configuration files but can provide a comprehensive
command-line interface instead (which is easily extended to support
configuration files where convenient by putting the 'server start
command' into a script and using the shell to read parameters ending up
on the command line from a file).

[...]

>> If server mangement is giving you headaches, offload it to someone
>> else (even if that 'someone' ends up being a program/ system which is
>> just as bad because it overshoots into the opposite direction: Instead
>> of having a myriad of independent processes all doing the similar things
>> in slightly different ways, let's write one program dealing with a
>> myriad of losely related problems).
>
> I never said server management is giving me headaches. I said the PID file
> locking dance is a headache, from a programming standpoint. And no matter
> how carefully you implement it, it's uniquely unsatisfying because of the
> [theoretical] race condition, which is distasteful to me as a
> developer.

I usually use 'server' to refer to 'process providing services to other
programs' and not to 'big computer sitting in a 19" rack'. Considering
this, 'PID file life support measures' are part of 'server
mangement'. But (as I already wrote in another posting) 'the PID' is
really useful for (and managed by) the kernel and also useful for
casual, interactive tasks. Managing a platoon of servers is better done
based on names.

> I personally don't find system administration to be bothersome. I learned
> long ago to keep things simple, such as by reducing dependencies and
> sticking to default configurations and services as much as possible. IMO,
> that means avoiding things like daemontools. I've kept an OpenBSD server
> colocated for nearly 15 years, and _remotely_ upgraded it every 6 months
> (excepting relocations or hardware swaps), sometimes from a continent away,
> and so far without incident. Do you know how methodical you have to be to do
> that consistently?

I do. I'm going to leave it at that because I'm not so much interested
in writing about me, except where I can server as an example for
something else.

Ian Collins

unread,

Nov 9, 2014, 8:33:34 PM11/9/14

to

wil...@wilbur.25thandClement.com wrote:
> Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:
>> <wil...@wilbur.25thandClement.com> writes:
> <snip>
>>> Still, the signal race highlights the headaches you have to go through
>>> dealing with service management. The PID file locking dance is
>>> needless work that doesn't even satisfactorily solve the problem.
>>
>> Considering this, what about the 'brilliant' idea of not using them
>> instead of trying to rescue the broken paradigm as hard as possible?
>> 'Some process has locked this file' doesn't really communicate anything
>> about the nature of the process. Also, the entity which knows how a
>> particular program should operate wrt other processes is the program/
>> process starting it, IOW, just that your program may be a server doesn't
>> mean it should necessarily try to disassociate itself from its
>> 'natural, controlling entity'.
>
> It doesn't necessarily. In all my projects explicit daemonization is only an
> option. By default they never dissassociate.
>
> I find this most useful for development. It allows me to start the service
> on a unique, non-privileged port, either manually or script running
> regression tests.
>
> But I find that including daemonization code is immensely convenient, for me
> and for others. And I tend to eschew configuration files in favor of
> comprehensive command-line options. I like to keep everything as
> self-contained as possible. I'm very consistent that way.

Why exclude configuration files in favour of command-line options rather
than use both? All of my application have a command line option "--json
<config file>" which reads options from a file (and a matching debug
option to output the current options as JSON). It's much easier to
write a configuration file once than it is to keep entering a large set
of options. It's also easier to check what options a process is using
from "ps".

>> 'Daemonization code' is entirely generic and can thus be put into a
>> program of its own instead of adding a C library routine to make the 'bad
>> software engineering' (DJB parahprase) more tolerable to the people who
>> cling to it because that's what they've been doing since 4.2BSD (someone
>> should have christened autonomously running processes 'creepy crawlies'
>> instead of 'daemons').
>
> 1) A seperate daemon service is a dependency. Dependencies are annoying and
> a time sink, especially for portable projects, and even more especially
> during development. There's nothing like spending half a day bootstrapping a
> new environment just to fix a bug or add a small feature.

Eh? If you are writing for a platform, the service management facility
will always be there.

> 2) Separate daemon managers only work for the simple cases. Most recently I
> implemented a service that can gracefully restart itself while never closing
> the listening port, and never disconnecting existing clients. It first
> pauses acceptance of incoming connections--allowing them to queue up in the
> kernel--while still continuing to service existing clients. It execs a new
> instance of itself, passing the listening socket. If the new instance
> successfully enters a steady state where it can service new connections, it
> signals the parent to transition to shutdown mode (close listening socket,
> finish servicing existing connections, and exit), otherwise[1] the parent
> resumes accepting new clients, terminating the errant child if necessary.
>
> This can't be done with inetd, daemontools, launchd, SMF, or systemd. For
> sophisticated services, there's no substitute for dealing with process
> management yourself.

Well it could be done with SMF.

> I personally don't find system administration to be bothersome. I learned
> long ago to keep things simple, such as by reducing dependencies and
> sticking to default configurations and services as much as possible.

Doesn't that include using the platform's service management facility?
I doubt anyone working on Solaris or Illumos would use anything other
than SMF to manage a new service, or a port of an existing one.

--
Ian Collins

Rainer Weikusat

unread,

Nov 10, 2014, 6:53:15 AM11/10/14

to

Ian Collins <ian-...@hotmail.com> writes:
> wil...@wilbur.25thandClement.com wrote:
>> Rainer Weikusat <rwei...@mobileactivedefense.com> wrote:
>>> <wil...@wilbur.25thandClement.com> writes:

[...]

>> And I tend to eschew configuration files in favor of
>> comprehensive command-line options. I like to keep everything as
>> self-contained as possible. I'm very consistent that way.
>
> Why exclude configuration files in favour of command-line options
> rather than use both?

That's sort-of a false dichotomoy here as it was refering to
more-or-less generic 'operational parameters' such as 'should the server
disassociate from its parent process' in a general context of 'process
management' and there is no set of command-line options which would
enable controlling the parameters of how individual servers are running
with any kind of 'system-wide server management facility': Parameters
are recorded in files which need to be 'activated' in some way. It's
actually even slightly (or not so slightly) worse because what is
actually done wrt to 'some server' depends on a principially unlimited
and a priori unknown number of other files with more or less sensible
contents, eg, according to some page on the web, a Solaris system can't
deal with local mail unless 'networking' is up an running, including
seriously unrelated bits like LDAP.

[...]

>>> 'Daemonization code' is entirely generic and can thus be put into a
>>> program of its own instead of adding a C library routine to make the 'bad
>>> software engineering' (DJB parahprase) more tolerable to the people who
>>> cling to it because that's what they've been doing since 4.2BSD (someone
>>> should have christened autonomously running processes 'creepy crawlies'
>>> instead of 'daemons').
>>
>> 1) A seperate daemon service is a dependency. Dependencies are annoying and
>> a time sink, especially for portable projects, and even more especially
>> during development. There's nothing like spending half a day bootstrapping a
>> new environment just to fix a bug or add a small feature.
>
> Eh? If you are writing for a platform, the service management
> facility will always be there.

Any kind of unportable assumption made in code will be troublefree for
code not supposed to be portable. That's not very much
surprising. But this should really be the other way around: It should be
easy to integrate some program into some platform infrastructure, that
is, without removing the code which causes it to refuse to start if some
arbitrarily named file exist (combined with some arbitrary set of band
aids), without removing (or conditionally disabling) the code which causes
it to run off into some arbitrary directions, hiding its track in a
myriad of different, rapidly created forks etc but also without
deinstalling every useless "daemon" a certain Mr Poettering ever wrote
five times in a row, that is, it also shouldn't be gratuitiously tied
with whatever the catwalk^Wplatform du jour on system XYZ happens to be
and the rattrap^Wplatform shouldn't end up being the whole system as
this means it absorbed a lot of code which could (and should) be more
generally useful.

>> 2) Separate daemon managers only work for the simple cases. Most recently I
>> implemented a service that can gracefully restart itself while never closing
>> the listening port, and never disconnecting existing clients. It first
>> pauses acceptance of incoming connections--allowing them to queue up in the
>> kernel--while still continuing to service existing clients. It execs a new
>> instance of itself, passing the listening socket. If the new instance
>> successfully enters a steady state where it can service new connections, it
>> signals the parent to transition to shutdown mode (close listening socket,
>> finish servicing existing connections, and exit), otherwise[1] the parent
>> resumes accepting new clients, terminating the errant child if necessary.
>>
>> This can't be done with inetd, daemontools, launchd, SMF, or systemd. For
>> sophisticated services, there's no substitute for dealing with process
>> management yourself.
>
> Well it could be done with SMF.

It can be done with any of the above in the sense that none will stop
the process from doing this.

>> I personally don't find system administration to be bothersome. I learned
>> long ago to keep things simple, such as by reducing dependencies and
>> sticking to default configurations and services as much as possible.
>
> Doesn't that include using the platform's service management facility?
> I doubt anyone working on Solaris or Illumos would use anything other
> than SMF to manage a new service, or a port of an existing one.

This seems a bit like 'sour grapes inverted': They must be sweet. I'll
have to eat them, anmyway.