fork + exec; what are the possible resource leaks?

Joshua Maurice

unread,

Apr 16, 2010, 5:56:12 PM4/16/10

to

I'm somewhat new to POSIX. It seems that the only way to create a new
process is fork. However, fork inherits all file descriptors. exec
closes only the file descriptors marked as "close on exec". I
generally spawn a separate process because of the isolation this
affords. If a process misbehaves, like if it has a resource leak, I
know that when that process dies the resource leak will generally go
away. However, if a process misbehaves, like not settings "close on
exec" when opening the file descriptor (an option only available in
recent Linux kernels), it's possible that I will leak a file
descriptor to that child and all direct and indirect grandchildren.

So, how does one generally deal with this? Close all file descriptors
from 3 to the max possible file descriptor? "proc/self/fd" is a good
alternative, but not portable in fact and not POSIX aka not portable
in theory. What do other people do?

Also, what other resources should I be concerned about when doing a
fork + exec? What other possible resources can "leak" into the child
and all grandchildren?

PS: There really should be a spawn process ala win32. This should not
replace fork, but there should be an alternative to fork to bring up a
clean process. That or there should be sane interfaces to accomplish
the same: to guarantee that I don't have any random open resources
which I will continue to leak and leak into my direct and indirect
grandchildren. And no, posix_spawn is not that. It is defined to have
the same semantics as fork + exec and all of the baggage which comes
along with it. I'm just trying to program defensively, and POSIX is
making it hard for me to do that.

Chris Friesen

unread,

Apr 16, 2010, 7:09:50 PM4/16/10

to

On 04/16/2010 03:56 PM, Joshua Maurice wrote:

> So, how does one generally deal with this? Close all file descriptors
> from 3 to the max possible file descriptor?

Yep.

If you want to be really anal, close all possible file descriptors and
then reopen 0/1/2 as desired.

There's good information on this at:

http://stackoverflow.com/questions/899038/getting-the-highest-allocated-file-descriptor

> Also, what other resources should I be concerned about when doing a
> fork + exec? What other possible resources can "leak" into the child
> and all grandchildren?

This is all covered in the man pages for fork() and exec(). Generally
open files of various kinds are what you need to worry about. File
locks are not preserved over fork() but are over exec().

> PS: There really should be a spawn process ala win32. This should not
> replace fork, but there should be an alternative to fork to bring up a
> clean process.

Arguably, yes.

Chris

Scott Lurndal

unread,

Apr 16, 2010, 7:19:30 PM4/16/10

to

Joshua Maurice <joshua...@gmail.com> writes:

> However, if a process misbehaves, like not settings "close on
>exec" when opening the file descriptor (an option only available in
>recent Linux kernels)

The "Close on Exec" option has been part of _every_ unix and linux kernel
since basically forever. In Unix v7 it was an ioctl (FIOCLEX/FIONCLEX),
in System V it was made an fcntl(2) flag.

>, it's possible that I will leak a file
>descriptor to that child and all direct and indirect grandchildren.

Most applications that use fork/exec to spawn processes will stick
a loop between the fork and exec to close all file descriptors
except 0, 1 and 2 (and will often redirect those, perhaps to pipes,
as well)

Given that a process opened by the shell will typically (but not always)
have file descriptors 0, 1 and 2 in use and all others closed, the only
file descriptors you don't have control over are those used by
libraries. The above loop will accomodate applications which use libraries
that open files and forget to set CLOEXEC.

>
>So, how does one generally deal with this? Close all file descriptors
>from 3 to the max possible file descriptor?

Yes, this is the typical solution for applications that don't
control all the files that may be opened.

When I was on the X/Open base working group in the 90's, I lobbied
for a 'closeall' function that would close all file descriptors
above the provided fd, but it was never accepted (primarily since at
the time, X/Open didn't invent, but rather attempted to standardize
existing practice).

>Also, what other resources should I be concerned about when doing a
>fork + exec? What other possible resources can "leak" into the child
>and all grandchildren?

man exec.

>
>PS: There really should be a spawn process ala win32. This should not

man posix_spawn

scott

Joshua Maurice

unread,

Apr 16, 2010, 8:21:05 PM4/16/10

to

On Apr 16, 4:19 pm, sc...@slp53.sl.home (Scott Lurndal) wrote:

> Joshua Maurice <joshuamaur...@gmail.com> writes:
> > However, if a process misbehaves, like not settings "close on
> >exec" when opening the file descriptor (an option only available in
> >recent Linux kernels)
>
> The "Close on Exec" option has been part of _every_ unix and linux kernel
> since basically forever. In Unix v7 it was an ioctl (FIOCLEX/FIONCLEX),
> in System V it was made an fcntl(2) flag.

Race condition. Up until a recent Linux kernel version, you could not
set close on exec in open; you could only set it with fcntl. In a
multithreaded program, there is a small window between open and fcntl
in which fork could be called, resulting in that file descriptor being
leaked. This lack of possible correctness was fixed when you could
specify O_CLOEXEC to open. See
http://udrepper.livejournal.com/20407.html
for full details.

> >, it's possible that I will leak a file
> >descriptor to that child and all direct and indirect grandchildren.
>
> Most applications that use fork/exec to spawn processes will stick
> a loop between the fork and exec to close all file descriptors
> except 0, 1 and 2 (and will often redirect those, perhaps to pipes,
> as well)
>
> Given that a process opened by the shell will typically (but not always)
> have file descriptors 0, 1 and 2 in use and all others closed, the only
> file descriptors you don't have control over are those used by
> libraries. The above loop will accomodate applications which use libraries
> that open files and forget to set CLOEXEC.

> >So, how does one generally deal with this? Close all file descriptors
> >from 3 to the max possible file descriptor?
>
> Yes, this is the typical solution for applications that don't
> control all the files that may be opened.
>
> When I was on the X/Open base working group in the 90's, I lobbied
> for a 'closeall' function that would close all file descriptors
> above the provided fd, but it was never accepted (primarily since at
> the time, X/Open didn't invent, but rather attempted to standardize
> existing practice).

Yes, but the potential max can be quite large, and that's just wasted
time. I suppose it's not that bad if you're not spawning that many
processes for a suitably small value. I just hope I don't run into a
system where the max file desc is a 64 bit int max.

Then there's still the problem that I want to program defensively, and
not have to rely upon a library guarantee that it creates all file
handles "close on exec". Then when I'm doing automated testing of my
product, preferably I want to isolate these leaks for software which
is under development.

> >Also, what other resources should I be concerned about when doing a
> >fork + exec? What other possible resources can "leak" into the child
> >and all grandchildren?
>
> man exec.

Thanks for the terseness. [Sarcasm]. I was looking for more pearls of
wisdom from those more experienced, like common gotchas.

> >PS: There really should be a spawn process ala win32. This should not
>
> man posix_spawn

Did you even read my full post? I specifically mentioned that
posix_spawn is not that in the next sentence of my previous post, the
one to which you're replying. It carries all of the same semantics of
fork + exec, which includes possibly leaking over process boundaries.
That extra baggage is exactly what I don't want to deal with most of
the time. Most of the time, I just want to be able to create a new
process without worrying about leaked file handles, which signal masks
get inherited, etc.

William Ahern

unread,

Apr 16, 2010, 8:38:06 PM4/16/10

to

Chris Friesen <cbf...@mail.usask.ca> wrote:
<snip>

> This is all covered in the man pages for fork() and exec(). Generally
> open files of various kinds are what you need to worry about. File
> locks are not preserved over fork() but are over exec().

BSD locks a la flock() are preserved across a fork(), which makes it
eminently more useful than POSIX locks, IMO.

William Ahern

unread,

Apr 16, 2010, 8:30:49 PM4/16/10

to

Chris Friesen <cbf...@mail.usask.ca> wrote:
> On 04/16/2010 03:56 PM, Joshua Maurice wrote:

> > So, how does one generally deal with this? Close all file descriptors
> > from 3 to the max possible file descriptor?

> Yep.

> If you want to be really anal, close all possible file descriptors and
> then reopen 0/1/2 as desired.

> There's good information on this at:

> http://stackoverflow.com/questions/899038/getting-the-highest-allocated-file-descriptor

A good post, but it's missing the most portable option, getdtablesize(2).

It's often considered "non-portable", and yet it's available in Linux, *BSD,
AIX, Solaris, and HP/UX (at least according to their online documentation).

Some of the man pages say that it is equivalent to both the RLIMIT_NOFILE
soft-limit, and the descriptor table size. I'm unsure whether setrlimit will
successfully lower the soft-limit below the highest numbered descriptor
already allocated. In any event, getdtablesize() is the best fall-back for
when a local API (such as those mentioned in the URI above) isn't available.

David Schwartz

unread,

Apr 16, 2010, 9:44:09 PM4/16/10

to

On Apr 16, 5:30 pm, William Ahern <will...@wilbur.25thandClement.com>
wrote:

> A good post, but it's missing the most portable option, getdtablesize(2).
>
> It's often considered "non-portable", and yet it's available in Linux, *BSD,
> AIX, Solaris, and HP/UX (at least according to their online documentation).

Some older versions of Linux incorrectly return a compile-time
constant for 'getdtablesize', usually 256 or 1,024, even though larger
numbers of file descriptors are 100% supported on those platforms. If
you want to be completely safe on every platform I know of, you can
use the highest of getdtablesize, getconf(_POSIX_OPEN_MAX), and
getrlimit(RLIMIT_NOFILE).

DS

Casper H.S. Dik

unread,

Apr 17, 2010, 10:23:49 AM4/17/10

to

David Schwartz <dav...@webmaster.com> writes:

>On Apr 16, 5:30=A0pm, William Ahern <will...@wilbur.25thandClement.com>
>wrote:

>> A good post, but it's missing the most portable option, getdtablesize(2).
>>

>> It's often considered "non-portable", and yet it's available in Linux, *B=
>SD,
>> AIX, Solaris, and HP/UX (at least according to their online documentation=
>).

>Some older versions of Linux incorrectly return a compile-time
>constant for 'getdtablesize', usually 256 or 1,024, even though larger
>numbers of file descriptors are 100% supported on those platforms. If
>you want to be completely safe on every platform I know of, you can
>use the highest of getdtablesize, getconf(_POSIX_OPEN_MAX), and
>getrlimit(RLIMIT_NOFILE).

In Solaris it is possible to open a file descriptor, dup is to the
highest available file descriptor and then lower the limit; it has
closefrom() which is what Solaris applications use.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

William Ahern

unread,

Apr 17, 2010, 1:35:40 PM4/17/10

to

Casper H.S. Dik <Caspe...@sun.com> wrote:
<snip>

> In Solaris it is possible to open a file descriptor, dup is to the
> highest available file descriptor and then lower the limit;

That's good to know. I was always curious, but downloading and installing
OpenSolaris was too much work. I've tried twice and gave up. Maybe it's
easier now.

> it has closefrom() which is what Solaris applications use.

The Solaris closefrom() man page suggests it may be using /proc, which would
not be cool for chroot'd applications. Is this the case? OpenBSD's
closefrom() is a system call, which seems more reasonable given the most
common use for this is as a step immediately before or upon a fork or exec.

Ersek, Laszlo

unread,

Apr 17, 2010, 2:00:07 PM4/17/10

to

On Fri, 16 Apr 2010, Joshua Maurice wrote:

> On Apr 16, 4:19 pm, sc...@slp53.sl.home (Scott Lurndal) wrote:
>> Joshua Maurice <joshuamaur...@gmail.com> writes:
>>> However, if a process misbehaves, like not settings "close on
>>> exec" when opening the file descriptor (an option only available in
>>> recent Linux kernels)
>>
>> The "Close on Exec" option has been part of _every_ unix and linux kernel
>> since basically forever. In Unix v7 it was an ioctl (FIOCLEX/FIONCLEX),
>> in System V it was made an fcntl(2) flag.
>
> Race condition. Up until a recent Linux kernel version, you could not
> set close on exec in open; you could only set it with fcntl. In a
> multithreaded program, there is a small window between open and fcntl
> in which fork could be called, resulting in that file descriptor being
> leaked. This lack of possible correctness was fixed when you could
> specify O_CLOEXEC to open. See
> http://udrepper.livejournal.com/20407.html
> for full details.

Okay, now I see; I've read your other posting earlier. I would not have
suggested the redirection of fd 77 from a temporary regular file under
these circumstances.

However, I can't help but note the following (perhaps I'm conflating two
different objectives of yours):

- first you wish to get rid of the complete inherited process environment,

- then you complain you can't name a single property that would permeate a
whole process tree, connected by nothing else than fork() lineage.

I kind of see a contradiction between these points.

What's worse, the wish to close file descriptors for security reasons
offers a false sense of security. As long as a process can ptrace()
another one, it is all snake oil. I can attach a gdb instance to any
process, call fstat() on fd's 0 to 1024, call lseek(fd, 0, SEEK_CUR) to
find out file offsets, call getpeername() to find out about internet
peers, call pipe() and dup2() and fork() to embed a sniffer child between
the original program code and the socket it writes to. I could read byte
arrays before encryption, all on the process' behalf.

Unless ptrace() is disabled on a system by default, or it is guaranteed
that all subprocesses that should not have address-space level access to
the parent and/or each other, are exec()'d from setuid images with
pairwise different non-privileged uid's, I think this debate about setting
FD_CLOEXEC atomically with open() is pointless. (Or, at least,
insufficient in itself.)

In the stackoverflow.com example, the parent itself is privileged enough
to set different uid's for its children between the fork() and exec()
calls. Unfortunately, if an external library calls fork() + exec()
anywhere (even in a synchronously called subroutine), it can (and most
probably will) omit this crucial step, and then again we'll have a child
process that can ptrace() the parent or do whatever else it wants. The end
result is that one can't link a library into a binary designed to be run
as root without knowing that library inside out. But in that case, all
fork() points are known, and the programmer might as well use manual
close() instead of FD_CLOEXEC.

Any rebuttal is highly appreciated. Thanks.

Cheers,
lacos

Joshua Maurice

unread,

Apr 17, 2010, 5:02:50 PM4/17/10

to

Only if you take it at face value and twist the intent. My goal for
using processes is to guarantee a degree of fault isolation. If one
process behaves badly, preferably it should not affect other
processes.

Fault isolation is generally required for long lasting systems as
well.

As a subtly different point, I also need this guarantee to write a
long lasting system. If rarely a process will leak file descriptors or
other resources to their children, and children keep spawning children
(in a controlled way), then eventually I'll run out of resources. It
seems to me to be a simple request to have a simple way to prevent
resource leaks.

The other post I made about being able to kill a process tree is again
about fault isolation and preventing resource leaks. If I have a job,
and I know that job may be faulty (such as tests on code under
development, or more generally any piece of code), I would like a way
to kill that piece of code and reclaim all of its acquired resources.

> What's worse, the wish to close file descriptors for security reasons
> offers a false sense of security. As long as a process can ptrace()
> another one, it is all snake oil. I can attach a gdb instance to any
> process, call fstat() on fd's 0 to 1024, call lseek(fd, 0, SEEK_CUR) to
> find out file offsets, call getpeername() to find out about internet
> peers, call pipe() and dup2() and fork() to embed a sniffer child between
> the original program code and the socket it writes to. I could read byte
> arrays before encryption, all on the process' behalf.
>
> Unless ptrace() is disabled on a system by default, or it is guaranteed
> that all subprocesses that should not have address-space level access to
> the parent and/or each other, are exec()'d from setuid images with
> pairwise different non-privileged uid's, I think this debate about setting
> FD_CLOEXEC atomically with open() is pointless. (Or, at least,
> insufficient in itself.)
>
> In the stackoverflow.com example, the parent itself is privileged enough
> to set different uid's for its children between the fork() and exec()
> calls. Unfortunately, if an external library calls fork() + exec()
> anywhere (even in a synchronously called subroutine), it can (and most
> probably will) omit this crucial step, and then again we'll have a child
> process that can ptrace() the parent or do whatever else it wants. The end
> result is that one can't link a library into a binary designed to be run
> as root without knowing that library inside out. But in that case, all
> fork() points are known, and the programmer might as well use manual
> close() instead of FD_CLOEXEC.

Security is another issue. That requires more than I'm asking. I'm not
asking to prevent malicious code from messing with the system (though
something like what I want would probably be required). Instead, I'm
merely trying to create a stable system, one without resource leaks.

Kenny McCormack

unread,

Apr 17, 2010, 6:21:05 PM4/17/10

to

In article <aec191c5-abb5-4179...@e7g2000yqf.googlegroups.com>,
Joshua Maurice <joshua...@gmail.com> wrote:
...

>Security is another issue. That requires more than I'm asking. I'm not
>asking to prevent malicious code from messing with the system (though
>something like what I want would probably be required). Instead, I'm
>merely trying to create a stable system, one without resource leaks.

I understand where you're coming from (and that security from malicious
code is a side issue). But I think in the real world, most people deal
with this problem by the simple expedient of rebooting frequently.

Given that in the real world, most systems are running (cough, cough)
operating systems made in Redmond, this seems a safe bet.

In line with this, I think the suggestion to run your builds in a VM
that boots, runs your build, and shuts down, is the best idea going.

--
(This discussion group is about C, ...)

Wrong. It is only OCCASIONALLY a discussion group
about C; mostly, like most "discussion" groups, it is
off-topic Rorsharch revelations of the childhood
traumas of the participants...

William Ahern

unread,

Apr 17, 2010, 7:38:03 PM4/17/10

to

Kenny McCormack <gaz...@shell.xmission.com> wrote:
> In article <aec191c5-abb5-4179...@e7g2000yqf.googlegroups.com>,
> Joshua Maurice <joshua...@gmail.com> wrote:
> ...
> >Security is another issue. That requires more than I'm asking. I'm not
> >asking to prevent malicious code from messing with the system (though
> >something like what I want would probably be required). Instead, I'm
> >merely trying to create a stable system, one without resource leaks.

> I understand where you're coming from (and that security from malicious
> code is a side issue). But I think in the real world, most people deal
> with this problem by the simple expedient of rebooting frequently.
>
> Given that in the real world, most systems are running (cough, cough)
> operating systems made in Redmond, this seems a safe bet.

I'm not sure what that has to do w/ the price of tea in China. The type of
products that other people use in the privacy and sanctity of their own
server rooms is their own business. Certainly I'm not going to base my own
expectations around their decisions. (Maybe they derive a certain
satisfaction from rebooting the same way I do compulsively calling sync from
the command-line--a habit I picked up in the early days of Linux when the
kernel was significantly less reliable.)

> In line with this, I think the suggestion to run your builds in a VM
> that boots, runs your build, and shuts down, is the best idea going.

Duplicated descriptors aren't really a resource leak in the common sense of
the term (cf. leak in a security context). The resources usually aren't
lost, so to speak; their lifetimes are just prolonged. (This is comparable
to garbage collected languages where the developer fails to explicitly close
a descriptor before losing the object reference.) They could be a resource
leak, but only in the most contrived scenarios, such as unbounded recursive
forking. (Who does that? A dyed-in-the-wool Lisp programmer hacking shell
scripts?) Disregarding issues with dismounting filesystems, the issue is
usually benign.

Stay away from libraries that keep static or global state, and the problem
vanishes (assuming your own code is smart enough to cleanup after itself).
In my experience, blindly closing all descriptors is usually a step in a
belt+suspenders utilitarian approach to some task.

Descriptors "leaking" into other processes is, of course, entirely intended.
It's fundamental to the Unix process model--and sensibilities--and it's no
surprise that this is the default behavior. Consider how `/bin/sh -c "cat
<&4"' works. The common case is made simple, and the uncommon case--spawning
long-lived (i.e. indefinite) processes--is burdened w/ the extra work. If
this actually caused issues in reality, then Unix would be the platform
everybody rebooted the most. (And there's no reason to presume that Unix
developers possess more expertise than Window developers.)

Ersek, Laszlo

unread,

Apr 17, 2010, 8:00:24 PM4/17/10

to

(I think the practical utility of my post will be nil, so read on with
that in mind.)

On Sat, 17 Apr 2010, Joshua Maurice wrote:

> [...] My goal for using processes is to guarantee a degree of fault

> isolation. If one process behaves badly, preferably it should not affect
> other processes.

The kernel does support such a separation between processes. The problem
is, as I see it, that when one process *leaks* file descriptors, in your
terminology, the kernel actually sees a *request* to bequeath file
descriptors to a child process. fork() was *designed* to do that, among
other things.

Thus the root of the issue seems to be library code forking without your
knowledge or permission. Unfortunately, within a single process, the
kernel seems not to provide preemptive protection; all parts must
co-operate. If you bethink it, the library code can do much worse things
to your state than asking the kernel (on your behalf) to pass on file
descriptors: it can trample all over your data.

In short (and this may be as misguided as of little consolation), within
the process, you're exposed to much greater dangers, and between
processes, the kernel only does what your process explicitly asks it to
do. Your idea of where the enemy territory begins differs from the
kernel's one.

Let's replace for a second the usual library concept with a separate
process that is co-operating via RPC, via AF_UNIX sockets. Performance
would go down the drain, but the co-operating process could not leak
*your* resources inadvertently, eg. log files soon to be rotated.

I googled around previously when reading your posts. I found this:

Re: Providing an ELF flag for disabling LD_PRELOAD/ptrace()
http://lkml.indiana.edu/hypermail/linux/kernel/0712.1/2040.html

If Alan Cox calls "the complete lack of a security boundary between
processes of the same user" "the normal Unix model", then we might be
allowed to call the non-separation between different functions in the same
process the normal Unix model too.

Whether this suits modern heterogeneous software development, that's a
different question. I feel your pain.

Looking back at your original post:

1) resources inherited through fork() and exec():
http://www.opengroup.org/onlinepubs/9699919799/functions/fork.html
http://www.opengroup.org/onlinepubs/9699919799/functions/exec.html

2) "spawn process ala win32": you could write a server program that takes
command lines over some kind of IPC mechanism and starts an according
process from a pristine environment. Now that we're talking about it, I
seem to remember one such server program; it's usually called "sshd".

3) If you want to trap fork() calls in library code, you could write your
own fork() wrapper. You could identify your own calls by relying on a
static variable in the wrapper, or in a multi-threaded process, by saving
and comparing thread identifiers, or by checking thread-specific data.

Cheers,
lacos

David Schwartz

unread,

Apr 17, 2010, 8:17:39 PM4/17/10

to

On Apr 17, 7:23 am, Casper H.S. Dik <Casper....@Sun.COM> wrote:

> In Solaris it is possible to open a file descriptor, dup is to the
> highest available file descriptor and then lower the limit; it has
> closefrom() which is what Solaris applications use.

Wow, I never thought of that. In fact, that's a fairly realistic fear
in this case. An application with a high limit might well drop the
limit on open file descriptors before exec'ing another process.

DS

Rainer Weikusat

unread,

Apr 18, 2010, 8:03:01 AM4/18/10

to

Joshua Maurice <joshua...@gmail.com> writes:
> I'm somewhat new to POSIX. It seems that the only way to create a new
> process is fork.

No. The "but Windows does it differently!"-people have meanwhile
managed to reinvent CreateProcess (or whatever the function is
actually called) and are in the process of 'undefining' fork in order
to prevent its future use.

> However, fork inherits all file descriptors. exec
> closes only the file descriptors marked as "close on exec". I
> generally spawn a separate process because of the isolation this
> affords. If a process misbehaves, like if it has a resource leak, I
> know that when that process dies the resource leak will generally go
> away. However, if a process misbehaves, like not settings "close on
> exec" when opening the file descriptor (an option only available in
> recent Linux kernels), it's possible that I will leak a file
> descriptor to that child and all direct and indirect grandchildren.

Yes, and if you hit yourself on the head with a frying pan, it is very
probable that this will hurt badly.

> So, how does one generally deal with this?

By not doing it. That's generally a sensible course of action whenever
one senses a potential problem as result of a particular action.

Xavier Roche

unread,

Apr 18, 2010, 8:42:15 AM4/18/10

to

Joshua Maurice a écrit :

> So, how does one generally deal with this? Close all file descriptors
> from 3 to the max possible file descriptor? "proc/self/fd" is a good
> alternative, but not portable in fact and not POSIX aka not portable
> in theory. What do other people do?

Playing with fork/exec, and closing all fd's before exec to ensure that
they are properly closed, starting from 3 to sysconf(_SC_OPEN_MAX).

I never found any cleaner way.

As you may have noticed, posix_spawn has design "choices" which prevent
from using it in a multithreaded environment if you want to get all fds
being closed on child.

See my previous "Handling the posix_spawn() file descriptor hell" rant:
<http://groups.google.com/group/comp.unix.programmer/browse_thread/thread/122a9b89a866c492/b18f45015951aaa9?pli=1>

To summarize the issues:
- third-party libraries may open files in the parent without FD_CLOEXEC,
causing leaks in childrens
- there is no way to set FD_CLOEXEC as default behaviour for fopen() as
far as I know, hence you are doomed anyway
- setting synchronously FD_CLOEXEC is impossible (at least for
non-recent kernels)
- posix_spawn is not solving our problem because it suffers from the
same race conditions

The fork/exec model is not the perfect solution (the lines of code
involved in a fork operation is really huge, and I occasionnaly ended up
in deadlocks with corrupted parent process because of post-fork
handlers -- the goal was to spawn an external debugger ; which is a very
specific case I must admit), but I never found any better one.

Rainer Weikusat

unread,

Apr 18, 2010, 10:35:26 AM4/18/10

to

Joshua Maurice <joshua...@gmail.com> writes:
> On Apr 16, 4:19 pm, sc...@slp53.sl.home (Scott Lurndal) wrote:
>> Joshua Maurice <joshuamaur...@gmail.com> writes:
>> > However, if a process misbehaves, like not settings "close on
>> >exec" when opening the file descriptor (an option only available in
>> >recent Linux kernels)
>>
>> The "Close on Exec" option has been part of _every_ unix and linux kernel
>> since basically forever. In Unix v7 it was an ioctl (FIOCLEX/FIONCLEX),
>> in System V it was made an fcntl(2) flag.
>
> Race condition. Up until a recent Linux kernel version, you could not
> set close on exec in open; you could only set it with fcntl. In a
> multithreaded program, there is a small window between open and fcntl
> in which fork could be called, resulting in that file descriptor being
> leaked.

And if you hit yourself onto the head with a frying pan, chances are
still that this will hurt badly. Coming to think of it, you could also
drop a hot pressing iron onto your feet and - again - you will manage
to hurt yourself by doing so. The solution to all problems mentioned
in this text so far is still: Don't do it.

Joshua Maurice

unread,

Apr 18, 2010, 5:52:22 PM4/18/10

to

You could speak not in riddles and perhaps help the conversation
instead of sounding like an ass. How do you propose to write an
application which
1- uses third party libraries which may not be correctly written, aka
not use O_CLOEXEC?
2- is multi-threaded, and uses a non-recent kernel which lacks
O_CLOEXEC, or uses a badly written library which does not create all
file handles with O_CLOEXEC or equivalents (aka the new options, not
fcntl)?
3- or any other combination where you want to program defensively,
where you want to guarantee a degree of fault isolation between
processes, aka one of the major point of processes, and have a stable
system, aka one which does not leak resources?

Possible solutions I see are:
1- Control all of the code and don't use third party libraries. Write
perfect code. Don't use threads or use newer kernel versions with
O_CLOEXEC.
2- Close all file descriptors and other resources between all fork and
execs which you don't specifically want to be inherited. There are
various ways to find all such resources, some of which are explained
else-thread.

Ersek, Laszlo

unread,

Apr 18, 2010, 6:40:56 PM4/18/10

to

On Sun, 18 Apr 2010, Joshua Maurice wrote:

> On Apr 18, 7:35 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:

>> [snip]

>
> How do you propose to write an application which

> 1- uses third party libraries which may not be correctly written, aka
> not use O_CLOEXEC?

> 2- is multi-threaded, and uses a non-recent kernel which lacks
> O_CLOEXEC, or uses a badly written library which does not create all
> file handles with O_CLOEXEC or equivalents (aka the new options, not
> fcntl)?

> 3- or any other combination where you want to program defensively, where
> you want to guarantee a degree of fault isolation between processes, aka
> one of the major point of processes, and have a stable system, aka one
> which does not leak resources?

Put stuff you don't trust in a separate process. This won't protect you
from malice, but it probably will from honest mistakes.

We did this with two closed source proprietary middleware client libs that
used to start threads on their own (one even forked in addition). We
wrapped them with separate daemon processes and accessed those over simple
RPC.

(This paid off immensely, because one of the client libs had a threading
bug (from our usage pattern and the external symptoms we concluded it
freed some resource and then accessed it some *indeterminate* time later),
and that bug reliably crashed the daemon until we developed a workaround.
The main program worked on many things simultaneously, and it was
important that such a crash didn't take down all those things, just one or
two of them, and even those in a way that could be handled gracefully in
the main program.)

I have the impression that the SUS carefully documents if an interface
might call fork() or interfere with the signal environment of the process
(system() and popen() come to mind). No scarcer documentation is
acceptable from a library you wish to link against.

Make the kernel your ally. Common address space is for friends you know
and trust. Look at qmail: it doesn't even trust itself.

Cheers,
lacos

Joshua Maurice

unread,

Apr 18, 2010, 8:10:39 PM4/18/10

to

Indeed and agreed. This is exactly what I expect from processes.
However, if resources can be easily leaked across process boundaries,
no if it's quite hard to not leak resources across process boundaries,
then we lose some degree of process isolation. I was specifically
asking how to make sure I don't get resources leaking across fork +
exec calls. You don't need to sell me on it. I've been vocal on this
point for the entire thread. I don't know how I can be more clear on
this.

Ersek, Laszlo

unread,

Apr 18, 2010, 10:25:26 PM4/18/10

to

On Sun, 18 Apr 2010, Joshua Maurice wrote:

> how to make sure I don't get resources leaking across fork + exec calls

No resource can "leak" through fork()/exec() calls that don't exist.

Main program connects to inetd. Inetd forks and executes single-shot
wrapper daemon. Daemon has (at least) two phases, init phase and servicing
phase. In the init phase, before initializing the wrapped library, it sets
FD_CLOEXEC on fd's 0 and 1, redirects fd 2 to a log file and sets
FD_CLOEXEC on it. As the final step, it initializes the library. Then it
starts taking requests on stdin and writing answers to stdout. There is no
ancestry between your main process and the wrapper process.

If the library has constructor functions, then don't link it into the
wrapper daemon at build time; open it with dlopen() after setting
FD_CLOEXEC on [012].

lacos

Joshua Maurice

unread,

Apr 18, 2010, 10:31:13 PM4/18/10

to

And again, for the rest of the world which may not have complete
control over the process, what do those people do? My team in my
company writes a library which is used by other teams, and we also use
libraries written by other companies. It's kind of difficult to do
what you suggest in the real world for general purpose libraries, both
users and writers. Then some of us use C++ and not C, so dlopen
because a bit less practical as well when using or writing a C++
library.

Many developers don't actually control the entire code base, nor all
of the entry and exit points, and instead have to work with other
pieces of code, some quite old, and possibly not the most correct.
There should be simple code to achieve the desired semantics: to
create a new process from an executable image and without leaking any
resources into that new process.

I suppose that as a practical matter, it will only come up if you do
unbounded forking, but it still rubs me the wrong way.

Casper H.S. Dik

unread,

Apr 19, 2010, 4:30:43 AM4/19/10

to

William Ahern <wil...@wilbur.25thandClement.com> writes:

>The Solaris closefrom() man page suggests it may be using /proc, which would
>not be cool for chroot'd applications. Is this the case? OpenBSD's
>closefrom() is a system call, which seems more reasonable given the most
>common use for this is as a step immediately before or upon a fork or exec.

Correct. It doesn't use a specific system call.

Rainer Weikusat

unread,

Apr 19, 2010, 7:54:16 AM4/19/10

to

Joshua Maurice <joshua...@gmail.com> writes:
> On Apr 18, 7:35 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
>> Joshua Maurice <joshuamaur...@gmail.com> writes:
>> > On Apr 16, 4:19 pm, sc...@slp53.sl.home (Scott Lurndal) wrote:
>> >> Joshua Maurice <joshuamaur...@gmail.com> writes:
>> >> > However, if a process misbehaves, like not settings "close on
>> >> >exec" when opening the file descriptor (an option only available in
>> >> >recent Linux kernels)
>>
>> >> The "Close on Exec" option has been part of _every_ unix and linux kernel
>> >> since basically forever. In Unix v7 it was an ioctl (FIOCLEX/FIONCLEX),
>> >> in System V it was made an fcntl(2) flag.
>>
>> > Race condition. Up until a recent Linux kernel version, you could not
>> > set close on exec in open; you could only set it with fcntl. In a
>> > multithreaded program, there is a small window between open and fcntl
>> > in which fork could be called, resulting in that file descriptor being
>> > leaked.
>>
>> And if you hit yourself onto the head with a frying pan, chances are
>> still that this will hurt badly. Coming to think of it, you could also
>> drop a hot pressing iron onto your feet and - again - you will manage
>> to hurt yourself by doing so. The solution to all problems mentioned
>> in this text so far is still: Don't do it.
>
> You could speak not in riddles and perhaps help the conversation
> instead of sounding like an ass.

Ok. This is a regularly discussed 'topic' here and my personal
assessment is that you are trolling. I don't go to Windows-oriented
newsgroups and start lengthy threads about theoretical problems which
could follow from properties of the API. If you are serious about
anything except stirring up trouble, you shouldn't either. Stay in the
universe you believe to be perfect and everyone is going to be much
happier.

> How do you propose to write an application which
> 1- uses third party libraries which may not be correctly written, aka
> not use O_CLOEXEC?
> 2- is multi-threaded, and uses a non-recent kernel which lacks
> O_CLOEXEC, or uses a badly written library which does not create all
> file handles with O_CLOEXEC or equivalents (aka the new options, not
> fcntl)?
> 3- or any other combination where you want to program defensively,
> where you want to guarantee a degree of fault isolation between
> processes, aka one of the major point of processes, and have a stable
> system, aka one which does not leak resources?

As I wrote already two times: The way to avoid problems with buggy
code is to not write buggy code.

David Given

unread,

Apr 19, 2010, 3:08:01 PM4/19/10

to

On 19/04/10 03:31, Joshua Maurice wrote:
[...]

> It's kind of difficult to do
> what you suggest in the real world for general purpose libraries, both
> users and writers.

My understanding of the problem is that *any* call to fopen(), in a
multithreaded application, may result in a leaked file descriptor
(because even if the caller to fopen() remembers to call
fcntl(fileno(fp), F_SETFD, FD_CLOEXEC) afterwards, there's still a
window whereby another thread calling fork() might propagate the file
descriptor).

Is this correct?

If so, is there any actual solution? Because fopen() is not going to go
away.

--
┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ─────
│
│ "In the beginning was the word.
│ And the word was: Content-type: text/plain" --- Unknown sage

Rainer Weikusat

unread,

Apr 19, 2010, 3:39:16 PM4/19/10

to

David Given <d...@cowlark.com> writes:
> On 19/04/10 03:31, Joshua Maurice wrote:
> [...]
>> It's kind of difficult to do
>> what you suggest in the real world for general purpose libraries, both
>> users and writers.
>
> My understanding of the problem is that *any* call to fopen(), in a
> multithreaded application, may result in a leaked file descriptor
> (because even if the caller to fopen() remembers to call
> fcntl(fileno(fp), F_SETFD, FD_CLOEXEC) afterwards, there's still a
> window whereby another thread calling fork() might propagate the file
> descriptor).
>
> Is this correct?
>
> If so, is there any actual solution?

Yes. The actual solution is still (and will remain forever): DO NOT DO
THIS. This cannot be that complicated, can it? Since executing fork
will create a new process running the same program with much of the
environment of the old process inherited, insofar parts of the
environment which would otherwise be inherited must not be transferred
across a fork (possibly followd by an exec), the only way to do so is
to either (temporarily) modify the original environment before
fork or adjust the inherited environment after fork. There is no other
way.

The much more interesting question (as compared to 'is there any
actual solution') would be 'Is there any actual problem', that is, a
real-world situation which does not include the generous assumption
that 'unknown and buggy code which exists only in binary form is going
to be executed as part of the forking process'. Because, as soon as
you assume that "the software doesn't work", you'll obviously get just
that (software doesn't work) by definition (and discussing this is as
obviously useless).

BTW, given that the DRAMs usually used in PCs are known to be
inherently unreliable, is their any way to program a computer at all?

Joshua Maurice

unread,

Apr 19, 2010, 4:58:05 PM4/19/10

to

I'm not trolling. I'm still waiting for your solution to my problem.
Your answer is "don't do it." My questions is "don't do what?"

I am in a large company working on a piece of software with over
25,000 source files, written and maintained by people in the US,
Ireland, Israel, India, and more. We use many third party libraries
not written in house. This all goes to run a massively complex engine
which works with all possible data connectors known to man, databases,
flatfiles, proprietary data formats, XML, etc.

Let's try to get to a common understanding. Please try to correct me
where I'm wrong. My company's product is made up of all of these
general purposes libraries, some written in house, some by other
companies, some open source, and some are plugged in by our customers
as user extensions. Now, for performance reasons, scalability and
speed, our engine is multi threaded. Any piece of this engine may call
fork + exec. Fork + exec is basic functionality, and it is required
for sufficiently complex activities such as stuff in our engine.
Finally, I think it's good standard practice to not let resources hang
out without closing them, aka leaking them indefinitely.

So, what am I to do? In our multi-threaded engine, am I to:

1- Disallow calling fork + exec by our code, not use any libraries
which call fork + exec (good luck finding documentation on that), and
document that user extensions are not allowed to call fork + exec?
This is entirely impractical.

2- Not have a multi-threaded engine? Again entirely impractical.

3- Do the best I can for my own fork + exec calls and close all
unknown resources between fork and exec. For that, I need a way to
enumerate over all open resources. For the other fork + exec calls
beyond my direct control, see 4.

4- Ignore potential file descriptor leaks and other resource leaks
across fork + exec as irrelevant. In practice, ignoring security
concerns (which would require more than what we've been discussing),
this might be practical. Without unbounded forking and with sufficient
system resources, leaking the occasional file descriptor or whatever
may not be a problem. This still seems like a horrible standing
operating procedure.

I'm not trolling. I'm looking for an honest answer. All you've done
thus far is say "Don't do it" and "It's been discussed before" without
actually talking about any actual applicable facts, nor pointing me
towards these previous discussions, nor giving me a useful summary of
said discussions. You are the troll when you answer the question with
"You're doing it wrong." Thus far, all I've found via google is the
same rehashed discussions without any reasonable conclusions, and you
are not helping the situation here. Ad hominem attacks are unbecoming.
(As I use my own by calling you a troll. Hypocritical yes, but
hopefully you can see the attempt at irony as a counter argument and
not instead as trying to support a vicious circle of name calling.)

David Given

unread,

Apr 19, 2010, 5:26:19 PM4/19/10

to

On 19/04/10 20:39, Rainer Weikusat wrote:
[...]

> Yes. The actual solution is still (and will remain forever): DO NOT DO
> THIS. This cannot be that complicated, can it?

So, basically, you're saying, don't use fopen()? In any multithreaded
apps? And in any *library* which might be used by a multithreaded app?
Which these days, given how popular multithreaded environments like
GNOME, web servers, and in fact any non-trivial application are, is
pretty much all code?

Well, it ain't going to happen. fopen() is part of the flipping
*language* spec.

So, while not using fopen() is a perfectly valid solution to the
problem, it's not actually a useful one. It's like telling someone to
avoid drowning at the bottom of the sea by not inhaling water --- you
can hold your breath for as long as you like, but you know that
*eventually* you're going to have to take a lungful.

We all know that the default for close-on-exec is wrong; are there any
*useful* strategies for dealing with it in a real-world environment?

Chris Friesen

unread,

Apr 19, 2010, 5:26:34 PM4/19/10

to

On 04/19/2010 02:58 PM, Joshua Maurice wrote:

> 1- Disallow calling fork + exec by our code, not use any libraries
> which call fork + exec (good luck finding documentation on that), and
> document that user extensions are not allowed to call fork + exec?
> This is entirely impractical.

> 2- Not have a multi-threaded engine? Again entirely impractical.

Arguably a multi-process engine is safer.

In any case, there are complexities when calling fork() from a threaded
process. Given that libraries cannot in general know the state of the
app, the app should also set up pthread_atfork() handlers to cover
everything that needs to be cleaned up.

Arguably it's a bad idea for libraries to call fork/exec or to create
new threads.

> 3- Do the best I can for my own fork + exec calls and close all
> unknown resources between fork and exec. For that, I need a way to
> enumerate over all open resources. For the other fork + exec calls
> beyond my direct control, see 4.

Certainly this is something that you can do, and probably should.

> 4- Ignore potential file descriptor leaks and other resource leaks
> across fork + exec as irrelevant. In practice, ignoring security
> concerns (which would require more than what we've been discussing),
> this might be practical. Without unbounded forking and with sufficient
> system resources, leaking the occasional file descriptor or whatever
> may not be a problem. This still seems like a horrible standing
> operating procedure.

It's not a new problem. The app is apparently working now, so anything
you do will help.

> I'm not trolling. I'm looking for an honest answer. All you've done
> thus far is say "Don't do it" and "It's been discussed before" without
> actually talking about any actual applicable facts, nor pointing me
> towards these previous discussions, nor giving me a useful summary of
> said discussions. You are the troll when you answer the question with
> "You're doing it wrong."

The simple fact is that you're stuck with a poorly-designed system and
you're trying to improve it.

Realistically, libraries don't have enough knowledge of the process
architecture to be able to fork/exec/pthread_create safely. Because of
that, it's almost never a good design to have libraries doing that sort
of thing.

If you need to have multiple different third-party products interwork,
it would probably be safer to run as multiple processes rather than
multiple threads. It's a bit more coding work, but on any recent unix
you can share memory between the processes, pass around file descriptors
via unix sockets, use process-shared mutexes/semaphores, etc. There is
a minimal overhead from the fact that you're not sharing memory maps
between threads and thus you need to flush the TLB on a context switch.
On the flip side you have much more isolation and you can choose
whether or not each process should be threaded. Using multiple
processes also tends to lead to better-designed interfaces since it
generally gets planned more carefully rather than just using data from
other threads in an ad-hoc manner. Lastly, depending on the
inter-process communication mechanisms that you use, it may be possible
to eavesdrop on the communication--this can be extremely useful in
debugging the system.

Chris

David Given

unread,

Apr 19, 2010, 5:50:45 PM4/19/10

to

On 19/04/10 22:26, Chris Friesen wrote:
[...]

> Realistically, libraries don't have enough knowledge of the process
> architecture to be able to fork/exec/pthread_create safely. Because of
> that, it's almost never a good design to have libraries doing that sort
> of thing.

It's not necessarily the library that's doing it. Consider a
hypothetical terminal emulator running under a multithreaded UI library
such as GTK.

When the UI library starts up, it's going to create background threads
to do work --- it says clearly in the docs that it's going to do this,
so this isn't a problem. But once the UI library has started, the main
program now cannot safely call forkpty() and exec() to start the child
process, because one of those background threads might open a file
descriptor at the wrong time and get it propagated to the child.

The least bad way I know of dealing with this is to use one of the
aforesaid foul hacks to close unwanted file descriptors in the child
after it's forked, before exec() is called. But these are non-portable
and not necessarily very reliable, as we've already seen...

Is posix_spawn() the current favoured solution? Is it gaining much
traction? (I note that my Ubuntu Koala system doesn't have a man page
for it, for example.)

(I've actually run into this problem with LBW: I accidentally left file
descriptor 4 open when spawning the Linux process. Most programs don't
care, because they don't mind what number gets assigned to what file
descriptor... until I tried running dpkg, which passes data to its
children using streams on specifically numbered file descriptors,
causing horrible fail *really* weird ways.)

Joshua Maurice

unread,

Apr 19, 2010, 5:55:44 PM4/19/10

to

On Apr 19, 2:50 pm, David Given <d...@cowlark.com> wrote:
> Is posix_spawn() the current favoured solution? Is it gaining much
> traction? (I note that my Ubuntu Koala system doesn't have a man page
> for it, for example.)

As far as I can tell, the semantics of posix_spawn are defined to be
equivalent to that of a user-written fork + exec, aka it carries the
same semantics and baggage, so it solves nothing for this problem. It
was intended to be a portable process creation on hardware without a
MMU. It was not intended to solve this resource leaks over fork + exec
problem.

http://www.opengroup.org/onlinepubs/009695399/functions/posix_spawn.html

David Given

unread,

Apr 19, 2010, 6:31:59 PM4/19/10

to

On 19/04/10 22:55, Joshua Maurice wrote:
[...]

> As far as I can tell, the semantics of posix_spawn are defined to be
> equivalent to that of a user-written fork + exec, aka it carries the
> same semantics and baggage, so it solves nothing for this problem.

Ah --- I'd assumed that specifying a non-NULL fileactions pointer
started with a blank slate, not with the existing set of file
descriptors. Fair enough.

Chris Friesen

unread,

Apr 19, 2010, 6:36:40 PM4/19/10

to

On 04/19/2010 03:50 PM, David Given wrote:

> Consider a
> hypothetical terminal emulator running under a multithreaded UI library
> such as GTK.
>
> When the UI library starts up, it's going to create background threads
> to do work --- it says clearly in the docs that it's going to do this,
> so this isn't a problem. But once the UI library has started, the main
> program now cannot safely call forkpty() and exec() to start the child
> process, because one of those background threads might open a file
> descriptor at the wrong time and get it propagated to the child.
>
> The least bad way I know of dealing with this is to use one of the
> aforesaid foul hacks to close unwanted file descriptors in the child
> after it's forked, before exec() is called. But these are non-portable
> and not necessarily very reliable, as we've already seen...

In a situation where the library is the framework for the app and is
itself multithreaded, the cleanest solution would probably be for the
library to provide a function for the app to call that would fork off a
clean process with no leakage.

Realistically, only the library has the knowledge of what is required to
do this in a completely safe way.

I'm no GTK programmer, but a couple minutes of poking around found the
g_spawn_* interfaces which provide a supported way to create a new
process. The default behaviour is to close all descriptors except
stdin/stdout/stderr before calling exec(), but this can be overridden if
you want them left open.

Chris

David Schwartz

unread,

Apr 19, 2010, 7:03:05 PM4/19/10

to

On Apr 19, 12:08 pm, David Given <d...@cowlark.com> wrote:

> My understanding of the problem is that *any* call to fopen(), in a
> multithreaded application, may result in a leaked file descriptor
> (because even if the caller to fopen() remembers to call
> fcntl(fileno(fp), F_SETFD, FD_CLOEXEC) afterwards, there's still a
> window whereby another thread calling fork() might propagate the file
> descriptor).
>
> Is this correct?
>
> If so, is there any actual solution? Because fopen() is not going to go
> away.

Install appropriate at fork handlers. Before the fork, acquire a lock.
After the fork, release the lock. Call 'fopen' while holding the lock.

For me, the practice is simply not to use 'fopen'. There have
historically been too many problems with it, and even though most of
those problems are solved now, for portability it's (IMO) best just
not to use it. Bluntly, it's an ugly wrapper around 'open'.

DS

Joshua Maurice

unread,

Apr 19, 2010, 7:46:58 PM4/19/10

to

Hmm. I reviewed the win32 documentation. It appears that I was
mistaken, and that it has the exact same problem. You inherit all
(file) handles or no (file) handles, nothing inbetween as any sane use
of fork + exec or CreateProcess would want.

Ersek, Laszlo

unread,

Apr 19, 2010, 8:15:34 PM4/19/10

to

On Mon, 19 Apr 2010, David Given wrote:

> On 19/04/10 20:39, Rainer Weikusat wrote:
> [...]
>> Yes. The actual solution is still (and will remain forever): DO NOT DO
>> THIS. This cannot be that complicated, can it?
>
> So, basically, you're saying, don't use fopen()? In any multithreaded
> apps? And in any *library* which might be used by a multithreaded app?

s/fopen/fork, I guess.

> We all know that the default for close-on-exec is wrong; are there any
> *useful* strategies for dealing with it in a real-world environment?

Reimplement open(), forward request to actual (or rather, next, as in
RTLD_NEXT) open(), but add O_CLOEXEC. If you're lucky, fopen() / freopen()
/ whatever will go through open().

$ cat cloexec.c

#define _GNU_SOURCE

#include <stdarg.h>
#include <fcntl.h>
#include <dlfcn.h>

static int (*real_open)(const char *, int, ...);

int
open(const char *path, int oflag, ...)
{
int ret;

if (oflag & O_CREAT) {
va_list ap;

va_start(ap, oflag);
ret = (*real_open)(path, oflag | O_CLOEXEC, va_arg(ap, mode_t));
va_end(ap);
}
else {
ret = (*real_open)(path, oflag | O_CLOEXEC);
}

return ret;
}

static void init(void) __attribute__((constructor));

static void
init(void)
{
*(void **)&real_open = dlsym(RTLD_NEXT, "open");
}

$ gcc -fPIC -shared -o cloexec.so -Wall -Wextra cloexec.c -ldl

$ strace touch testfile 2>&1 | grep O_WRONLY
open("testfile", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3

Now do something similar with socket():

https://bugzilla.redhat.com/show_bug.cgi?id=443321#c10
http://www.kernel.org/doc/man-pages/online/pages/man2/socket.2.html

Call pipe2() instead of pipe():

http://www.kernel.org/doc/man-pages/online/pages/man2/pipe2.2.html

etc etc

Cheers,
lacos

William Ahern

unread,

Apr 19, 2010, 8:19:16 PM4/19/10

to

David Given <d...@cowlark.com> wrote:
> On 19/04/10 20:39, Rainer Weikusat wrote:
> [...]
> > Yes. The actual solution is still (and will remain forever): DO NOT DO
> > THIS. This cannot be that complicated, can it?

> So, basically, you're saying, don't use fopen()? In any multithreaded
> apps? And in any *library* which might be used by a multithreaded app?
> Which these days, given how popular multithreaded environments like
> GNOME, web servers, and in fact any non-trivial application are, is
> pretty much all code?

The usual rule of thumb is not to mix threading with fork/exec. In fact, I
can't think of any scenario where I would want to use fork or exec after
starting some threads.

Yes, it can be done, and implementions make it work, but obviously there are
unresolved--perhaps unresolvable--caveats. This is an artifact of the fact
that threads came along late in the history of Unix. Until very recently
processes have always been the dominate parallelism paradigm, with simple
concurrency handled by poll() and non-blocking I/O. This is in contrast to
Windows, which is heavily multithreaded for both parallelism and
concurrency; in fact, Windows had threads before it even had robust
protected memory processes, IIRC.

As regards any library that open descriptors and/or starts threads when it
loads--that's a patently broken library in my opinion, especially if it
doesn't provide any way to release those resources explicitly.

Jonathan de Boyne Pollard

unread,

Apr 20, 2010, 5:58:15 AM4/20/10

to

As regards any library that open descriptors and/or starts threads when it loads—that's a patently broken library in my opinion, especially if it doesn't provide any way to release those resources explicitly.

A lot of people think that. Raymond Chen wrote two interesting articles a few years ago, the first on how this is the theory espoused by many, and the second on how ExitProcess() in Win32 has had to be designed in practice, given that there are a lot of such broken libraries in the world, and that Win32 allows processes to remotely inject threads into other processes. A couple of years later, xe published a puzzle, showing how hard it is for library writers to provide ways to release resources explicitly, that still work if the library user doesn't call the resource releasing function before exiting the process.

As you say, threads came late to Unix as compared to Microsoft operating systems. (Microsoft's first multithreaded operating system was released in 1987.) The Microsoft world has had a lot longer to encounter these problems. The Win32 equivalent to the problem at hand would be the window between fopen() and SetHandleInformation(…,HANDLE_FLAG_INHERIT,-1) where another thread can call CreateProcess() and thus cause the handle to be inherited.

And yes, the answers are the same, being either of:

Don't call fopen() but instead call CreateFile() with the security descriptor's bInheritHandle set to false and then call _open_osfhandle() and fdopen() to get a C stream mapped to the Win32 file handle.
Modify one's C library fopen() call to be capable of setting the "don't inherit" flag itself. As mentioned, the Microsoft world has already trodden this ground, and in Microsoft Visual C/C++ such a mechanism is already in place, that can be turned on by applications programmers via a flag. Handle inheritance is disabled atomically if one includes 'N' in the mode string passed to fopen().

And, like Microsoft Visual C/C++, GNU libc also has additional extension mode flags for fopen(), including one that sets O_CLOEXEC. It is 'e'. It has been there since glibc 2.7.

Jonathan de Boyne Pollard

unread,

Apr 20, 2010, 6:43:32 AM4/20/10

to

- there is no way to set FD_CLOEXEC as default behaviour for fopen() as far as I know, hence you are doomed anyway

For what it's worth, this problem dates back to the early 1980s, and was, to my recollection, first solved in Microsoft C version 5. The problem is, essentially, how to deal with code that calls open() or fopen() that doesn't explicitly set the extension flags that one supplies for implementation-specific or platform-specific behaviour. In other words: how to deal with modifying third-party code under the covers to incorporate alternative behaviour.

Today, the flags that third-party and library code aren't using are the O_CLOEXEC flag to open() and the 'e' flag to fopen().

In the 1980s, the flags that third-party and library code weren't using were the O_TEXT and O_BINARY flags to open() and the 't' and 'b' flags to fopen(). And the solution was a fairly simple one: If the application code didn't set any of the flags explicitly, the C library would bitwise-or in the value of a global integer named _fmode. The behaviour of third-party and library code could be globally changed simply by setting the _fmode flag (in main() or some such place), or linking in a special BINMODE.OBJ object file that replaced the library's default flag with one that had O_BINARY set.

This mechanism is still present today, and here is the MSDN documentation for _fmode.

This wheel has already been invented. So all that one needs to do with your application is copy it into GNU libc. Implement an _fmode global flag and bitwise-or its value into the mode flag in open(), allowing O_CLOEXEC as one of the flags to be or-ed in. Indeed, most of this work has been done for you, long since. The DJGPP libc already has _fmode. So, too, does Cygwin's C library (although, as noted by Corinna Vinschen in March 2009, it's not documented in the User Guide). All that you probably need to do, after copying the mechanism across (or otherwise enabling it), is ensure that O_CLOEXEC isn't masked out from _fmode.

Jonathan de Boyne Pollard

unread,

Apr 20, 2010, 6:04:56 AM4/20/10

to

>
>>
>> [... usual trolling by Rainer Weikusat ...]

>>
> Hmm. I reviewed the win32 documentation. It appears that I was
> mistaken, and that it has the exact same problem. You inherit all
> (file) handles or no (file) handles, nothing inbetween as any sane use
> of fork + exec or CreateProcess would want.
>

Go and read the message that I just posted. It mentions the relevant
Win32 API and C library functions, and even hyperlinks to the MSDN
documentation for one of them.

Rainer Weikusat

unread,

Apr 20, 2010, 7:59:03 AM4/20/10

to

David Given <d...@cowlark.com> writes:
> On 19/04/10 20:39, Rainer Weikusat wrote:
> [...]
>> Yes. The actual solution is still (and will remain forever): DO NOT DO
>> THIS. This cannot be that complicated, can it?
>
> So, basically, you're saying, don't use fopen()?

I am hard-pressed to try a reply in simplified kindergarten language
here, but I am not entirely certain if you didn't perhaps honestly
misinterprete my text completely. See, if you desire to change the
default policy, it is necessary that you have sufficient control of
your environment to actually change the policy without 'something' you
don't control causing the yet unchanged default policy to be
applied. The environment we are "discussing" here has been carefully
defined as to offer no opportunity for changing the default policy,
consequently, the default policy cannot be changed.

How do you proprose to deal with a multithreaded library which remaps
the text segment of your application with r/w permission and
overwrites your code with one million copies of "Oft glaubt der
Mensch, wenn er nur Worte hoert, es muesse sich dabei doch auch was
denken lassen" (Goethe, Faust I)?

[...]

> We all know that the default for close-on-exec is wrong;

That's your opinion on this topic. Apparently, the opinion of the
people who designed the interface was different. My opinion on this
topic, being a completely pragmatic person with no desire to
evangelize in favor of the one true someone's way is that "whatever the
default happens to be, it will be inconvenient in some situations".

Geoff Clare

unread,

Apr 20, 2010, 8:47:50 AM4/20/10

to

David Given wrote:

> My understanding of the problem is that *any* call to fopen(), in a
> multithreaded application, may result in a leaked file descriptor
> (because even if the caller to fopen() remembers to call
> fcntl(fileno(fp), F_SETFD, FD_CLOEXEC) afterwards, there's still a
> window whereby another thread calling fork() might propagate the file
> descriptor).
>
> Is this correct?
>
> If so, is there any actual solution? Because fopen() is not going to go
> away.

Here are three possible solutions:

1. Instead of using fopen(), use open() with O_CLOEXEC and then
fdopen() to get a stream from the file descriptor. Obviously
this is only portable to systems that provide O_CLOEXEC, but
O_CLOEXEC has been mandated by POSIX since 2008.

2. Fork before creating any threads. Then later on tell the
child what to do (via IPC) instead of forking at that point.
(The child can safely fork another child to do the work if
necessary.) Obviously this requires cooperation from the other
threads that would have called fork(), so it may not be possible
when using 3rd party libraries.

3. Use a mutex to ensure fork() cannot be called concurrently
with fopen(). Again this requires cooperation.

--
Geoff Clare <net...@gclare.org.uk>

Rainer Weikusat

unread,

Apr 20, 2010, 3:43:17 PM4/20/10

to

Rainer Weikusat <rwei...@mssgmbh.com> writes:

[...]

> Ok. Fine. What do you propose as solution to the problem that the
> third-party library could just overwrite all of your code with trash
> data? Maybe even because of a programming error in said library or a
> few bits ins some DIMM which have chosen to change their values at an
> inconvenient time?

As an addition to that: One of the nastiest problems I had to deal
with so far was "a third-party libray I couldn't control" (Opsec SDK)
which apparently corrupted the malloc-heap every once in a while,
leading (except causing random crashes) to the phenomenon that every
free-call in my code (or in library code) could possibly end in an
endless loop beause of trying to find the end of a linked list of
structures with the next-pointer of one structure pointing to the
structure itself. To add insult to injury, this was for a 24x7
firewall monitoring application which was supposed to 'just work'
without any human intervention and whose malfunctioning would
immediately cause phone calls from angry customers. It was possible to
deal with this because it was an actual problem with specific
properties. It is not possible to deal with hypothetical problems with
the nice property of being immediately redefined in a suitable way
once someone suggests a possible solution for the vaguely defined
issue (which is vaguely defined exactly in order to enable 'problem
redefinition' whenever the threat of a solution arises).

Ersek, Laszlo

unread,

Apr 20, 2010, 3:27:04 PM4/20/10

to

On Tue, 20 Apr 2010, David Given wrote:

> On 20/04/10 01:15, Ersek, Laszlo wrote:

>> Now do something similar with socket():

>> Call pipe2() instead of pipe():

> Ah, but it's not me that's calling socket() or pipe() --- it's a
> third-party library that's not under my control. And setting the CLOEXEC
> flag for socket() as described still has a (rare, but potential) race
> condition between socket() returning and the call to fcntl() immediately
> afterward.

No, I meant, "write a similar wrapper for pipe(), which diverts all calls
to pipe2(), and write a similar wrapper for socket(), which delegates all
calls to the 'next' socket() with SOCK_CLOEXEC or'd in". Basically, you'd
have to "override" all functions that return new file descriptors. Some
would be implemented by or-ing in a flag and delegating to the overridden
function, others would be implemented by calling differently named
functions. For the latter ones, you wouldn't need to call dlsym(RTLD_NEXT,
"...") in the constructor function.

Of course if the lib used pipe2() itself, then you'd have to override even
that.

Cheers,
lacos

Rainer Weikusat

unread,

Apr 20, 2010, 3:29:55 PM4/20/10

to

David Given <d...@cowlark.com> writes:
> On 20/04/10 01:15, Ersek, Laszlo wrote:

>> On Mon, 19 Apr 2010, David Given wrote:

> [...]

>>> So, basically, you're saying, don't use fopen()? In any multithreaded
>>> apps? And in any *library* which might be used by a multithreaded app?
>>
>> s/fopen/fork, I guess.
>

> I did mean fopen(): if I have a multithreaded application where one
> thread, *any* thread, creates a file descriptor, and another thread,
> *any* thread, calls exec(), then there's a potential problem.

If you write a single line of code, there is a potential problem
because it could be the wrong line of code. But since this is
completely hypothetical, discussing the possibility is useless. One
should rather check the line instead insofar there is any reason to
assume that a problem actually exists. And despite you are probably
not willing to believe this, there is no difference between my
contrived example and your contrived example: As soon as something is
done at all, it could have been done wrongly according to some
definition of 'wrong' and hence, every action is 'a potential problem'
(and discussing 'potential problems' at this level of detail is an
exercise in generating unproductive noise).

I asked you for a real problem two postings ago. Where is it?

[...]

> Ah, but it's not me that's calling socket() or pipe() --- it's a
> third-party library that's not under my control.

Ok. Fine. What do you propose as solution to the problem that the

third-party library could just overwrite all of your code with trash
data? Maybe even because of a programming error in said library or a
few bits ins some DIMM which have chosen to change their values at an
inconvenient time?

Answer: You ignore the possibility, at least until 'problems' actually
manifest themselves.

Regarding your problems wrt 'it is impossible to program for
UNIX(*)', hire a programmer and stay away from this stuff. It is
beyond you. *plonk*

David Given

unread,

Apr 20, 2010, 2:07:06 PM4/20/10

to

On 20/04/10 01:15, Ersek, Laszlo wrote:

> On Mon, 19 Apr 2010, David Given wrote:

[...]

>> So, basically, you're saying, don't use fopen()? In any multithreaded
>> apps? And in any *library* which might be used by a multithreaded app?
>
> s/fopen/fork, I guess.

I did mean fopen(): if I have a multithreaded application where one

thread, *any* thread, creates a file descriptor, and another thread,
*any* thread, calls exec(), then there's a potential problem.

[...]

> Reimplement open(), forward request to actual (or rather, next, as in
> RTLD_NEXT) open(), but add O_CLOEXEC. If you're lucky, fopen() /
> freopen() / whatever will go through open().

[...]

> $ LD_PRELOAD=./cloexec.so strace touch testfile 2>&1 | grep O_WRONLY

> open("testfile", O_WRONLY|O_CREAT|O_NCTTY|O_NONBLOCK|O_CLOEXEC, 0666) = 3

Hmm. I'd forgotten you can do that. It's not entirely reliable --- if
the program calls the syscall directly rather than going through libc,
then the hack won't work. It's a promising approach, though, and is
likely to work in most cases.

[...]

> Now do something similar with socket():

[...]

> Call pipe2() instead of pipe():

Ah, but it's not me that's calling socket() or pipe() --- it's a

third-party library that's not under my control. And setting the CLOEXEC
flag for socket() as described still has a (rare, but potential) race
condition between socket() returning and the call to fcntl() immediately
afterward.

Again, this will probably work in most cases, but there's still that
rare situation where something horrible will happen that makes me
uneasy. If only there were a nice portable way of telling the kernel
that all file descriptors for my process should be CLOEXEC until further
notice...

William Ahern

unread,

Apr 20, 2010, 2:48:58 PM4/20/10

to

Jonathan de Boyne Pollard <J.deBoynePoll...@ntlworld.com> wrote:
> [-- text/html, encoding 7bit, charset: ISO-8859-1, 67 lines --]

No doubt Windows has myriad techniques and devices for handling thread
issues. In fact, some of them clearly aren't resolvable. Windows Vista seems
to terminate a process which has a thread which tries to access a critical
section left inconsistent by ExitProcess terminating another thread!

If these are arguments for how things should be done in Unix, they strongly
suggest circumscribing the unstructured use of threads in Unix, and heaping
scorn on libraries and their developers that do stupid things.

Unix has a long tradition of implementing solutions through wetware, rather
than software or hardware. This is why there is so much opprobium on this
and similar Unix groups, which Windows developers chafe at. This thread is a
case in point.

It's also not unrelated to the fact that access to source code, even for
proprietary platforms, has always been much easier than on Windows. It's a
reasonable proposition to fix bugs rather than work around them. This tact
isn't acceptable in the "corporate" world. But consider that Unix libraries
aren't as often commercial products--libc a notable exception--unlike in
Windows, and we're of course speaking of libraries. Windows provids the
canonical XML parser, but in Unix you use libxml2 or something similar.

Similarly, system-level "solutions" are more difficult in Unix, because
portability across implementations is more highly valued, and unless a
solution is clearly a winner it will never be widely adopted by vendors or
users. This arguably results in evolutionarily stronger solutions, but in
the short-term leaves more gaps to be handled ad hoc--burdensome but IMO
acceptable as long nothing else precludes you from solving the problem.

This is also related to theories of software composition. Windows developers
often balk at the inefficiency of poll()+read() versus Windows' completion
ports. But poll()+read() are far more composable than completion ports.
They're moderately less efficient and yet almost infinitely more
composable--relative to being forced into a threaded, reentrant model. Unix
culture has always been more conservative and protective of unrestricted
composition of primitives than Windows culture. (The uglier Unix interfaces
are ugly by this measure in particular--and it's why most people in this
thread aren't much bothered by the descriptor "leaking" issue; it's not ugly
by this measure). That's not necessarily a judgment one way or another, just
the way it is.

Ersek, Laszlo

unread,

Apr 20, 2010, 6:01:15 PM4/20/10

to

On Tue, 20 Apr 2010, Rainer Weikusat wrote:

> One of the nastiest problems I had to deal with so far was "a
> third-party libray I couldn't control" (Opsec SDK) which apparently
> corrupted the malloc-heap every once in a while, leading (except causing
> random crashes) to the phenomenon that every free-call in my code (or in
> library code) could possibly end in an endless loop beause of trying to
> find the end of a linked list of structures with the next-pointer of one
> structure pointing to the structure itself. To add insult to injury,
> this was for a 24x7 firewall monitoring application which was supposed
> to 'just work' without any human intervention and whose malfunctioning
> would immediately cause phone calls from angry customers. It was
> possible to deal with this because it was an actual problem with
> specific properties.

Can you / would you / are you permitted to tell us: how?

Thanks,
lacos

Joshua Maurice

unread,

Apr 20, 2010, 6:57:43 PM4/20/10

to

On Apr 20, 11:48 am, William Ahern <will...@wilbur.25thandClement.com>
wrote:

I don't know if I have a windows-centric or whatever software value
system.

My own view of processes and threads is the standard one: threads of a
single process share a common memory space, file handles etc., and
different processes have different memory spaces, file handles, etc.
One uses threads when one needs this common memory space, and one uses
processes when one wants some degree of fault tolerance and
decoupling.

I see no a priori reason that creating a new process and opening a
file should require cooperation between threads to avoid resource
leaks. If these threads were instead processes, it would be simple;
the operations are independent, so they can be done concurrently.
However, because they are threads, and because fork + exec is defined
the way it is, these operations are in fact not independent. I think
this is silly. Nearly all cases of fork + exec do not want to inherit
all file handles. They want to inherit 3 handles and none of the rest.
The point of separate processes is to have separate resource pools,
not to share resource pools. There is no POSIX portable fast way to do
this, and thus I claim POSIX is broken in this regard if it cannot
handle the common case.

(It seems to be a recurring problem with POSIX, that to write correct
applications, all pieces of code must cooperate when it shouldn't be
required. POSIX should make independent those tasks which logically
are independent and can easily be made independent. This coupling of
code is to no one's benefit.)

I am not here to troll. I made it clear from the beginning that I know
this is how POSIX is defined, and I merely asked if there was a way to
wrap fork + exec calls to give a sane interface to callers, a way to
close all unrelated resources between fork and exec. I was asking for
a practical solution to this reasonable use. Instead, I had trolls
come in and give not useful replies like "I shouldn't hit myself with
a hammer". Luckily, not everyone is so indoctrinated and asinine, and
I was able to get some useful answers and replies. I thank those
people.

Joshua Maurice

unread,

Apr 20, 2010, 6:57:58 PM4/20/10

to

On Apr 20, 11:48 am, William Ahern <will...@wilbur.25thandClement.com>
wrote:

I don't know if I have a windows-centric or whatever software value

Joshua Maurice

unread,

Apr 20, 2010, 7:05:36 PM4/20/10

to

On Apr 20, 3:57 pm, Joshua Maurice <joshuamaur...@gmail.com> wrote:
[...]

Ack. Sorry for the double post.