Found it! close-on-exec was broken (386BSD)

Mark W. Eichin

unread,

Mar 28, 1992, 12:31:02 AM3/28/92

to

Here's the code that actually performs the work of close-on-exec when
exec runs. (/usr/src/sys.386bsd/kern/kern_descrip.c.)

/*
* Close any files on exec?
*/
void
fdcloseexec(p)
struct proc *p;
{
register struct filedesc *fdp = p->p_fd;
struct file **fpp;
register int i;

fpp = fdp->fd_ofiles;
for (i = fdp->fd_lastfile; i-- >= 0; fpp++)
if (*fpp && (fdp->fd_ofileflags[i] & UF_EXCLOSE)) {
(void) closef(*fpp, p);
*fpp = 0;
}
}

Now, look at the for loop. (Several of us have looked at it before.)
Note that i is counting down from fd_lastfile, presumably so
as to use a decrement-and-compare-with-zero, which probably maps to
some instruction on some architectures.
Note also that i is used as an index into fd_ofileflags. Ok,
so we're scanning down from the top.
Now, note that fpp is scanning the fd_ofiles list *UPWARDS*
from the 0th element. Oops.
"Oh. Gee. Is it really doing that?" I hear you cry. Well, yes.
Lots of kernel printf's later, and a working tcsh 6 (yay! :-) have
made it rather clear that this is an example of why this type of
optimization is really better left to the compiler :-)
The obvious fix:
! for (i = fdp->fd_lastfile; i-- >= 0; fpp++)
----
! for (i = 0; i <= fdp->fd_lastfile; i++, fpp++)

Have fun...
_Mark_ <eic...@athena.mit.edu>
MIT Student Information Processing Board
Cygnus Support <eic...@cygnus.com>
ps. Does Bill Jolitz actually read this? I suppose I should be sending
some of these directly to him, though it would be good for someone to
set up an explicit "bug-386bsd" mailing list [if they could guarantee
that the Jolitz' are on it.]

David Dawes

unread,

Mar 29, 1992, 4:57:59 AM3/29/92

to

In article <1992Mar28....@athena.mit.edu> eic...@athena.mit.edu (Mark W. Eichin) writes:

[ description of close-on-exec bug deleted ]

>Lots of kernel printf's later, and a working tcsh 6 (yay! :-)

There are still a few other problems with tcsh6. One is that the nl/cr
translation in output sometimes gets stuffed. I've tracked this down
to TIOCDRAIN not working. tcsh changes the tty settings before and
after a command is run, and they are taking effect before all the
output gets drained. I tried doing an explicit TIOCDRAIN ioctl in case
tcsetattr() wasn't doing it -- but that didn't make any difference (the
ioctl returns 0). I guess the problem is with ttywait() in
kern/tty.c.

Another problem is that every time I start tcsh, I get the following
message:

free(4d808) bad block.memtop = 4e000 membot = 4a498.
free(4b1e8) bad block.memtop = 4e000 membot = 4a498.

(the numbers change). I haven't noticed this causing any problems.

Yet another problem is if I do 'ls | more', the shell will hang when
the command terminates.

Someone else mentioned that bash required a ^C the first time it is
started on a tty. Same goes for tcsh when using the 0.0 kernel. I'd
say this is the same as the problem where more(1) hangs the first time
its run on a tty. Both bash and tcsh put the tty into raw mode.
tcsh and bash both start up fine with the 0.1 kernel which clearly has this
bug fixed. (It is possible to work around the close-on-exec bug if you
want to use the 0.1 kernel with tcsh by explicitly closing fd 3-19 before
the execv() calls in sh.exec.c.)

David

PS, When swap is full, does it scribble on the next partition? I got
some messages (not a panic) about not being able to clear a page --
and after I rebooted, /usr was in a bad state.
--
------------------------------------------------------------------------------
David Dawes (da...@physics.su.oz.au) DoD#210 | Phone: +61 2 692 2639
School of Physics, University of Sydney, Australia | Fax: +61 2 660 2903
------------------------------------------------------------------------------

Dave Stanhope

unread,

Mar 29, 1992, 3:53:33 PM3/29/92

to

I'm not sure if this is related but the arguments for 'nextc' in
'tty_ring.c' seem to be reversed from this function's usage in
modules elsewhere. The argument list is already specified in the
correct order. The simple fix seems to be to change line 83 in
'tty_rings.c' as follows:

WAS:
nextc(cpp, rbp) struct ringb *rbp; char **cpp; {
SHOULD BE:
nextc(rbp, cpp) struct ringb *rbp; char **cpp; {

Hope this helps,
David M. Stanhope (Dave's Not Here)

Mark W. Eichin

unread,

Mar 29, 1992, 7:36:15 PM3/29/92

to

>> From: d...@celtech.COM (Dave Stanhope)

>> I'm not sure if this is related but the arguments for 'nextc' in
>> 'tty_ring.c' seem to be reversed from this function's usage in

Not only that, but the second argument is wrong about half of
the time. Starting from Dave's pointer, I checked the usage of nextc,
and found that tty.c: line 1571, 1634, and 1637 all pass "cp" as the
second argument instead of &cp. These are in ttyretype and ttyrub; the
former would certainly explain why ^R tends to spew garbage, and since
fixing it, the problem has not returned.

Christos Zoulas

unread,

Mar 31, 1992, 5:07:00 PM3/31/92

to

>Another problem is that every time I start tcsh, I get the following
>message:
>
>free(4d808) bad block.memtop = 4e000 membot = 4a498.
>free(4b1e8) bad block.memtop = 4e000 membot = 4a498.
>
>(the numbers change). I haven't noticed this causing any problems.

That I think was from ttyname() calling closedir() twice and thus
freeing the same memory twice in libc...
[That is fixed in the current source at berkeley, but I am not sure about
the Net2 source, read the README file for a patch]

>Yet another problem is if I do 'ls | more', the shell will hang when
>the command terminates.

could be the notorious ANSI-c versus K&R c incompatiblity that affects
prototyped procedures that don't have integral arguments..
pid_t on 386BSD is a short, so the library calls that take pid_t as
an argument will not work, if the library is compiled with an ansi
compiler and the application with a K&R compile or vice-versa.

>Someone else mentioned that bash required a ^C the first time it is
>started on a tty. Same goes for tcsh when using the 0.0 kernel. I'd
>say this is the same as the problem where more(1) hangs the first time
>its run on a tty. Both bash and tcsh put the tty into raw mode.
>tcsh and bash both start up fine with the 0.1 kernel which clearly has this
>bug fixed. (It is possible to work around the close-on-exec bug if you
>want to use the 0.1 kernel with tcsh by explicitly closing fd 3-19 before
>the execv() calls in sh.exec.c.)

Or undef FIOCLEX and FIONCLEX.. Then tcsh will use closem().

christos

Karl Lehenbauer

unread,

Apr 3, 1992, 4:30:44 AM4/3/92

to

In article <1992Mar30.0...@athena.mit.edu> eic...@athena.mit.edu (Mark W. Eichin) writes:
>>> From: d...@celtech.COM (Dave Stanhope)
>>> ...arguments for 'nextc' in
>>> 'tty_ring.c' seem to be reversed from this function's usage ...

> Not only that, but the second argument is wrong about half of
>the time. Starting from Dave's pointer, I checked the usage of nextc,
>and found that tty.c: line 1571, 1634, and 1637 all pass "cp" as the
>second argument instead of &cp.

Excellent! I have applied these fixes and the tty delay bug is gone!

Meritorious Achievement Awards for Kernel Hacking Beyond The Call Of
Duty to Dave Stanhope and Mark Eichin!
--
-- Email in...@NeoSoft.com for info on getting Internet access.
"Trust me. I know what I'm doing." -- Sledge Hammer