Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

repeatable lockup (pipe related?)

0 views
Skip to first unread message

Ben Smithurst

unread,
Mar 23, 2000, 3:00:00 AM3/23/00
to
I've noticed a problem which seems to cause a repeatable lockup in both
RELENG_3 and RELENG_4 (I don't have any -current machines to test on).

basically, I've been able to repeat it by creating a file containing

.SH foo, bar, baz

and then a lot of junk text (I appended /etc/rc and /etc/rc.network). Then,
when I do

nroff -ms foo.ms 2>&1 | less

and quit 'less' straight away, the whole system seems to lockup. ^T
worked (sometimes), and showed troff using lots of system time (no user
time). ping from another host worked, ctrl-alt-esc dropped into DDB ok,
but that was about all. (ctrl-alt-del didn't work.)

Of course, when it first happened I wasn't trying to process /etc/rc*
files with nroff, :-) that's just a case which works (or not) and you
guys should be able to use to reproduce it. There's probably an easier
way, too, but what the hell. (It seems it doesn't always happen, but
normally within a few tries it does.)

Can anyone else repeat this? If not, I'll be happy to try to supply any
other details you want. (Either that or assume my system is seriously
screwed in some way, and you can assume I'm an idiot.)

part of the backtrace is:

#12 0xc01ecd30 in atkbd_isa_intr (arg=0xc023c5e0) at ../../isa/atkbd_isa.c:120
#13 0xc0131f1c in tsleep (ident=0xc5fdfca0, priority=272, wmesg=0xc01f557d "pipbww", timo=0)
at ../../kern/kern_synch.c:480
#14 0xc013df68 in pipe_write (fp=0xc0893a40, uio=0xc601ceec, cred=0xc07bf880, flags=0, p=0xc5afdbe0)
at ../../kern/sys_pipe.c:794
#15 0xc013c363 in dofilewrite (p=0xc5afdbe0, fp=0xc0893a40, fd=2, buf=0x807dc60, nbyte=32, offset=-1, flags=0)
at ../../sys/file.h:156
#16 0xc013c267 in write (p=0xc5afdbe0, uap=0xc601cf80) at ../../kern/sys_generic.c:298
#17 0xc01e7336 in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 134732896, tf_esi = 134732696,
tf_ebp = -1077938268, tf_isp = -972959788, tf_ebx = 672301476, tf_edx = 134732696, tf_ecx = 134732696,
tf_eax = 4, tf_trapno = 22, tf_err = 2, tf_eip = 672262004, tf_cs = 31, tf_eflags = 663, tf_esp = -1077938312,
tf_ss = 47}) at ../../i386/i386/trap.c:1073

everything above frame 13 looks like it's DDB related to me. ps shows,

root@strontium:/var/crash# ps -N debug.14 -M vmcore.14 -axl
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND
1000 55894 1 0 18 0 1320 0 opause Is #C2 0:00.13 (zsh)
1000 60555 55894 0 28 0 1004 0 - R+ #C2 0:00.01 (groff)
1000 60703 55894 0 -6 0 892 0 pipecl DE+ #C2 0:00.01 (less)
1000 61398 60555 295 -6 0 1692 0 - R+ #C2 0:00.16 (troff)
... [unrelated things, I think]

If anyone can shed any light on this, that'd be great.

Thanks...

--
Ben Smithurst / b...@scientia.demon.co.uk / PGP: 0x99392F7D


To Unsubscribe: send mail to majo...@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

Matthew Dillon

unread,
Mar 23, 2000, 3:00:00 AM3/23/00
to
:: .SH foo, bar, baz

::
::and then a lot of junk text (I appended /etc/rc and /etc/rc.network). Then,
::when I do
::
:: nroff -ms foo.ms 2>&1 | less
::
::and quit 'less' straight away, the whole system seems to lockup. ^T
::worked (sometimes), and showed troff using lots of system time (no user
::time). ping from another host worked, ctrl-alt-esc dropped into DDB ok,

I've committed a fix to this in -current, 4.x, and 3.x. Rev 1.61
kern/sys_pipe.c (current), 1.60.2.1 in RELENG_4, something else in
RELENG_3. Sorry 2.2.x'rs, three is my limit :-)

What happens is that the pipe writer checks for the reader going away
before entering the while() loop on the write, but only checks
sporatically inside that loop.

There are situations where the reader may go away while the writer is
blocked and cause the writer to enter into an infinite loop because
the writer believes there is a reader 'reading' when, in fact, the reader
side is stuck in pipeclose(). They two sides then play ping-pong
tsleep/wakeup with each other forever.

-Matt
Matthew Dillon
<dil...@backplane.com>

Ben Smithurst

unread,
Mar 24, 2000, 3:00:00 AM3/24/00
to
Matthew Dillon wrote:

> I've committed a fix to this in -current, 4.x, and 3.x. Rev 1.61
> kern/sys_pipe.c (current), 1.60.2.1 in RELENG_4, something else in
> RELENG_3. Sorry 2.2.x'rs, three is my limit :-)

Thanks! I'll try it tonight.

--
Ben Smithurst / b...@scientia.demon.co.uk / PGP: 0x99392F7D

0 new messages