I got lockups in recent -current box. This box is my main workstation and usually be up to date kernel.
In recent days, I usually locked up. This is not a panic, only locks up. My situation is:
o KDE's clock is working. o KDE's virtual screen switching is working. o Apache does not reply on 80/tcp from remote. o ssh from this box is still working. o zsh on this box does not go next prompt when I press enter key at promprt.
It looks kernel is working, but fork/exec is not working when I enter to this situation.
So, what can I diagnose with this situation? Of course this box has serial console.
-- Jun Kuriyama <kuriy...@imgsrc.co.jp> // IMG SRC, Inc. <kuriy...@FreeBSD.org> // FreeBSD Project _______________________________________________ freebsd-curr...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> I got lockups in recent -current box. This box is my main workstation > and usually be up to date kernel.
> In recent days, I usually locked up. This is not a panic, only locks > up. My situation is:
> o KDE's clock is working. > o KDE's virtual screen switching is working. > o Apache does not reply on 80/tcp from remote. > o ssh from this box is still working. > o zsh on this box does not go next prompt when I press enter key at > promprt.
> It looks kernel is working, but fork/exec is not working when I enter > to this situation.
Hmm, I've been seeing something semilar a couble of times, can you do a ps -axl and see if those processes hang around in vm ?
On Monday 01 December 2003 09:02, Soren Schmidt wrote:
> It seems Jun Kuriyama wrote: > > I got lockups in recent -current box. This box is my main workstation > > and usually be up to date kernel.
> > In recent days, I usually locked up. This is not a panic, only locks > > up. My situation is:
> > o KDE's clock is working. > > o KDE's virtual screen switching is working. > > o Apache does not reply on 80/tcp from remote. > > o ssh from this box is still working.
But ssh *to* the box doesn't right?
> > o zsh on this box does not go next prompt when I press enter key at > > promprt.
> > It looks kernel is working, but fork/exec is not working when I enter > > to this situation.
> Hmm, I've been seeing something semilar a couble of times, can you do a > ps -axl and see if those processes hang around in vm ?
I've had this on -STABLE as well and indeed KDE seems to be related. Also, = I=20 recall this to happen during the security check or rebuilding the locate=20 database. I haven't seen this once I installed more memory.
A good tell-tale is: * Switch to the console. * Select another VT, via ALT-F2 * Type in the login name and press enter
You won't get a passwd prompt.
Most of the time I got outof it, by CTRL-ALT-F1 and CTRL-C.
I think I have a weekly run report somewhere, reporting 'vm exhaustion'=20 errors. I can dig it up if it's helpful. At the time, the box had 128Megs of RAM and 256MB swap, P-III 450. =2D-=20 Melvyn
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D = =3D=3D=3D=3D=3D =46reeBSD sarevok.webteckies.org 5.2-BETA FreeBSD 5.2-BETA #1: Sat Nov 29=20 00:15:33 CET 2003 r...@sarevok.webteckies.org:/usr/obj/usr/src/sys/ SAREVOK_NOFW_DBG i386 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D = =3D=3D=3D=3D=3D
Melvyn Sopacua wrote: > > > o KDE's clock is working. > > > o KDE's virtual screen switching is working. > > > o Apache does not reply on 80/tcp from remote. > > > o ssh from this box is still working.
> But ssh *to* the box doesn't right?
Yes.
> I've had this on -STABLE as well and indeed KDE seems to be related. Also, I > recall this to happen during the security check or rebuilding the locate > database. I haven't seen this once I installed more memory.
> A good tell-tale is: > * Switch to the console. > * Select another VT, via ALT-F2 > * Type in the login name and press enter
> You won't get a passwd prompt.
Yes, I can see the same situation.
> Most of the time I got outof it, by CTRL-ALT-F1 and CTRL-C.
Hmm, I'll try next time.
> I think I have a weekly run report somewhere, reporting 'vm exhaustion' > errors. I can dig it up if it's helpful. > At the time, the box had 128Megs of RAM and 256MB swap, P-III 450.
My box has 2GB memory, so it would be enough. My box makes world and release nightly and it locks usually at that time.
Anyway, I'm waiting for next lock ups. :-)
-- Jun Kuriyama <kuriy...@imgsrc.co.jp> // IMG SRC, Inc. <kuriy...@FreeBSD.org> // FreeBSD Project _______________________________________________ freebsd-curr...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
On Mon, 1 Dec 2003, Jun Kuriyama wrote: > I got lockups in recent -current box. This box is my main workstation > and usually be up to date kernel.
> In recent days, I usually locked up. This is not a panic, only locks > up. My situation is:
> o KDE's clock is working. > o KDE's virtual screen switching is working. > o Apache does not reply on 80/tcp from remote. > o ssh from this box is still working. > o zsh on this box does not go next prompt when I press enter key at > promprt.
> It looks kernel is working, but fork/exec is not working when I enter to > this situation.
> So, what can I diagnose with this situation? Of course this box has > serial console.
This could be a sign of a VM or VFS lock leak or deadlock. I'd advise hooking up a serial console, dropping to DDB over serial line, and posting the results of "ps" and "show lockedvnods". We might then ask you to use the "show locks" command on various processes. You'll need to have DDB and WITNESS compiled in.
Robert N M Watson FreeBSD Core Team, TrustedBSD Projects rob...@fledge.watson.org Senior Research Scientist, McAfee Research
Robert Watson wrote: > This could be a sign of a VM or VFS lock leak or deadlock. I'd advise > hooking up a serial console, dropping to DDB over serial line, and posting > the results of "ps" and "show lockedvnods". We might then ask you to use > the "show locks" command on various processes. You'll need to have DDB > and WITNESS compiled in.
He he, I of course have serial console, DDB and WITNESS. They are good safety belf fot -current users, isn't it? :-)
I'll post information above next time. Thanks!
-- Jun Kuriyama <kuriy...@imgsrc.co.jp> // IMG SRC, Inc. <kuriy...@FreeBSD.org> // FreeBSD Project _______________________________________________ freebsd-curr...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Robert Watson wrote: > This could be a sign of a VM or VFS lock leak or deadlock. I'd advise > hooking up a serial console, dropping to DDB over serial line, and posting > the results of "ps" and "show lockedvnods". We might then ask you to use > the "show locks" command on various processes. You'll need to have DDB > and WITNESS compiled in.
-- Jun Kuriyama <kuriy...@imgsrc.co.jp> // IMG SRC, Inc. <kuriy...@FreeBSD.org> // FreeBSD Project _______________________________________________ freebsd-curr...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
On Tue, 2 Dec 2003, Jun Kuriyama wrote: > At Mon, 1 Dec 2003 09:23:21 -0500 (EST), > Robert Watson wrote: > > This could be a sign of a VM or VFS lock leak or deadlock. I'd advise > > hooking up a serial console, dropping to DDB over serial line, and posting > > the results of "ps" and "show lockedvnods". We might then ask you to use > > the "show locks" command on various processes. You'll need to have DDB > > and WITNESS compiled in.
Could you try compiling in DEBUG_LOCKS into your kernel and doing "show lockedvnods" with that? Unfortunately, someone removed the pid from the output of that command, but didn't add the thread pointer to the DDB ps output, so you'll probably need to modify the lockmgr_printinfo() function in vfs_subr.c to print out lkp->lk_lockholder->td_proc->p_pid as well for exclusive locks. It looks like maybe something isn't releasing a vnode lock before returning to userspace. I have some patches to assert that no lockmgr locks are held on the return to userspace, but I'll have to dig them up tomorrow and send them to you. Basically, it adds a per-thread lockmgr lock count in a thread-local variable, incrementing for each lock, and decrementing for each release, and then KASSERT()'s in userret that the variable is 0.
Robert N M Watson FreeBSD Core Team, TrustedBSD Projects rob...@fledge.watson.org Senior Research Scientist, McAfee Research
Robert Watson wrote: > Could you try compiling in DEBUG_LOCKS into your kernel and doing "show > lockedvnods" with that?
Okay. I'll use new kernel with DEBUG_LOCKS.
> Unfortunately, someone removed the pid from the > output of that command, but didn't add the thread pointer to the DDB ps > output, so you'll probably need to modify the lockmgr_printinfo() function > in vfs_subr.c to print out lkp->lk_lockholder->td_proc->p_pid as well for > exclusive locks.
I don't understand what it means, but I'll try with this modification.
Index: kern_lock.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_lock.c,v retrieving revision 1.70 diff -u -r1.70 kern_lock.c --- kern_lock.c 16 Jul 2003 01:00:38 -0000 1.70 +++ kern_lock.c 2 Dec 2003 07:04:49 -0000 @@ -611,8 +611,8 @@ printf(" lock type %s: SHARED (count %d)", lkp->lk_wmesg, lkp->lk_sharecount); else if (lkp->lk_flags & LK_HAVE_EXCL) - printf(" lock type %s: EXCL (count %d) by thread %p", - lkp->lk_wmesg, lkp->lk_exclusivecount, lkp->lk_lockholder); + printf(" lock type %s: EXCL (count %d) by thread %p (pid:%d)", + lkp->lk_wmesg, lkp->lk_exclusivecount, lkp->lk_lockholder, lkp->lk_lockholder->td_proc->p_pid); if (lkp->lk_waitcount > 0) printf(" with %d pending", lkp->lk_waitcount); }
> It looks like maybe something isn't releasing a vnode > lock before returning to userspace. I have some patches to assert that no > lockmgr locks are held on the return to userspace, but I'll have to dig > them up tomorrow and send them to you. Basically, it adds a per-thread > lockmgr lock count in a thread-local variable, incrementing for each lock, > and decrementing for each release, and then KASSERT()'s in userret that > the variable is 0.
Thanks! I'm waiting for your patch.
-- Jun Kuriyama <kuriy...@imgsrc.co.jp> // IMG SRC, Inc. <kuriy...@FreeBSD.org> // FreeBSD Project _______________________________________________ freebsd-curr...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Robert Watson <rwat...@FreeBSD.org> writes: > On Tue, 2 Dec 2003, Jun Kuriyama wrote:
>> At Mon, 1 Dec 2003 09:23:21 -0500 (EST), >> Robert Watson wrote: >> > This could be a sign of a VM or VFS lock leak or deadlock. I'd advise >> > hooking up a serial console, dropping to DDB over serial line, and posting >> > the results of "ps" and "show lockedvnods". We might then ask you to use >> > the "show locks" command on various processes. You'll need to have DDB >> > and WITNESS compiled in.
> Could you try compiling in DEBUG_LOCKS into your kernel and doing "show > lockedvnods" with that? Unfortunately, someone removed the pid from the > output of that command, but didn't add the thread pointer to the DDB ps > output, so you'll probably need to modify the lockmgr_printinfo() function > in vfs_subr.c to print out lkp->lk_lockholder->td_proc->p_pid as well for > exclusive locks. It looks like maybe something isn't releasing a vnode > lock before returning to userspace. I have some patches to assert that no > lockmgr locks are held on the return to userspace, but I'll have to dig > them up tomorrow and send them to you. Basically, it adds a per-thread > lockmgr lock count in a thread-local variable, incrementing for each lock, > and decrementing for each release, and then KASSERT()'s in userret that > the variable is 0.
I have the same problem. find is offending in my case:
I am guessing that some of the recent locking changes are causing the problem. Unfortunately I am on the road now through Jan 4th, so will not be in a position to look at it. Hopefully one of the folks working on getting the SMP pushed down through the filesystem (Jeff Roberson, John Baldwin, or Alan Cox) will have some idea what broke recently. I would try looking at which process holds the buffer lock that the find is trying to get. You can usually unravel the chain of locks to eventually find what pair of events lead to the deadlock. It definitely helps to have DEBUG_LOCKS compiled into your kernel.