Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

-current lockup (how to diagnose?)

29 views
Skip to first unread message

Jun Kuriyama

unread,
Dec 1, 2003, 2:57:28 AM12/1/03
to

I got lockups in recent -current box. This box is my main workstation
and usually be up to date kernel.

In recent days, I usually locked up. This is not a panic, only locks
up. My situation is:

o KDE's clock is working.
o KDE's virtual screen switching is working.
o Apache does not reply on 80/tcp from remote.
o ssh from this box is still working.
o zsh on this box does not go next prompt when I press enter key at
promprt.

It looks kernel is working, but fork/exec is not working when I enter
to this situation.

So, what can I diagnose with this situation? Of course this box has
serial console.


--
Jun Kuriyama <kuri...@imgsrc.co.jp> // IMG SRC, Inc.
<kuri...@FreeBSD.org> // FreeBSD Project
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Soren Schmidt

unread,
Dec 1, 2003, 3:03:38 AM12/1/03
to
It seems Jun Kuriyama wrote:
>
> I got lockups in recent -current box. This box is my main workstation
> and usually be up to date kernel.
>
> In recent days, I usually locked up. This is not a panic, only locks
> up. My situation is:
>
> o KDE's clock is working.
> o KDE's virtual screen switching is working.
> o Apache does not reply on 80/tcp from remote.
> o ssh from this box is still working.
> o zsh on this box does not go next prompt when I press enter key at
> promprt.
>
> It looks kernel is working, but fork/exec is not working when I enter
> to this situation.

Hmm, I've been seeing something semilar a couble of times, can you do a
ps -axl and see if those processes hang around in vm ?

-Søren

Melvyn Sopacua

unread,
Dec 1, 2003, 8:42:27 AM12/1/03
to

--Boundary-02=_wS0y/3lEeRq10Hw
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On Monday 01 December 2003 09:02, Soren Schmidt wrote:

> It seems Jun Kuriyama wrote:
> > I got lockups in recent -current box. This box is my main workstation
> > and usually be up to date kernel.
> >
> > In recent days, I usually locked up. This is not a panic, only locks
> > up. My situation is:
> >
> > o KDE's clock is working.
> > o KDE's virtual screen switching is working.
> > o Apache does not reply on 80/tcp from remote.
> > o ssh from this box is still working.

But ssh *to* the box doesn't right?

> > o zsh on this box does not go next prompt when I press enter key at
> > promprt.
> >
> > It looks kernel is working, but fork/exec is not working when I enter
> > to this situation.
>
> Hmm, I've been seeing something semilar a couble of times, can you do a
> ps -axl and see if those processes hang around in vm ?

I've had this on -STABLE as well and indeed KDE seems to be related. Also, =
I=20
recall this to happen during the security check or rebuilding the locate=20
database. I haven't seen this once I installed more memory.

A good tell-tale is:
* Switch to the console.
* Select another VT, via ALT-F2
* Type in the login name and press enter

You won't get a passwd prompt.

Most of the time I got outof it, by CTRL-ALT-F1 and CTRL-C.

I think I have a weekly run report somewhere, reporting 'vm exhaustion'=20
errors. I can dig it up if it's helpful.
At the time, the box had 128Megs of RAM and 256MB swap, P-III 450.
=2D-=20
Melvyn

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
=46reeBSD sarevok.webteckies.org 5.2-BETA FreeBSD 5.2-BETA #1: Sat Nov 29=20
00:15:33 CET 2003 ro...@sarevok.webteckies.org:/usr/obj/usr/src/sys/
SAREVOK_NOFW_DBG i386
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D

--Boundary-02=_wS0y/3lEeRq10Hw
Content-Type: application/pgp-signature
Content-Description: signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQA/y0SwOv9JNmfFN5URAhWwAJ0f2nCV8dTfYz7nLnRrhDBmFXRDnACgqGnR
pssP7sri3T2EuxgdB8aOJr8=
=/ZFj
-----END PGP SIGNATURE-----

--Boundary-02=_wS0y/3lEeRq10Hw--

Jun Kuriyama

unread,
Dec 1, 2003, 9:09:14 AM12/1/03
to
At Mon, 1 Dec 2003 14:39:56 +0100,

Melvyn Sopacua wrote:
> > > o KDE's clock is working.
> > > o KDE's virtual screen switching is working.
> > > o Apache does not reply on 80/tcp from remote.
> > > o ssh from this box is still working.
>
> But ssh *to* the box doesn't right?

Yes.

> I've had this on -STABLE as well and indeed KDE seems to be related. Also, I

> recall this to happen during the security check or rebuilding the locate

> database. I haven't seen this once I installed more memory.
>
> A good tell-tale is:
> * Switch to the console.
> * Select another VT, via ALT-F2
> * Type in the login name and press enter
>
> You won't get a passwd prompt.

Yes, I can see the same situation.

> Most of the time I got outof it, by CTRL-ALT-F1 and CTRL-C.

Hmm, I'll try next time.

> I think I have a weekly run report somewhere, reporting 'vm exhaustion'

> errors. I can dig it up if it's helpful.
> At the time, the box had 128Megs of RAM and 256MB swap, P-III 450.

My box has 2GB memory, so it would be enough. My box makes world and
release nightly and it locks usually at that time.

Anyway, I'm waiting for next lock ups. :-)


--
Jun Kuriyama <kuri...@imgsrc.co.jp> // IMG SRC, Inc.
<kuri...@FreeBSD.org> // FreeBSD Project

Robert Watson

unread,
Dec 1, 2003, 9:30:20 AM12/1/03
to

On Mon, 1 Dec 2003, Jun Kuriyama wrote:

> I got lockups in recent -current box. This box is my main workstation
> and usually be up to date kernel.
>
> In recent days, I usually locked up. This is not a panic, only locks
> up. My situation is:
>

> o KDE's clock is working.
> o KDE's virtual screen switching is working.
> o Apache does not reply on 80/tcp from remote.
> o ssh from this box is still working.

> o zsh on this box does not go next prompt when I press enter key at
> promprt.
>
> It looks kernel is working, but fork/exec is not working when I enter to
> this situation.
>

> So, what can I diagnose with this situation? Of course this box has
> serial console.

This could be a sign of a VM or VFS lock leak or deadlock. I'd advise
hooking up a serial console, dropping to DDB over serial line, and posting
the results of "ps" and "show lockedvnods". We might then ask you to use
the "show locks" command on various processes. You'll need to have DDB
and WITNESS compiled in.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
rob...@fledge.watson.org Senior Research Scientist, McAfee Research

Jun Kuriyama

unread,
Dec 1, 2003, 9:33:59 AM12/1/03
to
At Mon, 1 Dec 2003 09:23:21 -0500 (EST),

Robert Watson wrote:
> This could be a sign of a VM or VFS lock leak or deadlock. I'd advise
> hooking up a serial console, dropping to DDB over serial line, and posting
> the results of "ps" and "show lockedvnods". We might then ask you to use
> the "show locks" command on various processes. You'll need to have DDB
> and WITNESS compiled in.

He he, I of course have serial console, DDB and WITNESS. They are
good safety belf fot -current users, isn't it? :-)

I'll post information above next time. Thanks!


--
Jun Kuriyama <kuri...@imgsrc.co.jp> // IMG SRC, Inc.
<kuri...@FreeBSD.org> // FreeBSD Project

Jun Kuriyama

unread,
Dec 2, 2003, 1:31:18 AM12/2/03
to
At Mon, 1 Dec 2003 09:23:21 -0500 (EST),
Robert Watson wrote:
> This could be a sign of a VM or VFS lock leak or deadlock. I'd advise
> hooking up a serial console, dropping to DDB over serial line, and posting
> the results of "ps" and "show lockedvnods". We might then ask you to use
> the "show locks" command on various processes. You'll need to have DDB
> and WITNESS compiled in.

I got it.

http://www.imgsrc.co.jp/~kuriyama/BSD/lock-20031202.log

Robert Watson

unread,
Dec 2, 2003, 1:49:47 AM12/2/03
to

On Tue, 2 Dec 2003, Jun Kuriyama wrote:

> At Mon, 1 Dec 2003 09:23:21 -0500 (EST),
> Robert Watson wrote:
> > This could be a sign of a VM or VFS lock leak or deadlock. I'd advise
> > hooking up a serial console, dropping to DDB over serial line, and posting
> > the results of "ps" and "show lockedvnods". We might then ask you to use
> > the "show locks" command on various processes. You'll need to have DDB
> > and WITNESS compiled in.
>
> I got it.
>
> http://www.imgsrc.co.jp/~kuriyama/BSD/lock-20031202.log

"ouch"

Could you try compiling in DEBUG_LOCKS into your kernel and doing "show
lockedvnods" with that? Unfortunately, someone removed the pid from the
output of that command, but didn't add the thread pointer to the DDB ps
output, so you'll probably need to modify the lockmgr_printinfo() function
in vfs_subr.c to print out lkp->lk_lockholder->td_proc->p_pid as well for
exclusive locks. It looks like maybe something isn't releasing a vnode
lock before returning to userspace. I have some patches to assert that no
lockmgr locks are held on the return to userspace, but I'll have to dig
them up tomorrow and send them to you. Basically, it adds a per-thread
lockmgr lock count in a thread-local variable, incrementing for each lock,
and decrementing for each release, and then KASSERT()'s in userret that
the variable is 0.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
rob...@fledge.watson.org Senior Research Scientist, McAfee Research

_______________________________________________

Jun Kuriyama

unread,
Dec 2, 2003, 2:10:08 AM12/2/03
to
At Tue, 2 Dec 2003 01:45:43 -0500 (EST),

Robert Watson wrote:
> Could you try compiling in DEBUG_LOCKS into your kernel and doing "show
> lockedvnods" with that?

Okay. I'll use new kernel with DEBUG_LOCKS.

> Unfortunately, someone removed the pid from the
> output of that command, but didn't add the thread pointer to the DDB ps
> output, so you'll probably need to modify the lockmgr_printinfo() function
> in vfs_subr.c to print out lkp->lk_lockholder->td_proc->p_pid as well for
> exclusive locks.

I don't understand what it means, but I'll try with this modification.

Index: kern_lock.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_lock.c,v
retrieving revision 1.70
diff -u -r1.70 kern_lock.c
--- kern_lock.c 16 Jul 2003 01:00:38 -0000 1.70
+++ kern_lock.c 2 Dec 2003 07:04:49 -0000
@@ -611,8 +611,8 @@
printf(" lock type %s: SHARED (count %d)", lkp->lk_wmesg,
lkp->lk_sharecount);
else if (lkp->lk_flags & LK_HAVE_EXCL)
- printf(" lock type %s: EXCL (count %d) by thread %p",
- lkp->lk_wmesg, lkp->lk_exclusivecount, lkp->lk_lockholder);
+ printf(" lock type %s: EXCL (count %d) by thread %p (pid:%d)",
+ lkp->lk_wmesg, lkp->lk_exclusivecount, lkp->lk_lockholder, lkp->lk_lockholder->td_proc->p_pid);
if (lkp->lk_waitcount > 0)
printf(" with %d pending", lkp->lk_waitcount);
}


> It looks like maybe something isn't releasing a vnode
> lock before returning to userspace. I have some patches to assert that no
> lockmgr locks are held on the return to userspace, but I'll have to dig
> them up tomorrow and send them to you. Basically, it adds a per-thread
> lockmgr lock count in a thread-local variable, incrementing for each lock,
> and decrementing for each release, and then KASSERT()'s in userret that
> the variable is 0.

Thanks! I'm waiting for your patch.


--
Jun Kuriyama <kuri...@imgsrc.co.jp> // IMG SRC, Inc.
<kuri...@FreeBSD.org> // FreeBSD Project

Sławek Żak

unread,
Dec 3, 2003, 9:41:23 AM12/3/03
to
Robert Watson <rwa...@FreeBSD.org> writes:

> On Tue, 2 Dec 2003, Jun Kuriyama wrote:
>

>> At Mon, 1 Dec 2003 09:23:21 -0500 (EST),


>> Robert Watson wrote:
>> > This could be a sign of a VM or VFS lock leak or deadlock. I'd advise
>> > hooking up a serial console, dropping to DDB over serial line, and posting
>> > the results of "ps" and "show lockedvnods". We might then ask you to use
>> > the "show locks" command on various processes. You'll need to have DDB
>> > and WITNESS compiled in.
>>
>> I got it.
>>
>> http://www.imgsrc.co.jp/~kuriyama/BSD/lock-20031202.log
>
> "ouch"
>

> Could you try compiling in DEBUG_LOCKS into your kernel and doing "show

> lockedvnods" with that? Unfortunately, someone removed the pid from the


> output of that command, but didn't add the thread pointer to the DDB ps
> output, so you'll probably need to modify the lockmgr_printinfo() function
> in vfs_subr.c to print out lkp->lk_lockholder->td_proc->p_pid as well for

> exclusive locks. It looks like maybe something isn't releasing a vnode


> lock before returning to userspace. I have some patches to assert that no
> lockmgr locks are held on the return to userspace, but I'll have to dig
> them up tomorrow and send them to you. Basically, it adds a per-thread
> lockmgr lock count in a thread-local variable, incrementing for each lock,
> and decrementing for each release, and then KASSERT()'s in userret that
> the variable is 0.

I have the same problem. find is offending in my case:

db> ps
pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd
35331 c955654c f0bba000 0 91902 35331 0004002 [SLP]getblk 0xd5782e3c] find
85666 c9556388 f0bb9000 0 86354 85666 0004002 [CV]select 0xc06d0b04] top
86354 c91a4710 ee7f5000 0 29327 86354 0004002 [SLP]pause 0xc9544000] csh
29327 c91a4e20 ee7f9000 0 40805 29327 0000100 [CV]select 0xc06d0b04] sshd
91902 c9385710 f0b54000 0 44297 91902 0004002 [SLP]pause 0xc9566000] csh
44297 c9273000 f0a46000 0 40805 44297 0000100 [CV]select 0xc06d0b04] sshd
89288 c92731c4 f0a8f000 0 1 89288 0004002 [SLP]ttyin 0xc91ea010] getty
30656 c9557388 f0bc2000 0 1 30656 0004002 [SLP]ttyin 0xc93c4010] getty
23066 c8f1e710 e60d6000 0 1 23066 0004002 [SLP]ttyin 0xc92bfe10] getty
3582 c9557000 f0bc0000 0 1 3582 0004002 [SLP]ttyin 0xc93c3210] getty
38198 c9273c5c f0a95000 0 1 38198 0004002 [SLP]ttyin 0xc93c3810] getty
32813 c95568d4 f0bbc000 0 1 32813 0004002 [SLP]ttyin 0xc93c3e10] getty
25214 c9556a98 f0bbd000 0 1 25214 0004002 [SLP]ttyin 0xc93c4410] getty
85142 c9231000 f09ea000 0 1 85142 0004002 [SLP]ttyin 0xc91ea210] getty
27067 c9385e20 f0ba0000 0 1 27067 0004002 [SLP]ttyin 0xc8fc1410] getty
89169 c955754c f0bc3000 0 1 89169 0000000 [SLP]nanslp 0xc06a971c] cron
40805 c9231710 f0a36000 0 1 40805 0000100 [CV]select 0xc06d0b04] sshd
13151 c92a1710 f0aff000 0 1 13151 0000000 [CV]select 0xc06d0b04] syslogd
83 c91a3388 ee7c6000 0 0 0 0000204 [SLP]- 0xc06d312c] nfsiod 3
82 c91a354c ee7c7000 0 0 0 0000204 [SLP]- 0xc06d3128] nfsiod 2
81 c91a3710 ee7c8000 0 0 0 0000204 [SLP]- 0xc06d3124] nfsiod 1
80 c91a38d4 ee7c9000 0 0 0 0000204 [SLP]- 0xc06d3120] nfsiod 0
79 c91a3a98 ee7ca000 0 0 0 0000204 [SLP]vlruwt 0xc91a3a98] vnlru
78 c91a3c5c ee7cb000 0 0 0 0000204 [SLP]syncer 0xc06a90e0] syncer
77 c91a3e20 ee7cc000 0 0 0 0000204 [SLP]psleep 0xc06d0f6c] bufdaen
76 c91a4000 ee7cd000 0 0 0 000020c [SLP]pgzero 0xc06d98e8] pagezeo
75 c8ee7a98 e60a2000 0 0 0 0000204 [SLP]psleep 0xc06d9940] vmdaemn
74 c8ee7c5c e60a3000 0 0 0 0000204 [SLP]psleep 0xc06d992c] pagedan
73 c8ee7e20 e60a4000 0 0 0 0000204 [IWAIT] swi0: tty:sio
9 c8f1b000 e60a5000 0 0 0 0000204 [SLP]aifthd 0xc8f1b000] aac0aif
8 c8f1b1c4 e60a6000 0 0 0 0000204 [SLP]actask 0xc06a0dcc] acpi_t2
7 c8f1b388 e60a7000 0 0 0 0000204 [SLP]actask 0xc06a0dcc] acpi_t1
6 c8f1b54c e60a8000 0 0 0 0000204 [SLP]actask 0xc06a0dcc] acpi_t0
72 c8f1b710 e60a9000 0 0 0 0000204 new [IWAIT] swi5:+
5 c8f1b8d4 e60aa000 0 0 0 0000204 [SLP]tqthr 0xc06aab68] taskquee
71 c8f1ba98 e60ab000 0 0 0 0000204 new [IWAIT] swi6:+
70 c8f1bc5c e60ac000 0 0 0 0000204 [IWAIT] swi7: task queue
69 c8f1be20 e60ad000 0 0 0 0000204 [IWAIT] swi7: acpitaskq
68 c8f1e000 e60ae000 0 0 0 0000204 new [IWAIT] swi3: cambio
67 c8f1e1c4 e60af000 0 0 0 0000204 new [IWAIT] swi2: camnet
66 c8f1e388 e60b0000 0 0 0 0000204 [SLP]- 0xc06a1040] random
4 c8ed21c4 e604c000 0 0 0 0000204 [SLP]- 0xc06a3d60] g_down
3 c8ed2388 e604d000 0 0 0 0000204 [SLP]- 0xc06a3d5c] g_up
2 c8ed254c e604e000 0 0 0 0000204 [SLP]- 0xc06a3d54] g_event
65 c8ed2710 e604f000 0 0 0 0000204 [IWAIT] swi1: net
64 c8ed28d4 e6050000 0 0 0 0000204 new [IWAIT] swi4: vm
63 c8ed2a98 e6051000 0 0 0 000020c [LOCK Giant c9299200] swi8: tty:sio clock
62 c8ed2c5c e6052000 0 0 0 0000204 new [IWAIT] irq0: clk
61 c8ed2e20 e6053000 0 0 0 0000204 new [IWAIT] irq47:
60 c8ee7000 e6054000 0 0 0 0000204 new [IWAIT] irq46:
59 c8ee71c4 e6055000 0 0 0 0000204 new [IWAIT] irq45:
58 c8ee7388 e6056000 0 0 0 0000204 new [IWAIT] irq44:
57 c8ee754c e6057000 0 0 0 0000204 new [IWAIT] irq43:
56 c8ee7710 e6058000 0 0 0 0000204 new [IWAIT] irq42:
55 c8ee78d4 e60a1000 0 0 0 0000204 new [IWAIT] irq41:
54 c8ec3a98 e601b000 0 0 0 0000204 new [IWAIT] irq40:
53 c8ec3c5c e601c000 0 0 0 0000204 new [IWAIT] irq39:
52 c8ec3e20 e601d000 0 0 0 0000204 new [IWAIT] irq38:
51 c8ed0000 e601e000 0 0 0 0000204 new [IWAIT] irq37:
50 c8ed01c4 e601f000 0 0 0 0000204 new [IWAIT] irq36:
49 c8ed0388 e6020000 0 0 0 0000204 new [IWAIT] irq35:
48 c8ed054c e6021000 0 0 0 0000204 new [IWAIT] irq34:
47 c8ed0710 e6022000 0 0 0 0000204 new [IWAIT] irq33:
46 c8ed08d4 e6023000 0 0 0 0000204 new [IWAIT] irq32:
45 c8ed0a98 e6024000 0 0 0 0000204 new [IWAIT] irq31:
44 c8ed0c5c e6049000 0 0 0 0000204 [IWAIT] irq30: aac0
43 c8ed0e20 e604a000 0 0 0 0000204 new [IWAIT] irq29: bge1
42 c8ed2000 e604b000 0 0 0 0000204 [IWAIT] irq28: bge0
41 c8eb554c e5feb000 0 0 0 0000204 new [IWAIT] irq27:
40 c8eb5710 e5fec000 0 0 0 0000204 new [IWAIT] irq26:
39 c8eb58d4 e5fed000 0 0 0 0000204 new [IWAIT] irq25:
38 c8eb5a98 e5fee000 0 0 0 0000204 new [IWAIT] irq24:
37 c8eb5c5c e5fef000 0 0 0 0000204 new [IWAIT] irq23:
36 c8eb5e20 e5ff0000 0 0 0 0000204 new [IWAIT] irq22:
35 c8ec3000 e5ff1000 0 0 0 0000204 new [IWAIT] irq21:
34 c8ec31c4 e6016000 0 0 0 0000204 new [IWAIT] irq20:
33 c8ec3388 e6017000 0 0 0 0000204 new [IWAIT] irq19:
32 c8ec354c e6018000 0 0 0 0000204 new [IWAIT] irq18:
31 c8ec3710 e6019000 0 0 0 0000204 new [IWAIT] irq17:
30 c8ec38d4 e601a000 0 0 0 0000204 new [IWAIT] irq16:
29 c3bc51c4 e3db8000 0 0 0 0000204 new [IWAIT] irq15: ata1
28 c3bc5388 e3db9000 0 0 0 0000204 [IWAIT] irq14: ata0
27 c3bc554c e3dba000 0 0 0 0000204 new [IWAIT] irq13:
26 c3bc5710 e3dbb000 0 0 0 0000204 new [IWAIT] irq12: psm0
25 c3bc58d4 e3de0000 0 0 0 0000204 new [IWAIT] irq11:
24 c3bc5a98 e3de1000 0 0 0 0000204 new [IWAIT] irq10:
23 c3bc5c5c e3de2000 0 0 0 0000204 new [IWAIT] irq9: acpi0
22 c3bc5e20 e3de3000 0 0 0 0000204 new [IWAIT] irq8: rtc
21 c8eb5000 e5fe8000 0 0 0 0000204 new [IWAIT] irq7:
20 c8eb51c4 e5fe9000 0 0 0 0000204 [IWAIT] irq6: fdc0
19 c8eb5388 e5fea000 0 0 0 0000204 new [IWAIT] irq5:
18 c3bbe000 e3d66000 0 0 0 0000204 new [IWAIT] irq4: sio0
17 c3bbe1c4 e3daf000 0 0 0 0000204 new [IWAIT] irq3: sio1
16 c3bbe388 e3db0000 0 0 0 0000204 new [IWAIT] irq2:
15 c3bbe54c e3db1000 0 0 0 0000204 [CPU 0] irq1: atkbd0
14 c3bbe710 e3db2000 0 0 0 000020c [Can run] idle: cpu0
13 c3bbe8d4 e3db3000 0 0 0 000020c [CPU 1] idle: cpu1
12 c3bbea98 e3db4000 0 0 0 000020c [CPU 2] idle: cpu2
11 c3bbec5c e3db5000 0 0 0 000020c [CPU 3] idle: cpu3
1 c3bbee20 e3db6000 0 0 1 0004200 [SLP]wait 0xc3bbee20] init
10 c3bc5000 e3db7000 0 0 0 0000204 [CV]ktrace 0xc06a7304] ktrace
0 c06a3e60 c081f000 0 0 0 0000200 [SLP]sched 0xc06a3e60] swapper
db> show lockedvnods
Locked vnodes
0xc96734a0: tag ufs, type VDIR, usecount 0, writecount 0, refcount 1, flags (VV_OBJBUF), l0
ino 4764471, on dev aacd0s2d (4, 21)
db> trace 35331
sched_switch(c9386c80,2,c0657786,1d4,5a0daf6f) at sched_switch+0x9c
mi_switch(c9386c80,50,c0657786,ca,1) at mi_switch+0x20a
msleep(d5782e3c,c06a8868,50,c065d71e,0) at msleep+0x4b5
acquire(f0b78abc,2000020,600,e7,c9386c80) at acquire+0xa0
debuglockmgr(d5782e3c,2090022,c9259128,c9386c80,c0650ee8) at debuglockmgr+0x3f7
getblk(c9259128,48813a0,0,4000,0) at getblk+0x184
breadn(c9259128,48813a0,0,4000,0) at breadn+0x52
bread(c9259128,48813a0,0,4000,0) at bread+0x4c
ffs_update(c96734a0,0,1,54,246) at ffs_update+0x1ff
ufs_inactive(f0b78c78,f0b78c90,c0548934,f0b78c78,0) at ufs_inactive+0x1f0
ufs_vnoperate(f0b78c78,0,c065e9c5,89a,897) at ufs_vnoperate+0x18
vrele(c96734a0,0,c065f03f,2cb,2b0) at vrele+0x147
fchdir(c9386c80,f0b78d14,c066d42a,3ee,1) at fchdir+0x1ca
syscall(2f,2f,2f,5,8055080) at syscall+0x292
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (13, FreeBSD ELF32, fchdir), eip = 0x280c4fcf, esp = 0xbfbfeb0c, ebp = 0xbfbfe-
db>

/S

Kirk McKusick

unread,
Dec 3, 2003, 2:44:26 PM12/3/03
to
I am guessing that some of the recent locking changes are causing
the problem. Unfortunately I am on the road now through Jan 4th,
so will not be in a position to look at it. Hopefully one of the
folks working on getting the SMP pushed down through the filesystem
(Jeff Roberson, John Baldwin, or Alan Cox) will have some idea what
broke recently. I would try looking at which process holds the
buffer lock that the find is trying to get. You can usually unravel
the chain of locks to eventually find what pair of events lead to
the deadlock. It definitely helps to have DEBUG_LOCKS compiled into
your kernel.

Kirk McKusick

0 new messages