mutex lock panic on halt/shutdown unmounting GPT/RAID filesystem

John D. Baker

unread,

Jul 5, 2011, 11:57:48 AM7/5/11

to

I'm building (built, rather, now testing) a machine to be my new file
server. I'm using a NetBSD-5.1_STABLE/i386 kernel made from GENERIC
plus "options RF_RAID5_RS=1" to get RAID-5 w/rotated sparing. The raid
is built across 8 1TB Hitachi SATA drives using a 4-port siisata
(sii3124-based) PCIe card and the onboard ahcisata ports of an intel
D945GCL board. The system disk is on the parallel ATA bus.

The machine has 4GB of RAM, although only a little over 3.5GB is
actually visible, naturally. As such, I only defined a token 2GB
of swap space.

I have the RAID in a single filesystem defined using GPT and wedges.
All file systems are mounted with the "log" option.

Prior to having the RAID intialized, newfs'ed and mounted, nothing
appeared amiss.

Once the big filesystem was online, the system will panic when shutting
down (e.g.: shutdown -r now) with "mutex lock error: locking against
myself". Unfortunately it all goes by too fast to read. It saves the
crashdump, but I think the subsequent savecore has problems--the memory
image is there, but the kernel image is only 10 bytes in size. It
displays "(null) bad address".

This appears to happen AFTER the big filesystem is successfully unmounted
but BEFORE the system disk's filesystems have been unmounted. Console
and dmesg indicates that the OS partitions get the log-replay treatment
when starting up again. The RAID and its filesystem are intact.

I've determined that if I manually unmount the GPT/RAID filesystem
before shutting down it doesn't panic.

Any clues?

Need more data?

--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-...@muc.de

David Young

unread,

Jul 5, 2011, 1:23:37 PM7/5/11

to

On Tue, Jul 05, 2011 at 10:57:48AM -0500, John D. Baker wrote:
> I'm building (built, rather, now testing) a machine to be my new file
> server. I'm using a NetBSD-5.1_STABLE/i386 kernel made from GENERIC
> plus "options RF_RAID5_RS=1" to get RAID-5 w/rotated sparing. The raid
> is built across 8 1TB Hitachi SATA drives using a 4-port siisata
> (sii3124-based) PCIe card and the onboard ahcisata ports of an intel
> D945GCL board. The system disk is on the parallel ATA bus.
>
> The machine has 4GB of RAM, although only a little over 3.5GB is
> actually visible, naturally. As such, I only defined a token 2GB
> of swap space.
>
> I have the RAID in a single filesystem defined using GPT and wedges.
> All file systems are mounted with the "log" option.
>
> Prior to having the RAID intialized, newfs'ed and mounted, nothing
> appeared amiss.
>
> Once the big filesystem was online, the system will panic when shutting
> down (e.g.: shutdown -r now) with "mutex lock error: locking against
> myself". Unfortunately it all goes by too fast to read. It saves the
> crashdump, but I think the subsequent savecore has problems--the memory
> image is there, but the kernel image is only 10 bytes in size. It
> displays "(null) bad address".

Can you 'sysctl -w ddb.onpanic=1; shutdown -r now' and type 'bt' for a
backtrace?

Dave

--
David Young OJC Technologies
dyo...@ojctech.com Urbana, IL * (217) 344-0444 x24

Greg Oster

unread,

Jul 5, 2011, 1:25:59 PM7/5/11

to

It may not be related, but I suspect you may be the first person to be
testing the RF_RAID5_RS bits :) It wouldn't surprise me if there were a
few wrinkles with that code and NetBSD.

> Need more data?

Probably... (But I won't have a chance to look at this for at least 3
weeks... )

Later...

Greg Oster

David Laight

unread,

Jul 6, 2011, 3:26:15 AM7/6/11

to

On Tue, Jul 05, 2011 at 10:26:03PM -0500, John D. Baker wrote:
>
> Since it's in the test phase anyway, I can give -current a run on the
> target system, but it'll be a couple of days before I can.

You probably only need to boot a current kernel, the installed
userspace should be good enough.

David

--
David Laight: da...@l8s.co.uk

John D. Baker

unread,

Jul 5, 2011, 2:27:49 PM7/5/11

to

On Tue, 5 Jul 2011, Greg Oster wrote:

> It may not be related, but I suspect you may be the first person to be
> testing the RF_RAID5_RS bits :) It wouldn't surprise me if there were a
> few wrinkles with that code and NetBSD.

I've used it for years. On my previous file server, I actually got it to
work once (as in failed component replaced and reconstructed in place).
The other times, a second unit failed during reconstruction but the
raid continued to run OK in degraded mode.

The current file-server is built with it, but hasn't been put to the
test (no failed components since being placed into service in 2009).
The current fileserver is running NetBSD-4.0_STABLE. It has never
lost the RAID, but its system disk keeps eating itself and causing
panics (mostly freeing free block and mangled directory entry).
Also, it takes 17+ hours to check/reinitialize parity, so that's why I'm
building another one on hardware that likes the netbsd-5 branch.

In response to David Young's posting:

Jul 5 13:21:25 yggdrasil shutdown: reboot by sysop:
Jul 5 13:21:33 yggdrasil syslogd: Exiting on signal 15
syncing disks... 7 done
unmounting file systems...Mutex error: mutex_vector_enter: locking against myself

lock address : 0x00000000c4d5a71c
current cpu : 0
current lwp : 0x00000000cf9557c0
owner field : 0x00000000cf9557c0 wait/spin: 0/0

panic: lock error
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip c057dc0c cs 8 eflags 246 cr2 bbb9890c ilevel 0
Stopped in pid 485.1 (reboot) at netbsd:breakpoint+0x4: popl %ebp
db{0}> bt
breakpoint(c0a89ee6,cfa097a8,c0ab9440,c04c7cbf,0,1,0,0,cfa097a8,cf9557c0) at net
bsd:breakpoint+0x4
panic(c0a4c51d,c0a4a16f,c0852252,c0a4a13e,c4d5a71c,0,cf9557c0,cf418730,c4d5a71c,
cf9557c0) at netbsd:panic+0x1b0
lockdebug_abort(c4d5a71c,c0ab62f0,c0852252,c0a4a13e,cf418730,1,cfa0987c,c049ce72
,c4d5a71c,c0852252) at netbsd:lockdebug_abort+0x2d
mutex_abort(c4d5a71c,c0852252,c0a4a13e,cf418730,a800,0,cfa0981c,c0514314,c0b7ac2
0,cdeb59c0) at netbsd:mutex_abort+0x2e
mutex_vector_enter(c4d5a71c,0,0,4,7,c4bfe000,cfa098cc,cdeb5740,a8,a8) at netbsd:
mutex_vector_enter+0x262
dkwedge_del(cfa098d0,cf35de48,10,306b64,c0b15b8c,cdeb5740,1,c081019d,cdeb5748,c4
d6c488) at netbsd:dkwedge_del+0x188
dkwedge_delall(c4cb2828,c0aa3220,0,c0826120,1203,cf9557c0,cfa099fc,c04bdbc4,1203
,3) at netbsd:dkwedge_delall+0x61
raidclose(1203,3,6000,cf9557c0,6000,3,6,3,cf6cc5c0,0) at netbsd:raidclose+0x12f
bdev_close(1203,3,6000,cf9557c0,0,0,cfa09a4c,1203,6000,0) at netbsd:bdev_close+0
x84
spec_close(cfa09a58,20002,cfa09a6c,c0509038,cf6cc5c0,c08537a0,cf6cc5c0,3,fffffff
f,3) at netbsd:spec_close+0x237
VOP_CLOSE(cf6cc5c0,3,ffffffff,c04f7335,c4d5a600,0,cfa09aac,c046a9e6,cf6cc5c0,3) a
t netbsd:VOP_CLOSE+0x6c
vn_close(cf6cc5c0,3,ffffffff,c0850ac0,a800,cf9557c0,cfa09adc,c04bdbc4,a800,3) at
netbsd:vn_close+0x4e
dkclose(a800,3,6000,cf9557c0,6000,3,6,3,cf4187e8,0) at netbsd:dkclose+0xc6
bdev_close(a800,3,6000,cf9557c0,0,0,cfa09b2c,a800,6000,0) at netbsd:bdev_close+0
x84
spec_close(cfa09b38,20002,cfa09b4c,c0509038,cf4187e8,c08537a0,cf4187e8,3,fffffff
f,c57ed000) at netbsd:spec_close+0x237
VOP_CLOSE(cf4187e8,3,ffffffff,c03b9066,0,c4d6c8cc,c4d6c880,cf701000,cf701000,cf7
01024) at netbsd:VOP_CLOSE+0x6c
ffs_unmount(cf701000,80000,cf8d63c0,0,cf701000,0,cfa09bcc,c050732f,cf701000,8000
0) at netbsd:ffs_unmount+0x1c9
VFS_UNMOUNT(cf701000,80000,cf8d63c0,0,2001018,cf6cc398,1,cf701000,cf700000,cf955
7c0) at netbsd:VFS_UNMOUNT+0x26
dounmount(cf701000,80000,cf9557c0,0,cfa09c08,7,0,cf9557c0,cfa09d00,0) at netbsd:
dounmount+0x13f
vfs_unmountall(cf9557c0,0,0,c048976d,cde8b630,0,cfa09c3c,c058456b,0,cf9557c0) at
netbsd:vfs_unmountall+0x63
vfs_shutdown(0,cf9557c0,0,0,cfa09d00,0,cfa09cec,c04b83c4,0,0) at netbsd:vfs_shut
down+0x8d
cpu_reboot(0,0,0,0,0,0,c0a4b571,0,23,fffffffe) at netbsd:cpu_reboot+0x13b
sys_reboot(cf9557c0,cfa09d00,cfa09d28,0,0,bfbfeed8,8049144,2,1,1) at netbsd:sys_
reboot+0x74
syscall(cfa09d48,b3,ab,1f,1f,1,d,bfbfeed8,0,256) at netbsd:syscall+0xbd
db{0}>

Hope this helps. It'll be a couple of days before I can poke at the
machine again.

John D. Baker

unread,

Jul 10, 2011, 9:40:34 PM7/10/11

to

I put -current on the machine in question and it shuts down and reboots
without problems.