Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Panic when removing a SCSI device entry

1 view
Skip to first unread message

Joerg Wunsch

unread,
May 8, 2011, 4:53:14 AM5/8/11
to
I've got a setup where a tape library is attached with a
computer-controllable power switch, so it is only turned on during the
time when backups (or restores) are done. This is mainly to reduce
the noise level, but also to reduce the overall power consumption
energy while that library is not needed.

Every now and then, the kernel panics with a page fault during the
(unattented, it happens at night times) power cycling and surrounding
actions. The current process when the page fault happens is always
mt(1), which is used inside the powerup/down script to ensure the
drive is being properly rewound. The page fault happens in
destroy_devl(), at this location:

/* If we are a child, remove us from the parents list */
if (dev->si_flags & SI_CHILD) {
here --->>> LIST_REMOVE(dev, si_siblings);
dev->si_flags &= ~SI_CHILD;
}

The preprocessed code of that looks like:

if (dev->si_flags & 0x0010) {
if ((((dev))->si_siblings.le_next) != ((void *)0))
(((dev))->si_siblings.le_next)->si_siblings.le_prev =
(dev)->si_siblings.le_prev;
*(dev)->si_siblings.le_prev = (((dev))->si_siblings.le_next);
dev->si_flags &= ~0x0010;
}

and it's the indirection of *(dev)->si_siblings.le_prev that hits a
NULL pointer. Obviously, LIST_REMOVE doesn't anticipate that
dev->si_siblings.le_prev might be a NULL pointer, so this is a usage
error, somehow. Could it be that destroy_devl() is called twice for
the same device?

This used to happen on an earlier system (some version of 7.x-stable),
and I eventually managed it to tweak the powerup/down scripts of the
library so to avoid the critical sequence of actions triggering this
situation. Now that I finally upgraded the machine to 8.2-STABLE,
it is triggered very frequently again though.

Any ideas how to fix it, or at least apply a workaround, other than
turning

*(elm)->field.le_prev = LIST_NEXT((elm), field); \

in the LIST_REMOVE macro into

if ((elm)->field.le_prev != NULL) \
*(elm)->field.le_prev = LIST_NEXT((elm), field); \

which affects the entire system, not just the SCSI subsystem part?

--
cheers, J"org .-.-. --... ...-- -.. . DL8DTL

http://www.sax.de/~joerg/ NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)
_______________________________________________
freebs...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "freebsd-scsi...@freebsd.org"

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-...@muc.de

Kostik Belousov

unread,
May 8, 2011, 5:45:09 AM5/8/11
to
Is it NULL pointer dereference ? See below.

> dev->si_siblings.le_prev might be a NULL pointer, so this is a usage
> error, somehow. Could it be that destroy_devl() is called twice for
> the same device?
>
> This used to happen on an earlier system (some version of 7.x-stable),
> and I eventually managed it to tweak the powerup/down scripts of the
> library so to avoid the critical sequence of actions triggering this
> situation. Now that I finally upgraded the machine to 8.2-STABLE,
> it is triggered very frequently again though.
>
> Any ideas how to fix it, or at least apply a workaround, other than
> turning
>
> *(elm)->field.le_prev = LIST_NEXT((elm), field); \
>
> in the LIST_REMOVE macro into
>
> if ((elm)->field.le_prev != NULL) \
> *(elm)->field.le_prev = LIST_NEXT((elm), field); \
>
> which affects the entire system, not just the SCSI subsystem part?

Please provide the full printout from the panic. Also, it would
be useful to get the dump and do "p *dev" from the frame of
destroy_devl(). I might need further information after the requested
data is provided.

Thing you may try meantime is the following patch.

diff --git a/sys/kern/kern_conf.c b/sys/kern/kern_conf.c
index b2be5cc..59b876c 100644
--- a/sys/kern/kern_conf.c
+++ b/sys/kern/kern_conf.c
@@ -981,6 +981,8 @@ destroy_devl(struct cdev *dev)
/* Remove name marking */
dev->si_flags &= ~SI_NAMED;

+ dev->si_refcount++; /* Avoid race with dev_rel() */
+


/* If we are a child, remove us from the parents list */
if (dev->si_flags & SI_CHILD) {

LIST_REMOVE(dev, si_siblings);
@@ -997,7 +999,6 @@ destroy_devl(struct cdev *dev)
dev->si_flags &= ~SI_CLONELIST;
}

- dev->si_refcount++; /* Avoid race with dev_rel() */
csw = dev->si_devsw;
dev->si_devsw = NULL; /* already NULL for SI_ALIAS */
while (csw != NULL && csw->d_purge != NULL && dev->si_threadcount) {

Joerg Wunsch

unread,
May 8, 2011, 6:45:43 AM5/8/11
to
As Kostik Belousov wrote:


> > and it's the indirection of *(dev)->si_siblings.le_prev that hits a
> > NULL pointer. Obviously, LIST_REMOVE doesn't anticipate that

> Is it NULL pointer dereference ? See below.

Yes, the fault address in the page fault is 0.

> Please provide the full printout from the panic. Also, it would
> be useful to get the dump and do "p *dev" from the frame of
> destroy_devl(). I might need further information after the requested
> data is provided.

Unfortunately, I somehow cannot get the system to provide a coredump.

The dmesg printout from the panic is:

sa0 at sym0 bus 0 scbus1 target 0 lun 0
sa0: <QUANTUM DLT7000 2560> Removable Sequential Access SCSI-2 device
sa0: 20.000MB/s transfers (10.000MHz, offset 15, 16bit)
(sa0:sym0:0:0:0): removing device entry


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x0
fault code = supervisor write, page not present
instruction pointer = 0x20:0xc052f346
stack pointer = 0x28:0xe98504a0
frame pointer = 0x28:0xe98504c4
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 52518 (mt)
trap number = 12
panic: page fault
cpuid = 0
Uptime: 1d4h55m31s

(This includes the sa0 device arrival/removal messages.)

The disassembly of the respective part of destroy_devl() is:

0xc052f32e <destroy_devl+30>: test $0x10,%dl
0xc052f331 <destroy_devl+33>: je 0xc052f34c <destroy_devl+60>
0xc052f333 <destroy_devl+35>: mov 0x4c(%esi),%edx
0xc052f336 <destroy_devl+38>: test %edx,%edx
0xc052f338 <destroy_devl+40>: je 0xc052f340 <destroy_devl+48>
0xc052f33a <destroy_devl+42>: mov 0x50(%esi),%eax
0xc052f33d <destroy_devl+45>: mov %eax,0x50(%edx)
0xc052f340 <destroy_devl+48>: mov 0x50(%esi),%edx
0xc052f343 <destroy_devl+51>: mov 0x4c(%esi),%eax
0xc052f346 <destroy_devl+54>: mov %eax,(%edx)
0xc052f348 <destroy_devl+56>: andl $0xffffffef,0x4(%esi)

I could perhaps setup a serial console, so to get at least DDB
functioning if you'd like to see more details. A remote GDB might
also be possible, but will require more work (setting up the
respective environment on a second machine).

> Thing you may try meantime is the following patch.

OK, I'll do that tonight, so let's see how the subsequent nightly
backups proceed.

Kostik Belousov

unread,
May 8, 2011, 7:33:32 AM5/8/11
to
On Sun, May 08, 2011 at 12:45:43PM +0200, Joerg Wunsch wrote:
> As Kostik Belousov wrote:
>
>
> > > and it's the indirection of *(dev)->si_siblings.le_prev that hits a
> > > NULL pointer. Obviously, LIST_REMOVE doesn't anticipate that
>
Serial console is fine, I do want to see a backtrace.
There is also "show cdev" command in ddb, that might provide some
useful information.

INVARIANTS may be also useful, since the kernel might catch the corruption
earlier.


>
> > Thing you may try meantime is the following patch.
>
> OK, I'll do that tonight, so let's see how the subsequent nightly
> backups proceed.
>

Joerg Wunsch

unread,
May 8, 2011, 4:36:34 PM5/8/11
to
As Kostik Belousov wrote:

> > I could perhaps setup a serial console, so to get at least DDB

> > functioning if you'd like to see more details. ...

> Serial console is fine, I do want to see a backtrace.
> There is also "show cdev" command in ddb, that might provide some
> useful information.

OK, I'm setting up a DDB kernel right now, and attached an old laptop
as the console terminal. I also applied your suggested patch.

> INVARIANTS may be also useful, since the kernel might catch the
> corruption earlier.

As INVARIANTS has a performance impact, I'd like to avoid that by now.
Let's see first whether an analysis is possible without that. If not,
would it suffice to just compile kern_conf.c with INVARIANTS?

Kostik Belousov

unread,
May 8, 2011, 4:45:13 PM5/8/11
to
On Sun, May 08, 2011 at 10:36:34PM +0200, Joerg Wunsch wrote:
> As Kostik Belousov wrote:
>
> > > I could perhaps setup a serial console, so to get at least DDB
> > > functioning if you'd like to see more details. ...
>
> > Serial console is fine, I do want to see a backtrace.
> > There is also "show cdev" command in ddb, that might provide some
> > useful information.
>
> OK, I'm setting up a DDB kernel right now, and attached an old laptop
> as the console terminal. I also applied your suggested patch.
Great.

>
> > INVARIANTS may be also useful, since the kernel might catch the
> > corruption earlier.
>
> As INVARIANTS has a performance impact, I'd like to avoid that by now.
> Let's see first whether an analysis is possible without that. If not,
> would it suffice to just compile kern_conf.c with INVARIANTS?

No, it is not enough to compile only kern_conf.c with INVARIANTS.
Performance impact is not that huge, and definitely pays it cost
for such problems.

Joerg Wunsch

unread,
May 18, 2011, 2:04:29 AM5/18/11
to
As Joerg Wunsch wrote:

> > Please provide the full printout from the panic. Also, it would
> > be useful to get the dump and do "p *dev" from the frame of
> > destroy_devl(). I might need further information after the requested
> > data is provided.
>
> Unfortunately, I somehow cannot get the system to provide a coredump.

OK, it happened again last night, and I've got a DDB trace now. The
panic is at a slightly different location (in notify_destroy()), but
still a null pointer (apparently, dev->si_name is NULL now).

[thread pid 33502 tid 100246 ]
Stopped at strlen+0x8: cmpb $0,0(%edx)
db> bt
Tracing pid 33502 tid 100246 td 0xc8be92e0
strlen(0,c6dfc804,cc0b0e80,cc6e6800,e98804b8,...) at strlen+0x8
notify(ce0dc900,0,0,cc6e6800,c05ac3fb,...) at notify+0x3f
destroy_devl(e98804f4,c0470a2b,ce0dc900,c07e9284,1,...) at destroy_devl+0x17b
destroy_dev(ce0dc900,c07e9284,1,0,e988051c,...) at destroy_dev+0x10
sacleanup(cc0b0e80,c07f161b,12,0,e9880570,...) at sacleanup+0x8b
camperiphfree(50,e9880994,c044b4de,e98809ac,c6e83c80,...) at camperiphfree+0x8f
cam_periph_release_locked(cc0b0e80,0,cc0b0e80,e98809bc,c044b762,...) at cam_periph_release_locked+0x55
cam_periph_release(cc0b0e80,14c,cc814200,e98809fc,e98809e8,...) at cam_periph_release+0x60
saopen(cc814200,1,2000,c8be92e0,c07cc465,...) at saopen+0x263
giant_open(cc814200,1,2000,c8be92e0,e9880b08,...) at giant_open+0x93
devfs_open(e9880b08,e9880b30,c061c4fa,c0840e60,e9880b08,...) at devfs_open+0x102
VOP_OPEN_APV(c0840e60,e9880b08,c075ad1a,cacbe788,0,...) at VOP_OPEN_APV+0x42
vn_open_cred(e9880b78,e9880c2c,0,0,c7fba280,...) at vn_open_cred+0x4ba
vn_open(e9880b78,e9880c2c,0,c7f49150,3,...) at vn_open+0x3b
kern_openat(c8be92e0,ffffff9c,804a0bb,0,1,...) at kern_openat+0x12c
kern_open(c8be92e0,804a0bb,0,0,6,...) at kern_open+0x35
open(c8be92e0,e9880cec,0,c,28176088,...) at open+0x30
syscallenter(c8be92e0,e9880ce4,e9880d1c,c07ad276,c8be92e0,...) at syscallenter+0x329
syscall(e9880d28) at syscall+0x34
Xint0x80_syscall() at Xint0x80_syscall+0x21
syscall (5, FreeBSD ELF32, open), eip = 0x2817608f, esp = 0xbfbfec7c, ebp = 0xbfbfee18 ---
db> show reg
cs 0x20
ds 0x28
es 0x28
fs 0x8
ss 0x28
eax 0
ecx 0x8
edx 0
ebx 0x2
esp 0xe9880468
ebp 0xe9880468
esi 0xce0dc900
edi 0xcc6e6800
eip 0xc0620568 strlen+0x8
efl 0x10202
strlen+0x8: cmpb $0,0(%edx)
db> show cdev
geom.ctl 0xc6d1a100
devctl 0xc6ccc700
console 0xc6ccc600
sndstat 0xc6ccc500
ptmx 0xc6ccc400
ctty 0xc6ccc300
mem 0xc6ccc200
kmem 0xc6db3800
audit 0xc6db3700
bpf 0xc6db3600
bpf0 0xc6db3500
null 0xc6db3400
zero 0xc6db3300
fd/0 0xc6db3200
stdin 0xc6db3100
fd/1 0xc6db3000
stdout 0xc6db2e00
fd/2 0xc6db2d00
stderr 0xc6db2c00
klog 0xc6db2b00
pci 0xc6db2a00
midistat 0xc6db2900
kbdmux0 0xc6db2700
kbd0 0xc6db2600
random 0xc6db2400
urandom 0xc6db2300
sysmouse 0xc6db2200
io 0xc6db2100
speaker 0xc6db2000
fido 0xc6d1be00
ata 0xc6d1bd00
acpi 0xc6d1b800
ttyu2 0xc6e7dd00
ttyu2.init 0xc6e7d800
ttyu2.lock 0xc6e7d700
cuau2 0xc6e7d600
cuau2.init 0xc6e7d500
cuau2.lock 0xc6e7d400
ttyu3 0xc6e7d000
ttyu3.init 0xc6e7ce00
ttyu3.lock 0xc6e7cd00
cuau3 0xc6e7cc00
cuau3.init 0xc6e7cb00
cuau3.lock 0xc6e7ca00
ttyu4 0xc6e7c600
ttyu4.init 0xc6e7c500
ttyu4.lock 0xc6e7c800
cuau4 0xc6e7c900
cuau4.init 0xc6e7d200
cuau4.lock 0xc6e7d300
ttyu5 0xc6e7e400
ttyu5.init 0xc6e7e500
ttyu5.lock 0xc6e7e600
cuau5 0xc6e7e700
cuau5.init 0xc6e7e800
cuau5.lock 0xc6e7e900
ttyu6 0xc6e7e000
ttyu6.init 0xc6f01e00
ttyu6.lock 0xc6f01d00
cuau6 0xc6f01c00
cuau6.init 0xc6f01b00
cuau6.lock 0xc6f01a00
ttyu7 0xc6f01600
ttyu7.init 0xc6f01500
ttyu7.lock 0xc6f01400
cuau7 0xc6f01300
cuau7.init 0xc6f01200
cuau7.lock 0xc6f01100
ttyu8 0xc6f00c00
ttyu8.init 0xc6f00b00
ttyu8.lock 0xc6f00a00
cuau8 0xc6f00900
cuau8.init 0xc6f00800
cuau8.lock 0xc6f00700
ttyu9 0xc6f00300
ttyu9.init 0xc6f00200
ttyu9.lock 0xc6f00100
cuau9 0xc6f00000
cuau9.init 0xc6e7fe00
cuau9.lock 0xc6e7fd00
ttyv0 0xc6f01000
ttyv1 0xc6f01800
ttyv2 0xc6fa5d00
ttyv3 0xc6fa5c00
ttyv4 0xc6fa5b00
ttyv5 0xc6fa5a00
ttyv6 0xc6fa5900
ttyv7 0xc6fa5800
ttyv8 0xc6fa5700
ttyv9 0xc6fa5600
ttyva 0xc6fa5500
ttyvb 0xc6fa5400
ttyvc 0xc6fa5300
ttyvd 0xc6fa5200
ttyve 0xc6fa5100
ttyvf 0xc6fa5000
consolectl 0xc6fa4e00
lpt0 0xc6fa4b00
lpt0.ctl 0xc6fa4a00
ppi0 0xc6fa4900
ttyu0 0xc6fa4600
ttyu0.init 0xc6fa4500
ttyu0.lock 0xc6fa4400
cuau0 0xc6fa4300
cuau0.init 0xc6fa4200
cuau0.lock 0xc6fa4100
usbctl 0xc71d6d00
mdctl 0xc71d6b00
devstat 0xc71d6a00
fd0 0xc71d6900
usb/0.1.0 0xc71d6700
ugen0.1 0xc71d6600
usb/1.1.0 0xc71d6500
ugen1.1 0xc71d6400
usb/0.1.1 0xc71d6300
usb/1.1.1 0xc71d5d00
xpt0 0xc71d5800
mixer0 0xc71d4a00
mixer1 0xc71d4000
mixer2 0xc7216a00
acd0 0xc7216100
ad4 0xc7216000
ad4s1 0xc7215e00
ad4s1b 0xc7215d00
ad4s1h 0xc7215c00
gvinum/sound 0xc728de00
gvinum/squid 0xc728dd00
gvinum/camel 0xc728dc00
gvinum/tmp 0xc728db00
gvinum/dump 0xc728da00
gvinum/bacula_db 0xc728d900
gvinum/junk 0xc728d800
gvinum/home 0xc728d700
gvinum/home_cvs 0xc728d600
gvinum/var 0xc728d500
gvinum/usr 0xc728d400
gvinum/local 0xc728d300
gvinum/root 0xc728d200
gvinum/obj 0xc728d100
gvinum/upload 0xc728d000
gvinum/mysql 0xc72a4400
gvinum/pdf 0xc72a4300
gvinum/distfiles 0xc72a4200
gvinum/news 0xc72a4100
gvinum/src 0xc72a4000
gvinum/ports 0xc72a3e00
gvinum/temp 0xc72a3d00
ufsid/4dd10a3a6f636a7d 0xc72a3100
usb/1.2.0 0xc72a2700
ugen1.2 0xc72a2600
usb/1.2.1 0xc7290800
cd0 0xc7290500
pass0 0xc7290700
pass1 0xc7290d00
pass2 0xc7290e00
da0 0xc72e8800
da0a 0xc72a2900
da0h 0xc72a2a00
da1 0xc72a2b00
ufsid/4856d98a00081994 0xc72a2c00
da1a 0xc72a2d00
da1h 0xc72e6700
usb/0.2.0 0xc72e7100
ugen0.2 0xc72e6e00
usb/0.2.1 0xc72e6c00
usb/0.3.0 0xc7375300
ugen0.3 0xc72a3400
usb/0.3.1 0xc7376d00
ukbd0 0xc7377200
kbd1 0xc72a3500
usb/0.4.0 0xc743a400
ugen0.4 0xc743a300
usb/0.4.1 0xc743a000
ums0 0xc7439500
usb/0.5.0 0xc7438c00
ugen0.5 0xc7438b00
usb/0.5.1 0xc7438a00
usb/0.6.0 0xc7501800
ugen0.6 0xc7501700
usb/0.6.2 0xc7501400
pf 0xc75d7500
nfslock 0xc7501a00
tap0 0xc7501100
apm0 0xc75d9600
dsp2.0 0xc82a3500
dsp1.0 0xc7f9dc00
dsp0.0 0xc7f9db00
pts/0 0xc8902d00
pts/1 0xc82a1600
pts/2 0xc89e0a00
pts/3 0xc8901a00
ptyp0 0xc8902c00
ttyp0 0xc82a3a00
pts/4 0xc819f100
pts/5 0xc89de200
pts/6 0xc8902600
tun0 0xc9f99300
pts/7 0xcc35a700
ptyp1 0xcc210a00
ttyp1 0xcc1b9600
pass3 0xce096c00
ch0 0xce113400
nsa0.0 0xcc814200
esa0.0 0xcdc50600
nsa0 0xcc81c800
esa0 0xcc871a00
sa0.1 0xce083c00
nsa0.1 0xccec1d00
esa0.1 0xccd63400
sa0.2 0xcc840e00
nsa0.2 0xcc7dc800
esa0.2 0xcc841500
sa0.3 0xce083400
nsa0.3 0xcdc50400
esa0.3 0xce084600
ptyp2 0xcf8b1000
ttyp2 0xcf929100
pass4 0xce989900
sa0.ctl 0xced5f400
sa0.0 0xce991100
nsa0.0 0xcea91c00
esa0.0 0xced17900
sa0 0xce71d500
nsa0 0xced60400
esa0 0xce956800
sa0.1 0xce68ab00
nsa0.1 0xceb10a00
esa0.1 0xced1f300
sa0.2 0xce6dd400
nsa0.2 0xcec9c800
esa0.2 0xce960100
sa0.3 0xcea91d00
nsa0.3 0xce9bb700
esa0.3 0xceb99d00
db> panic
panic: from debugger
cpuid = 0
Uptime: 1d5h4m38s
Physical memory: 3575 MB
Dumping 365 MB: 350 334 318 302 286 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14
Dump complete
Automatic reboot in 15 seconds - press a key on the console to abort
--> Press a key on the console to reboot,
--> or switch off the system now.
Rebooting...

As you can see, I've got a coredump this time, so I can run kgdb
on that.

Currently, I'm compiling an INVARIANTS kernel, and will boot that one
soon - though I wonder whether it really makes sense here, as the
picture is different from last time (due to Kostik's suggested
patch?).

One observation that comes to mind: with devices appearing and
disappearing, the CAM subsystem sometimes suffers from some confusion
if a device is still held open by the time it disappears on the bus.
The device then appears in "camcontrol devlist" as just "sa0", without
a pass device associated. When powering it on again, and reprobing
it, it becomes "sa0, pass4, sa0" or such.

Kostik Belousov

unread,
May 18, 2011, 12:51:20 PM5/18/11
to
On Wed, May 18, 2011 at 08:04:29AM +0200, Joerg Wunsch wrote:
> As Joerg Wunsch wrote:
>
> > > Please provide the full printout from the panic. Also, it would
> > > be useful to get the dump and do "p *dev" from the frame of
> > > destroy_devl(). I might need further information after the requested
> > > data is provided.
> >
> > Unfortunately, I somehow cannot get the system to provide a coredump.
>
> OK, it happened again last night, and I've got a DDB trace now. The
> panic is at a slightly different location (in notify_destroy()), but
> still a null pointer (apparently, dev->si_name is NULL now).
>
> [thread pid 33502 tid 100246 ]
> Stopped at strlen+0x8: cmpb $0,0(%edx)
> db> bt
> Tracing pid 33502 tid 100246 td 0xc8be92e0
> strlen(0,c6dfc804,cc0b0e80,cc6e6800,e98804b8,...) at strlen+0x8
> notify(ce0dc900,0,0,cc6e6800,c05ac3fb,...) at notify+0x3f
> destroy_devl(e98804f4,c0470a2b,ce0dc900,c07e9284,1,...) at destroy_devl+0x17b
> destroy_dev(ce0dc900,c07e9284,1,0,e988051c,...) at destroy_dev+0x10
The ddb arguments printed might be wrong, or might point to the issue
causing the panic.

Please do "p *(struct cdev_priv *)0xe98804f4" and
"p *(struct cdev_priv *)0xce0dc900" from kgdb.

...

> As you can see, I've got a coredump this time, so I can run kgdb
> on that.
>
> Currently, I'm compiling an INVARIANTS kernel, and will boot that one
> soon - though I wonder whether it really makes sense here, as the
> picture is different from last time (due to Kostik's suggested
> patch?).

I am pretty much sure that INVARIANTS kernel would hit the assert
about SI_NAMED flag being clear on destroy_devl() invocation.
We would have catched the issue earlier, with less interesting data
destroyed.

Anyway, please show the data I requested from the dump, and do
install INVARIANTS kernel.

Joerg Wunsch

unread,
May 20, 2011, 4:39:48 AM5/20/11
to
As Kostik Belousov wrote:

> Please do "p *(struct cdev_priv *)0xe98804f4" and
> "p *(struct cdev_priv *)0xce0dc900" from kgdb.

Well, that kernel unfortunately lacked debugging symbols, and while
I've still been thinking about the best way to recompile an exact
same kernel with them ...

> I am pretty much sure that INVARIANTS kernel would hit the assert
> about SI_NAMED flag being clear on destroy_devl() invocation.
> We would have catched the issue earlier, with less interesting data
> destroyed.

... the now INVARIANTS kernel panicked again last night. This time,
I've got debugging symbols. So, as you expected, the corruption had
been caught earliere now. The panic message is:

(kgdb) p panicstr
$1 = 0xc088dca0 "Bad link elm 0xc81cc200 prev->next != elm"

Here is the stack trace:

(kgdb) bt
#0 doadump () at pcpu.h:231
#1 0xc057943e in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:419
#2 0xc0579710 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:592
#3 0xc044b707 in camperiphfree (periph=0xc81cc200) at /usr/src/sys/cam/cam_periph.c:550
#4 0xc044b8e5 in cam_periph_release_locked (periph=0xc81cc200) at /usr/src/sys/cam/cam_periph.c:336
#5 0xc044ba74 in cam_periph_release (periph=0xc81cc200) at /usr/src/sys/cam/cam_periph.c:352
#6 0xc046eea2 in saopen (dev=0xc8d9ba00, flags=1, fmt=8192, td=0xc93d15c0)
at /usr/src/sys/cam/scsi/scsi_sa.c:499
#7 0xc053833e in giant_open (dev=0xc8d9ba00, oflags=1, devtype=8192, td=0xc93d15c0)
at /usr/src/sys/kern/kern_conf.c:361
#8 0xc05177b2 in devfs_open (ap=0xe98b3b00) at /usr/src/sys/fs/devfs/devfs_vnops.c:992
#9 0xc07b9a95 in VOP_OPEN_APV (vop=0xc08403c0, a=0xe98b3b00) at vnode_if.c:445
#10 0xc06143d6 in vn_open_cred (ndp=0xe98b3b78, flagp=0xe98b3c2c, cmode=0, vn_open_flags=0,
cred=0xc807a780, fp=0xc90f6d58) at vnode_if.h:196
#11 0xc06144db in vn_open (ndp=0xe98b3b78, flagp=0xe98b3c2c, cmode=0, fp=0xc90f6d58)
at /usr/src/sys/kern/vfs_vnops.c:94
#12 0xc06133fc in kern_openat (td=0xc93d15c0, fd=-100, path=0x804a0bb <Address 0x804a0bb out of bounds>,
pathseg=UIO_USERSPACE, flags=1, mode=6) at /usr/src/sys/kern/vfs_syscalls.c:1083
#13 0xc0613845 in kern_open (td=0xc93d15c0, path=0x804a0bb <Address 0x804a0bb out of bounds>,
pathseg=UIO_USERSPACE, flags=0, mode=6) at /usr/src/sys/kern/vfs_syscalls.c:1039
#14 0xc06138c0 in open (td=0xc93d15c0, uap=0xe98b3cec) at /usr/src/sys/kern/vfs_syscalls.c:1015
#15 0xc05b6276 in syscallenter (td=0xc93d15c0, sa=0xe98b3ce4) at /usr/src/sys/kern/subr_trap.c:326
#16 0xc0799b54 in syscall (frame=0xe98b3d28) at /usr/src/sys/i386/i386/trap.c:1086
#17 0xc077fd21 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:266
#18 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)

The dev node in question (I think, it's the "dev" argument in stack
frame #6) is:
(kgdb) p dev
$5 = (struct cdev *) 0xc8d9ba00
(kgdb) p *dev
$6 = {__si_reserved = 0x0, si_flags = 4, si_atime = {tv_sec = 1305772805, tv_nsec = 0}, si_ctime = {
tv_sec = 1305773153, tv_nsec = 0}, si_mtime = {tv_sec = 1305773153, tv_nsec = 0}, si_uid = 0,
si_gid = 5, si_mode = 432, si_cred = 0x0, si_drv0 = 1, si_refcount = 2, si_list = {
le_next = 0xc7f9ed00, le_prev = 0xc8d43538}, si_clone = {le_next = 0x0, le_prev = 0x0},
si_children = {lh_first = 0xc8d9f100}, si_siblings = {le_next = 0x0, le_prev = 0x0}, si_parent = 0x0,
si_name = 0xc8d9ba78 "nsa0.0", si_drv1 = 0xc81cc200, si_drv2 = 0x0, si_devsw = 0xc0833a40,
si_iosize_max = 65536, si_usecount = 1, si_threadcount = 2, __si_u = {__sid_snapdata = 0x0},
__si_namebuf = "nsa0.0", '\0' <repeats 57 times>}

The contents of the "periph" object as seen in the various CAM layer
functions is:

(kgdb) p *periph
$1 = {pinfo = {priority = 1, generation = 87063, index = -1}, periph_start = 0xc046ba80 <sastart>,
periph_oninval = 0xc046bcf0 <saoninvalidate>, periph_dtor = 0xc046f2d0 <sacleanup>,
periph_name = 0xc07e07bc "sa", path = 0xc8dd31b0, softc = 0xc8387800, sim = 0xc6e83c80,
unit_number = 0, type = CAM_PERIPH_BIO, flags = 8, immediate_priority = 4294967295, refcount = 0,
ccb_list = {slh_first = 0x0}, periph_links = {sle_next = 0x0}, unit_links = {tqe_next = 0x0,
tqe_prev = 0xc0833b30}, deferred_callback = 0, deferred_ac = 0}

Finally, here's the "show cdev" command from DDB:

gvinum/ports 0xc728e100
gvinum/src 0xc728e000
gvinum/news 0xc728de00
gvinum/distfiles 0xc728dd00
gvinum/pdf 0xc728dc00
gvinum/mysql 0xc728db00
gvinum/upload 0xc728da00
gvinum/obj 0xc728d900
gvinum/root 0xc728d800
gvinum/local 0xc728d700
gvinum/usr 0xc728d600
gvinum/var 0xc728d500
gvinum/home_cvs 0xc728d400
gvinum/home 0xc728d300
gvinum/junk 0xc728d200
gvinum/bacula_db 0xc728d100
gvinum/dump 0xc728d000
gvinum/tmp 0xc72a4400
gvinum/camel 0xc72a4300
gvinum/squid 0xc72a4200
gvinum/sound 0xc72a4100
usb/1.2.0 0xc72a2b00
ugen1.2 0xc72a2a00
cd0 0xc7290c00
usb/1.2.1 0xc7290b00
pass0 0xc7290400
pass1 0xc7290300
pass2 0xc7290200
da0 0xc72e8800
da0a 0xc72a2700
da0h 0xc72a2800
da1 0xc72a2e00
ufsid/4856d98a00081994 0xc72a3000
da1a 0xc72a3100
da1h 0xc72a3200
usb/0.2.0 0xc72e6900
ugen0.2 0xc72e6a00
usb/0.2.1 0xc72e6b00
usb/0.3.0 0xc72e7400
ugen0.3 0xc72e7500
usb/0.3.1 0xc72e7600
ukbd0 0xc72e7e00
kbd1 0xc72e8000
usb/0.4.0 0xc72e8100
ugen0.4 0xc72e8200
usb/0.4.1 0xc72e8300
ums0 0xc72e7300
usb/0.5.0 0xc72e6e00
ugen0.5 0xc72e6c00
usb/0.5.1 0xc72e6100
usb/0.6.0 0xc72a4900
ugen0.6 0xc72a4800
usb/0.6.2 0xc72a4700
pf 0xc7449700
nfslock 0xc7e39100
tap0 0xc7e95b00
apm0 0xc7fa0100
dsp2.0 0xc7446100
dsp1.0 0xc7e37300
dsp0.0 0xc7e37100
pts/1 0xc8230600
pts/2 0xc8230900
ptyp0 0xc7e36d00
ttyp0 0xc8230700
ptyp1 0xc8230b00
ttyp1 0xc82eb800
pts/3 0xc8d9ae00
pts/0 0xc8047800
pts/4 0xc82c9400
pts/5 0xc8da1400
pts/6 0xc8e02700
usb/0.6.0 0xc8497a00
ugen0.6 0xc7e97500
usb/0.6.2 0xc82ed400
pass3 0xc8d9d700
ch0 0xc8046200
nsa0.0 0xc8d9ba00
esa0.0 0xc8d43500
nsa0 0xc8d9f100
esa0 0xc8d43c00
sa0.1 0xc8497e00
nsa0.1 0xc8048800
esa0.1 0xc8e00100
sa0.2 0xc8e01b00
nsa0.2 0xc8da0c00
esa0.2 0xc8d9d900
sa0.3 0xc8d9de00
nsa0.3 0xc8d9f900
esa0.3 0xc8d42600
usb/0.7.0 0xcecdad00
ugen0.7 0xcecd4a00
usb/0.7.2 0xcecd7800
pts/7 0xc8046100
ptyp2 0xc822da00
ttyp2 0xc8230300
pass4 0xce291100
sa0.ctl 0xcecfca00
sa0.0 0xcecfcd00
nsa0.0 0xc7f9e900
esa0.0 0xcecf3000
sa0 0xc8d42b00
nsa0 0xcecd7b00
esa0 0xc90a7000
sa0.1 0xcbda4c00
nsa0.1 0xcecfcc00
esa0.1 0xce21a400
sa0.2 0xceccb600
nsa0.2 0xcece3b00
esa0.2 0xc8e01d00
sa0.3 0xcecd4600
nsa0.3 0xceccb300
esa0.3 0xcecfc500

I think that's all I could tell by now ...

Kostik Belousov

unread,
May 20, 2011, 4:17:57 PM5/20/11
to
On Fri, May 20, 2011 at 10:39:48AM +0200, Joerg Wunsch wrote:
> As Kostik Belousov wrote:
>
> > Please do "p *(struct cdev_priv *)0xe98804f4" and
> > "p *(struct cdev_priv *)0xce0dc900" from kgdb.
>
> Well, that kernel unfortunately lacked debugging symbols, and while
> I've still been thinking about the best way to recompile an exact
> same kernel with them ...
Yes, it would be quite interesting to see the data I asked for.
I spent significant time trying to imagine a scenario where the
reported panic could be possible, and did not end with anything.

>
> > I am pretty much sure that INVARIANTS kernel would hit the assert
> > about SI_NAMED flag being clear on destroy_devl() invocation.
> > We would have catched the issue earlier, with less interesting data
> > destroyed.
>

> ... the now INVARIANTS kernel panicked again last night. This time,
> I've got debugging symbols. So, as you expected, the corruption had
> been caught earliere now. The panic message is:
>
> (kgdb) p panicstr
> $1 = 0xc088dca0 "Bad link elm 0xc81cc200 prev->next != elm"
>
> Here is the stack trace:
>
> (kgdb) bt
> #0 doadump () at pcpu.h:231
> #1 0xc057943e in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:419
> #2 0xc0579710 in panic (fmt=Variable "fmt" is not available.
> ) at /usr/src/sys/kern/kern_shutdown.c:592
> #3 0xc044b707 in camperiphfree (periph=0xc81cc200) at /usr/src/sys/cam/cam_periph.c:550

What is the exact revision of your sources ?

I do not see ay list manipulation macros at the line 550 in cam_periph.c,
both in HEAD and stable/8. There is one three lines earlier, and it could
cause the panic shown.

This looks like a CAM issue, which is out of my scope.
Hope other subscribers will offer the help.

I committed the devfs fix you tested, it should land into stable/8 in a week.

> #4 0xc044b8e5 in cam_periph_release_locked (periph=0xc81cc200) at /usr/src/sys/cam/cam_periph.c:336
> #5 0xc044ba74 in cam_periph_release (periph=0xc81cc200) at /usr/src/sys/cam/cam_periph.c:352
> #6 0xc046eea2 in saopen (dev=0xc8d9ba00, flags=1, fmt=8192, td=0xc93d15c0)
> at /usr/src/sys/cam/scsi/scsi_sa.c:499
> #7 0xc053833e in giant_open (dev=0xc8d9ba00, oflags=1, devtype=8192, td=0xc93d15c0)
> at /usr/src/sys/kern/kern_conf.c:361
> #8 0xc05177b2 in devfs_open (ap=0xe98b3b00) at /usr/src/sys/fs/devfs/devfs_vnops.c:992
> #9 0xc07b9a95 in VOP_OPEN_APV (vop=0xc08403c0, a=0xe98b3b00) at vnode_if.c:445
> #10 0xc06143d6 in vn_open_cred (ndp=0xe98b3b78, flagp=0xe98b3c2c, cmode=0, vn_open_flags=0,
> cred=0xc807a780, fp=0xc90f6d58) at vnode_if.h:196
> #11 0xc06144db in vn_open (ndp=0xe98b3b78, flagp=0xe98b3c2c, cmode=0, fp=0xc90f6d58)
> at /usr/src/sys/kern/vfs_vnops.c:94
> #12 0xc06133fc in kern_openat (td=0xc93d15c0, fd=-100, path=0x804a0bb <Address 0x804a0bb out of bounds>,
> pathseg=UIO_USERSPACE, flags=1, mode=6) at /usr/src/sys/kern/vfs_syscalls.c:1083
> #13 0xc0613845 in kern_open (td=0xc93d15c0, path=0x804a0bb <Address 0x804a0bb out of bounds>,
> pathseg=UIO_USERSPACE, flags=0, mode=6) at /usr/src/sys/kern/vfs_syscalls.c:1039
> #14 0xc06138c0 in open (td=0xc93d15c0, uap=0xe98b3cec) at /usr/src/sys/kern/vfs_syscalls.c:1015
> #15 0xc05b6276 in syscallenter (td=0xc93d15c0, sa=0xe98b3ce4) at /usr/src/sys/kern/subr_trap.c:326
> #16 0xc0799b54 in syscall (frame=0xe98b3d28) at /usr/src/sys/i386/i386/trap.c:1086
> #17 0xc077fd21 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:266
> #18 0x00000033 in ?? ()
> Previous frame inner to this frame (corrupt stack?)
>
> The dev node in question (I think, it's the "dev" argument in stack

As I said, devfs looks innocent in this backtrace.

> frame #6) is:
> (kgdb) p dev
> $5 = (struct cdev *) 0xc8d9ba00
> (kgdb) p *dev
> $6 = {__si_reserved = 0x0, si_flags = 4, si_atime = {tv_sec = 1305772805, tv_nsec = 0}, si_ctime = {
> tv_sec = 1305773153, tv_nsec = 0}, si_mtime = {tv_sec = 1305773153, tv_nsec = 0}, si_uid = 0,
> si_gid = 5, si_mode = 432, si_cred = 0x0, si_drv0 = 1, si_refcount = 2, si_list = {
> le_next = 0xc7f9ed00, le_prev = 0xc8d43538}, si_clone = {le_next = 0x0, le_prev = 0x0},
> si_children = {lh_first = 0xc8d9f100}, si_siblings = {le_next = 0x0, le_prev = 0x0}, si_parent = 0x0,
> si_name = 0xc8d9ba78 "nsa0.0", si_drv1 = 0xc81cc200, si_drv2 = 0x0, si_devsw = 0xc0833a40,
> si_iosize_max = 65536, si_usecount = 1, si_threadcount = 2, __si_u = {__sid_snapdata = 0x0},
> __si_namebuf = "nsa0.0", '\0' <repeats 57 times>}
>
> The contents of the "periph" object as seen in the various CAM layer
> functions is:
>
> (kgdb) p *periph
> $1 = {pinfo = {priority = 1, generation = 87063, index = -1}, periph_start = 0xc046ba80 <sastart>,
> periph_oninval = 0xc046bcf0 <saoninvalidate>, periph_dtor = 0xc046f2d0 <sacleanup>,
> periph_name = 0xc07e07bc "sa", path = 0xc8dd31b0, softc = 0xc8387800, sim = 0xc6e83c80,
> unit_number = 0, type = CAM_PERIPH_BIO, flags = 8, immediate_priority = 4294967295, refcount = 0,
> ccb_list = {slh_first = 0x0}, periph_links = {sle_next = 0x0}, unit_links = {tqe_next = 0x0,
> tqe_prev = 0xc0833b30}, deferred_callback = 0, deferred_ac = 0}
>
> Finally, here's the "show cdev" command from DDB:
>

Joerg Wunsch

unread,
May 20, 2011, 4:37:31 PM5/20/11
to
As Kostik Belousov wrote:

> > > Please do "p *(struct cdev_priv *)0xe98804f4" and
> > > "p *(struct cdev_priv *)0xce0dc900" from kgdb.
> >

> > Well, that kernel unfortunately lacked debugging symbols, and while
> > I've still been thinking about the best way to recompile an exact
> > same kernel with them ...

> Yes, it would be quite interesting to see the data I asked for.

OK, I found a way to cheat around the missing -g symbols ... and: all
the data at 0xce0dc900 are zeroed out. The other address does not
make any sense at all:

(kgdb) p *(struct cdev_priv *)0xe98804f4
$1 = {cdp_c = {__si_reserved = 0xe988097c, si_flags = 3225728383, si_atime = {tv_sec = -871690624,
tv_nsec = -1065413093}, si_ctime = {tv_sec = 18, tv_nsec = 0}, si_mtime = {tv_sec = -376961680,
tv_nsec = -927034656}, si_uid = 3239626616, si_gid = 0, si_mode = 1316, si_cred = 0xdad13340,
si_drv0 = -376961728, si_refcount = -1068057405, si_list = {le_next = 0x0, le_prev = 0xe9880538},
si_clone = {le_next = 0xe9880538, le_prev = 0x202}, si_children = {lh_first = 0x2}, si_siblings = {
le_next = 0xdad13340, le_prev = 0x0}, si_parent = 0xe9880564,
si_name = 0xc056bcc3 "\213]�213u�213}�211��\211�213U\f\205�\r�����\213E\b����]�215�&",
si_drv1 = 0x0, si_drv2 = 0x1, si_devsw = 0x0, si_iosize_max = -1055340680,
si_usecount = 3918005652, si_threadcount = 3671143232, __si_u = {__sid_snapdata = 0x0},
__si_namebuf = "\000�030�000\000\000\000@3�|\005\210�000\000\000\000\000@�\224\005\210��V�001\000\000\000\020\000\000\000�\005\210��V�\234\204�020\000\000\000\001\000\000\000\006\000\000"},
cdp_list = {tqe_next = 0xe988061a, tqe_prev = 0x4}, cdp_inode = 2147289763, cdp_flags = 3918005812,
cdp_inuse = 3671349200, cdp_maxdirent = 3671143232, cdp_dirents = 0x6400, cdp_dirent0 = 0xe0badfa7,
cdp_dtr_list = {tqe_next = 0x257, tqe_prev = 0xc056bc4a}, cdp_dtr_cb = 0x404a9c20,
cdp_dtr_cb_arg = 0xc084b894, cdp_fdpriv = {lh_first = 0xe9880674}}

> What is the exact revision of your sources ?

It's a checkout from a CVS tree, so I cannot give you an exact SVN
revision number. The checkout has been done on April 13.

> This looks like a CAM issue, which is out of my scope.

This was my fear, and that's why I wrote to the freebsd-scsi list.

> > si_name = 0xc8d9ba78 "nsa0.0",

Could that be an issue with the multiple SCSI tape drive device nodes?
I see, /dev/nsa0.0 is somehow involved into the panic, yet other
processes might access just /dev/nsa0 (which is a different cdev).

Kostik Belousov

unread,
May 20, 2011, 4:45:10 PM5/20/11
to
On Fri, May 20, 2011 at 10:37:31PM +0200, Joerg Wunsch wrote:
> As Kostik Belousov wrote:
>
> > > > Please do "p *(struct cdev_priv *)0xe98804f4" and
> > > > "p *(struct cdev_priv *)0xce0dc900" from kgdb.
> > >
> > > Well, that kernel unfortunately lacked debugging symbols, and while
> > > I've still been thinking about the best way to recompile an exact
> > > same kernel with them ...
>
> > Yes, it would be quite interesting to see the data I asked for.
>
> OK, I found a way to cheat around the missing -g symbols ... and: all
> the data at 0xce0dc900 are zeroed out. The other address does not
> make any sense at all:
Yes, this is a garbage, and it is consistent with the panic you reported
with INVARIANTS turned on. It seems quite possible that CAM did
destroy_dev() on the freed and reused memory.

>
> (kgdb) p *(struct cdev_priv *)0xe98804f4
> $1 = {cdp_c = {__si_reserved = 0xe988097c, si_flags = 3225728383, si_atime = {tv_sec = -871690624,
> tv_nsec = -1065413093}, si_ctime = {tv_sec = 18, tv_nsec = 0}, si_mtime = {tv_sec = -376961680,
> tv_nsec = -927034656}, si_uid = 3239626616, si_gid = 0, si_mode = 1316, si_cred = 0xdad13340,
> si_drv0 = -376961728, si_refcount = -1068057405, si_list = {le_next = 0x0, le_prev = 0xe9880538},
> si_clone = {le_next = 0xe9880538, le_prev = 0x202}, si_children = {lh_first = 0x2}, si_siblings = {
> le_next = 0xdad13340, le_prev = 0x0}, si_parent = 0xe9880564,

> si_name = 0xc056bcc3 "\213]???213u???213}???211??????\211???213U\f\205???\r???????????????\213E\b????????????]???215???&",

> si_drv1 = 0x0, si_drv2 = 0x1, si_devsw = 0x0, si_iosize_max = -1055340680,
> si_usecount = 3918005652, si_threadcount = 3671143232, __si_u = {__sid_snapdata = 0x0},

> __si_namebuf = "\000???030???000\000\000\000@3???|\005\210???000\000\000\000\000@???\224\005\210??????V???001\000\000\000\020\000\000\000???\005\210??????V???\234\204???020\000\000\000\001\000\000\000\006\000\000"},

> cdp_list = {tqe_next = 0xe988061a, tqe_prev = 0x4}, cdp_inode = 2147289763, cdp_flags = 3918005812,
> cdp_inuse = 3671349200, cdp_maxdirent = 3671143232, cdp_dirents = 0x6400, cdp_dirent0 = 0xe0badfa7,
> cdp_dtr_list = {tqe_next = 0x257, tqe_prev = 0xc056bc4a}, cdp_dtr_cb = 0x404a9c20,
> cdp_dtr_cb_arg = 0xc084b894, cdp_fdpriv = {lh_first = 0xe9880674}}
>
> > What is the exact revision of your sources ?
>
> It's a checkout from a CVS tree, so I cannot give you an exact SVN
> revision number. The checkout has been done on April 13.
>
> > This looks like a CAM issue, which is out of my scope.
>
> This was my fear, and that's why I wrote to the freebsd-scsi list.

Well, it helped to identify and correct a devfs bug anyway, thank you.

>
> > > si_name = 0xc8d9ba78 "nsa0.0",
>
> Could that be an issue with the multiple SCSI tape drive device nodes?
> I see, /dev/nsa0.0 is somehow involved into the panic, yet other
> processes might access just /dev/nsa0 (which is a different cdev).
>

Joerg Wunsch

unread,
Jun 8, 2011, 3:31:38 PM6/8/11
to
As Kostik Belousov wrote:

> This looks like a CAM issue, which is out of my scope.
> Hope other subscribers will offer the help.

I see frequently console messages like this now:

xpt_release_devq(0): requested 1 > present 0

Could that be related? Any ideas?

So far, no panic again, but probably not since the bug itself
has been fixed but rather since I slightly changed the scripts
that powerup/-down the tape library.

Alexander Motin

unread,
Jun 11, 2011, 3:29:31 PM6/11/11
to
Joerg Wunsch wrote:
> As Kostik Belousov wrote:
>
>> This looks like a CAM issue, which is out of my scope.
>> Hope other subscribers will offer the help.
>
> I see frequently console messages like this now:
>
> xpt_release_devq(0): requested 1 > present 0
>
> Could that be related? Any ideas?

This massage tells about non-fatal error somewhere in CAM. Previously
such conditions were silently ignored. I see at least one suspicious
point in sa driver, but I am not very good at it's logic. Could you
investigate what kind of activity triggers those messages, so I could
try to reproduce it?

I am thinking about something like this:

--- scsi_sa.c.orig 2011-04-15 00:25:33.000000000 +0300
+++ scsi_sa.c 2011-06-11 22:18:20.000000000 +0300
@@ -1783,12 +1783,7 @@ sadone(struct cam_periph *periph, union
}
}
}
- /*
- * If we had an error (immediate or pending),
- * release the device queue now.
- */
- if (error || (softc->flags & SA_FLAG_ERR_PENDING))
- cam_release_devq(done_ccb->ccb_h.path, 0, 0, 0, 0);
+ QFRLS(done_ccb);
#ifdef CAMDEBUG
if (error || bp->bio_resid) {
CAM_DEBUG(periph->path, CAM_DEBUG_INFO,


> So far, no panic again, but probably not since the bug itself
> has been fixed but rather since I slightly changed the scripts
> that powerup/-down the tape library.

--
Alexander Motin

0 new messages