Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Ooops in 2.2.19 SMP causing dac960 array to stop writing.

7 views
Skip to first unread message

Michael J Schout

unread,
Aug 27, 2001, 2:19:02 PM8/27/01
to
[1.] One line summary of the problem:

Ooops in 2.2.19 SMP causing dac960 array to stop writing.

[2.] Full description of the problem/report:

After 41 days of uptime, I recieved this oops. After this happened, I was
unable to write to the dac960 raid5 array anymore, but reading from the disk
continued to work as expected. I have had this happen several times before and
it was suggested that we upgrade the firmware on our dac960 controller. We did
that (see dac960 info below) before the last reboot, so it appears that the
firmware was not the problem. I dont know what else to try to eliminate this
problem, or if this is a bug in the kernel or driver? Or maybe a controller
that is going bad??? Any ideas or suggestions would be greatly appreciated.

[3.] Keywords (i.e., modules, networking, kernel):

dac960, smp, raid, kernel

[4.] Kernel version (from /proc/version):

Linux version 2.2.19 (ro...@deathstar.gkg-com.com) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 SMP Fri Apr 27 10:49:53 CDT 2001

[5.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/oops-tracing.txt)

Options used: -V (default)
-o /lib/modules/2.2.19/ (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-m /boot/System.map (specified)
-c 1 (default)

Unable to handle kernel NULL pointer dereference at virtual address 00000100
current->tss.cr3 = 264a7000, %%cr3 = 264a7000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[find_buffer+104/144]
EFLAGS: 00010206
eax: 00000100 ebx: 00000007 ecx: 0000f534 edx: 00000100
esi: 0000000d edi: 00003006 ebp: 000696da esp: db2efde0
ds: 0018 es: 0018 ss: 0018
Process postmaster (pid: 16914, process nr: 61, stackpage=db2ef000)
Stack: 000696da 00003006 0000f534 c012bd04 00003006 000696da 00001000 000696da
c012c0a6 00003006 000696da 00001000 000696da 000696da cf004188 cf004188
c012c0a6 c0143ecd 00003006 000696da 00001000 00000000 000696da cf004188
Call Trace: [get_hash_table+24/76] [getblk+30/324] [getblk+30/324] [ext2_alloc_block+109/344] [block_getblk+305/616] [ext2_getblk+139/164] [ext2_file_write+1296/1572] [do_generic_file_read+1119/1536] [do_generic_file_read+1524/1536] [smp_local_timer_interrupt+196/304] [sys_write+254/320] [ext2_file_write+0/1572] [system_call+52/56] [_stext+43/164]
Code: 8b 00 39 6a 04 75 15 8b 4c 24 20 39 4a 08 75 0c 66 39 7a 0c

Code: 00000000 Before first symbol 00000000 <_IP>: <===
Code: 00000000 Before first symbol 0: 8b 00 mov (%eax),%eax <===
Code: 00000002 Before first symbol 2: 39 6a 04 cmp %ebp,0x4(%edx)
Code: 00000005 Before first symbol 5: 75 15 jne 0000001c Before first symbol
Code: 00000007 Before first symbol 7: 8b 4c 24 20 mov 0x20(%esp,1),%ecx
Code: 0000000b Before first symbol b: 39 4a 08 cmp %ecx,0x8(%edx)
Code: 0000000e Before first symbol e: 75 0c jne 0000001c Before first symbol
Code: 00000010 Before first symbol 10: 66 39 7a 0c cmp %di,0xc(%edx)

[6.] A small shell script or example program which triggers the
problem (if possible)

N/A.

Machine runs fine for several days then oopses.

[7.] Environment

[7.1.] Software (add the output of the ver_linux script here)

Linux deathstar.gkg-com.com 2.2.19 #1 SMP Fri Apr 27 10:49:53 CDT 2001 i686 unknown

Gnu C egcs-2.91.66
Gnu make 3.78.1
binutils 2.9.5.0.22
util-linux 2.10r
modutils 2.3.21
e2fsprogs 1.18
pcmcia-cs 3.1.8
Linux C Library 2.1.3
Dynamic linker (ldd) 2.1.3
Procps 2.0.6
Net-tools 1.54
Console-tools 0.3.3
Sh-utils 2.0
Modules Loaded 3c59x DAC960

[7.2.] Processor information (from /proc/cpuinfo):

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 7
model name : Pentium III (Katmai)
stepping : 3
cpu MHz : 501.145
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr xmm
bogomips : 999.42

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 7
model name : Pentium III (Katmai)
stepping : 3
cpu MHz : 501.145
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr xmm
bogomips : 999.42

[7.3.] Module information (from /proc/modules):

3c59x 22480 2 (autoclean)
DAC960 60848 3

[7.4.] SCSI information (from /proc/scsi/scsi)

***** DAC960 RAID Driver Version 2.2.10 of 1 February 2001 *****
Copyright 1998-2001 by Leonard N. Zubkoff <l...@dandelion.com>
Configuring Mylex DAC960PTL1 PCI RAID Controller
Firmware Version: 4.08-0-37, Channels: 1, Memory Size: 8MB
PCI Bus: 0, Device: 18, Function: 1, I/O Address: Unassigned
PCI Address: 0xFC8FE000 mapped at 0xF0810000, IRQ Channel: 18
Controller Queue Depth: 124, Maximum Blocks per Command: 128
Driver Queue Depth: 123, Scatter/Gather Limit: 33 of 33 Segments
Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63
Physical Devices:
0:0 Vendor: SEAGATE Model: ST39175LW Revision: 0001
Serial Number: 3AL0NXK100007008KQ52
Disk Status: Online, 17782784 blocks
0:1 Vendor: SEAGATE Model: ST39175LW Revision: 0001
Serial Number: 3AL0P60T00007012R69K
Disk Status: Online, 17782784 blocks
0:2 Vendor: SEAGATE Model: ST39175LW Revision: 0001
Serial Number: 3AL0P4VE00007012R7QV
Disk Status: Online, 17782784 blocks
0:3 Vendor: SEAGATE Model: ST39175LW Revision: 0001
Serial Number: 3AL0P3V700001005HKUC
Disk Status: Standby, 17782784 blocks
0:4 Vendor: SEAGATE Model: ST39175LW Revision: 0001
Serial Number: 3AL0P3SW000070113RRF
Disk Status: Online, 17782784 blocks
0:5 Vendor: SEAGATE Model: ST39175LW Revision: 0001
Serial Number: 3AL0P1MV00007012RDGX
Disk Status: Online, 17782784 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Online, 71131136 blocks, Write Thru

[7.5.] Other information that might be relevant to the problem
(please look in /proc and include all information that you
think to be relevant):

Other incidents where I reported an oops of this nature:

http://groups.google.com/groups?selm=linux.raid.Pine.LNX.4.10.10107161322570.12406-100000%40galaxy.gkg-com.com
http://groups.google.com/groups?selm=Pine.LNX.4.10.10106271124500.17066-100000%40galaxy.gkg-com.com


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

0 new messages