Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Oops in 2.2.19 SMP while writing to dac960 raid5

8 views
Skip to first unread message

Michael J Schout

unread,
Jul 16, 2001, 2:25:43 PM7/16/01
to
I appologize if I have not sent this to the right place. If this belongs
elsewhere, please let me know.

-----------

[1.] One line summary of the problem:

Oops in 2.2.19 SMP while writing to disk using DAC960 raid5.

[2.] Full description of the problem/report:

Machine will oops sometimes while writing to dac960 raid5 drive.
After oops, the disk cannot be written to until the machine is rebooted.
Perhaps the dac960 driver is not SMP safe?

[3.] Keywords (i.e., modules, networking, kernel):

dac960 SMP

[4.] Kernel version (from /proc/version):

Linux version 2.2.19 (ro...@deathstar.gkg-com.com) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 SMP Fri Apr 27 10:49:53 CDT 2001

[5.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/oops-tracing.txt)

WARNING: This version of ksymoops is obsolete.
WARNING: The current version can be obtained from ftp://ftp.<country>.kernel.org/pub/linux/utils/kernel/ksymoops
Options used: -V (specified)
-O (specified)
-k /proc/ksyms (specified)
-l /proc/modules (default)
-M (specified)
-c 1 (default)

Unable to handle kernel NULL pointer dereference at virtual address 00000134
current->tss.cr3 = 282be000, %%cr3 = 282be000
*pde = 00000000
Oops: 0002
CPU: 1
EIP: 0010:[remove_from_queues+188/344]
EFLAGS: 00010206
eax: 00000100 ebx: 00000002 ecx: df422e40 edx: efdb8a88
esi: df422e40 edi: 00000000 ebp: 00000000 esp: da411ea4
ds: 0018 es: 0018 ss: 0018
Process postmaster (pid: 30145, process nr: 34, stackpage=da411000)
Stack: df422e40 c012c2d5 df422e40 df422e40 00001000 c0142a4f df422e40 d273d124
cf13e1e0 d273d0d8 ffffffea df422e40 df422e40 df422e40 da411f04 02054000
00000000 00000000 00000000 effd6200 00000000 00002054 02054000 d273d0d8
Call Trace: [refile_buffer+89/160] [ext2_file_write+1011/1572] [do_generic_file_read+1119/1536] [do_generic_file_read+1524/1536] [generic_file_read+99/124] [sys_recv+30/36] [sys_write+254/320]
Code: 89 50 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00 00 ff 0d
Warning: trailing garbage ignored on Code: line
Text: 'Code: 89 50 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00 00 ff 0d '
Garbage: ' '

Code: 00000000 Before first symbol 00000000 <_IP>: <===
Code: 00000000 Before first symbol 0: 89 50 34 mov %edx,0x34(%eax) <===
Code: 00000003 Before first symbol 3: c7 01 00 00 00 00 movl $0x0,(%ecx)
Code: 00000009 Before first symbol 9: 89 02 mov %eax,(%edx)
Code: 0000000b Before first symbol b: c7 41 34 00 00 00 00 movl $0x0,0x34(%ecx)
Code: 00000012 Before first symbol 12: ff 0d 00 00 00 00 decl 0x0


[6.] A small shell script or example program which triggers the
problem (if possible)

Not possible. Probalem happens randomly after a few weeks of uptime.

[7.] Environment
[7.1.] Software (add the output of the ver_linux script here)

Linux deathstar.gkg-com.com 2.2.19 #1 SMP Fri Apr 27 10:49:53 CDT 2001 i686 unknown

Gnu C egcs-2.91.66
Gnu make 3.78.1
binutils 2.9.5.0.22
util-linux 2.10r
modutils 2.3.21
e2fsprogs 1.18
pcmcia-cs 3.1.8
Linux C Library 2.1.3
Dynamic linker (ldd) 2.1.3
Procps 2.0.6
Net-tools 1.54
Console-tools 0.3.3
Sh-utils 2.0
Modules Loaded 3c59x DAC960

[7.2.] Processor information (from /proc/cpuinfo):

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 7
model name : Pentium III (Katmai)
stepping : 3
cpu MHz : 501.144
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr xmm
bogomips : 999.42

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 7
model name : Pentium III (Katmai)
stepping : 3
cpu MHz : 501.144
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr xmm
bogomips : 999.42

[7.3.] Module information (from /proc/modules):

3c59x 22480 2 (autoclean)
DAC960 60848 3

[7.4.] SCSI information (from /proc/scsi/scsi)

none.

/proc/rd/c0/initial_status:

***** DAC960 RAID Driver Version 2.2.10 of 1 February 2001 *****
Copyright 1998-2001 by Leonard N. Zubkoff <l...@dandelion.com>
Configuring Mylex DAC960PTL1 PCI RAID Controller
Firmware Version: 4.07-0-29, Channels: 1, Memory Size: 8MB
PCI Bus: 0, Device: 18, Function: 1, I/O Address: Unassigned
PCI Address: 0xFC8FE000 mapped at 0xF0810000, IRQ Channel: 18
Controller Queue Depth: 124, Maximum Blocks per Command: 128
Driver Queue Depth: 123, Scatter/Gather Limit: 33 of 33 Segments
Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63
Physical Devices:
0:0 Vendor: SEAGATE Model: ST39175LW Revision: 0001
Serial Number: 3AL0NXK100007008KQ52
Disk Status: Online, 17782784 blocks
0:1 Vendor: SEAGATE Model: ST39175LW Revision: 0001
Serial Number: 3AL0P60T00007012R69K
Disk Status: Online, 17782784 blocks
0:2 Vendor: SEAGATE Model: ST39175LW Revision: 0001
Serial Number: 3AL0P4VE00007012R7QV
Disk Status: Online, 17782784 blocks
0:3 Vendor: SEAGATE Model: ST39175LW Revision: 0001
Serial Number: 3AL0P3V700001005HKUC
Disk Status: Standby, 17782784 blocks
0:4 Vendor: SEAGATE Model: ST39175LW Revision: 0001
Serial Number: 3AL0P3SW000070113RRF
Disk Status: Online, 17782784 blocks
0:5 Vendor: SEAGATE Model: ST39175LW Revision: 0001
Serial Number: 3AL0P1MV00007012RDGX
Disk Status: Online, 17782784 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Online, 71131136 blocks, Write Thru

[7.5.] Other information that might be relevant to the problem
(please look in /proc and include all information that you
think to be relevant):

/proc/ioports:
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
02f8-02ff : serial(auto)
03c0-03df : vga+
03f8-03ff : serial(auto)
e880-e8ff : eth1
ec00-ec7f : eth0
ffa0-ffa7 : ide0
ffa8-ffaf : ide1

/proc/interrupts
CPU0 CPU1
0: 299547 315761 IO-APIC-edge timer
1: 0 2 IO-APIC-edge keyboard
2: 0 0 XT-PIC cascade
8: 0 1 IO-APIC-edge rtc
13: 1 0 XT-PIC fpu
16: 0 0 IO-APIC-level eth1
18: 42749 41506 IO-APIC-level Mylex DAC960PTL1
19: 36073 37190 IO-APIC-level eth0
NMI: 0
ERR: 0

[X.] Other notes, patches, fixes, workarounds:

Not much I can say. this has happened several times to me now (this is the 4th
time in the past 2 months). Every time the result is the same: the dac960 disk
becomes unwriteable. If I had to guess, I would say this is an SMP issue, but
I dont know how to even begin to track it down and/or fix it. If any additional
info is needed, let me know.

Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majo...@vger.kernel.org

Leonard N. Zubkoff

unread,
Jul 17, 2001, 12:49:22 PM7/17/01
to
Date: Mon, 16 Jul 2001 13:25:43 -0500 (CDT)
From: Michael J Schout <msc...@gkg.net>

I appologize if I have not sent this to the right place. If this belongs
elsewhere, please let me know.

-----------

[1.] One line summary of the problem:

Oops in 2.2.19 SMP while writing to disk using DAC960 raid5.

[2.] Full description of the problem/report:

Machine will oops sometimes while writing to dac960 raid5 drive.
After oops, the disk cannot be written to until the machine is rebooted.
Perhaps the dac960 driver is not SMP safe?

Well, if the DAC960 driver is not SMP safe, there must be a *very* subtle bug,
as it is running in SMP configurations on quite a large number of systems,
including 4 processor systems.

Once the system stops being willing to write to the disk, will it still read?

Nothing in the OOPS backtrace indicates a fault directly within the driver.

***** DAC960 RAID Driver Version 2.2.10 of 1 February 2001 *****
Copyright 1998-2001 by Leonard N. Zubkoff <l...@dandelion.com>
Configuring Mylex DAC960PTL1 PCI RAID Controller
Firmware Version: 4.07-0-29, Channels: 1, Memory Size: 8MB

Please update your firmware to Mylex's latest version. The above version is
known to have problems, including the controller's firmware hanging.

Leonard

0 new messages