Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion kern/6351: DPT RAID controller stops working under heavy load.
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
rtm  
View profile  
 More options Apr 20 1998, 3:00 am
Newsgroups: mailing.freebsd.bugs
From: r...@viaweb.com
Date: 1998/04/20
Subject: kern/6351: DPT RAID controller stops working under heavy load.

>Number:         6351
>Category:       kern
>Synopsis:       DPT RAID controller stops working under heavy load.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:
>Keywords:
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Apr 19 12:10:01 PDT 1998
>Last-Modified:
>Originator:     Robert Morris
>Organization:
Viaweb
>Release:        2.2.6
>Environment:

FreeBSD 2.2.6-RELEASE #8: Sat Apr 18 12:08:03 EDT 1998
    r...@bab-el-ehr.viaweb.com:/c2/rtm/sys-2.2.6/compile/DPT
CPU: Pentium Pro (331.92-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x650  Stepping=0
  Features=0x183fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA ,CMOV,<b16>,<b17>,MMX,<b24>>
real memory  = 134217728 (131072K bytes)
avail memory = 119980032 (117168K bytes)
DPT:  RAID Manager driver, Version 1.0.1
Probing for devices on PCI bus 0:
chip0 <generic PCI bridge (vendor=8086 device=7180 subclass=0)> rev 3 on pci0:0:0
chip1 <generic PCI bridge (vendor=8086 device=7181 subclass=4)> rev 3 on pci0:1:0
chip2 <Intel 82371AB PCI-ISA bridge> rev 1 on pci0:7:0
chip3 <Intel 82371AB IDE interface> rev 1 on pci0:7:1
chip4 <Intel 82371AB USB interface> rev 1 int d irq ?? on pci0:7:2
chip5 <Intel 82371AB Power management controller> rev 1 on pci0:7:3
vga0 <VGA-compatible display device> rev 0 int a irq 11 on pci0:15:0
fxp0 <Intel EtherExpress Pro 10/100B Ethernet> rev 4 int a irq 9 on pci0:16:0
fxp0: Ethernet address 00:a0:c9:b0:14:5c
DPT:  PCI SCSI HBA Driver, version 1.2.4
dpt0 <DPT Caching SCSI RAID Controller> rev 2 int a irq 10 on pci0:18:0
dpt0: DPT type 3, model PM3334UW firmware 07L0, Protocol 0
      on port ef90 with 458753MB Write-Back cache.  LED = 0000 0000
dpt0: Enabled Options:
      Verify Lost Transactions
      Precisely Track State Transitions
      Collect Metrics
      Handle Timeouts
(dpt0:0:0): "DPT RAID-5 07L0" type 0 fixed SCSI 2
sd0(dpt0:0:0): Direct-Access 34731MB (71130368 512 byte sectors)
(dpt0:6:0): "HP C1537A L708" type 1 removable SCSI 2
st0(dpt0:6:0): Sequential-Access density code 0x25, variable blocks, write-enabled
Probing for devices on PCI bus 1:
Probing for devices on the ISA bus:
sc0 at 0x60-0x6f irq 1 on motherboard
sc0: VGA color <16 virtual consoles, flags=0x0>
ed0 not found at 0x300
sio0 at 0x3f8-0x3ff irq 4 on isa
sio0: type 16550A
sio1 at 0x2f8-0x2ff irq 3 on isa
sio1: type 16550A
sio4 not found at 0x2f0
sio5 not found at 0x3e0
lpt0 at 0x378-0x37f irq 7 on isa
lpt0: Interrupt-driven port
lp0: TCP/IP capable interface
psm0 not found at 0x60
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1.44MB 3.5in
wdc0 at 0x1f0-0x1f7 irq 14 on isa
wdc0: unit 0 (wd0): <WDC AC22100H>
wd0: 2014MB (4124736 sectors), 4092 cyls, 16 heads, 63 S/T, 512 B/S
wdc1 not found at 0x170
bt0 not found at 0x330
ep0 not found at 0x300
npx0 flags 0x1 on motherboard
npx0: INT 16 interface
changing root device to wd0s1a

>Description:

I have a DPT PM3334UW with two busses, three Seagate ST39173W
drives, two Seagate ST19171W drives, all in a RAID-5 Array.
Under heavy load the DPT driver or board stops completing requests.
The DPT Busy LED stays on permanently, and the Write LED blinks
once per second. Here's the dmesg output:

dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0
            as late after 13159566usec
dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0
            as late after 13158190usec
dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0
            as late after 13157239usec
dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0
            as late after 13109499usec
dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (23159566)
                gets another chance(1/5)
dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (33157546)
                gets another chance(1/5)
dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (43156142)
                gets another chance(1/5)
dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (53107731)
                gets another chance(1/5)
dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0
            as late after 63159566usec
dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0
            as late after 63158292usec
dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0
            as late after 63157343usec
dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0
            as late after 63109602usec
dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (73159696)
                gets another chance(2/5)
dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (83157548)
                gets another chance(2/5)
dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (93156014)
                gets another chance(2/5)
dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (103107728)
                gets another chance(2/5)
dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0
            as late after 113159565usec
dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0
            as late after 113158162usec
dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0
            as late after 113157214usec
dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0
            as late after 113109475usec
dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (123159566)
                gets another chance(3/5)
dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (133157554)
                gets another chance(3/5)
dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (143156017)
                gets another chance(3/5)
dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (153108177)
                gets another chance(3/5)
dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0
            as late after 163159569usec
dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0
            as late after 163158330usec
dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0
            as late after 163157380usec
dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0
            as late after 163109644usec
dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (173159565)
                gets another chance(4/5)
dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (183157548)
                gets another chance(4/5)
dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (193156018)
                gets another chance(4/5)
dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (203107728)
                gets another chance(4/5)
dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0
            as late after 213159566usec
dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0
            as late after 213158186usec
dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0
            as late after 213157231usec
dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0
            as late after 213109492usec
dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (223159567)
                gets another chance(5/5)
dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (233157547)
                gets another chance(5/5)
dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (243156016)
                gets another chance(5/5)
dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (253107729)
                gets another chance(5/5)
dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0
            as late after 263159570usec
dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0
            as late after 263158162usec
dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0
            as late after 263157216usec
dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0
            as late after 263109478usec
dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (273159565)
                gets another chance(6/5)
dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (283157548)
                gets another chance(6/5)
dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (293156014)
                gets another chance(6/5)
dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (303107729)
                gets another chance(6/5)
dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0
            as late after 313159568usec
dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0
            as late after 313158207usec
dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0
            as late after 313157257usec
dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0
            as late after 313109518usec
dpt0 ERROR: Destroying stale 91995 (Prevent/Allow Medium Removal [7.14])
                on c0b0t0u0 (323159567/7)
dpt0 ERROR: Destroying stale 91996 (Prevent/Allow Medium Removal [7.14])
                on c0b0t0u0 (333157547/7)
dpt0 ERROR: Destroying stale 91997 (Test Unit Ready [7.24])
                on c0b0t0u0 (343156017/7)
dpt0 ERROR: Destroying stale 92003 (Test Unit Ready [7.24])
                on c0b0t0u0 (353107733/7)

>How-To-Repeat:

The problem shows up in a few minutes if I run 39 processes
that read random blocks from the raw disk, and at the same time
one process that repeatedly truncates a file and writes 200 MB
to it.
>Fix:

I don't know how to fix it. If I turn off Tagged Command Queuing
using the DPT's ^D boot rom software, the problem takes longer to
show up (ie after 10 minutes of heavy load rather than just 1).

>Audit-Trail:
>Unformatted:

To Unsubscribe: send mail to majord...@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.