DriveReady SeekComplete Error

Rick Jansen

unread,

Jun 4, 2004, 3:56:10 AM6/4/04

to linux-...@vger.kernel.org

Hi,

Is this drive fubar? It's installed brand-new, and after only three days
of operating it's giving me these errors from time to time.

May 30 07:08:18 web3 kernel: hda: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
May 30 07:08:18 web3 kernel: hda: dma_intr: error=0x40 {
UncorrectableError }, LBAsect=227270012, sector=227270007
May 30 07:08:18 web3 kernel: end_request: I/O error, dev hda, sector
227270007

I could find some other people on the net with these problems, but none
of them happened with brandnew drives.

What can I do?

Rick Jansen

--
Looking for books? Try http://www.megabooksearch.com
The Linux on 64-Bit platforms Wiki: http://www.linux64.net
PGP Public Key: http://www.rockingstone.nl/rick/pubkey.asc
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Daniel Egger

unread,

Jun 4, 2004, 5:05:16 AM6/4/04

to Rick Jansen, linux-...@vger.kernel.org

On 04.06.2004, at 09:54, Rick Jansen wrote:

> Is this drive fubar? It's installed brand-new, and after only three
> days
> of operating it's giving me these errors from time to time.

Seems so, try getting the SMART utils and have a look at the
output of smartctl -a <device>.

However I fail to see why this l-k related and as such offtopic
here.

Servus,
Daniel

PGP.sig

John Bradford

unread,

Jun 4, 2004, 5:32:50 AM6/4/04

to Daniel Egger, Rick Jansen, linux-...@vger.kernel.org

Quote from Daniel Egger <d...@axiros.com>:

> Seems so, try getting the SMART utils and have a look at the
> output of smartctl -a <device>.
>
> However I fail to see why this l-k related and as such offtopic
> here.

1. I see no particular evidence to suggest that the drive is necessarily faulty.
2. It is on topic because it may well be a coding error in the kernel.

John.

John Bradford

unread,

Jun 4, 2004, 5:38:30 AM6/4/04

to Rick Jansen, linux-...@vger.kernel.org

Quote from Rick Jansen <ri...@rockingstone.nl>:

> May 30 07:08:18 web3 kernel: hda: dma_intr: status=0x51 { DriveReady
> SeekComplete Error }
> May 30 07:08:18 web3 kernel: hda: dma_intr: error=0x40 {
> UncorrectableError }, LBAsect=227270012, sector=227270007
> May 30 07:08:18 web3 kernel: end_request: I/O error, dev hda, sector
> 227270007
>
> I could find some other people on the net with these problems, but none
> of them happened with brandnew drives.
>
> What can I do?

Please post more information. First, what size is the disk?

The LBAsect number suggests an access around 108 Gb. If the disk is smaller
than this, then it would appear that a request was made for a non-existant
sector.

Is the LBAsect number the same in each error? What is the machine doing
when the errors occur?

John.

Rick Jansen

unread,

Jun 4, 2004, 5:57:59 AM6/4/04

to linux-...@vger.kernel.org

On Fri, Jun 04, 2004 at 10:43:02AM +0100, John Bradford wrote:
> Please post more information. First, what size is the disk?
>
> The LBAsect number suggests an access around 108 Gb. If the disk is smaller
> than this, then it would appear that a request was made for a non-existant
> sector.
>
> Is the LBAsect number the same in each error? What is the machine doing
> when the errors occur?
>
> John.

Here's some more information about the disk from the boot log.
I also found some StatusErrors in there.

May 10 11:14:07 web3 kernel: hda: Maxtor 6Y120P0, ATA DISK drive
May 10 11:14:07 web3 kernel: hda: max request size: 128KiB
May 10 11:14:07 web3 kernel: hda: 240121728 sectors (122942 MB)
w/7936KiB Cache, CHS=65535/16/63, UDMA(133)
May 10 11:14:07 web3 kernel: hda: hda1 hda2 hda3 hda4 < hda5 hda6 >
May 10 11:14:07 web3 kernel: hda: task_no_data_intr: status=0x51 {
DriveReady SeekComplete Error }
May 10 11:14:07 web3 kernel: hda: task_no_data_intr: error=0x04 {
DriveStatusError }
May 10 11:14:07 web3 kernel: hda: Write Cache FAILED Flushing!

Thats a different error then what it gives me occasionaly. Googling this
error lead me to believe this is a bug in the ide driver, that my disk
doesnt support some flush command.

After parsing the log with simple script, these sectors seem to give the
errors:

227270012
227270483
236708724
237757036
237757472
238018530
238020393
238279554
238804853
239066426
239328347
239590823
240116567
240121662
58619113
58619120
58619127
58619447
58619448
58619519
58620045
58620048
58620331

I've tried to reproduce these errors by rsyncing the whole filesystem,
executing various find / commands, but nothing triggered them. The
machine is primarily a web server, so my guess is they happen when
apache tries to do something to the disk.

The output from smartctl -a seems a bit large to include in this email.

Rick Jansen

--
Looking for books? Try http://www.megabooksearch.com
The Linux on 64-Bit platforms Wiki: http://www.linux64.net
PGP Public Key: http://www.rockingstone.nl/rick/pubkey.asc

Jens Axboe

unread,

Jun 4, 2004, 6:03:23 AM6/4/04

to Rick Jansen, linux-...@vger.kernel.org

On Fri, Jun 04 2004, Rick Jansen wrote:
> On Fri, Jun 04, 2004 at 10:43:02AM +0100, John Bradford wrote:
> > Please post more information. First, what size is the disk?
> >
> > The LBAsect number suggests an access around 108 Gb. If the disk is smaller
> > than this, then it would appear that a request was made for a non-existant
> > sector.
> >
> > Is the LBAsect number the same in each error? What is the machine doing
> > when the errors occur?
> >
> > John.
>
> Here's some more information about the disk from the boot log.
> I also found some StatusErrors in there.
>
> May 10 11:14:07 web3 kernel: hda: Maxtor 6Y120P0, ATA DISK drive
> May 10 11:14:07 web3 kernel: hda: max request size: 128KiB
> May 10 11:14:07 web3 kernel: hda: 240121728 sectors (122942 MB)
> w/7936KiB Cache, CHS=65535/16/63, UDMA(133)
> May 10 11:14:07 web3 kernel: hda: hda1 hda2 hda3 hda4 < hda5 hda6 >
> May 10 11:14:07 web3 kernel: hda: task_no_data_intr: status=0x51 {
> DriveReady SeekComplete Error }
> May 10 11:14:07 web3 kernel: hda: task_no_data_intr: error=0x04 {
> DriveStatusError }
> May 10 11:14:07 web3 kernel: hda: Write Cache FAILED Flushing!
>
> Thats a different error then what it gives me occasionaly. Googling this
> error lead me to believe this is a bug in the ide driver, that my disk
> doesnt support some flush command.

It is, what kernel are you using?

--
Jens Axboe

Rick Jansen

unread,

Jun 4, 2004, 6:06:20 AM6/4/04

to linux-...@vger.kernel.org

On Fri, Jun 04, 2004 at 11:59:00AM +0200, Jens Axboe wrote:
>
> It is, what kernel are you using?
>
> --
> Jens Axboe

This is 2.6.6.

Rick Jansen

--
Looking for books? Try http://www.megabooksearch.com
The Linux on 64-Bit platforms Wiki: http://www.linux64.net
PGP Public Key: http://www.rockingstone.nl/rick/pubkey.asc

Jens Axboe

unread,

Jun 4, 2004, 6:09:16 AM6/4/04

to Rick Jansen, linux-...@vger.kernel.org

(don't trim people from the cc list, thanks)

On Fri, Jun 04 2004, Rick Jansen wrote:

> On Fri, Jun 04, 2004 at 11:59:00AM +0200, Jens Axboe wrote:
> >
> > It is, what kernel are you using?
> >
> > --
> > Jens Axboe
>
> This is 2.6.6.

The that's a known error, you should not worry about it. It's fixed in
later kernels.

--
Jens Axboe

mattia

unread,

Jun 4, 2004, 6:17:10 AM6/4/04

to linux-...@vger.kernel.org

I have the following error (kernel 2.6.6):

Jun 4 08:05:43 blink kernel: ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Jun 4 08:05:43 blink kernel: hdc: Maxtor 6Y160P0, ATA DISK drive
Jun 4 08:05:43 blink kernel: hdd: Maxtor 6Y120L0, ATA DISK drive
Jun 4 08:05:43 blink kernel: ide1 at 0x170-0x177,0x376 on irq 15
Jun 4 08:05:43 blink kernel: hda: max request size: 128KiB
Jun 4 08:05:43 blink kernel: hda: 78177792 sectors (40027 MB) w/1819KiB
Cache, CHS=65535/16/63, UDMA(100)
Jun 4 08:05:43 blink kernel: hda: hda1 hda2 hda3
Jun 4 08:05:43 blink kernel: hdc: max request size: 1024KiB
Jun 4 08:05:43 blink kernel: hdc: 320173056 sectors (163928 MB)
w/7936KiB Cache, CHS=19929/255/63, UDMA(100)
Jun 4 08:05:43 blink kernel: hdc: hdc1
Jun 4 08:05:43 blink kernel: hdd: max request size: 128KiB
Jun 4 08:05:43 blink kernel: hdd: 240121728 sectors (122942 MB)
w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
Jun 4 08:05:43 blink kernel: hdd: hdd1 hdd2 hdd3
Jun 4 08:05:43 blink kernel: hdd: task_no_data_intr: status=0x51 {
DriveReady SeekComplete Error }
Jun 4 08:05:43 blink kernel: hdd: task_no_data_intr: error=0x04 {
DriveStatusError }
Jun 4 08:05:43 blink kernel: hdd: Write Cache FAILED Flushing!

I found somewhere that's something wrong with that maxtor drive.
However, everything works fine.
Bye

--

Jens Axboe

unread,

Jun 4, 2004, 6:24:26 AM6/4/04

to mattia, linux-...@vger.kernel.org

damnit, don't trim the cc list!

There's nothing wrong with the drive technically, it's just odd (lba48
without FLUSH_CACHE_EXT). It's really a linux ide bug that's fixed in
newer kernels. 2.6.7 will fix your problem.

Bartlomiej Zolnierkiewicz

unread,

Jun 4, 2004, 7:43:05 AM6/4/04

to Jens Axboe, mattia, linux-...@vger.kernel.org, Andrew Morton

Wrong.

Bug is a combination of a very minor firmware quirk
and lack of strict checking in Linux IDE driver.

FLUSH_CACHE_EXT bit is set but it is not supported
(but it is not a problem because LBA48 is not supported also).

It is fixed in 2.6.7-rc1 but your IDE barrier patch has this problem
(just reminding you that it is still not fixed in 2.6.7-rc2-mm2).

Cheers,
Bartlomiej

Jens Axboe

unread,

Jun 4, 2004, 7:59:20 AM6/4/04

to Bartlomiej Zolnierkiewicz, mattia, linux-...@vger.kernel.org, Andrew Morton

Ah my bad, I didn't realize this bit was actually set correctly (you
mean (cfs_enable_2 & 0x2400) == 0x2400 is actually true?).

> It is fixed in 2.6.7-rc1 but your IDE barrier patch has this problem
> (just reminding you that it is still not fixed in 2.6.7-rc2-mm2).

So where's the bug? I don't see it...

--
Jens Axboe

Bartlomiej Zolnierkiewicz

unread,

Jun 4, 2004, 8:14:28 AM6/4/04

to Jens Axboe, mattia, linux-...@vger.kernel.org, Andrew Morton

Yes but only on unaffected drives. ;-)

0x2000 is FLUSH_CACHE_EXT support
0x0400 is LBA48 support

Affected drives have 0x2000 set incorrectly so we have to check also 0x0400.
(== 0x2400 is needed because & 0x2400 is true if & 0x2000 OR if & 0x0400).

Everything is explained in the patch changelog:

| - many Maxtor disks incorrectly claim CACHE FLUSH EXT command support,
| fix it by checking both CACHE FLUSH EXT command and LBA48 support
| (thanks to Eric D. Mudama for help in fixing this)

and in the patch itself:

+/* some Maxtor disks have bit 13 defined incorrectly so check bit 10 too */
+#define ide_id_has_flush_cache_ext(id) \
+ (((id)->cfs_enable_2 & 0x2400) == 0x2400)

> > It is fixed in 2.6.7-rc1 but your IDE barrier patch has this problem
> > (just reminding you that it is still not fixed in 2.6.7-rc2-mm2).
>
> So where's the bug? I don't see it...

ide_fill_flush_cmd()

+ if (drive->id->cfs_enable_2 & 0x2400)
+ rq->buffer[0] = WIN_FLUSH_CACHE_EXT;

Cheers,
Bartlomiej

Jens Axboe

unread,

Jun 4, 2004, 8:18:41 AM6/4/04

to Bartlomiej Zolnierkiewicz, mattia, linux-...@vger.kernel.org, Andrew Morton

Ok, so I didn't miss anything. Was just wondering because of your
comment on -mm2.

> > > It is fixed in 2.6.7-rc1 but your IDE barrier patch has this problem
> > > (just reminding you that it is still not fixed in 2.6.7-rc2-mm2).
> >
> > So where's the bug? I don't see it...
>
> ide_fill_flush_cmd()
>
> + if (drive->id->cfs_enable_2 & 0x2400)
> + rq->buffer[0] = WIN_FLUSH_CACHE_EXT;

Just checked a fresh copy of 2.6.7-rc2-mm2, and it has it correctly:

static void ide_fill_flush_cmd(ide_drive_t *drive, struct request *rq)
{
char *buf = rq->cmd;

/*
* reuse cdb space for ata command
*/
memset(buf, 0, sizeof(rq->cmd));

rq->flags |= REQ_DRIVE_TASK | REQ_STARTED;
rq->buffer = buf;
rq->buffer[0] = WIN_FLUSH_CACHE;

if (ide_id_has_flush_cache_ext(drive->id))

rq->buffer[0] = WIN_FLUSH_CACHE_EXT;
}

So that's why I didn't follow what you meant, there should be no problem
here. You are reading disk-barrier-ide.patch, barrier-update.patch is
applied on top of that.

So we are back to step 1, why is his drive complaining. I'm guessing it
doesn't have write back caching enabled and aborts the flush on those
grounds - Ed, what is the output of hdparm -i on your booted system?

--
Jens Axboe

Bartlomiej Zolnierkiewicz

unread,

Jun 4, 2004, 8:40:16 AM6/4/04

to Jens Axboe, mattia, linux-...@vger.kernel.org, Andrew Morton

Hehe, indeed, my bad.

> So we are back to step 1, why is his drive complaining. I'm guessing it

Yep.

> doesn't have write back caching enabled and aborts the flush on those
> grounds - Ed, what is the output of hdparm -i on your booted system?

-

Andy Hawkins

unread,

Jun 4, 2004, 9:12:08 AM6/4/04

to linux-...@vger.kernel.org

Hi,

In article <2004060410...@suse.de>,

Jens Axboe<ax...@suse.de> wrote:
> The that's a known error, you should not worry about it. It's fixed in
> later kernels.

I'm seeing this error too, and also frequent crashes (total lock ups) of the
machine (every day or two at the moment). Could the two be related?

I haven't got round to doing any diagnostics yet, but there's nothing
obvious in the logs. I was going to dismantle the machine and check
connections etc. first.

Andy

Daniel Egger

unread,

Jun 4, 2004, 9:42:46 AM6/4/04

to Rick Jansen, linux-...@vger.kernel.org

On 04.06.2004, at 11:54, Rick Jansen wrote:

> The output from smartctl -a seems a bit large to include in this email.

This is usually also a bad sign, escpecially if the size is caused
by the Error Log Structure areas.

Please send the info of smartctl -v <device>. This should give a
good indication of whether this is a kernel or a drive problem
because it will show some of the internal status of the drive.

Servus,
Daniel

PGP.sig

Rick Jansen

unread,

Jun 4, 2004, 9:54:53 AM6/4/04

to Daniel Egger, linux-...@vger.kernel.org

On Fri, Jun 04, 2004 at 03:41:27PM +0200, Daniel Egger wrote:
> This is usually also a bad sign, escpecially if the size is caused
> by the Error Log Structure areas.
>
> Please send the info of smartctl -v <device>. This should give a
> good indication of whether this is a kernel or a drive problem
> because it will show some of the internal status of the drive.
>
> Servus,
> Daniel

I don't think -v is the option you mean:

smartctl version 5.30 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=======> INVALID ARGUMENT TO -v: /dev/hda <=======
=======> VALID ARGUMENTS ARE:
help
9,halfminutes
9,minutes
9,seconds
9,temp
192,emergencyretractcyclect
193,loadunload
194,10xCelsius
194,unknown
198,offlinescanuncsectorct
200,writeerrorcount
201,detectedtacount
220,temp
N,raw8
N,raw16
N,raw48
<=======

Use smartctl -h to get a usage summary

John Bradford

unread,

Jun 4, 2004, 9:58:47 AM6/4/04

to Rick Jansen, Daniel Egger, linux-...@vger.kernel.org

> I don't think -v is the option you mean:

Send all the data.

John.

John Bradford

unread,

Jun 4, 2004, 10:03:07 AM6/4/04

to Rick Jansen, linux-...@vger.kernel.org

Please don't trim the CC list.

Quote from Rick Jansen <ri...@rockingstone.nl>:

The lower ones are definitely within the capacity of the drive. I suspect
it _may_ genuinuely be faulty after all.

Back up your data.

If practical, try overwriting the whole disk, with something like:

dd if=/dev/zero of=/dev/hda

and see if it makes the errors go away.

John.

Daniel Egger

unread,

Jun 4, 2004, 10:16:49 AM6/4/04

to Rick Jansen, linux-...@vger.kernel.org

On 04.06.2004, at 15:53, Rick Jansen wrote:

> I don't think -v is the option you mean:

> smartctl version 5.30 Copyright (C) 2002-4 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/

DOH! Works for version 2.1 and will list all attributes
as well as their values.

Servus,
Daniel

PGP.sig

Rick Jansen

unread,

Jun 4, 2004, 10:19:33 AM6/4/04

to John Bradford, Daniel Egger, linux-...@vger.kernel.org

On Fri, Jun 04, 2004 at 03:05:20PM +0100, John Bradford wrote:
> Send all the data.
>
> John.

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: Maxtor 6Y120P0
Serial Number: Y43XXY5E
Firmware Version: YAR41BW0
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
Local Time is: Fri Jun 4 16:16:57 2004 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity was
never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 118) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 182) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 60) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 252 252 063 Pre-fail Always - 1249
4 Start_Stop_Count 0x0032 253 253 000 Old_age Always - 6
5 Reallocated_Sector_Ct 0x0033 252 252 063 Pre-fail Always - 15
6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0
7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0
8 Seek_Time_Performance 0x0027 249 244 187 Pre-fail Always - 41816
9 Power_On_Minutes 0x0032 251 251 000 Old_age Always - 908h+44m
10 Spin_Retry_Count 0x002b 252 252 157 Pre-fail Always - 0
11 Calibration_Retry_Count 0x002b 252 252 223 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 253 253 000 Old_age Always - 8
192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0
193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0
194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 38
195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 1632
196 Reallocated_Event_Count 0x0008 253 253 000 Old_age Offline - 0
197 Current_Pending_Sector 0x0008 252 252 000 Old_age Offline - 13
198 Offline_Uncorrectable 0x0008 252 252 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age Offline - 0
200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 5
202 TA_Increase_Count 0x000a 253 252 000 Old_age Always - 0
203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always - 0
204 Shock_Count_Write_Opern 0x000a 253 252 000 Old_age Always - 0
205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age Always - 0
207 Spin_High_Current 0x002a 252 252 000 Old_age Always - 0
208 Spin_Buzz 0x002a 252 252 000 Old_age Always - 0
209 Offline_Seek_Performnce 0x0024 195 195 000 Old_age Offline - 0
99 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0
100 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0
101 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0

SMART Error Log Version: 1
ATA Error Count: 440 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Timestamp = decimal seconds since the previous disk power-on.
Note: timestamp "wraps" after 2^32 msec = 49.710 days.

Error 440 occurred at disk power-on lifetime: 843 hours
When the command that caused the error occurred, the device was in an unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 03 77 dd 8b ed Error: UNC 3 sectors at LBA = 0x0d8bdd77 = 227270007

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
c8 00 08 77 dd 8b ed 08 2045545.712 READ DMA
ca 00 08 e8 a1 3c e0 08 2045545.680 WRITE DMA
ca 00 08 37 41 54 e2 08 2045545.680 WRITE DMA
ca 00 08 d0 ca 95 e0 08 2045545.680 WRITE DMA
ca 00 70 78 a1 3c e0 08 2045545.680 WRITE DMA

Error 439 occurred at disk power-on lifetime: 843 hours
When the command that caused the error occurred, the device was in an unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 03 77 dd 8b ed Error: UNC 3 sectors at LBA = 0x0d8bdd77 = 227270007

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
c8 00 08 77 dd 8b ed 08 2045544.688 READ DMA
c8 00 40 ff 0b c7 e2 08 2045544.656 READ DMA
c8 00 08 4f df 8b ed 08 2045543.632 READ DMA
ca 00 08 17 0f a8 e2 08 2045543.584 WRITE DMA
ca 00 08 ef ba 97 e2 08 2045543.584 WRITE DMA

Error 438 occurred at disk power-on lifetime: 843 hours
When the command that caused the error occurred, the device was in an unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 04 4f df 8b ed Error: UNC 4 sectors at LBA = 0x0d8bdf4f = 227270479

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
c8 00 08 4f df 8b ed 08 2045543.632 READ DMA
ca 00 08 17 0f a8 e2 08 2045543.584 WRITE DMA
ca 00 08 ef ba 97 e2 08 2045543.584 WRITE DMA
ca 00 10 d7 ba 97 e2 08 2045543.584 WRITE DMA
ca 00 10 8f ba 97 e2 08 2045543.584 WRITE DMA

Error 437 occurred at disk power-on lifetime: 843 hours
When the command that caused the error occurred, the device was in an unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 04 4f df 8b ed Error: UNC 4 sectors at LBA = 0x0d8bdf4f = 227270479

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
c8 00 08 4f df 8b ed 08 2045542.560 READ DMA
c8 00 08 8f 0b c7 e2 08 2045542.560 READ DMA
c8 00 50 af 0b c7 e2 08 2045542.560 READ DMA
c8 00 10 9f 0b c7 e2 08 2045542.560 READ DMA
c8 00 08 97 0b c7 e2 08 2045542.560 READ DMA

Error 436 occurred at disk power-on lifetime: 843 hours
When the command that caused the error occurred, the device was in an unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 03 77 dd 8b ed Error: UNC 3 sectors at LBA = 0x0d8bdd77 = 227270007

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
c8 00 08 77 dd 8b ed 08 2043137.264 READ DMA
c8 00 18 f7 36 7c e3 08 2043137.248 READ DMA
c8 00 10 3f 0f 7c e3 08 2043137.248 READ DMA
c8 00 10 57 d5 7b e3 08 2043137.248 READ DMA
c8 00 08 37 c7 7b e3 08 2043137.232 READ DMA

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 60% 839 0x0d8bdd7c
# 2 Short offline Completed: read failure 60% 816 0x0d8bdd7c
# 3 Short offline Completed: read failure 60% 805 0x0d8bdd7c

Bruce Allen

unread,

Jun 4, 2004, 3:34:48 PM6/4/04

to Rick Jansen, John Bradford, Daniel Egger, linux-...@vger.kernel.org

Hi Rick,

> smartctl version 5.30 Copyright (C) 2002-4 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/

Some comments below.

> Device Model: Maxtor 6Y120P0
> Serial Number: Y43XXY5E
> Firmware Version: YAR41BW0
> Device is: In smartctl database [for details use: -P show]
> ATA Version is: 7

You can use selective self-tests on this drive. You'll need smartmontools
version 5.31 or greater. This will help you pin down the bad LBAs
quickly.

> Self-test execution status: ( 118) The previous self-test completed having
> the read element of the test failed.

You have some unreadable disk sectors.

> Selective Self-test supported.

This is a very useful disk feature!

> 5 Reallocated_Sector_Ct 0x0033 252 252 063 Pre-fail Always - 15

Your disk has already reallocated 15 bad sectors.

> 197 Current_Pending_Sector 0x0008 252 252 000 Old_age Offline - 13
> 198 Offline_Uncorrectable 0x0008 252 252 000 Old_age Offline - 1

There are 13 unreadable disk sectors that the OS has tried to access, and
one additional unreadable disk sector found in an off-line scan.

> Error 440 occurred at disk power-on lifetime: 843 hours
> When the command that caused the error occurred, the device was in an unknown state.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 03 77 dd 8b ed Error: UNC 3 sectors at LBA = 0x0d8bdd77 = 227270007

This is a typical read that failed at LBA 227270007.

> SMART Self-test log structure revision number 1
> Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
> # 1 Short offline Completed: read failure 60% 839 0x0d8bdd7c
> # 2 Short offline Completed: read failure 60% 816 0x0d8bdd7c
> # 3 Short offline Completed: read failure 60% 805 0x0d8bdd7c

See http://smartmontools.sourceforge.net/BadBlockHowTo.txt for info on how
to locate and force reallocation of the bad sectors. When you are done
you should be able to run a long self-test (-t long) with no errors found.
You can use selective self-tests (-t select,M-N) to help locate the bad
sectors.

Bruce

Eric D. Mudama

unread,

Jun 4, 2004, 3:58:55 PM6/4/04

to mattia, linux-...@vger.kernel.org

>Jun 4 08:05:43 blink kernel: hdd: task_no_data_intr: status=0x51 {
>DriveReady SeekComplete Error }
>Jun 4 08:05:43 blink kernel: hdd: task_no_data_intr: error=0x04 {
>DriveStatusError }
>Jun 4 08:05:43 blink kernel: hdd: Write Cache FAILED Flushing!

That is a known issue in older driver versions that should be resolved
now. It only affects our latest generation of drives that are <=
120GB, it will not affect the larger drives (>= 160GB), and it won't
affect any drives of the next product generation because I fixed the
root cause in the drive as well as helping identify a driver
workaround/fix.

error=0x04 is an "abort" and not a critical error

The original note had error=0x40, which is an Uncorrectable ECC
error... that is bad, and you should probably RMA the drive.

You can also try to see if you can "fix" it by writing to that LBA
(obviously backup your data first) and see if the error goes
away... if that is the case, the ECC could have been due to a write
splice at power failure or some other transient event (extreme shock
or heat or something)

If there are a lot of ECC errors (you have them in about 2 places)
which could be a sign of bad things in progress, just RMA the drive.

--eric

--
Eric D. Mudama
edmu...@mail.bounceswoosh.org

Ed Tomlinson

unread,

Jun 4, 2004, 5:57:16 PM6/4/04

to linux-...@vger.kernel.org, Jens Axboe, Bartlomiej Zolnierkiewicz, mattia, Andrew Morton

On June 4, 2004 08:17 am, Jens Axboe wrote:
> So we are back to step 1, why is his drive complaining. I'm guessing it
> doesn't have write back caching enabled and aborts the flush on those
> grounds - Ed, what is the output of hdparm -i on your booted system?

On 2.6.7-rc2 its:

Ed

root@bert:/home/knoppix# hdparm -iI /dev/hda

/dev/hda:

Model=WDC AC26400R, FwRev=15.01J15, SerialNo=WD-WM6271600165
Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
RawCHS=13328/15/63, TrkSize=57600, SectSize=600, ECCbytes=40
BuffType=DualPortCache, BuffSize=512kB, MaxMultSect=16, MultSect=16
CurCHS=13328/15/63, CurSects=12594960, LBA=yes, LBAsects=12594960
IORDY=on/off, tPIO={min:160,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 *udma2 udma3 udma4
AdvancedPM=no WriteCache=enabled
Drive conforms to: device does not report version: 1 2 3 4

* signifies the current active mode

ATA device, with non-removable media
Model Number: WDC AC26400R
Serial Number: WD-WM6271600165
Firmware Revision: 15.01J15
Standards:
Supported: 4 3 2 1
Likely used: 4
Configuration:
Logical max current
cylinders 13328 13328
heads 15 15
sectors/track 63 63
--
bytes/track: 57600 bytes/sector: 600
CHS current addressable sectors: 12594960
LBA user addressable sectors: 12594960
device size with M = 1024*1024: 6149 MBytes
device size with M = 1000*1000: 6448 MBytes (6 GB)
Capabilities:
LBA, IORDY(can be disabled)
Buffer size: 512.0kB bytes avail on r/w long: 40 Queue depth: 1
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=160ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* READ BUFFER cmd
* WRITE BUFFER cmd
* Look-ahead
* Write cache
* Power Management feature set
* SMART feature set

mattia

unread,

Jun 5, 2004, 9:46:58 AM6/5/04

to Eric D. Mudama, linux-...@vger.kernel.org

Eric D. Mudama wrote:

>> Jun 4 08:05:43 blink kernel: hdd: task_no_data_intr: status=0x51 {
>> DriveReady SeekComplete Error }
>> Jun 4 08:05:43 blink kernel: hdd: task_no_data_intr: error=0x04 {
>> DriveStatusError }
>> Jun 4 08:05:43 blink kernel: hdd: Write Cache FAILED Flushing!
>
>
> That is a known issue in older driver versions that should be resolved
> now. It only affects our latest generation of drives that are <=
> 120GB, it will not affect the larger drives (>= 160GB), and it won't
> affect any drives of the next product generation because I fixed the
> root cause in the drive as well as helping identify a driver
> workaround/fix.
>
> error=0x04 is an "abort" and not a critical error
>
> The original note had error=0x40, which is an Uncorrectable ECC
> error... that is bad, and you should probably RMA the drive.
>
> You can also try to see if you can "fix" it by writing to that LBA
> (obviously backup your data first) and see if the error goes
> away... if that is the case, the ECC could have been due to a write
> splice at power failure or some other transient event (extreme shock
> or heat or something)
>
> If there are a lot of ECC errors (you have them in about 2 places)
> which could be a sign of bad things in progress, just RMA the drive.
>
> --eric
>
>

I use that drive normally, smartctl does not report errors.
That error was not displayed before: I don't remember, but maybe with
the 2.6.5 kernel (is that possible?) - now i run the 2.6.6

Bye