LibPATA code issues / 2.6.16 (previously, 2.6.15.x)

Justin Piszcz

unread,

Apr 21, 2006, 3:15:37 PM4/21/06

to Jeff Garzik, Mark Lord, David Greaves, Tejun Heo, linux-...@vger.kernel.org, IDE/ATA development list, albe...@tw.ibm.com, ax...@suse.de, Linus Torvalds, smartmonto...@lists.sourceforge.net

Yet a new problem, under 2.6.16, when I fill up the disk, smartmontools
reports this:

Apr 21 14:24:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable
(pending) sectors
Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable
(pending) sectors
Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Offline uncorrectable
sectors

What made it error under 2.6.16?

$ time dd if=/dev/zero of=file.out
dd: writing to `file.out': No space left on device
781118873+0 records in
781118872+0 records out
399932862464 bytes (400 GB) copied, 8873.06 seconds, 45.1 MB/s

real 147m53.092s
user 8m1.395s
sys 42m4.500s

$

Under 2.6.15.x, I did not see this behavior, is this going bad, or?

Thanks,

Justin.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Jeff Garzik

unread,

Apr 21, 2006, 3:19:33 PM4/21/06

to Justin Piszcz, Mark Lord, David Greaves, Tejun Heo, linux-...@vger.kernel.org, IDE/ATA development list, albe...@tw.ibm.com, ax...@suse.de, Linus Torvalds, smartmonto...@lists.sourceforge.net

Justin Piszcz wrote:
> Yet a new problem, under 2.6.16, when I fill up the disk, smartmontools
> reports this:
>
> Apr 21 14:24:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable
> (pending) sectors
> Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable
> (pending) sectors
> Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Offline uncorrectable
> sectors
>
> What made it error under 2.6.16?
>
> $ time dd if=/dev/zero of=file.out
> dd: writing to `file.out': No space left on device
> 781118873+0 records in
> 781118872+0 records out
> 399932862464 bytes (400 GB) copied, 8873.06 seconds, 45.1 MB/s
>
> real 147m53.092s
> user 8m1.395s
> sys 42m4.500s
>
> $
>
> Under 2.6.15.x, I did not see this behavior, is this going bad, or?

That's a disk-level problem. You've got bad sectors.

You can force the disk to replace the bad sectors by doing a disk-level
write:

dd if=/dev/zero of=/dev/sda1 bs=4k

and then test the disk with

smartctl -d ata -t long /dev/sda

If sectors continue to die, the disk is toast.

Jeff

Linus Torvalds

unread,

Apr 21, 2006, 3:30:56 PM4/21/06

to Jeff Garzik, Justin Piszcz, Mark Lord, David Greaves, Tejun Heo, linux-...@vger.kernel.org, IDE/ATA development list, albe...@tw.ibm.com, ax...@suse.de, smartmonto...@lists.sourceforge.net

On Fri, 21 Apr 2006, Jeff Garzik wrote:
>
> You can force the disk to replace the bad sectors by doing a disk-level write:
>
> dd if=/dev/zero of=/dev/sda1 bs=4k

NOTE! Obviously don't do this before you've backed up the disk. Depending
on the filesystem, you might just have overwritten something important, or
just your pr0n collection ;)

Jeff, please be a little more careful about telling people commands like
that. Some people might cut-and-paste the command without realizing what
it's doing as a way to "fix" their problem.

Linus

Jeff Garzik

unread,

Apr 21, 2006, 6:47:32 PM4/21/06

to Linus Torvalds, Justin Piszcz, Mark Lord, David Greaves, Tejun Heo, linux-...@vger.kernel.org, IDE/ATA development list, albe...@tw.ibm.com, ax...@suse.de, smartmonto...@lists.sourceforge.net

Linus Torvalds wrote:
>
> On Fri, 21 Apr 2006, Jeff Garzik wrote:
>> You can force the disk to replace the bad sectors by doing a disk-level write:
>>
>> dd if=/dev/zero of=/dev/sda1 bs=4k
>
> NOTE! Obviously don't do this before you've backed up the disk. Depending
> on the filesystem, you might just have overwritten something important, or
> just your pr0n collection ;)
>
> Jeff, please be a little more careful about telling people commands like
> that. Some people might cut-and-paste the command without realizing what
> it's doing as a way to "fix" their problem.

Agreed, though the original poster had already done a 400GB dd from
/dev/zero...

Jeff

Linus Torvalds

unread,

Apr 21, 2006, 8:07:03 PM4/21/06

to Jeff Garzik, Justin Piszcz, Mark Lord, David Greaves, Tejun Heo, linux-...@vger.kernel.org, IDE/ATA development list, albe...@tw.ibm.com, ax...@suse.de, smartmonto...@lists.sourceforge.net

On Fri, 21 Apr 2006, Jeff Garzik wrote:

>
> Agreed, though the original poster had already done a 400GB dd from
> /dev/zero...

Yes, but to a _file_ on the partition (ie he didn't overwrite any existign
data, just the empty parts of a filesystem).

I realize that it's not enough for the "re-allocate on write" behaviour,
and for that you really _do_ need to re-write the whole disk to get all
the broken blocks reallocated, but my argument was just that we should
make sure to _tell_ people when they are overwriting all their old data ;)

Linus

Leon Woestenberg

unread,

May 6, 2006, 11:09:44 AM5/6/06

to Linus Torvalds, smartmonto...@lists.sourceforge.net, Jeff Garzik, Justin Piszcz, Mark Lord, David Greaves, Tejun Heo, linux-...@vger.kernel.org, IDE/ATA development list, albe...@tw.ibm.com, ax...@suse.de

Hi all,

On Fri, 2006-04-21 at 17:05 -0700, Linus Torvalds wrote:
>
> On Fri, 21 Apr 2006, Jeff Garzik wrote:
>
> >
> > Agreed, though the original poster had already done a 400GB dd from
> > /dev/zero...
>
> Yes, but to a _file_ on the partition (ie he didn't overwrite any existign
> data, just the empty parts of a filesystem).
>
> I realize that it's not enough for the "re-allocate on write" behaviour,
> and for that you really _do_ need to re-write the whole disk to get all
> the broken blocks reallocated, but my argument was just that we should
> make sure to _tell_ people when they are overwriting all their old data ;)
>

I did not realize this before, and asked badblocks maintainer Theodore
if badblocks /some/file was supported (the man page says no); but of
course any filesystem can decide to re-allocate blocks for a file.

However, for large files where parts may be bad sectors, I am still
searching for a way to read, then re-write every physical sector
occupied by the file.

With the purpose to remap the bad sectors inside large MPEG files (where
I would rather have a few zeroed holes than a read error in them).

Anyone know such tooling exists? I suspect it has to use filesystem
specific IOCTL's to query for the blocks involved.

Regards,

Leon

Ingo Oeser

unread,

May 7, 2006, 8:48:34 AM5/7/06

to Leon Woestenberg, Linus Torvalds, smartmonto...@lists.sourceforge.net, Jeff Garzik, Justin Piszcz, Mark Lord, David Greaves, Tejun Heo, linux-...@vger.kernel.org, IDE/ATA development list, albe...@tw.ibm.com, ax...@suse.de

On Saturday, 6. May 2006 17:09, Leon Woestenberg wrote:
> However, for large files where parts may be bad sectors, I am still
> searching for a way to read, then re-write every physical sector
> occupied by the file.
>
> With the purpose to remap the bad sectors inside large MPEG files (where
> I would rather have a few zeroed holes than a read error in them).

This much easier to solve in the player software:
do {
ret = read(fd, buffer, size)
if (ret > 0) {
playbuffer(buffer, ret)
} else if (ret < 0) {
switch(errno) {
case EIO:
playbuffer(allzeroesbuffer, size);
/* skip over this frame because of disk problems */
lseek(fd, size, SEEK_CUR);
/* TODO: Handle return or lseek() here */
}
}
} while(ret != 0)

> Anyone know such tooling exists? I suspect it has to use filesystem
> specific IOCTL's to query for the blocks involved.

The (somewhat) portable ioctl() FIBMAP would suffice.
That way you find out what blocks are this file is mapped to,
and could add some of these blocks to the badblock list of e2fsck.

Regards

Ingo Oeser

Justin Piszcz

unread,

Jun 11, 2006, 7:14:08 AM6/11/06

to Linus Torvalds, Jeff Garzik, Mark Lord, David Greaves, Tejun Heo, linux-...@vger.kernel.org, IDE/ATA development list, albe...@tw.ibm.com, ax...@suse.de, smartmonto...@lists.sourceforge.net

[4597362.011000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ
0xb/00/00
[4597362.011000] ata3: status=0x51 { DriveReady SeekComplete Error }
[4597362.011000] ata3: error=0x04 { DriveStatusError }

Now under 2.6.16.20. (was doing an rsync from 1 drive (IDE) -> to this
SATA) drive.

The SATA drive AFAIK does not have any issues, no bad sectors/etc, still
the same drive as before, but this is the new one from the previous RMA.

Just FYI.