repeating medium errors and error from "camcontrol defects"

Alexander Leidinger

unread,

Oct 26, 2001, 6:20:24 AM10/26/01

to

Hi,

RELENG_4_4:

I get some errors:
(da0:ahc0:0:0:0): READ(10). CDB: 28 0 0 f1 2b 50 0 0 c 0
(da0:ahc0:0:0:0): MEDIUM ERROR info:f12b59 asc:11,0
(da0:ahc0:0:0:0): Unrecovered read error sks:80,29
(da0:ahc0:0:0:0): READ(10). CDB: 28 0 0 f1 2b 50 0 0 c 0
(da0:ahc0:0:0:0): MEDIUM ERROR info:f12b59 asc:11,0
(da0:ahc0:0:0:0): Unrecovered read error sks:80,29
(da0:ahc0:0:0:0): READ(10). CDB: 28 0 0 f1 2b 50 0 0 c 0
(da0:ahc0:0:0:0): MEDIUM ERROR info:f12b59 asc:11,0
(da0:ahc0:0:0:0): Unrecovered read error sks:80,29
(da0:ahc0:0:0:0): READ(10). CDB: 28 0 0 f1 2b 50 0 0 c 0
(da0:ahc0:0:0:0): MEDIUM ERROR info:f12b59 asc:11,0
(da0:ahc0:0:0:0): Unrecovered read error sks:80,29
(da0:ahc0:0:0:0): READ(10). CDB: 28 0 0 f1 2b 50 0 0 c 0
(da0:ahc0:0:0:0): MEDIUM ERROR info:f12b59 asc:11,0
(da0:ahc0:0:0:0): Unrecovered read error sks:80,29
(da0:ahc0:0:0:0): READ(10). CDB: 28 0 0 f1 2b 50 0 0 c 0
(da0:ahc0:0:0:0): MEDIUM ERROR info:f12b59 asc:11,0
(da0:ahc0:0:0:0): Unrecovered read error sks:80,29

They are from 9:44 to 11:20 in the morning.

I tried to issue "camcontrol defects da0 -f block -G", but I get
"camcontrol: Error returned from read defect data command".

da0 at ahc0 bus 0 target 0 lun 0
da0: <IBM DDRS-39130D DC1B> Fixed Direct Access SCSI-2 device
da0: 80.000MB/s transfers (40.000MHz, offset 15, 16bit), Tagged Queueing Enabled
da0: 8715MB (17850000 512 byte sectors: 255H 63S/T 1111C)

What did I wrong with camcontrol and what can I do to get rid of those
errors? Do I have to replace the disk or the cable?

Bye,
Alexander.

--
...and that is how we know the Earth to be banana-shaped.

http://www.Leidinger.net Alexander @ Leidinger.net
GPG fingerprint = C518 BC70 E67F 143F BE91 3365 79E2 9C60 B006 3FE7

To Unsubscribe: send mail to majo...@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message

Gregor Bittel

unread,

Oct 26, 2001, 9:24:50 AM10/26/01

to

Hi,

>I tried to issue "camcontrol defects da0 -f block -G", but I get

I get the same error, (4.1-Release), but you should try this
one:
camcontrol defects da0 -f phys -G
Note that I use "phys" instead of "block", hope it helps.

>What did I wrong with camcontrol and what can I do to get
>rid of those errors? Do I have to replace the disk or the cable?

First you should check the cable, then the termination, and
at least another harddisk.

-Gregor.

Kenneth D. Merry

unread,

Oct 26, 2001, 1:11:37 PM10/26/01

to

As someone else already suggested, you should try using the 'phys' defect
format, as many drives don't support the block format.

As for the errors, they mean you have a bad block. Replacing the cable
won't really affect the problem. You may want to look into getting a new
drive, though, since medium errors typically indicate the drive is on its
way out.

To fix this particular bad block, what you should do is make sure read
and write reallocation are turned on in mode page 1:

camcontrol modepage da0 -m 1 -e -P 3

will allow you to edit the saved parameters and enable read and write
reallocation.

Then, write zeroes to the bad block to force the drive to remap it.

Joerg Wunsch posted a this basic command in response to a similar question
last week:

dd if=/dev/zero of=/dev/da0 skip=0xf12b59 count=1

The caveat is that according to the man page, skip causes dd to skip on the
input, not the output. I would suggest looking at the dd(1) man page and
possibly using seek= instead.

Alternatively, you can turn the read command into a write command and issue
it via camcontrol:

camcontrol cmd da0 -v -c "2a 0 0 f1 2b 59 0 v:i2 0" 1 -o 512 - < /dev/zero

That should write 512 bytes (1 block) of zeros to block 0xf12b59.

Obviously all of this is somewhat "dangerous" if you make a mistake, and in
any case you're going to end up losing whatever data was in that 512 byte
block.

I normally use some variation of the camcontrol command above to force
block remapping. I've thought about implementing a more automatic way of
bad block remapping in camcontrol, but haven't really gotten around to it.

Ken
--
Kenneth Merry
k...@kdm.org

Alexander Leidinger

unread,

Oct 27, 2001, 12:57:26 PM10/27/01

to

On 26 Okt, Kenneth D. Merry wrote:

(Please keep me in the CC.)

[...]

>> What did I wrong with camcontrol and what can I do to get rid of those
>> errors? Do I have to replace the disk or the cable?
>
> As someone else already suggested, you should try using the 'phys' defect
> format, as many drives don't support the block format.

Yes, this works.

> As for the errors, they mean you have a bad block. Replacing the cable
> won't really affect the problem. You may want to look into getting a new
> drive, though, since medium errors typically indicate the drive is on its
> way out.

It shows 285 defects with -P and 0 with -G. I thougth I have to replace
it, when the -G list grows too fast, and not already if it just shows
one error.

> To fix this particular bad block, what you should do is make sure read
> and write reallocation are turned on in mode page 1:
>
> camcontrol modepage da0 -m 1 -e -P 3
>
> will allow you to edit the saved parameters and enable read and write
> reallocation.

Already turned on.

> Then, write zeroes to the bad block to force the drive to remap it.

Is there a way to determine the file which is affected? I didn't want
just wrote zeros to it if I didn't know which file this affects. Maybe
I'm able to replace the file from a backup after or before overwritting
the block .

Bye,
Alexander.

--
The best things in life are free, but the
expensive ones are still worth a look.

http://www.Leidinger.net Alexander @ Leidinger.net
GPG fingerprint = C518 BC70 E67F 143F BE91 3365 79E2 9C60 B006 3FE7

Kenneth D. Merry

unread,

Oct 27, 2001, 6:01:18 PM10/27/01

to

On Sat, Oct 27, 2001 at 18:59:14 +0200, Alexander Leidinger wrote:
> On 26 Okt, Kenneth D. Merry wrote:
>
> (Please keep me in the CC.)
>
> [...]

> >> What did I wrong with camcontrol and what can I do to get rid of those
> >> errors? Do I have to replace the disk or the cable?
> >
> > As someone else already suggested, you should try using the 'phys' defect
> > format, as many drives don't support the block format.
>

> Yes, this works.

>
> > As for the errors, they mean you have a bad block. Replacing the cable
> > won't really affect the problem. You may want to look into getting a new
> > drive, though, since medium errors typically indicate the drive is on its
> > way out.
>

> It shows 285 defects with -P and 0 with -G. I thougth I have to replace
> it, when the -G list grows too fast, and not already if it just shows
> one error.

True enough. What probably happened in this case is that it wasn't able to
recover the error on read, and you haven't tried to write the file, so it
can't really do anything but report an error.

> > To fix this particular bad block, what you should do is make sure read
> > and write reallocation are turned on in mode page 1:
> >
> > camcontrol modepage da0 -m 1 -e -P 3
> >
> > will allow you to edit the saved parameters and enable read and write
> > reallocation.
>

> Already turned on.

Cool.

> > Then, write zeroes to the bad block to force the drive to remap it.
>

> Is there a way to determine the file which is affected? I didn't want
> just wrote zeros to it if I didn't know which file this affects. Maybe
> I'm able to replace the file from a backup after or before overwritting
> the block .

I don't know how you'd do that. It's probably possible to dissect the
filesystem information to figure it out, but I don't know how to do that.

Cyrille Lefevre

unread,

Oct 27, 2001, 6:35:40 PM10/27/01

to

Kenneth D. Merry wrote:
> On Fri, Oct 26, 2001 at 12:21:56 +0200, Alexander Leidinger wrote:

[snip]

> Then, write zeroes to the bad block to force the drive to remap it.

I'm not really sure this always work. the way I use to remap bad
block is to make a read check using the SCSI BIOS controller (TEKRAM
and probably ADAPTEC allow this at boot time), when errors are
encountered, the BIOS asks you whether or not you want to remap the
defective block. so, I'm sure that bad blocks are remapped since it
asks me to do something.

Cyrille.
--
Cyrille Lefevre mailto:clef...@citeweb.net

Kenneth D. Merry

unread,

Oct 28, 2001, 12:35:36 AM10/28/01

to

On Sun, Oct 28, 2001 at 00:35:27 +0200, Cyrille Lefevre wrote:
> Kenneth D. Merry wrote:
> > On Fri, Oct 26, 2001 at 12:21:56 +0200, Alexander Leidinger wrote:

> [snip]

> > Then, write zeroes to the bad block to force the drive to remap it.
>

> I'm not really sure this always work. the way I use to remap bad
> block is to make a read check using the SCSI BIOS controller (TEKRAM
> and probably ADAPTEC allow this at boot time), when errors are
> encountered, the BIOS asks you whether or not you want to remap the
> defective block. so, I'm sure that bad blocks are remapped since it
> asks me to do something.

That'll work as well, but it takes a while to do a verify on an entire
disk.

I think that in most cases, writing to the bad block, if you have AWRE
turned on, will cause the block to get remapped. That is, unless, the
drive is somehow able to write to the block but can't read it.

You can detect that case by just trying to read the block once you've
written it. If you can read it, it has likely been remapped. (Unless the
drive can just "magically" read that block again.)

The "right" solution to this would probably be to implement something in
camcontrol similar to the BIOS verify routines that would then give the
user the option of remapping a bad block, writing zeroes to it, etc.

An adjunct to this command would be a separate camcontrol command to allow
the user to remap a specific block.

Alexander Leidinger

unread,

Oct 30, 2001, 8:43:16 AM10/30/01

to

On 27 Okt, Kenneth D. Merry wrote:

>> > Then, write zeroes to the bad block to force the drive to remap it.
>>
>> Is there a way to determine the file which is affected? I didn't want
>> just wrote zeros to it if I didn't know which file this affects. Maybe
>> I'm able to replace the file from a backup after or before overwritting
>> the block .
>
> I don't know how you'd do that. It's probably possible to dissect the
> filesystem information to figure it out, but I don't know how to do that.

I'm at least able to determine the partition which contains the bad
block: http://www.leidinger.net/FreeBSD/b2i.c

"b2i" means "block to inode", but this is misleading at the moment,
"b2p" (block to partition) or "bip" (block in partition) is the actual
semantic of b2i.c.

At the moment I try to squeeze some information (or pointers to docs)
about the on disk layout of FFS out of f...@freebsd.org. bde suggested to
just compare the files on the partition with the backup or just to read
every file on it and see which one produces errors (I'm going to do this
now), but I think a dedicated program which does the phys block ->
filename mapping needs less resources on a possible heavy loaded system.

Bye,
Alexander.

--
It's not a bug, it's tradition!

http://www.Leidinger.net Alexander @ Leidinger.net
GPG fingerprint = C518 BC70 E67F 143F BE91 3365 79E2 9C60 B006 3FE7

Bernd Walter

unread,

Oct 30, 2001, 11:10:32 AM10/30/01

to

On Tue, Oct 30, 2001 at 02:09:36PM +0100, Alexander Leidinger wrote:
> On 27 Okt, Kenneth D. Merry wrote:
> At the moment I try to squeeze some information (or pointers to docs)
> about the on disk layout of FFS out of f...@freebsd.org. bde suggested to
> just compare the files on the partition with the backup or just to read
> every file on it and see which one produces errors (I'm going to do this
> now), but I think a dedicated program which does the phys block ->
> filename mapping needs less resources on a possible heavy loaded system.

A directory entry (filename) points to an inode, which directly and
indirectly points to the logical blocks inside the partition.
There are no references for the way back.
So the only way to find out is to traverse the directorys and inodes
until you have found the inode and then to continue with the directorys
until you found all remaining directory enties.

--
B.Walter COSMO-Project http://www.cosmo-project.de
ti...@cicely.de Usergroup in...@cosmo-project.de

Alexander Leidinger

unread,

Oct 30, 2001, 11:29:31 AM10/30/01

to

On 30 Okt, Bernd Walter wrote:

>> At the moment I try to squeeze some information (or pointers to docs)
>> about the on disk layout of FFS out of f...@freebsd.org. bde suggested to
>> just compare the files on the partition with the backup or just to read
>> every file on it and see which one produces errors (I'm going to do this
>> now), but I think a dedicated program which does the phys block ->
>> filename mapping needs less resources on a possible heavy loaded system.
>
> A directory entry (filename) points to an inode, which directly and
> indirectly points to the logical blocks inside the partition.
> There are no references for the way back.
> So the only way to find out is to traverse the directorys and inodes
> until you have found the inode and then to continue with the directorys
> until you found all remaining directory enties.

That's what I want to do.

Bye,
Alexander.

--
Reboot America.

http://www.Leidinger.net Alexander @ Leidinger.net
GPG fingerprint = C518 BC70 E67F 143F BE91 3365 79E2 9C60 B006 3FE7