Mysterious behavior of "reassign" for ST173404FC drive. Advise sought.

Andrei A. Dergatchev

unread,

Mar 12, 2001, 9:51:44 AM3/12/01

to

Hello,

I'm struggling with remapping of a few bad block with the drive
which was a bit damaged during transportation.

I'm using "scu" utility from
http://www.bit-net.com/~rmiller/scu.html
- the only utility for AlphaLinux AFAIK.

With debug mode on (so that I can see which commands
are being used) I see that during "reassign" read is issued
first:

"28 00 06 88 67 fa 00 00 01 00"

and there is an error (MEDIUM ERROR, error code 5 -
error during I/O)

Afterwards, reassign is being issued.

"7 0 0 0 0 0" (not sure at the moment how many zeroes
exactly).

I see that reported result is ok - success.
However, defect list is not changed at all !
At that moment I can issue and check as many reads
and writes of this block as I wish
"28 00 06 88 67 fa 00 00 01 00"
"2a 00 06 88 67 fa 00 00 01 00"
however any verify command fails immediately with
the same error message as before, and everything
starts again.

I had a few other bad blocks, and what I noticed
that if the error message is "verify error" then
defect list is updated successfully. However, if
a bad block causes "direct read error" then
somehow the drive firmware does not perform
subsequent reassign, as it seems.
AWRE/ARRE bits does not affect anything
when I change them.

I wrote to Dr. Lagerweij asking if he can get another
firmware for such drive as he stated at his web page,
however I got
"Delivery failed 500 attempts: ba...@cts-bv.nl"

This is OEM drive so Seagate support declined to help me.

I'm a bit of stuck, having 4 such special bad blocks
on a 73 Gig drive - I can use it as all of them are after
55 Gig, however I don't feel very good about the drive's
mysterious inability to deal with them.

At the moment I believe that it is the drive's firmware
which is in fault - I have second similar drive sitting
in the same enclosure (with 2 power supplies) which
is ok, so I believe that power is ok.

I shall be very grateful for any hints or help,
Sincerely yours,

Andrei

Bart Lagerweij

unread,

Mar 12, 2001, 10:56:37 AM3/12/01

to

A.Derg...@tn.utwente.nl (Andrei A. Dergatchev) wrote in
<3aacdde6...@news.nic.utwente.nl>:

>I'm struggling with remapping of a few bad block with the drive
>which was a bit damaged during transportation.

[snip]

Is your drive out of warranty?

--
Bart Lagerweij - http://www.nu2.nu

Andrei A. Dergatchev

unread,

Mar 12, 2001, 11:59:00 AM3/12/01

to

On 12 Mar 2001 15:56:37 GMT, bart@[NoSpam]cts-bv.nl (Bart Lagerweij)
wrote:

>A.Derg...@tn.utwente.nl (Andrei A. Dergatchev) wrote in
><3aacdde6...@news.nic.utwente.nl>:
>
>>I'm struggling with remapping of a few bad block with the drive
>>which was a bit damaged during transportation.
>
>[snip]
>
>Is your drive out of warranty?

I don't know at the moment - Seagate says it's OEM drive.
Who is that OEM and how to contact them I'm yet to find out.
As the drive reports itself as "ST173404 CLAR72" I would
guess it is Clariion (or how they write themselves, CLARiiON ?).

Andrei A. Dergatchev

unread,

Mar 12, 2001, 3:39:50 PM3/12/01

to

>
>Now that you've changed the block size to 512, have you tried the host
>adapter's low-level format utility?

Yes :-)
This is working fine now, I did change block size
to 512 bytes and reformatted a few times already
(06 00 00 00 00 00 (or how many zeroes are there)
was clearly visible, and completed after ~2 hours).
>--
>«««««««««««««««««««««««»»»»»»»»»»»»»»»»»»»»»»»»»»»»
> Milton B. Hewitt
> CAUCE Member - http://www.cauce.org
> Proud supporter of the Microsoft Boycott Campaign
> http://www.vcnet.com/bms/
>«««««««««««««««««««««««»»»»»»»»»»»»»»»»»»»»»»»»»»»»

Regards,

Andrei

Andrei A. Dergatchev

unread,

Mar 12, 2001, 3:48:55 PM3/12/01

to

[...]

In addition:

I'm searching deja/now_google and to my surprise I found that
bad block reassignment seems to be a lot more involved
process that I anticipated. I'll be looking for more information.
I now wonder is it possible at all to tell the drive that
a particular lba is bad, or are drives have become so advanced
now that they insist on doing only what they find suitable
themselves ??
It doesn't look attractive to me to have to jump
to warranty papers any time a 73 Gig drive encountered
a bad block :-(

http://groups.google.com/groups?q=reassign+scsi&num=100&hl=en&lr=&safe=off&rnum=4&seld=932277752&ic=1
[by Folkert Rienstra]
"Nope. SCSI nor EIDE can/will reassign blocks with ECC errors on
reads.
Both can/will reassign bad blocks on writes. Both will have finite
numbers
of replacement sectors. SCSI and IDE drives match each other in these
departments."

http://groups.google.com/groups?q=reassign+scsi&num=100&hl=en&lr=&safe=off&rnum=8&seld=916403417&ic=1
[by Gerard Roudier]
"I suggest you to check how the 'Read-write recovery page' is set up
on
your disk. You must ensure that the ARRE bit, at least, is set. But
this
will not magically make your disk reassign the faulty block. In fact,
the
disk will reassign the block only if it has been able to recover from
the
error and the behaviour is governed by the other infos in the page. In
no
case, the drive will decide by itself to reassign the block and to
copy
corrupted data to the new block."

Folkert Rienstra

unread,

Mar 12, 2001, 6:42:20 PM3/12/01

to

"Andrei A. Dergatchev" <A.Derg...@tn.utwente.nl> wrote in message news:3aad345b...@news.nic.utwente.nl...
: [...]

:
: In addition:
:
: I'm searching deja/now_google and to my surprise I found that
: bad block reassignment seems to be a lot more involved
: process that I anticipated. I'll be looking for more information.

: I now wonder is it possible at all to tell the drive that a particular lba
: is bad,

Yes, it is. The SCSI command is 'Reassign Block'.
SCSI Verify utilities use it.

: or are drives have become so advanced now that they insist on doing only

: what they find suitable themselves ??

You decide, by setting some parameters in the Error Recovery Modepage(s),
how the drive shall react.

: It doesn't look attractive to me to have to jump to warranty papers any

Andrei A. Dergatchev

unread,

Mar 13, 2001, 10:15:46 AM3/13/01

to

Thanks for your response.

If I understood you correctly, what you're saying is
1st) I can change settings in mode page "error recovery"
2nd) I need to issue "reassign block"

Now, sorry for being unclear, what I wrote in my first
letter is:
1) "reassign block" does not work,
because after issuing "reassign block" and seeing
success received from the drive: a) grown defects list
remain unchanged and b) subsequent verify lba causes
the same error again at this very block.
2) this behavior does not depend if I switch on or off
AWRE/ARRE, DCR, DTE, PER and EER both
in pages 0x1 and 0x7.

When I wrote "Mysterious" it was related to the fact
that after issuing reassign block I can issue read
and write to this particular lba and the drive will
report success.
So, for me it seems like "reassign lba" does not
work with this firmware if a direct read error
occurred right before that. Why it would be
the case I don't know.
Because, I did see myself that reassign lba worked
successfully and updated grown defects list - but, only
if verify error occurred, and not direct read error.

I got all SCSI books I could find in the University
library and I plan to write a short program following
examples to see if sense codes can help me.
Unlikely of course.

Regards,

Andrei

On Tue, 13 Mar 2001 00:42:20 +0100, "Folkert Rienstra"
<see....@freeler.nl> wrote:

>
>"Andrei A. Dergatchev" <A.Derg...@tn.utwente.nl> wrote in message =
>news:3aad345b...@news.nic.utwente.nl...
>: [...]
>:=20
>: In addition:
>:=20

>: I'm searching deja/now_google and to my surprise I found that
>: bad block reassignment seems to be a lot more involved
>: process that I anticipated. I'll be looking for more information.
>

>: I now wonder is it possible at all to tell the drive that a particular =
>lba=20
>: is bad,=20

>
>Yes, it is. The SCSI command is 'Reassign Block'.
>SCSI Verify utilities use it.
>

>: or are drives have become so advanced now that they insist on doing =
>only=20

>: what they find suitable themselves ??
>

>You decide, by setting some parameters in the Error Recovery =

>Modepage(s),
>how the drive shall react.
>

>: It doesn't look attractive to me to have to jump to warranty papers =
>any=20

>: time a 73 Gig drive encountered a bad block :-(

>:=20
>: =
>http://groups.google.com/groups?q=3Dreassign+scsi&num=3D100&hl=3Den&lr=3D=
>&safe=3Doff&rnum=3D4&seld=3D932277752&ic=3D1
>: [by Folkert Rienstra]
>: "Nope. SCSI nor EIDE can/will reassign blocks with ECC errors on =
>reads.
>: Both can/will reassign bad blocks on writes. Both will have finite =
>numbers
>: of replacement sectors. SCSI and IDE drives match each other in these=20
>: departments."
>:=20
>: =
>http://groups.google.com/groups?q=3Dreassign+scsi&num=3D100&hl=3Den&lr=3D=
>&safe=3Doff&rnum=3D8&seld=3D916403417&ic=3D1
>: [by Gerard Roudier]
>: "I suggest you to check how the 'Read-write recovery page' is set up =
>on
>: your disk. You must ensure that the ARRE bit, at least, is set. But =
>this
>: will not magically make your disk reassign the faulty block. In fact, =
>the
>: disk will reassign the block only if it has been able to recover from =
>the
>: error and the behaviour is governed by the other infos in the page. In =
>no
>: case, the drive will decide by itself to reassign the block and to =

Folkert Rienstra

unread,

Mar 13, 2001, 5:16:47 PM3/13/01

to

"Andrei A. Dergatchev" <A.Derg...@tn.utwente.nl> wrote in message news:3aae3679...@news.nic.utwente.nl...
: Thanks for your response.

:
: If I understood you correctly, what you're saying is
: 1st) I can change settings in mode page "error recovery"
: 2nd) I need to issue "reassign block"
:
: Now, sorry for being unclear, what I wrote in my first
: letter is:
: 1) "reassign block" does not work,
: because after issuing "reassign block" and seeing
: success received from the drive: a) grown defects list
: remain unchanged and b) subsequent verify lba causes
: the same error again at this very block.

Hmm, that is strange. What 'verify'?

: 2) this behavior does not depend if I switch on or off

: AWRE/ARRE, DCR, DTE, PER and EER both
: in pages 0x1 and 0x7.

Well, I wouldn't expect it to.

:
: When I wrote "Mysterious" it was related to the fact

: that after issuing reassign block I can issue read
: and write to this particular lba and the drive will
: report success.

Uhmm, is that 'read and write' or 'write and read'? Perhaps the reassign block marked it a candidate instead of directly reassigning it. Then again it may have tried to recover the data, and on succes decided to not reassign at all, leaving it alone.

I suggest that you consult the drive manual and check the descriptions of all the settings that have anything to do with bad block error processing.
Also check the Vendor Unique Parameters page for them. I noticed on an IBM drive that there are several settings there that affect drive behaviour. Also check the 'Reassign Block' description for that drive.
http://www.seagate.com/support/disc/manuals/fc/29482c.pdf

: So, for me it seems like "reassign lba" does not

: work with this firmware if a direct read error
: occurred right before that. Why it would be
: the case I don't know.

: Because, I did see myself that reassign lba worked
: successfully and updated grown defects list - but, only
: if verify error occurred, and not direct read error.

I'm sorry, you lost me here, not enough detail. What 'verify' exactly? What 'direct read error'? There is SCSI 'Verify' (2F) and there is SCSI 'Write and Verify' (2E).
ERP will be different between the 2.

:
: I got all SCSI books I could find in the University

: library and I plan to write a short program following
: examples to see if sense codes can help me.

IBM's codeupdt?

: Unlikely of course.

:
: Regards,
:
: Andrei
:
: On Tue, 13 Mar 2001 00:42:20 +0100, "Folkert Rienstra"
: <see....@freeler.nl> wrote:
:
: >
: >"Andrei A. Dergatchev" <A.Derg...@tn.utwente.nl> wrote in message =
: >news:3aad345b...@news.nic.utwente.nl...
: >: [...]

: >:
: >: In addition:
: >:
: >: I'm searching deja/now_google and to my surprise I found that

: >: bad block reassignment seems to be a lot more involved
: >: process that I anticipated. I'll be looking for more information.
: >

: >: I now wonder is it possible at all to tell the drive that a particular lba
: >: is bad,
: >
: >Yes, it is. The SCSI command is 'Reassign Block'.

: >SCSI Verify utilities use it.
: >

: >: or are drives have become so advanced now that they insist on doing only
: >: what they find suitable themselves ??
: >
: >You decide, by setting some parameters in the Error Recovery Modepage(s),

: >how the drive shall react.
: >

: >: It doesn't look attractive to me to have to jump to warranty papers any
: >: time a 73 Gig drive encountered a bad block :-(
: >:
: >:
: >http://groups.google.com/groups?q=3Dreassign+scsi&num=3D100&hl=3Den&lr=3D=
: >&safe=3Doff&rnum=3D4&seld=3D932277752&ic=3D1
: >: [by Folkert Rienstra]
: >: "Nope. SCSI nor EIDE can/will reassign blocks with ECC errors on reads.
: >: Both can/will reassign bad blocks on writes. Both will have finite numbers

: >: of replacement sectors. SCSI and IDE drives match each other in these

: >: departments."
: >:
: >:
: >http://groups.google.com/groups?q=3Dreassign+scsi&num=3D100&hl=3Den&lr=3D=
: >&safe=3Doff&rnum=3D8&seld=3D916403417&ic=3D1
: >: [by Gerard Roudier]
: >: "I suggest you to check how the 'Read-write recovery page' is set up on
: >: your disk. You must ensure that the ARRE bit, at least, is set. But this
: >: will not magically make your disk reassign the faulty block. In fact, the
: >: disk will reassign the block only if it has been able to recover from the
: >: error and the behaviour is governed by the other infos in the page. In no
: >: case, the drive will decide by itself to reassign the block and to copy

: >
: >
:

Folkert Rienstra

unread,

Mar 13, 2001, 5:18:26 PM3/13/01

to

"Andrei A. Dergatchev" <A.Derg...@tn.utwente.nl> wrote in message news:3aacdde6...@news.nic.utwente.nl...
: Hello,

:
: I'm struggling with remapping of a few bad block with the drive
: which was a bit damaged during transportation.
:
: I'm using "scu" utility from
: http://www.bit-net.com/~rmiller/scu.html
: - the only utility for AlphaLinux AFAIK.
:
: With debug mode on (so that I can see which commands

: are being used) I see that during "reassign" a read is issued

: first:
:
: "28 00 06 88 67 fa 00 00 01 00"

Oops, that is not good.
If that finds a recoverable error the drive may already do a reassign on it's own.

:
: and there is an error (MEDIUM ERROR, error code 5 -

: error during I/O)
:
: Afterwards, reassign is being issued.
:
: "7 0 0 0 0 0" (not sure at the moment how many zeroes
: exactly).

If the block was already reassigned on the first read it's replacement now
gets replaced again.

:
: I see that reported result is ok - success.

: However, defect list is not changed at all !
: At that moment I can issue and check as many reads
: and writes of this block as I wish
: "28 00 06 88 67 fa 00 00 01 00"
: "2a 00 06 88 67 fa 00 00 01 00"

: however any verify command

What verify command?

:fails immediately with the same error message as before, and everything

Andrei A. Dergatchev

unread,

Mar 14, 2001, 4:56:35 PM3/14/01

to

Hello,

Thanks a lot for your help and sorry for being unclear -
I didn't know there are 2 verify commands till today :-(

I'm including all the output I could get for you, hoping
that perhaps it will be a better illustration than my words.

I was thinking about it a bit more - what happens if this
particular small region contains thousands of bad blocks -
will they be distributed across the whole drive, or is it possible
that some local place for spare good blocks gets totally
used and the drive can not reassign more bad blocks locally ?
Maybe I need to reformat the drive leaving more space
for bad blocks ? Is there such an option ?

And what is CAM - is it inside the hard drive or
in the driver somewhere - probably I'm seeing a software
glitch I mean ?

>: Now, sorry for being unclear, what I wrote in my first
>: letter is:
>: 1) "reassign block" does not work,
>: because after issuing "reassign block" and seeing
>: success received from the drive: a) grown defects list
>: remain unchanged and b) subsequent verify lba causes
>: the same error again at this very block.
>
>Hmm, that is strange. What 'verify'?

2F

Verifying 1 block (109602809 - 109602809) on /dev/sg2 (ST173404
CLAR72), please
SCSI Cmd = 'verify data', CDB length = 10, CDB bytes: 2f 0 6 88 67 f9
0 0 1 0
Dumping SCSI Pass Through at 0x12021c680:

File Descriptor: 3
pack_length: 0
reply_len: 36
pack_id: 1271
result: 0
timeout: 921600 (900 seconds)
CDB Length: 10
CDB Bytes: 2f 0 6 88 67 f9 0 0 1 0

Dumping SCSI Pass Through at 0x12021c6c0:

File Descriptor: 3
pack_length: 36
reply_len: 36
pack_id: 1271
result: 0
sense_buffer: f0 0 3 6 88 67 f9 a 0 0 0 0 11 0 e4 0
SCSI Status: 2 (SCSI_STAT_CHECK_CONDITION)

'verify data' failed, CAM status = 0x84 (CCB request completed with an
error)

Device/Command Information:

Device Name/Type: ST173404 CLAR72 (Direct
Access)
Nexus Information: Bus 1, Target 0, Lun 0
Erroring Command: verify data
Command Descriptor Block: 2f 00 06 88 67 f9 00 00 01 00
Command Timeout: 900 seconds
CAM Status: 0x4 (CAM_REQ_CMP_ERR - CCB
request compl
eted with an error)
SCSI Status: 0x2 (SCSI_STAT_CHECK_CONDITION
- Error,
exception, or abnormal condition)

Request Sense Information:

Error Code: 0x70 (Current Error)
Valid Bit: 0x1 (Information field is
valid)
Segment Number: 0
Sense Key: 0x3 (MEDIUM ERROR -
Nonrecoverable med
Illegal Length: 0
End Of Media: 0
File Mark: 0
Information Field: 0x68867f9 (109602809)
Additional Sense Length: 10
Command Specific Information: 0
Additional Sense Code/Qualifier: (0x11, 0) = Unrecovered read
error
Field Replaceable Unit Code: 0xe4
Sense Specific Bytes: 00 00 00

Completing 'verify data' command with status 5 (Input/output error).
scu: Verify error at logical block number 109602809 (0x68867f9).

Device/Command Information:

Device Name/Type: ST173404 CLAR72 (Direct
Access)
Nexus Information: Bus 1, Target 0, Lun 0
Erroring Command: verify data
Command Descriptor Block: 2f 00 06 88 67 f9 00 00 01 00
Command Timeout: 900 seconds
CAM Status: 0x4 (CAM_REQ_CMP_ERR - CCB
request completed with an error)

Request Sense Information:

Error Code: 0x70 (Current Error)
Valid Bit: 0x1 (Information field is
valid)
Segment Number: 0
Sense Key: 0x3 (MEDIUM ERROR -
Nonrecoverable mediu
m error)
Illegal Length: 0
End Of Media: 0
File Mark: 0
Information Field: 0x68867f9 (109602809)
Additional Sense Length: 10
Command Specific Information: 0
Additional Sense Code/Qualifier: (0x11, 0) = Unrecovered read
error
Field Replaceable Unit Code: 0xe4
Sense Specific Bytes: 00 00 00

---
Afterwards - I can *write* to this very block and I'm
getting success report:
---

Writing 1 block (109602809 - 109602809) on /dev/sg2 (ST173404 CLAR72)
with patte
SCSI Cmd = 'direct write', CDB length = 10, CDB bytes: 2a 0 6 88 67 f9
0 0 1 0
Data sent for 'direct write' command:

0x12020c000 39 9c c3 39 39 9c c3 39 39 9c c3 39 39 9c c3 39 39 9c c3
39
[more addresses here with the same writing pattern]...

Dumping SCSI Pass Through at 0x12021de60:

File Descriptor: 3
pack_length: 0
reply_len: 36
pack_id: 1271
result: 0
timeout: 184320 (180 seconds)
CDB Length: 10
CDB Bytes: 2a 0 6 88 67 f9 0 0 1 0

Dumping SCSI Pass Through at 0x12021d600:
File Descriptor: 3
pack_length: 0
reply_len: 36
pack_id: 1271
result: 0
timeout: 184320 (180 seconds)
CDB Length: 10
CDB Bytes: 2a 0 6 88 67 f9 0 0 1 0

Dumping SCSI Pass Through at 0x12021d600:

File Descriptor: 3
reply_len: 36
pack_id: 1271
result: 0
sense_buffer: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SCSI Status: 0 (SCSI_STAT_GOOD)
Host Status: 0 (DID_OK)
Driver Status: 0

'direct write' completed, CAM status = 0x1 (CCB request completed
w/out error)

---
Now, I can even read it - again this very "bad" block:
---

Reading 1 block (109602809 - 109602809) on /dev/sg2 (ST173404 CLAR72)
using patt
SCSI Cmd = 'direct read', CDB length = 10, CDB bytes: 28 0 6 88 67 f9
0 0 1 0

Dumping SCSI Pass Through at 0x12021d600:

File Descriptor: 3
pack_length: 0
reply_len: 548
pack_id: 1271
result: 0
timeout: 184320 (180 seconds)
CDB Length: 10
CDB Bytes: 28 0 6 88 67 f9 0 0 1 0

Dumping SCSI Pass Through at 0x12021de60:

File Descriptor: 3
pack_length: 548
reply_len: 548
pack_id: 1271
result: 0
sense_buffer: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SCSI Status: 0 (SCSI_STAT_GOOD)
Driver Status: 0

'direct read' completed, CAM status = 0x1 (CCB request completed w/out
error)
Data received for 'direct read' command:

0x12020c000 39 9c c3 39 39 9c c3 39 39 9c c3 39 39 9c c3 39 39 9c c3
39
[again more data here, read with the same pattern as just written]...

>: 2) this behavior does not depend if I switch on or off
>: AWRE/ARRE, DCR, DTE, PER and EER both
>: in pages 0x1 and 0x7.
>
>Well, I wouldn't expect it to.

Well, I hoped AWRE/ARRE would take care of that -
why wouldn't they ?
>
>:=20

>: When I wrote "Mysterious" it was related to the fact
>: that after issuing reassign block I can issue read
>: and write to this particular lba and the drive will
>: report success.
>
>Uhmm, is that 'read and write' or 'write and read'?

Doesn't matter - after failed verify 2F, I can write 2A
and read 28 as many times
as I want (see above). I just can't use 2F verify.

> Perhaps the reassign =
>block marked it a candidate instead of directly reassigning it. Then =
>again it may have tried to recover the data, and on succes decided to =

>not reassign at all, leaving it alone.

Is there any way for me to look up these details -
will the drive report it as a sense code or something ?

It looks pretty nice to me - the drive is writing
and reading what it just wrote nicely. Why verify is
failing I have no idea.
>
>I suggest that you consult the drive manual and check the descriptions =
>of all the settings that have anything to do with bad block error =
>processing.

:-( Already done. As I said, I tried quite a few different
configurations of error settings already :-(

>Also check the Vendor Unique Parameters page for them.

Is there a place to find out correct Vendor specific parameters,
except Seagate ? I don't remember seeing them in the manual :-(

> I noticed on an =
>IBM drive that there are several settings there that affect drive =

>behaviour. Also check the 'Reassign Block' description for that drive.
>http://www.seagate.com/support/disc/manuals/fc/29482c.pdf

Yes I got this manual already and read it. Unfortunately,
the description can't be a shorter one :-(

>
>: Because, I did see myself that reassign lba worked
>: successfully and updated grown defects list - but, only
>: if verify error occurred, and not direct read error.
>

>I'm sorry, you lost me here, not enough detail. What 'verify' exactly? =
>What 'direct read error'? There is SCSI 'Verify' (2F) and there is SCSI =
>'Write and Verify' (2E).

I'm again sorry for the confusion, 2F Verify, as I wrote.

>ERP will be different between the 2.

Well, I tried with both settings of ERP and the results
are the same.
Grown defects list never gets updated :-((
>
>:=20

>: I got all SCSI books I could find in the University
>: library and I plan to write a short program following
>: examples to see if sense codes can help me.
>
>IBM's codeupdt?
>

Unfortunately, I'm running an Alpha, so x86 binaries
are not really an option :-(

Thanks you very much for your help again,
Regards,

Andrei

Folkert Rienstra

unread,

Mar 14, 2001, 6:51:28 PM3/14/01

to

My impression that this has something to do with the drive being in AV mode
is growing stronger and stronger.
This could mean that all error reporting for read and/or write has been disabled
and all ERP that will delay reading and writing is not executed at all.
Perhaps that is why a Reassign Block is also not executed.

On the other hand I may misread you completely and you can ignore this post.
I'll offer it, as I invested to much time in it to throw it away again.

"Andrei A. Dergatchev" <A.Derg...@tn.utwente.nl> wrote in message news:3aafe18f....@news.nic.utwente.nl...
: Hello,

:
: Thanks a lot for your help and sorry for being unclear -
: I didn't know there are 2 verify commands till today :-(

Well, there is only one. The other one is actually a write with
subsequent verify.
In 'verify utility' terms that is a destructive verify.

:
: I'm including all the output I could get for you, hoping

: that perhaps it will be a better illustration than my words.

Not really, too much ballast.

:
: I was thinking about it a bit more - what happens if this

: particular small region contains thousands of bad blocks -
: will they be distributed across the whole drive, or is it possible
: that some local place for spare good blocks gets totally
: used and the drive can not reassign more bad blocks locally ?

A fast drive will have them interspersed all over the drive.

: Maybe I need to reformat the drive leaving more space

: for bad blocks ? Is there such an option ?

You'll know when you get the specific command descriptions for
the drive.

:
: And what is CAM - is it inside the hard drive or

: in the driver somewhere -

Driver.

: probably I'm seeing a software glitch I mean ?

:
: >: Now, sorry for being unclear, what I wrote in my first
: >: letter is:
: >: 1) "reassign block" does not work,
: >: because after issuing "reassign block" and seeing
: >: success received from the drive: a) grown defects list
: >: remain unchanged and b) subsequent verify lba causes
: >: the same error again at this very block.
: >
: >Hmm, that is strange. What 'verify'?
:
: 2F
:
: Verifying 1 block (109602809 - 109602809) on /dev/sg2 (ST173404
: CLAR72), please

1)

: SCSI Cmd = 'verify data', CDB length = 10, CDB bytes: 2f 0 6 88 67 f9 0 0 1 0

: Dumping SCSI Pass Through at 0x12021c680:
:
: File Descriptor: 3
: pack_length: 0
: reply_len: 36
: pack_id: 1271
: result: 0
: timeout: 921600 (900 seconds)
: CDB Length: 10
: CDB Bytes: 2f 0 6 88 67 f9 0 0 1 0
:
: Dumping SCSI Pass Through at 0x12021c6c0:
:
: File Descriptor: 3
: pack_length: 36
: reply_len: 36
: pack_id: 1271
: result: 0
: sense_buffer: f0 0 3 6 88 67 f9 a 0 0 0 0 11 0 e4 0

: SCSI Status: 2 (SCSI_STAT_CHECK_CONDITION)
:
: 'verify data' failed, CAM status = 0x84 (CCB request completed with an error)

:

2)

: Device/Command Information:

:
: Device Name/Type: ST173404 CLAR72 (Direct Access)
: Nexus Information: Bus 1, Target 0, Lun 0
: Erroring Command: verify data
: Command Descriptor Block: 2f 00 06 88 67 f9 00 00 01 00
: Command Timeout: 900 seconds
: CAM Status: 0x4 (CAM_REQ_CMP_ERR - CCB request
: completed with an error)

: SCSI Status: 0x2 (SCSI_STAT_CHECK_CONDITION -

: Error, exception, or abnormal condition)
:
: Request Sense Information:
:
: Error Code: 0x70 (Current Error)
: Valid Bit: 0x1 (Information field is valid)
: Segment Number: 0
: Sense Key: 0x3 (MEDIUM ERROR - Nonrecoverable med
: Illegal Length: 0
: End Of Media: 0
: File Mark: 0
: Information Field: 0x68867f9 (109602809)
: Additional Sense Length: 10
: Command Specific Information: 0
: Additional Sense Code/Qualifier: (0x11, 0) = Unrecovered read error
: Field Replaceable Unit Code: 0xe4
: Sense Specific Bytes: 00 00 00
:
: Completing 'verify data' command with status 5 (Input/output error).
: scu: Verify error at logical block number 109602809 (0x68867f9).

:

3)

: Device/Command Information:

:
: Device Name/Type: ST173404 CLAR72 (Direct Access)
: Nexus Information: Bus 1, Target 0, Lun 0
: Erroring Command: verify data
: Command Descriptor Block: 2f 00 06 88 67 f9 00 00 01 00
: Command Timeout: 900 seconds
: CAM Status: 0x4 (CAM_REQ_CMP_ERR - CCB request
: completed with an error)
:
: Request Sense Information:
:
: Error Code: 0x70 (Current Error)
: Valid Bit: 0x1 (Information field is valid)
: Segment Number: 0

: Sense Key: 0x3 (MEDIUM ERROR - Nonrecoverable medium error)

: Illegal Length: 0
: End Of Media: 0
: File Mark: 0
: Information Field: 0x68867f9 (109602809)
: Additional Sense Length: 10
: Command Specific Information: 0
: Additional Sense Code/Qualifier: (0x11, 0) = Unrecovered read error
: Field Replaceable Unit Code: 0xe4
: Sense Specific Bytes: 00 00 00

:

I see the same info 3 times in slightly different format, don't know what to think of that.

It is 3 11 0 or Unrecovered Read Error, sub description: Data ECC Check

Another sub description is:
Unrecovered Verify Error with BytChk Option before ECC check.

That could have been the reason of what you are seeing if the BytChk option
had been on in the verify command. Alas, it was not.

: Afterwards - I can *write* to this very block and I'm

Assuming you could not before? That is not the impression I got from you.

:
: Reading 1 block (109602809 - 109602809) on /dev/sg2 (ST173404 CLAR72)

AWRE/ARRE does nothing for the Reassign Block command.
They do something for Read and Write commands when the error is
recoverable (for reads). Then a "bad" block is replaced automatically.

: >
: >:
: >: When I wrote "Mysterious" it was related to the fact

: >: that after issuing reassign block I can issue read
: >: and write to this particular lba and the drive will
: >: report success.
: >
: >Uhmm, is that 'read and write' or 'write and read'?
:
: Doesn't matter - after failed verify 2F, I can write 2A
: and read 28 as many times

Well, it would have if a reassigment had taken place on the write.

: as I want (see above). I just can't use 2F verify.

Ok. This leads me to conclude that error reporting may be off for
reads and writes but not verifies.

:
: > Perhaps the reassign
: >block marked it a candidate instead of directly reassigning it. Then
: >again it may have tried to recover the data, and on succes decided to
: >not reassign at all, leaving it alone.

:
: Is there any way for me to look up these details -
: will the drive report it as a sense code or something ?

I don't think so.

:
: It looks pretty nice to me - the drive is writing

: and reading what it just wrote nicely. Why verify is
: failing I have no idea.

The sense error should supply that infornation.

: >
: >I suggest that you consult the drive manual and check the descriptions
: >of all the settings that have anything to do with bad block error
: >processing.

:
: :-( Already done. As I said, I tried quite a few different
: configurations of error settings already :-(
:
: >Also check the Vendor Unique Parameters page for them.
:
: Is there a place to find out correct Vendor specific parameters,
: except Seagate ? I don't remember seeing them in the manual :-(
:
: > I noticed on an

: >IBM drive that there are several settings there that affect drive
: >behaviour. Also check the 'Reassign Block' description for that drive.

: >http://www.seagate.com/support/disc/manuals/fc/29482c.pdf
:
: Yes I got this manual already and read it. Unfortunately,
: the description can't be a shorter one :-(
: >
: >: Because, I did see myself that reassign lba worked
: >: successfully and updated grown defects list - but, only
: >: if verify error occurred, and not direct read error.
: >
: >I'm sorry, you lost me here, not enough detail. What 'verify' exactly?

: >What 'direct read error'? There is SCSI 'Verify' (2F) and there is SCSI
: >'Write and Verify' (2E).

:
: I'm again sorry for the confusion, 2F Verify, as I wrote.
:
: >ERP will be different between the 2.
:
: Well, I tried with both settings of ERP and the results are the same.
: Grown defects list never gets updated :-((
: >

: >:
: >: I got all SCSI books I could find in the University

Folkert Rienstra

unread,

Mar 14, 2001, 6:50:49 PM3/14/01

to

"Andrei A. Dergatchev" <A.Derg...@tn.utwente.nl> wrote in message news:3aacdde6...@news.nic.utwente.nl...
: Hello,

:
: I'm struggling with remapping of a few bad block with the drive
: which was a bit damaged during transportation.
:
: I'm using "scu" utility from
: http://www.bit-net.com/~rmiller/scu.html
: - the only utility for AlphaLinux AFAIK.
:
: With debug mode on (so that I can see which commands
: are being used) I see that during "reassign" read is issued
: first:
:
: "28 00 06 88 67 fa 00 00 01 00"
:
: and there is an error (MEDIUM ERROR, error code 5 -
: error during I/O)
:
: Afterwards, reassign is being issued.
:
: "7 0 0 0 0 0" (not sure at the moment how many zeroes
: exactly).
:
: I see that reported result is ok - success.
: However, defect list is not changed at all !
: At that moment I can issue and check as many reads
: and writes of this block as I wish
: "28 00 06 88 67 fa 00 00 01 00"
: "2a 00 06 88 67 fa 00 00 01 00"
: however any verify command fails immediately with
: the same error message as before,

: and everything starts again.

This I don't get, WHAT starts again?

:
: I had a few other bad blocks, and what I noticed

: that if the error message is "verify error" then
: defect list is updated successfully. However, if
: a bad block causes "direct read error" then
: somehow the drive firmware does not perform
: subsequent reassign, as it seems.

Ditto. To recap (hope I get this right):

1)
You have an Unrecoverable Read Error.
You decide to Reassign the block by hand (utility)
It seems to do so but you find no evidence
The block appears readable/writable.
So far so good.

2)
Now you decide to do a Verify on that same block and it fails.
A Read on that block now also fails
You are back at square one, the block is bad again.

3)
You also noticed that a Verify that fails will reassign the block immediately

That does not make sense because
at 1) you say a Reassign is good and
at 2) you say a Verify after that is bad and
at 3) that wat is bad at 2) is actually good at 3).

That can't be both right.
There is a flipflop here, if you do 2) the block goes bad but if you repeat it 3)
it would be replaced (with evidence of that).

There is something missing or misrepresented here.

: AWRE/ARRE bits does not affect anything

Andrei A. Dergatchev

unread,

Mar 15, 2001, 6:23:09 PM3/15/01

to

Hello,

I'm very grateful for your help !!

I'm very sorry for possibly being unclear !

>
>This I don't get, WHAT starts again?

Error messages are reported for the block from which
I just read something to which I just wrote something.

>
>:
>: I had a few other bad blocks, and what I noticed
>: that if the error message is "verify error" then
>: defect list is updated successfully. However, if
>: a bad block causes "direct read error" then
>: somehow the drive firmware does not perform
>: subsequent reassign, as it seems.
>
>Ditto. To recap (hope I get this right):
>
>1)
>You have an Unrecoverable Read Error.
>You decide to Reassign the block by hand (utility)
>It seems to do so but you find no evidence
>The block appears readable/writable.
>So far so good.
>
>2)
>Now you decide to do a Verify on that same block and it fails.

Exactly

>A Read on that block now also fails

Yes.

When I
1) turn on the computer
2) try to read this block - failed (MEDIUM ERROR)
3) reassign - "scu" reads the block first and fails, proceed
with reassign - success
4) read the block - success
5) verify the block - failed
6) read the block - failed
7) reassign the block - success
8) read the block - success - the value read is the same as in 4)

>You are back at square one, the block is bad again.

An attempt to read the block causes MEDIUM ERROR again.
Or at least it is reported by CAM/driver.

>
>3)
>You also noticed that a Verify that fails will reassign the block immediately

I noticed a few bad blocks were reassigned when
verify failed - yes. But, those were another bad blocks,
never this one. Unfortunately, at that moment I have
not checked with debug option exact SCSI commands
and CAM error messages. The output of "scu" utility
stated that "verify failed at logical block". For this
bad block however, the output is that "direct read failed".
Grown defects list was updated too.
So, I guess that reassign worked - but apparently in
slightly different conditions.
I can dig in this issue more, if you wish - try to cancel
grown defects list and see the conditions when SCSI
reassign command actually works with this drive.

>
>That does not make sense because
>at 1) you say a Reassign is good and
>at 2) you say a Verify after that is bad and
>at 3) that wat is bad at 2) is actually good at 3).

What I said is that:
A) For this (#) bad block the result of "SCSI reassign"
is that subsequent reads and writes are reported to WORK.
B) For this (#) bad block the reasult of "2F verify" after A
causes reads 28 and writes 2A to fail.
C) (This one was probably confusing, I'm sorry about that)
I saw that for another bad blocks (##) reassign worked
(so something does work, that's all what I wanted to say
when I added this).

>
>That can't be both right.
>There is a flipflop here, if you do 2) the block goes bad but if you repeat it 3)
>it would be replaced (with evidence of that).

It seems, this block is whether never replaced (or marked as bad),
or a subsequent verify somehow nullifies the effect of reassign
(which is indeed NEVER shows up in grown list).

At the moment I see 3 possible explanations:
1) firmware is buggy - under certain conditions (if direct read
error occurred at previous time) the result of reassign does
not affect grown defects list. Where the data are coming from
which I see with subsequent read I have no idea.
I saw however that if I write first after reassign, it will
succeed and subsequent.
I'll need to try in another way, not like "scu" is doing -
I need to try to issue reassign without issuing read or write
to this particular address.
2) Even though I played with setting different options
ON and OFF, it is possible that I still missed the needed
magic combo.
3) Hardware fault. For some magic reason, this block or
this address should not be ever touched, I'll have to dig
in the fs driver to make sure it will never want to access
this particular block and afew others (because the rest
of that 73 Gigs are doing fine).

>
>There is something missing or misrepresented here.

I'd be extremely happy if you'd want to spent a few
minutes to look into this - root access to this computer
or I can visit you with this drive.

Thank you very much for your help again,
Regards,

Andrei

Folkert Rienstra

unread,

Mar 16, 2001, 9:27:33 AM3/16/01

to

"Andrei A. Dergatchev" <A.Derg...@tn.utwente.nl> wrote in message news:3ab13eec....@news.nic.utwente.nl...
: Hello,

:
: I'm very grateful for your help !!
:
: I'm very sorry for possibly being unclear !

Sometimes concise but to the point is just better.

: >
: >This I don't get, WHAT starts again?

:
: Error messages are reported for the block from which
: I just read something to which I just wrote something.
: >
: >:
: >: I had a few other bad blocks, and what I noticed
: >: that if the error message is "verify error" then
: >: defect list is updated successfully. However, if
: >: a bad block causes "direct read error" then
: >: somehow the drive firmware does not perform
: >: subsequent reassign, as it seems.
: >
: >Ditto. To recap (hope I get this right):
: >
: >1)
: >You have an Unrecoverable Read Error.
: >You decide to Reassign the block by hand (utility)

: >It seems to do so but you find no evidence of it
: >The block now appears readable/writable.

: >So far so good.
: >
: >2)
: >Now you decide to do a Verify on that same block and it fails.
:
: Exactly
:
: >A Read on that block now also fails
:
: Yes.
:
: When I
: 1) turn on the computer
: 2) try to read this block - failed (MEDIUM ERROR)
: 3) reassign - "scu" reads the block first and fails, proceed
: with reassign - success
: 4) read the block - success
: 5) verify the block - failed
: 6) read the block - failed
: 7) reassign the block - success
: 8) read the block - success - the value read is the same as in 4)
:
: >You are back at square one, the block is bad again.
:
: An attempt to read the block causes MEDIUM ERROR again.
: Or at least it is reported by CAM/driver.
: >
: >3)
: >You also noticed that a Verify that fails will reassign the block immediately
:
: I noticed a few bad blocks were reassigned when
: verify failed - yes. But, those were another bad blocks,
: never this one.

Perhaps they were recoverable errors?

: Unfortunately, at that moment I have

: not checked with debug option exact SCSI commands
: and CAM error messages. The output of "scu" utility
: stated that "verify failed at logical block". For this
: bad block however, the output is that "direct read failed".

Instead of "verify failed" although this was a verify. Right, that
is confusing. You need to find out what the meaning of that is.

: Grown defects list was updated too.

: So, I guess that reassign worked - but apparently in
: slightly different conditions.
: I can dig in this issue more, if you wish - try to cancel
: grown defects list and see the conditions when SCSI
: reassign command actually works with this drive.

You could. I still think you need to find out what all the configurable
options are for this drive that affect how bad blocks are handled.
Also check out the Reassign Block command. Although the manual says
it is supported make sure that it does not do a NO-OP.

: >
: >That does not make sense because

: >at 1) you say a Reassign is good and
: >at 2) you say a Verify after that is bad and
: >at 3) that wat is bad at 2) is actually good at 3).
:
: What I said is that:
: A) For this (#) bad block the result of "SCSI reassign"
: is that subsequent reads and writes are reported to WORK.

: B) For this (#) bad block the reasult of "2F verify" after A)
: causes reads 28 and writes 2A to fail.

: C) (This one was probably confusing, I'm sorry about that)
: I saw that for another bad blocks (##) reassign worked
: (so something does work, that's all what I wanted to say
: when I added this).
: >
: >That can't be both right.
: >There is a flipflop here, if you do 2) the block goes bad but if you
: >repeat it 3) it would be replaced (with evidence of that).
:
: It seems, this block is whether never replaced (or marked as bad),
: or a subsequent verify somehow nullifies the effect of reassign
: (which is indeed NEVER shows up in grown list).

Well, >something< happens to it.

:
: At the moment I see 3 possible explanations:

: 1) firmware is buggy - under certain conditions (if direct read
: error occurred at previous time) the result of reassign does
: not affect grown defects list. Where the data are coming from
: which I see with subsequent read I have no idea.

Cache?

Send and Receive Diagnostic can do a LBA -> CHS translation
for you. codeupdt has that as does WDC's SCSI Workbench.
When the block is reassigned it will have a new CHS address.
So how about going x86?

Btw, just discovered the SCU command is 'translate'.

: I saw however that if I write first after reassign, it will

: succeed and subsequent.
: I'll need to try in another way, not like "scu" is doing -
: I need to try to issue reassign without issuing read or write
: to this particular address.
: 2) Even though I played with setting different options
: ON and OFF, it is possible that I still missed the needed
: magic combo.

You need a good manual.
Apparently that is the 'Fibre Channel Interface Manual'.

: 3) Hardware fault. For some magic reason, this block or