Can I use this 250Gb hard disk?

Jonathan Eales

unread,

Dec 9, 2006, 3:23:44 PM12/9/06

to

I diagnosed correctly a customer's crashing system as a faulty Maxtor 250Gb
SATA hard disk drive. I replaced it and the customer is happy.

So put it in my server, I ran a low level format on it (Maxtor Power Max),
and then started using it as a 'just in case backup'. You know; DVD images
of Vista beta that I'd already copied to DVD; demos of games I'd already
bought or didn't want.

I've turned on SMART monitoring and been doing regular checks on it for the
last month and it seems OK.

Can I relegate up a division to the more important backup league or just
trash it.

Basically, do low level formats really work?

Thanks,
Jonathan

johannes

unread,

Dec 9, 2006, 3:38:46 PM12/9/06

to

I have a laptop drive, it jammed while I saved of a large spreadsheet,
the disk showed lots of errors and eventually didn't boot. I let it cool
down for 24 hours and it appeared to come back to life again. But...
Do you really want to trust your data to such a drive once it has failed?
I still keep it in the drawer, but not in the laptop.

Jim Howes

unread,

Dec 9, 2006, 6:39:42 PM12/9/06

to

Jonathan Eales wrote:
> I've turned on SMART monitoring and been doing regular checks on it for the
> last month and it seems OK.

Care to post the full smart diagnostics? (If you're not using
smartmontools (from smartmontools.sourceforge.net) get them, and do a
smartctl -a /dev/hdX >smart.txt, and post the resulting smart.txt;
replace X in /dev/hdX with the appropriate device name (read the docs)
for your drive. (and yes, /dev/hdX is the notation used by the windows
version of smartctl)

> Basically, do low level formats really work?

The question that needs to be answered is 'Why was the previous format
unsatisfactory?'. Factory formatting (which for some drives is the ONLY
way that they can be properly low-level formatted; some drives ignore
or just pay lip service to the concept once in end-user-land) should be
sufficient for the design life of the drive.

If defects have arisen since the factory formatting, then defects will
probably continue to arise. The onboard firmware, and surface testing
that has gone on during formatting will have mapped out current bad
areas, but if it is actual surface damage, such things have a habit of
spreading.

When the disc detects a problem with a particular sector, it adds that
sector to a list of 'pending' sectors. If that sector is ever
successfully read, it copies the contents to a new spare sector and maps
the defective sector out of use. If the sector is written to, it does
the remapping straight away. By writing to every sector on the disk
(which may be all that you have in fact done with your 'low-level
format') will have caused the remapping to have come into effect. This
does not mean that the underlying decay of the disc has gone away.

Without examining the full SMART stats, it is impossible to say whether
the drive is healthy or not. Even then, for some drives (IBM devices in
particular) SMART can report that the drive is perfectly healthy even
when it is quite obviously entirely broken.

Johnny B Good

unread,

Dec 9, 2006, 7:36:56 PM12/9/06

to

The message <457b493e$0$632$5a6a...@news.aaisp.net.uk>
from Jim Howes <jimh...@this.address.is.wrong> contains these words:

====snip====

> Without examining the full SMART stats, it is impossible to say whether
> the drive is healthy or not. Even then, for some drives (IBM devices in
> particular) SMART can report that the drive is perfectly healthy even
> when it is quite obviously entirely broken.

And, vice versa for Seagates (allegedly :-).

Anyhoo... I've come across this phenomena whereby, the maker's
diagnostic utility has granted an error code to validate application for
an RMA return and then (almost in the same breath, as it were) the "Low
Level Format" has given the drive a clean bill of health (after a seven
hour runtime) which subsequent re-testing fails to refute.

In this case, The trouble seems to be errors induced by OS and /or MoBo
interface problems (bad connections?) rather than in the low level
format itself. Still, regardless of data errors outside of the drive
itself, you'd think the diagnostics would be able to distinguish between
this type of error and CRC errors generated within the drive itself.

Possibly the real culprit might be the hdd controller bios responding
to 'illegal' commands generated by data corruption in the interface
(only applies to PATA, SATA applies CRC error checking to _everything_
over the SATA interface) which may induce errors _not related_ to
surface defects but which upset the diagnostics and/or normal operation.
The LLF may be able to do enough to clear such problems up.

The "Low Level Format" usually isn't really a completely low level
format, merely a case of test data being written to each and every
sector and test reads with an eye on any CRC errors that might be
generated. It passes this test if no CRC errors get generated.

Another possibility is that high temperature operation may have caused
errors within the 512 byte user accessable data field of the sector
which a subsequent "LLF" at normal temperature has been able to rectify.
In this case the permanent LLF has remained intact since it is only ever
read and never written to during normal operations.

Although you'd be prudent not to trust the drive for 'priceless' data
storage, it's quite possible that a LLF may have effected a permanent
fix. You'd have to thoroughly exercise it over a month or two to give it
a chance to show its true colours before commiting it to any mission
critical storage.

However, even a brand new drive can let you down. Modern drives seem
more prone to sudden failure than the older models of just a few years
back. The much larger capacities aggravate the consequences of such
failures since we're no longer talking of mere tens of gigabytes but
hundreds of gigabytes of lost data :-(

--
Regards, John.

Please remove the "ohggcyht" before replying.
The address has been munged to reject Spam-bots.

Jonathan Eales

unread,

Dec 10, 2006, 5:11:08 AM12/10/06

to

Thanks everyone for replying.

Jim as requested the SMART diagnostics:

smartctl version 5.36 [i686-mingw32-xp-sp2] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Maxtor DiamondMax Plus 9 family
Device Model: Maxtor 6Y250M0
Serial Number: Y63443AE
Firmware Version: YAR51EW0
User Capacity: 251,000,193,024 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
Local Time is: Sun Dec 10 10:02:08 2006 GMTST
SMART support is: Available - device has SMART capability.
Enabled status cached by OS, trying SMART RETURN STATUS
cmd.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 363) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 106) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 172 171 063 Pre-fail
s - 26542
4 Start_Stop_Count 0x0032 253 253 000 Old_age
ys - 280
5 Reallocated_Sector_Ct 0x0033 253 001 063 Pre-fail Always
In_the_past 3
6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail
ine - 0
7 Seek_Error_Rate 0x000a 253 252 000 Old_age
ys - 0
8 Seek_Time_Performance 0x0027 253 238 187 Pre-fail
s - 60600
9 Power_On_Minutes 0x0032 242 242 000 Old_age
ys - 678h+48m
10 Spin_Retry_Count 0x002b 253 252 157 Pre-fail
s - 0
11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail
s - 0
12 Power_Cycle_Count 0x0032 252 252 000 Old_age
ys - 561
192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age
ys - 0
193 Load_Cycle_Count 0x0032 253 253 000 Old_age
ys - 0
194 Temperature_Celsius 0x0032 253 253 000 Old_age
ys - 36
195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age
ys - 4137
196 Reallocated_Event_Count 0x0008 219 219 000 Old_age
line - 34
197 Current_Pending_Sector 0x0008 251 001 000 Old_age
line - 25
198 Offline_Uncorrectable 0x0008 253 109 000 Old_age
line - 0
199 UDMA_CRC_Error_Count 0x0008 199 198 000 Old_age
line - 1
200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age
ys - 0
201 Soft_Read_Error_Rate 0x000a 252 250 000 Old_age
ys - 1582
202 TA_Increase_Count 0x000a 253 248 000 Old_age
ys - 0
203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail
s - 18
204 Shock_Count_Write_Opern 0x000a 253 245 000 Old_age
ys - 0
205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age
ys - 0
207 Spin_High_Current 0x002a 253 252 000 Old_age
ys - 0
208 Spin_Buzz 0x002a 253 252 000 Old_age
ys - 0
209 Offline_Seek_Performnce 0x0024 196 193 000 Old_age
line - 0
99 Unknown_Attribute 0x0004 253 253 000 Old_age
line - 0
100 Unknown_Attribute 0x0004 253 253 000 Old_age
line - 0
101 Unknown_Attribute 0x0004 253 253 000 Old_age
line - 0

SMART Error Log Version: 1
ATA Error Count: 9950 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 9950 occurred at disk power-on lifetime: 3699 hours (154 days + 3
hours)
When the command that caused the error occurred, the device was in an
unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 59 40 b4 86 38 e0 Error: UNC at LBA = 0x003886b4 = 3704500

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
24 00 40 b4 86 38 e0 00 00:22:27.680 READ SECTOR(S) EXT
24 00 40 b6 07 37 e0 00 00:22:27.680 READ SECTOR(S) EXT
24 00 40 b7 88 35 e0 00 00:22:27.664 READ SECTOR(S) EXT
24 00 40 b9 09 34 e0 00 00:22:27.648 READ SECTOR(S) EXT
24 00 40 ba 8a 32 e0 00 00:22:27.632 READ SECTOR(S) EXT

Error 9949 occurred at disk power-on lifetime: 3699 hours (154 days + 3
hours)
When the command that caused the error occurred, the device was in an
unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 59 40 cb 15 22 e0 Error: UNC at LBA = 0x002215cb = 2233803

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
24 00 40 cb 15 22 e0 00 00:22:19.520 READ SECTOR(S) EXT
24 00 01 00 00 00 e0 00 00:22:19.504 READ SECTOR(S) EXT
24 00 40 cc 96 20 e0 00 00:22:19.488 READ SECTOR(S) EXT
24 00 40 ce 17 1f e0 00 00:22:19.472 READ SECTOR(S) EXT
24 00 40 cf 98 1d e0 00 00:22:19.456 READ SECTOR(S) EXT

Error 9948 occurred at disk power-on lifetime: 3699 hours (154 days + 3
hours)
When the command that caused the error occurred, the device was in an
unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 59 40 f8 33 f5 e0 Error: UNC at LBA = 0x00f533f8 = 16069624

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
24 00 40 f8 33 f5 e0 00 00:22:10.672 READ SECTOR(S) EXT
24 00 40 f9 b4 f3 e0 00 00:22:10.656 READ SECTOR(S) EXT
24 00 40 fb 35 f2 e0 00 00:22:10.656 READ SECTOR(S) EXT
24 00 40 fc b6 f0 e0 00 00:22:10.640 READ SECTOR(S) EXT
24 00 40 fe 37 ef e0 00 00:22:10.624 READ SECTOR(S) EXT

Error 9947 occurred at disk power-on lifetime: 3699 hours (154 days + 3
hours)
When the command that caused the error occurred, the device was in an
unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 59 40 44 e7 a8 e0 Error: UNC at LBA = 0x00a8e744 = 11069252

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
24 00 40 44 e7 a8 e0 00 00:20:56.000 READ SECTOR(S) EXT
24 00 40 46 68 a7 e0 00 00:20:56.000 READ SECTOR(S) EXT
24 00 40 47 e9 a5 e0 00 00:20:55.984 READ SECTOR(S) EXT
24 00 40 49 6a a4 e0 00 00:20:55.968 READ SECTOR(S) EXT
24 00 40 4a eb a2 e0 00 00:20:55.952 READ SECTOR(S) EXT

Error 9946 occurred at disk power-on lifetime: 3699 hours (154 days + 3
hours)
When the command that caused the error occurred, the device was in an
unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 59 40 3d 8e b0 e0 Error: UNC at LBA = 0x00b08e3d = 11570749

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
24 00 40 3d 8e b0 e0 00 00:20:45.392 READ SECTOR(S) EXT
24 00 40 3f 0f af e0 00 00:20:45.376 READ SECTOR(S) EXT
24 00 40 40 90 ad e0 00 00:20:45.360 READ SECTOR(S) EXT
24 00 40 42 11 ac e0 00 00:20:45.344 READ SECTOR(S) EXT
24 00 40 43 92 aa e0 00 00:20:45.328 READ SECTOR(S) EXT

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours)
LBA_of_first_error
# 1 Short offline Completed without error 00%
1 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

"Jim Howes" <jimh...@this.address.is.wrong> wrote in message
news:457b493e$0$632$5a6a...@news.aaisp.net.uk...

Jim Howes

unread,

Dec 10, 2006, 5:06:39 PM12/10/06

to

Jonathan Eales wrote:
> Thanks everyone for replying.
>
> Jim as requested the SMART diagnostics:

Ok, let's see what we can read into this...

> Extended self-test routine
> recommended polling time: ( 106) minutes.

Got a couple of hours? Run 'smartctl -t long /dev/hdX' and leave the
machine alone for a while. Do not reboot it during the testing time,
the test runs in the background on the drive itself, so you can use the
machine if you must, but drive accesses will slow the test down, and it
will take longer;the test will update all of the counters marked
'Offline' in the '-a' output...

By the way, your output seems to be wrapping and losing columns, atleast
it looks like that.

> 196 Reallocated_Event_Count 0x0008 219 219 000 Old_age
> line - 34
> 197 Current_Pending_Sector 0x0008 251 001 000 Old_age
> line - 25

ERROR ERROR ERROR - Core dumped.
At this point, if it were around here, the drive would be either being
RMA'd, or headed to landfill[1]. On the servers that I herd, a single
'Current_Pending_Sector' is enough for the drive to be schedule for swapout.

25 pending sectors? That means that the drive knows about a further 25
marginal sectors, and is waiting for a chance to remap them. If the LLF
did not do that, I'm not sure what will.

> 198 Offline_Uncorrectable 0x0008 253 109 000 Old_age
> line - 0

This is zero because you have not run a full test yet.

I suspect if you do run '-t long', you'll get a 'read element of test
failed' where you currently have 'self-test routine completed without
error or no self-test has ever been run.'

Jim

[1] After being quite thoroughly mangled beyond hope of data recovery by
the sort of methods that definitely count as treatment that voids the
warranty ;-)

Mike Tomlinson

unread,

Dec 10, 2006, 10:45:28 PM12/10/06

to

In article <kXEeh.14717$HV6....@newsfe1-gui.ntli.net>, Jonathan Eales
<Jon....@Virgin.net> writes

>Can I relegate up a division to the more important backup league or just
>trash it.

1) It's already let its owner down once.
2) It's a Maxtor.

Do you really need to ask?

--
(\__/)
(='.'=) This is Bunny. Copy and paste Bunny into your
(")_(") signature to help him gain world domination.

Jonathan Eales

unread,

Dec 11, 2006, 3:09:07 PM12/11/06

to

Jim,

Here's that data again following a long test:

martctl version 5.36 [i686-mingw32-xp-sp2] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Maxtor DiamondMax Plus 9 family
Device Model: Maxtor 6Y250M0
Serial Number: Y63443AE
Firmware Version: YAR51EW0
User Capacity: 251,000,193,024 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0

Local Time is: Mon Dec 11 20:02:43 2006 GMTST

SMART support is: Available - device has SMART capability.
Enabled status cached by OS, trying SMART RETURN STATUS
cmd.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine

completed
without error or no self-test has ever
been run.

Total time to complete Offline
data collection: ( 363) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.

Short self-test routine
recommended polling time: ( 2) minutes.

Extended self-test routine
recommended polling time: ( 106) minutes.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 172 171 063 Pre-fail

s - 26878

4 Start_Stop_Count 0x0032 253 253 000 Old_age

ys - 281

5 Reallocated_Sector_Ct 0x0033 253 001 063 Pre-fail Always
In_the_past 3
6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail
ine - 0
7 Seek_Error_Rate 0x000a 253 252 000 Old_age
ys - 0

8 Seek_Time_Performance 0x0027 251 238 187 Pre-fail
s - 44970

9 Power_On_Minutes 0x0032 242 242 000 Old_age

ys - 703h+39m

10 Spin_Retry_Count 0x002b 253 252 157 Pre-fail
s - 0
11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail
s - 0
12 Power_Cycle_Count 0x0032 252 252 000 Old_age

ys - 562

192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age
ys - 0
193 Load_Cycle_Count 0x0032 253 253 000 Old_age
ys - 0
194 Temperature_Celsius 0x0032 253 253 000 Old_age

ys - 42

195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age

ys - 2939

196 Reallocated_Event_Count 0x0008 219 219 000 Old_age
line - 34
197 Current_Pending_Sector 0x0008 251 001 000 Old_age
line - 25

198 Offline_Uncorrectable 0x0008 253 109 000 Old_age
line - 0

199 UDMA_CRC_Error_Count 0x0008 199 198 000 Old_age
line - 1
200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age
ys - 0

201 Soft_Read_Error_Rate 0x000a 253 250 000 Old_age
ys - 3

202 TA_Increase_Count 0x000a 253 248 000 Old_age
ys - 0
203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail

s - 2

204 Shock_Count_Write_Opern 0x000a 253 245 000 Old_age
ys - 0
205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age
ys - 0
207 Spin_High_Current 0x002a 253 252 000 Old_age
ys - 0
208 Spin_Buzz 0x002a 253 252 000 Old_age
ys - 0

209 Offline_Seek_Performnce 0x0024 192 192 000 Old_age
line - 0
99 Unknown_Attribute 0x0004 253 253 000 Old_age
line - 0
100 Unknown_Attribute 0x0004 253 253 000 Old_age
line - 0
101 Unknown_Attribute 0x0004 253 253 000 Old_age
line - 0

SMART Error Log Version: 1

# 1 Extended offline Completed without error 00%
0 -
# 2 Extended offline Completed without error 00%
2 -
# 3 Short offline Completed without error 00%
1 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

"Jim Howes" <jimh...@this.address.is.wrong> wrote in message

news:457c84ef$0$626$5a6a...@news.aaisp.net.uk...