SMART

Ivan Voras

unread,

Nov 12, 2009, 7:25:12 AM11/12/09

to

Jeremy Chadwick wrote:

> I can teach you how to decode/read SMART statistics correctly.
>

Actually, it would be good if you taught more than him :)

I've always wondered how important are each of the dozen or so
statistics and what indicates what...

Here is for example my desktop drive:

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 087 083 006 Pre-fail
Always - 45398197
3 Spin_Up_Time 0x0003 096 093 000 Pre-fail
Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always - 64
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail
Always - 247407473
9 Power_On_Hours 0x0032 089 089 000 Old_age
Always - 10155
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always - 64
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
- 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
- 0
190 Airflow_Temperature_Cel 0x0022 058 055 045 Old_age Always
- 42 (Lifetime Min/Max 37/44)
194 Temperature_Celsius 0x0022 042 045 000 Old_age Always
- 42 (0 20 0 0)
195 Hardware_ECC_Recovered 0x001a 062 059 000 Old_age Always
- 45398197
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
- 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age
Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always
- 0

I see many values exceeding threshold but since I see it so often on
other drives I don't know what the threshold is for.

_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"

Thomas Backman

unread,

Nov 12, 2009, 7:29:32 AM11/12/09

to

None of the your values are exceeding the threshold - it works backwards. If the value is LOWER than the threshold, you might be in trouble.
Also, judging by the raw read error rate, seek error rate and hardward ECC recovered, allow me to guess that this is a Seagate drive. :-)
(Seagate drives, perhaps among others, use these raw values way differently than others. My Hitachi 7K1000.B has 0 on those.)

Regards,
Thomas_______________________________________________

Ivan Voras

unread,

Nov 12, 2009, 7:56:16 AM11/12/09

to

Good to know.

> Also, judging by the raw read error rate, seek error rate and hardward ECC recovered, allow me to guess that this is a Seagate drive. :-)
> (Seagate drives, perhaps among others, use these raw values way differently than others. My Hitachi 7K1000.B has 0 on those.)

Yes, it's Seagate. Statistically I have the least problems with their
drives. But I imagine that lack of standardization about these
statistics very much limits the usability of SMART, right?

Bruce Cran

unread,

Nov 12, 2009, 8:32:21 AM11/12/09

to

On Thu, 12 Nov 2009 13:56:16 +0100
Ivan Voras <ivo...@freebsd.org> wrote:

> Yes, it's Seagate. Statistically I have the least problems with their
> drives. But I imagine that lack of standardization about these
> statistics very much limits the usability of SMART, right?
>

The main problem with SMART appears to be that it's not an accurate
predictor of drive failure, according to a study done at Google - see
http://labs.google.com/papers/disk_failures.pdf

--
Bruce Cran

Ivan Voras

unread,

Nov 12, 2009, 8:35:16 AM11/12/09

to

Bruce Cran wrote:
> On Thu, 12 Nov 2009 13:56:16 +0100
> Ivan Voras <ivo...@freebsd.org> wrote:
>
>> Yes, it's Seagate. Statistically I have the least problems with their
>> drives. But I imagine that lack of standardization about these
>> statistics very much limits the usability of SMART, right?
>
> The main problem with SMART appears to be that it's not an accurate
> predictor of drive failure, according to a study done at Google - see
> http://labs.google.com/papers/disk_failures.pdf

I've seen it. But I don't remember if they addressed the problem of
nonstandard interpretations of statistics? I do remember they said they
buy from multiple drive vendors.

Dimitry Andric

unread,

Nov 12, 2009, 9:06:35 AM11/12/09

to

On 2009-11-12 14:35, Ivan Voras wrote:
> I've seen it. But I don't remember if they addressed the problem of
> nonstandard interpretations of statistics?

Note the statistics you quoted are "Vendor Specific SMART Attributes",
so it is quite logical for different vendors to have different
statistics. :)

Ivan Voras

unread,

Nov 12, 2009, 9:11:18 AM11/12/09

to

Dimitry Andric wrote:
> On 2009-11-12 14:35, Ivan Voras wrote:
>> I've seen it. But I don't remember if they addressed the problem of
>> nonstandard interpretations of statistics?
>
> Note the statistics you quoted are "Vendor Specific SMART Attributes",
> so it is quite logical for different vendors to have different
> statistics. :)

I see your point :)

Though I would hope that a statistics like:

1 Raw_Read_Error_Rate 0x000f 087 083 006 Pre-fail Always
- 45398197

would have an equivalent across vendors :) I know, it's too much to ask :)

Ian Smith

unread,

Nov 12, 2009, 9:29:17 AM11/12/09

to

On Thu, 12 Nov 2009, Ivan Voras wrote:
> Dimitry Andric wrote:
> > On 2009-11-12 14:35, Ivan Voras wrote:
> > > I've seen it. But I don't remember if they addressed the problem of
> > > nonstandard interpretations of statistics?
> >
> > Note the statistics you quoted are "Vendor Specific SMART Attributes",
> > so it is quite logical for different vendors to have different
> > statistics. :)
>
> I see your point :)
>
> Though I would hope that a statistics like:
>
> 1 Raw_Read_Error_Rate 0x000f 087 083 006 Pre-fail Always
> - 45398197
>
> would have an equivalent across vendors :) I know, it's too much to ask :)

True .. but all you really need to know is that as far as your disk
vendor is concerned, your error rate is 87 (somethings), the worst it's
ever been is 83 and if it were nearer 6 somethings, you should worry :)

9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 10155

Seagate says you're only 11% on the way to (mean) oblivion .. if you
believe it should run 11.4 years. We had one 4GB IBM drive that did!

cheers, Ian

Jeremy Chadwick

unread,

Nov 12, 2009, 12:44:28 PM11/12/09

to

On Thu, Nov 12, 2009 at 01:25:12PM +0100, Ivan Voras wrote:
> Jeremy Chadwick wrote:
>
> >I can teach you how to decode/read SMART statistics correctly.
> >
>

> Actually, it would be good if you taught more than him :)
>
> I've always wondered how important are each of the dozen or so
> statistics and what indicates what...

I started a write-up but after writing about 300 lines realised that if
I continued the details would eventually be lost in the Sea of
Information Chaos that is a mailing list. :-) I've gone over how to
read SMART data 3 separate times in the past 2 months (at work, on a
public forum, and in private mail), so this would be the 4th...

I'll work on writing an actual HTML document to put up on my web site
and will respond with the URL once I finish it.

Sorry for the "yeah sure I can help you read this data" response
followed by what will probably be labelled as an excuse by some.
Admittedly reading the output is pretty simple, but "getting familiar"
with what the output looks like (on a per-vendor basis) takes exposure
to all sorts of drives, ditto with F/W bugs and so on.

In general though, don't let anyone tell you SMART is worthless. The
"overall health assessment" status is generally worthless, but the
per-attribute data is of great use. Don't let anyone tell you the
weighted/adjusted values (VALUE/WORST/THRESH) are useless either; in
some cases they're all you can safely rely on. Don't damn SMART when
it's actually the manufacturers which need to be spanked for setting
such unreasonable health failure thresholds.

Rick C. Petty

unread,

Nov 12, 2009, 4:33:20 PM11/12/09

to

On Thu, Nov 12, 2009 at 09:44:28AM -0800, Jeremy Chadwick wrote:
> On Thu, Nov 12, 2009 at 01:25:12PM +0100, Ivan Voras wrote:
> > Jeremy Chadwick wrote:
> >
> > >I can teach you how to decode/read SMART statistics correctly.
> >
> > Actually, it would be good if you taught more than him :)
> >
> > I've always wondered how important are each of the dozen or so
> > statistics and what indicates what...
>

> I'll work on writing an actual HTML document to put up on my web site
> and will respond with the URL once I finish it.

Isn't this sufficient?
http://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes

If not, could you make the changes on wikipedia? This isn't a
FreeBSD-specific topic, and the larger community would benefit from such
documentation.

-- Rick C. Petty

Hans F. Nordhaug

unread,

Nov 12, 2009, 5:50:17 PM11/12/09

to

* Jeremy Chadwick <fre...@jdc.parodius.com> [2009-11-12]:
> On Thu, Nov 12, 2009 at 11:33:08AM +0100, Hans F. Nordhaug wrote:
> > Suddenly /bin/sh started to crash all the time with core dumps. I'm
> > running FreeBSD 7.2-RELEASE-p4 (i386) and I have not updated anything
> > lately. The /bin/sh binary seems to be untouched. It might be some
> > hardware trouble, but the machine seems to run OK now. (I had to
> > replace /bin/sh with a symlink to /rescue/sh.)
> >
> > I would like to track down the problem, but running sh I only get
> > "Segmentation fault: 11 (core dumped)". I would be happy to run
> > gdb and give you a backtrace. Any clues?
> >
> > PS! I tried to run "freebsd-update IDS" to see if any files are
> > broken, but it stops at
> > Inspecting system... sha256: ///boot/kernel/utopia.ko.symbols: Input/output error
>
> Hardware problem. Take your pick: bad RAM, bad hard disk, bad
> motherboard, bad PSU, bad cabling.
>
> You can rule out hard disk problems by installing smartmontools from
> ports (sysutils/smartmontools). Please provide output from the
> following command:
>
> smartctl -a /dev/{disk}

Thx for the infp about smartmontools. The only problem I found was:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

190 Airflow_Temperature_Cel 0x0022 001 001 045 Old_age Always FAILING_NOW 253

Don't know if this is a serious problem.

Hans

PS! The disk is of type
Model Family: Western Digital Caviar Second Generation Serial ATA family
Device Model: WDC WD2500JS-55NCB1