Recommended hard drive temperature

Franc Zabkar

unread,

Apr 16, 2008, 3:42:36 AM4/16/08

to

I've been reading this document which is an analysis of Google's hard
disc failure rates:

Failure Trends in a Large Disk Drive Population:
http://research.google.com/archive/disk_failures.pdf

It states that "contrary to previously reported results, we found very
little correlation between failure rates and either elevated
temperature or activity levels."

Figure 4 "shows that failures do not increase when the average
temperature increases. In fact, there is a clear trend showing that
lower temperatures are associated with higher failure rates. Only at
very high temperatures is there a slight reversal of this trend."

"Figure 5 looks at the average temperatures for different age groups.
The distributions are in sync with Figure 4 showing a mostly flat
failure rate at mid-range temperatures and a modest increase at the
low end of the temperature distribution. What stands out are the 3 and
4-year old drives, where the trend for higher failures with higher
temperature is much more constant and also more pronounced."

"Overall our experiments can confirm previously reported temperature
effects only for the high end of our temperature range and especially
for older drives. In the lower and middle temperature ranges, higher
temperatures are not associated with higher failure rates."

Figure 5 suggests that Google's optimum temperature for hard drives is
between 35C and 40C.

Elsewhere I found this old IBM article:
http://web.archive.org/web/20000519230551/http://www.storage.ibm.com/hardsoft/diskdrdl/technolo/drivetemp/drivetemp.htm

It states that "figure 2 shows the dramatic effect that temperature
has on the overall reliability of a hard disk drive. Derivations [sic]
from a nominal operating temperature (assumed to be maintained over
the life of a drive) can result in a derivation [sic] from the nominal
failure rate. As the temperature exceeds the recommended level, the
failure rate increases two to three percent for every one degree rise
above it. For example, a hard disk drive running for an extended
period of time at five degrees above the recommended temperature can
experience an increase in failure rate of 10 to 15 percent. Likewise,
operating a drive below the recommended temperature can extend drive
life."

This last statement is a bit ambiguous. If a hard drive is more
reliable at a temperature below that which is recommended, then why
not recommend a lower temperature in the first place? Then again,
maybe the author's intended meaning was "recommended maximum
temperature".

- Franc Zabkar
--
Please remove one 'i' from my address when replying by email.

Arno Wagner

unread,

Apr 16, 2008, 8:20:06 AM4/16/08

to

Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
> I've been reading this document which is an analysis of Google's hard
> disc failure rates:

[...]

If you can keep your HDDs below around 40C or so, then you will
run them under data-center conditions. These conditions is what
the Google study is about. An example from my personal experience
is with Maxtor disks. They had direct outside airflow and stayed
<30C under load and at 22C when idle. No failures in 3 years for
about 50 disks. These were the same Maxtors known to die fast when
run hot (e.g. at 50-60C).

Conditions in a typical PC are different. The HDDs are often
not directly cooled with outside air and can get hot under load.
If you have temperature spikes in the 50C range or higher,
temperature is a major factor in HDD death. How major exactly is
currently unknown or only known to the manufacturers. Most drives
have a 55C stated maximum temperature. The Maxtors I mention above
had a statement in their product manual that up to 60C the drive
failure rate would not increase, despite a 55C maximum temperature.
There is reason to believe that statement was over-optimistic or
a plain lie. So don't expect the HDD manufacturers to tell you
about high-temperature life expectancy.

Bottom line, the Google study shows that if you can get the drives
consitently down to below 40C, temperature does not matter a lot.
So the recomendation would be to have your drives (under load,
on a hot day) below 40C at all times. Note that this also applies
to external enclosures.

Arno

Franc Zabkar

unread,

Apr 16, 2008, 4:11:44 PM4/16/08

to

On 16 Apr 2008 12:20:06 GMT, Arno Wagner <m...@privacy.net> put finger
to keyboard and composed:

>Bottom line, the Google study shows that if you can get the drives
>consitently down to below 40C, temperature does not matter a lot.
>So the recomendation would be to have your drives (under load,
>on a hot day) below 40C at all times. Note that this also applies
>to external enclosures.
>
>Arno

AFAICS, the Google study conclusively shows that failure rates also
increase when temperatures drop below 35C. In fact lower temps appear
to be more dangerous than slightly higher temps, except when the drive
is getting old, in which case higher temps start to become
significant.

Arno Wagner

unread,

Apr 16, 2008, 6:10:18 PM4/16/08

to

Previously Franc Zabkar <fza...@iinternode.on.net> wrote:

> On 16 Apr 2008 12:20:06 GMT, Arno Wagner <m...@privacy.net> put finger
> to keyboard and composed:

>>Bottom line, the Google study shows that if you can get the drives
>>consitently down to below 40C, temperature does not matter a lot.
>>So the recomendation would be to have your drives (under load,
>>on a hot day) below 40C at all times. Note that this also applies
>>to external enclosures.
>>
>>Arno
>
> AFAICS, the Google study conclusively shows that failure rates also
> increase when temperatures drop below 35C. In fact lower temps appear
> to be more dangerous than slightly higher temps, except when the drive
> is getting old, in which case higher temps start to become
> significant.

Don't read too much into it. AFAIR they did not separate by
manufacturer, model and manufactuuring date. It is quite possible that
the drives running at lower temperatures were actually from a batch
that had less life expectancy from the start and stay at lower
temperatures because of different cooling characteristics, i.e. there
may well be a systematic error in the measurements.

Arno

Franc Zabkar

unread,

Apr 17, 2008, 2:09:30 AM4/17/08

to

On 16 Apr 2008 22:10:18 GMT, Arno Wagner <m...@privacy.net> put finger
to keyboard and composed:

The way I read it, the reliability-versus-temperature result was found
to be consistent across all models and manufacturers.

==================================================================
Failure rates are known to be highly correlated with drive models,
manufacturers and vintages. Our results do not contradict this fact.
For example, Figure 2 [Annualized failure rates broken down by age
groups] changes significantly when we normalize failure rates per each
drive model. Most age-related results are impacted by drive vintages.
However, in this paper, we do not show a breakdown of drives per
manufacturer, model, or vintage due to the proprietary nature of these
data.

Interestingly, this does not change our conclusions. In contrast to
age-related results, we note that all results shown in the rest of the
paper are not affected significantly by the population mix.

==================================================================
The data in this study are collected from a large number of disk
drives, deployed in several types of systems across all of Google’s
services. More than one hundred thousand disk drives were used for all
the results presented here. The disks are a combination of serial and
parallel ATA consumer-grade hard disk drives, ranging in speed from
5400 to 7200 rpm, and in size from 80 to 400 GB. All units in this
study were put into production in or after 2001. The population
contains several models from many of the largest disk drive
manufacturers and from at least nine different models.

==================================================================

Arno Wagner

unread,

Apr 17, 2008, 9:22:52 AM4/17/08

to

Indeed. But did they have all models and all manufacturers
at all temperatures?

> ==================================================================
> Failure rates are known to be highly correlated with drive models,
> manufacturers and vintages. Our results do not contradict this fact.
> For example, Figure 2 [Annualized failure rates broken down by age
> groups] changes significantly when we normalize failure rates per each
> drive model. Most age-related results are impacted by drive vintages.
> However, in this paper, we do not show a breakdown of drives per
> manufacturer, model, or vintage due to the proprietary nature of these
> data.

> Interestingly, this does not change our conclusions. In contrast to
> age-related results, we note that all results shown in the rest of the
> paper are not affected significantly by the population mix.

> ==================================================================
> The data in this study are collected from a large number of disk
> drives, deployed in several types of systems across all of Google’s
> services. More than one hundred thousand disk drives were used for all
> the results presented here. The disks are a combination of serial and
> parallel ATA consumer-grade hard disk drives, ranging in speed from
> 5400 to 7200 rpm, and in size from 80 to 400 GB. All units in this
> study were put into production in or after 2001. The population
> contains several models from many of the largest disk drive
> manufacturers and from at least nine different models.

> ==================================================================

Hmm, I have to look at the paper again. This smells rather
strongly of a methodical error.

Ok, I have it now. I think you refer to figure 5: "AFR for average
drove Temperature". This one seems to indicate slightly higher failure
rates for the 15...30C window than for the others in drives younger
than 3 years. If you consult figure 4, you see that temperature
extremes are rare. Then there is one thing: Partially defective drives
work slower or not at all. This may result in lower drive temperatures
(spin down, refusal to execute access) and higher drive temperatures
(lots and lots of retries, heat from bearings). This can
significantly skew the results. The basic results could be that
failing drives run hotter or colder than others. I am also missing
more break-downs into different temperature profiles (e.g. mainly
constant, strong variation, etc..) as it is, e.g., possible thet the
problem in the low temp section is due to cycling temperatures.

I am not saying the results are wrong, but they are suspicuous and
with the data given are _very_ difficult to even understand
properly. It does not seem any statistics expert was consulted by the
writers and the temperature results are by far the weakest in the
paper. I also miss a proof or at least conclusive argument that the
remaining observations are temperature independent, both for absolute
value and different change profiles.

The paper is still very valuable. Figures 7-10 give solid results, and
need no further details. Scanning your disks every 2 weeks or so and
monitoring reallocation counts is a very good idea (and something I
have been doing for several years now). The folks at Google likely
also found that the SMART status alone is typically over-optimistic.

As to many failures not being predicted by SMART data, my results
are different. It is possible that the drive selection here again
skewed the picture compared to modern drives. Personally I have had
100% prediction by SMART attributes (not SMART status though) in
an addmittedly small population of about 50 drives over three
years and with mostly Maxtors that are known to fail gradually.

Arno

Franc Zabkar

unread,

Apr 17, 2008, 4:16:35 PM4/17/08

to

On 17 Apr 2008 13:22:52 GMT, Arno Wagner <m...@privacy.net> put finger
to keyboard and composed:

I would expect that Google would identify a partially defective drive
(assuming it was detected by SMART) and eventually take it out of
service. Certainly, if the drive does not work at all, then by
definition it must be totally, not partially, defective. Having said
that, the article doesn't really give a satisfactory definition of
failure other than to say that it is the reason that a drive is
replaced. <shrug>

As for spin problems, the article states ...

"Spin Retries. Counts the number of retries when the drive is
attempting to spin up. We did not register a single count within our
entire population."

>The basic results could be that
>failing drives run hotter or colder than others. I am also missing
>more break-downs into different temperature profiles (e.g. mainly
>constant, strong variation, etc..) as it is, e.g., possible thet the
>problem in the low temp section is due to cycling temperatures.

The article states ...

"As is common in server-class deployments, the disks were powered on,
spinning, and generally in service for essentially all of their
recorded life. They were deployed in rack-mounted servers and housed
in professionally managed datacenter facilities."

I think that would discount your temperature cycling hypothesis.

>I am not saying the results are wrong, but they are suspicuous and
>with the data given are _very_ difficult to even understand
>properly. It does not seem any statistics expert was consulted by the
>writers and the temperature results are by far the weakest in the
>paper. I also miss a proof or at least conclusive argument that the
>remaining observations are temperature independent, both for absolute
>value and different change profiles.
>
>The paper is still very valuable. Figures 7-10 give solid results, and
>need no further details. Scanning your disks every 2 weeks or so and
>monitoring reallocation counts is a very good idea (and something I
>have been doing for several years now). The folks at Google likely
>also found that the SMART status alone is typically over-optimistic.

>As to many failures not being predicted by SMART data, my results
>are different. It is possible that the drive selection here again
>skewed the picture compared to modern drives. Personally I have had
>100% prediction by SMART attributes (not SMART status though) in
>an addmittedly small population of about 50 drives over three
>years and with mostly Maxtors that are known to fail gradually.
>
>Arno

With respect, I prefer to accept Google's experience.

"It is difficult to add temperature to this analysis since despite it
being reported as part of SMART there are no crisp thresholds that
directly indicate errors. However, if we arbitrarily assume that
spending more than 50% of the observed time above 40C is an indication
of possible problem, and add those drives to the set of predictable
failures, we still are left with about 36% of all drives with no
failure signals at all."

I notice also that Google have an interesting observation regarding
seek errors.

"When examining our population, we find that seek errors are
widespread within drives of one manufacturer only, while others are
more conservative in showing this kind of errors. For this one
manufacturer, the trend in seek errors is not clear, changing from one
vintage to another. For other manufacturers, there is no correlation
between failure rates and seek errors."

I wonder if the abovementioned manufacturer is Seagate. IME, when
Seagate drives report a "seek error rate", they are actually reporting
a seek count.

Arno Wagner

unread,

Apr 17, 2008, 4:56:34 PM4/17/08

to

Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
> On 17 Apr 2008 13:22:52 GMT, Arno Wagner <m...@privacy.net> put finger
> to keyboard and composed:

[...]

>>Ok, I have it now. I think you refer to figure 5: "AFR for average
>>drove Temperature". This one seems to indicate slightly higher failure
>>rates for the 15...30C window than for the others in drives younger
>>than 3 years. If you consult figure 4, you see that temperature
>>extremes are rare. Then there is one thing: Partially defective drives
>>work slower or not at all. This may result in lower drive temperatures
>>(spin down, refusal to execute access) and higher drive temperatures
>>(lots and lots of retries, heat from bearings). This can
>>significantly skew the results.

> I would expect that Google would identify a partially defective drive
> (assuming it was detected by SMART) and eventually take it out of
> service. Certainly, if the drive does not work at all, then by
> definition it must be totally, not partially, defective. Having said
> that, the article doesn't really give a satisfactory definition of
> failure other than to say that it is the reason that a drive is
> replaced. <shrug>

Problem is also that the failure time (according to the article)
was the replacement time. I have heard the chief Google technology
guy speak about this and he stated something like "every few months
defectives are repaired". There can be a long time between
faulyre and replacement.

> As for spin problems, the article states ...

> "Spin Retries. Counts the number of retries when the drive is
> attempting to spin up. We did not register a single count within our
> entire population."

That may just mean that no drive managed to get spun-up
at all after the first try failed. Or the attribute is unused.

>>The basic results could be that
>>failing drives run hotter or colder than others. I am also missing
>>more break-downs into different temperature profiles (e.g. mainly
>>constant, strong variation, etc..) as it is, e.g., possible thet the
>>problem in the low temp section is due to cycling temperatures.

> The article states ...

> "As is common in server-class deployments, the disks were powered on,
> spinning, and generally in service for essentially all of their
> recorded life. They were deployed in rack-mounted servers and housed
> in professionally managed datacenter facilities."

> I think that would discount your temperature cycling hypothesis.

Not at all. The very fact that disks managed to get to high
temperatures means that temperature cycles are possible.

This does not counter my argument. It just states that there are
at least 36% failures that are not temperature related. And it
is, as noted, quite arbitratily. The authors are speculating here
about whether temperature above 40C is the killer when observed more
than 50% of the time. It is not in their environment. This does not
surprise me at all.

Also note that there is no "Googles experience" in the paper.
This is "observations in a specfic environment by three people
with Google" and certainly the observations are not well
documented with regard to temperature. On the other hand, an air
conditioned data center and only two years of observation is not
enough to answer that question conclusively.

> I notice also that Google have an interesting observation regarding
> seek errors.

> "When examining our population, we find that seek errors are
> widespread within drives of one manufacturer only, while others are
> more conservative in showing this kind of errors. For this one
> manufacturer, the trend in seek errors is not clear, changing from one
> vintage to another. For other manufacturers, there is no correlation
> between failure rates and seek errors."

> I wonder if the abovementioned manufacturer is Seagate. IME, when
> Seagate drives report a "seek error rate", they are actually reporting
> a seek count.

Quite frankly this shows that the authors have not a lot of
experience with SMART data. Seek errors are due to modern drives
starting reading before the heads have settled. This usually works,
but when it does not work it becomes a seek error. Some
manufacuters list these in the SMART data, other do not. The
number seen does not mean much, which is well known to people
that work a lot with SMART data.

Arno

Folkert Rienstra

unread,

Apr 17, 2008, 6:51:53 PM4/17/08

to

Franc Zabkar wrote in news:267f045ortncn8j0d...@4ax.com

> On 17 Apr 2008 13:22:52 GMT, Arno Wagner <m...@privacy.net> put finger to keyboard and composed:
> > Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
> > > On 16 Apr 2008 22:10:18 GMT, Arno Wagner m...@privacy.net> put finger to keyboard and composed:
> > > > Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
> > > > > On 16 Apr 2008 12:20:06 GMT, Arno Wagner m...@privacy.net> put finger to keyboard and composed:
> > > >

[awful big snip]

>
> I notice also that Google have an interesting observation regarding
> seek errors.
>
> "When examining our population, we find that seek errors are
> widespread within drives of one manufacturer only, while others are
> more conservative in showing this kind of errors. For this one
> manufacturer, the trend in seek errors is not clear, changing from one
> vintage to another. For other manufacturers, there is no correlation
> between failure rates and seek errors."

> I wonder if the abovementioned manufacturer is Seagate. IME, when
> Seagate drives report a "seek error rate", they are actually reporting
> a seek count.

What else did you think 'rate' meant.

>
> - Franc Zabkar

Folkert Rienstra

unread,

Apr 17, 2008, 6:53:02 PM4/17/08

to

Arno Wagner wrote in news:66prs2F...@mid.individual.net

> Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
> > On 17 Apr 2008 13:22:52 GMT, Arno Wagner <m...@privacy.net> put finger
> > to keyboard and composed:
> [...]

[awful big snip]

>
> Quite frankly this shows that the authors have not a lot of
> experience with SMART data.

> Seek errors are due to modern drives starting reading before the
> heads have settled.

Babblebot, clueless as always.

A seek error is a failure to find the addressed track.
The drive has a full rev. to determine that it is on the correct track.
It won't start to read user data until it has determined that it is
on the right track and in the right rotational position.
Also, there is no such time that the drive is *not* reading as it is
reading the servo data all the time. If the drive determines that
it is on the correct track then obviously the heads have settled.

> This usually works, but when it does not work it becomes a seek error.

Nope, it becomes a read error.

> Some manufacuters list these in the SMART data, other do not.

A seek error is a seek error, and that's that.

> The number seen does not mean much, which is well known to people
> that work a lot with SMART data.

Right, so obviously this should not be mentioned as an observation.
Babblebot, S.M.A.R.T. as ever.

>
> Arno

Folkert Rienstra

unread,

Apr 17, 2008, 6:54:34 PM4/17/08

to

Franc Zabkar wrote in news:gt9b045tqpk3gbj6i...@4ax.com

> I've been reading this document which is an analysis of Google's hard
> disc failure rates:
>
> Failure Trends in a Large Disk Drive Population:
> http://research.google.com/archive/disk_failures.pdf
>
> It states that "contrary to previously reported results, we found very
> little correlation between failure rates and either elevated
> temperature or activity levels."
>
> Figure 4 "shows that failures do not increase when the average
> temperature increases. In fact, there is a clear trend showing that
> lower temperatures are associated with higher failure rates. Only at
> very high temperatures is there a slight reversal of this trend."
>
> "Figure 5 looks at the average temperatures for different age groups.
> The distributions are in sync with Figure 4 showing a mostly flat
> failure rate at mid-range temperatures and a modest increase at the
> low end of the temperature distribution.

> What stands out are the 3 and 4-year old drives, where the trend for
> higher failures with higher temperature is

> much more constant

Presumably they mean the bathtub figures look like copies of each other.

> and also more pronounced."

What I find much more interesting is the trend reversal from 3rd to 4th
year, while maintaining equal relation between AFR and temperature ranges.
Presumably the weaker brothers fall out of the mix and the rest just lives on
happily.

>
> "Overall our experiments can confirm previously reported temperature
> effects only for the high end of our temperature range and especially
> for older drives. In the lower and middle temperature ranges, higher
> temperatures are not associated with higher failure rates."
>
> Figure 5 suggests that Google's optimum temperature for hard drives is
> between 35C and 40C.
>
> Elsewhere I found this old IBM article:
> http://web.archive.org/web/20000519230551/http://www.storage.ibm.com/hardsoft/diskdrdl/technolo/drivetemp/drivetemp.htm
>
> It states that "figure 2 shows the dramatic effect that temperature
> has on the overall

> *reliability*

> of a hard disk drive. Derivations [sic] from a nominal operating
> temperature (assumed to be maintained over the life of a drive)
> can result in a derivation [sic] from the nominal

> failure *rate*.

Hey, there is that favourite word of yours again.

> As the temperature exceeds the recommended level, the
> failure rate increases two to three percent for every one degree rise
> above it. For example, a hard disk drive running for an extended
> period of time at five degrees above the recommended temperature can
> experience an increase in

> failure *rate*

And again.

lars

unread,

Apr 20, 2008, 8:16:28 AM4/20/08

to

In short, time well spend reading.
http://www.pdl.cmu.edu/PDL-FTP/Failure/failure-fast07_abs.html

Franc Zabkar

unread,

Apr 20, 2008, 5:50:24 PM4/20/08

to

On Sun, 20 Apr 2008 14:16:28 +0200, lars <la...@hesdorf.dk> put finger
to keyboard and composed:

>In short, time well spend reading.
>http://www.pdl.cmu.edu/PDL-FTP/Failure/failure-fast07_abs.html

This document appears to be a statistical analysis of HD failures. It
doesn't attempt to delve into the technical reasons for failure. The
only time it discusses temperature, or SMART, is in reference to the
Google article in my OP.

Google's experience suggests to me that temperatures below about 35C
result in greater failure rates, which is contrary to normal
expectations. However, Arno appears to be saying that the lower temps
may be a consequence of failure rather than a cause.

Arno Wagner

unread,

Apr 20, 2008, 6:03:24 PM4/20/08

to

Previously Franc Zabkar <fza...@iinternode.on.net> wrote:

> On Sun, 20 Apr 2008 14:16:28 +0200, lars <la...@hesdorf.dk> put finger
> to keyboard and composed:

>>In short, time well spend reading.
>>http://www.pdl.cmu.edu/PDL-FTP/Failure/failure-fast07_abs.html

> This document appears to be a statistical analysis of HD failures. It
> doesn't attempt to delve into the technical reasons for failure. The
> only time it discusses temperature, or SMART, is in reference to the
> Google article in my OP.

> Google's experience suggests to me that temperatures below about 35C
> result in greater failure rates, which is contrary to normal
> expectations. However, Arno appears to be saying that the lower temps
> may be a consequence of failure rather than a cause.

Exactly. It is possible, but the paper does not give us enough
data to determine whether it is the case. Also it runns contrary
to all known reliability characteristics of semiconductors,
other electronics components and mechnanics.

Arno

Franc Zabkar

unread,

Apr 21, 2008, 1:09:22 AM4/21/08

to

On 20 Apr 2008 22:03:24 GMT, Arno Wagner <m...@privacy.net> put finger
to keyboard and composed:

>Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
>> On Sun, 20 Apr 2008 14:16:28 +0200, lars <la...@hesdorf.dk> put finger
>> to keyboard and composed:
>
>>>In short, time well spend reading.
>>>http://www.pdl.cmu.edu/PDL-FTP/Failure/failure-fast07_abs.html
>
>> This document appears to be a statistical analysis of HD failures. It
>> doesn't attempt to delve into the technical reasons for failure. The
>> only time it discusses temperature, or SMART, is in reference to the
>> Google article in my OP.
>
>> Google's experience suggests to me that temperatures below about 35C
>> result in greater failure rates, which is contrary to normal
>> expectations. However, Arno appears to be saying that the lower temps
>> may be a consequence of failure rather than a cause.
>
>Exactly. It is possible, but the paper does not give us enough
>data to determine whether it is the case. Also it runns contrary
>to all known reliability characteristics of semiconductors,
>other electronics components and mechnanics.
>
>Arno

What about fluid dynamics? Maybe there is an optimal temperature for
the platter lubricant and/or air bearing.

I found this interesting Samsung patent whose inventors claim that
"flying height drops significantly in humid conditions" and that this
can be remedied "by increasing the temperature of the air flowing
between a slider's air bearing surface and the rotating disk surface
it accesses".

Method and Apparatus Reducing Flying Height Drop in a Hard Disk Drive
Under Humid Conditions:
http://tinyurl.com/4s5brl
http://www.freshpatents.com/Method-and-apparatus-reducing-flying-height-drop-in-a-hard-disk-drive-under-humid-conditions-dt20071227ptan20070297085.php

Franc Zabkar

unread,

Apr 21, 2008, 1:40:02 AM4/21/08

to

On 17 Apr 2008 20:56:34 GMT, Arno Wagner <m...@privacy.net> put finger
to keyboard and composed:

>Seek errors are due to modern drives

>starting reading before the heads have settled. This usually works,
>but when it does not work it becomes a seek error. Some
>manufacuters list these in the SMART data, other do not. The
>number seen does not mean much, which is well known to people
>that work a lot with SMART data.
>
>Arno

I demonstrated elsewhere in another NG that in Seagate's case the
"seek error rate" figure is actually a count, not a rate, and it is a
count of the total number of seeks, not seek errors. I did this by
performing a zero fill operation on a 13GB drive and recording the
SMART "seek error rate" parameter before and after.

See ...

http://groups.google.com/group/microsoft.public.windowsxp.hardware/msg/2ac63d875bfaf0d4

... for my results.

Arno Wagner

unread,

Apr 21, 2008, 5:39:54 AM4/21/08

to

Previously Franc Zabkar <fza...@iinternode.on.net> wrote:

> See ...

> http://groups.google.com/group/microsoft.public.windowsxp.hardware/msg/2ac63d875bfaf0d4

> ... for my results.

Ah, yes. Bottom line, the "Seek Error" Attribute is pretty
meaningless, if you do not know the specific drive.

Arno

Arno Wagner

unread,

Apr 21, 2008, 5:48:45 AM4/21/08

to

Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
> On 20 Apr 2008 22:03:24 GMT, Arno Wagner <m...@privacy.net> put finger
> to keyboard and composed:

>>Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
>>> On Sun, 20 Apr 2008 14:16:28 +0200, lars <la...@hesdorf.dk> put finger
>>> to keyboard and composed:
>>
>>>>In short, time well spend reading.
>>>>http://www.pdl.cmu.edu/PDL-FTP/Failure/failure-fast07_abs.html
>>
>>> This document appears to be a statistical analysis of HD failures. It
>>> doesn't attempt to delve into the technical reasons for failure. The
>>> only time it discusses temperature, or SMART, is in reference to the
>>> Google article in my OP.
>>
>>> Google's experience suggests to me that temperatures below about 35C
>>> result in greater failure rates, which is contrary to normal
>>> expectations. However, Arno appears to be saying that the lower temps
>>> may be a consequence of failure rather than a cause.
>>
>>Exactly. It is possible, but the paper does not give us enough
>>data to determine whether it is the case. Also it runns contrary
>>to all known reliability characteristics of semiconductors,
>>other electronics components and mechnanics.
>>
>>Arno

> What about fluid dynamics? Maybe there is an optimal temperature for
> the platter lubricant and/or air bearing.

Possibly. Many drives in the Google study should actually
be pre-fluid bearing, if I remember correctly when they became
mainstream. A part would be FDBs though and maybe there is some
increased vibration effect or the like at lower temperaturers.

Now what would be interesting is SMART status changes for the
drives that dies at lower temperatures, compard to those that
died at other temperatures. Also temperature vs. FDB percentace
would be of interesst and temperature vs. disk age would be too.
Also disk performance in the week before death vs. temperature
would be nice.

> I found this interesting Samsung patent whose inventors claim that
> "flying height drops significantly in humid conditions" and that this
> can be remedied "by increasing the temperature of the air flowing
> between a slider's air bearing surface and the rotating disk surface
> it accesses".

> Method and Apparatus Reducing Flying Height Drop in a Hard Disk Drive
> Under Humid Conditions:
> http://tinyurl.com/4s5brl
> http://www.freshpatents.com/Method-and-apparatus-reducing-flying-height-drop-in-a-hard-disk-drive-under-humid-conditions-dt20071227ptan20070297085.php

Interesting, I will have a look at there references! Not relevant
for data-center operation, however, since humididy is also strictly
regulated in there.

Arno

Franc Zabkar

unread,

Apr 21, 2008, 4:51:07 PM4/21/08

to

On 21 Apr 2008 09:48:45 GMT, Arno Wagner <m...@privacy.net> put finger
to keyboard and composed:

>Previously Franc Zabkar <fza...@iinternode.on.net> wrote:

>> What about fluid dynamics? Maybe there is an optimal temperature for
>> the platter lubricant and/or air bearing.
>
>Possibly. Many drives in the Google study should actually
>be pre-fluid bearing, if I remember correctly when they became
>mainstream. A part would be FDBs though and maybe there is some
>increased vibration effect or the like at lower temperaturers.

When I wrote "fluid dynamics", I was referring to the air flow under
the R/W head, ie the air bearing, not the motor bearing.

>Now what would be interesting is SMART status changes for the
>drives that dies at lower temperatures, compard to those that
>died at other temperatures. Also temperature vs. FDB percentace
>would be of interesst and temperature vs. disk age would be too.
>Also disk performance in the week before death vs. temperature
>would be nice.
>
>> I found this interesting Samsung patent whose inventors claim that
>> "flying height drops significantly in humid conditions" and that this
>> can be remedied "by increasing the temperature of the air flowing
>> between a slider's air bearing surface and the rotating disk surface
>> it accesses".
>
>> Method and Apparatus Reducing Flying Height Drop in a Hard Disk Drive
>> Under Humid Conditions:
>> http://tinyurl.com/4s5brl
>> http://www.freshpatents.com/Method-and-apparatus-reducing-flying-height-drop-in-a-hard-disk-drive-under-humid-conditions-dt20071227ptan20070297085.php
>
>Interesting, I will have a look at there references! Not relevant
>for data-center operation, however, since humididy is also strictly
>regulated in there.
>
>Arno

Static electricity becomes an issue in low humidity environments. I
recall one site where the Control Data hard drive would log a "status
error" whenever the operator touched it. The solution was an
antistatic mat. At other sites I've seen humidifiers used to solve
this kind of problem. I would think that any datacenter with a
humidifier would encounter the issues addressed in the Samsung patent.

Arno Wagner

unread,

Apr 21, 2008, 7:04:10 PM4/21/08

to

Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
> On 21 Apr 2008 09:48:45 GMT, Arno Wagner <m...@privacy.net> put finger
> to keyboard and composed:

>>Previously Franc Zabkar <fza...@iinternode.on.net> wrote:

>>> What about fluid dynamics? Maybe there is an optimal temperature for
>>> the platter lubricant and/or air bearing.
>>
>>Possibly. Many drives in the Google study should actually
>>be pre-fluid bearing, if I remember correctly when they became
>>mainstream. A part would be FDBs though and maybe there is some
>>increased vibration effect or the like at lower temperaturers.

> When I wrote "fluid dynamics", I was referring to the air flow under
> the R/W head, ie the air bearing, not the motor bearing.

Ah. That would be a different type of dynamics, that, while having
some fluid properties, is not fluid dynamics.

I am just saying that very likely all HDDs in the study were
running with similar humidity, and therefore humidity will
not be a factor examined.

Arno

Folkert Rienstra

unread,

Apr 21, 2008, 6:25:29 PM4/21/08

to

Arno Wagner wrote in news:6735naF...@mid.individual.net

> Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
> > On 17 Apr 2008 20:56:34 GMT, Arno Wagner <m...@privacy.net> put finger to keyboard and composed:
>

[snip babbleshit]

> > >
> > > Arno

> > I demonstrated elsewhere in another NG that in Seagate's case the

Good for you, babblebot-2. The world's a whole different place now.

> > "seek error rate" figure is actually a count, not a rate,

Aah, but on other makes it is a rate then. How interesting.
What -in your expert opinion- would the purpose of such a 'rate'
(not a count, mind you) be.

> > and it is a count of the total number of seeks, not seek errors.

Bummer. Have you informed Seagate of their 'error'.

> > I did this by performing a zero fill operation on a 13GB drive and
> > recording the SMART "seek error rate" parameter before and after.
>
> > See ...
>
> > http://groups.google.com/group/microsoft.public.windowsxp.hardware/msg/2ac63d875bfaf0d4
>
> > ... for my results.
>
> Ah, yes. Bottom line, the "Seek Error" Attribute is pretty meaningless,

> if you do not know the specific drive.

Gee Babblebot, maybe there is something to the term "vendor specific", after all.

>
> Arno

Franc Zabkar

unread,

Apr 22, 2008, 2:12:42 AM4/22/08

to

On 21 Apr 2008 23:04:10 GMT, Arno Wagner <m...@privacy.net> put finger
to keyboard and composed:

>Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
>> On 21 Apr 2008 09:48:45 GMT, Arno Wagner <m...@privacy.net> put finger
>> to keyboard and composed:
>
>>>Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
>
>>>> What about fluid dynamics? Maybe there is an optimal temperature for
>>>> the platter lubricant and/or air bearing.
>>>
>>>Possibly. Many drives in the Google study should actually
>>>be pre-fluid bearing, if I remember correctly when they became
>>>mainstream. A part would be FDBs though and maybe there is some
>>>increased vibration effect or the like at lower temperaturers.
>
>> When I wrote "fluid dynamics", I was referring to the air flow under
>> the R/W head, ie the air bearing, not the motor bearing.
>
>Ah. That would be a different type of dynamics, that, while having
>some fluid properties, is not fluid dynamics.

http://en.wikipedia.org/wiki/Fluid_dynamics

"Fluid dynamics is the sub-discipline of fluid mechanics dealing with
fluid flow: fluids (liquids and gases) in motion. It has several
subdisciplines itself, including aerodynamics (the study of gases in
motion) ... Fluid dynamics has a wide range of applications, including
calculating forces and moments on aircraft ..."

... and on flying disc heads?

Arno Wagner

unread,

Apr 22, 2008, 8:46:32 PM4/22/08

to

Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
> On 21 Apr 2008 23:04:10 GMT, Arno Wagner <m...@privacy.net> put finger
> to keyboard and composed:

>>Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
>>> On 21 Apr 2008 09:48:45 GMT, Arno Wagner <m...@privacy.net> put finger
>>> to keyboard and composed:
>>
>>>>Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
>>
>>>>> What about fluid dynamics? Maybe there is an optimal temperature for
>>>>> the platter lubricant and/or air bearing.
>>>>
>>>>Possibly. Many drives in the Google study should actually
>>>>be pre-fluid bearing, if I remember correctly when they became
>>>>mainstream. A part would be FDBs though and maybe there is some
>>>>increased vibration effect or the like at lower temperaturers.
>>
>>> When I wrote "fluid dynamics", I was referring to the air flow under
>>> the R/W head, ie the air bearing, not the motor bearing.
>>
>>Ah. That would be a different type of dynamics, that, while having
>>some fluid properties, is not fluid dynamics.

> http://en.wikipedia.org/wiki/Fluid_dynamics

> "Fluid dynamics is the sub-discipline of fluid mechanics dealing with
> fluid flow: fluids (liquids and gases) in motion. It has several
> subdisciplines itself, including aerodynamics (the study of gases in
> motion) ... Fluid dynamics has a wide range of applications, including
> calculating forces and moments on aircraft ..."

> ... and on flying disc heads?

Oh, ok. Different usage in my physics course (which was in
german), it seems.

Arno

Folkert Rienstra

unread,

Apr 23, 2008, 2:31:01 PM4/23/08

to

Arno Wagner wrote in news:677f78F...@mid.individual.net

And, as we all know, german physics are different from
the rest of the world's physics. That explains it all.
Thanks babblebot, you nailed it, once again.
What should we do without you, eh. Imagine that.

>
> Arno

Franc Zabkar

unread,

Apr 23, 2008, 4:28:13 PM4/23/08

to

On 17 Apr 2008 20:56:34 GMT, Arno Wagner <m...@privacy.net> put finger
to keyboard and composed:

>Seek errors are due to modern drives

>starting reading before the heads have settled. This usually works,
>but when it does not work it becomes a seek error.

I've been searching for references to support your statement but I
haven't had any luck. I know from personal experience that older
drives could be commanded to seek with a positive or negative track
offset. Positioning the head slightly off-track and attempting a read
was commonly done during low level formatting to test the integrity of
the data surface. Any track that failed would be taken out of service
and replaced with a spare. Servo offsets could also be used when
recovering data from marginal sectors.

I notice that some Seagate product manuals specify a lower number for
average seek time during reads as opposed to writes (8.5ms versus
10ms). Track-to-track seeks are also lower (0.8ms versus 1.0ms). I
don't know whether this reflects the operation of the drive's read
ahead cache or whether it supports your claim regarding a "preemptive"
read strategy.

Arno Wagner

unread,

Apr 24, 2008, 2:42:53 AM4/24/08

to

Previously Franc Zabkar <fza...@iinternode.on.net> wrote:

> On 17 Apr 2008 20:56:34 GMT, Arno Wagner <m...@privacy.net> put finger
> to keyboard and composed:

>>Seek errors are due to modern drives
>>starting reading before the heads have settled. This usually works,
>>but when it does not work it becomes a seek error.

> I've been searching for references to support your statement but I
> haven't had any luck. I know from personal experience that older
> drives could be commanded to seek with a positive or negative track
> offset. Positioning the head slightly off-track and attempting a read
> was commonly done during low level formatting to test the integrity of
> the data surface. Any track that failed would be taken out of service
> and replaced with a spare. Servo offsets could also be used when
> recovering data from marginal sectors.

Actually I do not have a good reference myself. It is something
that accumulated when looking into modern HDD technology and
the possibilities of recovering overwritten data. Supporting
data is that for some intermediate HDD generations reads
were faster than writes by a notable margin. I do not remember
exactly whether I found an explicit statement about this behaviour,
or whether it is my conclusion from accumulayed facts. So, strictly
speaking, this may only be a hypothesis, but is is a good one.

> I notice that some Seagate product manuals specify a lower number for
> average seek time during reads as opposed to writes (8.5ms versus
> 10ms). Track-to-track seeks are also lower (0.8ms versus 1.0ms).

Ah, so this is still going on.

> I don't know whether this reflects the operation of the drive's read
> ahead cache or whether it supports your claim regarding a "preemptive"
> read strategy.

Caches/buffers are not involved here. Otherwise they would
certainly also include them for writing, and write-buffers
can hav a lot more impact than read-ahead.

Arno

Franc Zabkar

unread,

Apr 24, 2008, 5:46:12 AM4/24/08

to

On 24 Apr 2008 06:42:53 GMT, Arno Wagner <m...@privacy.net> put finger
to keyboard and composed:

Yes, that makes sense. It seems, then, that your preemptive read
hypothesis is plausible.

Folkert Rienstra

unread,

Apr 24, 2008, 7:42:42 AM4/24/08

to

Franc Zabkar wrote in news:0il0141ha4hojccg9...@4ax.com

Yes it is, just not what he thinks it is. And it's your naming, not his.

>
> - Franc Zabkar

Folkert Rienstra

unread,

Apr 24, 2008, 7:43:19 AM4/24/08

to

Arno Wagner wrote in news:67aofdF...@mid.individual.net

> Previously Franc Zabkar <fza...@iinternode.on.net> wrote:
> > On 17 Apr 2008 20:56:34 GMT, Arno Wagner <m...@privacy.net> put finger
> > to keyboard and composed:
>

[Babbleshit removed]

>
> > I've been searching for references to support your statement but I
> > haven't had any luck. I know from personal experience that older
> > drives could be commanded to seek with a positive or negative track
> > offset. Positioning the head slightly off-track and attempting a read
> > was commonly done during low level formatting to test the integrity of
> > the data surface. Any track that failed would be taken out of service
> > and replaced with a spare. Servo offsets could also be used when
> > recovering data from marginal sectors.

> Actually I do not have a good reference myself.

Of course not. You are Babblebot, you don't need one.

> It is something that accumulated when looking into modern HDD
> technology and the possibilities of recovering overwritten data.
> Supporting data is that for some intermediate HDD generations
> reads were faster than writes by a notable margin. I do not remember
> exactly whether I found an explicit statement about this behaviour,
> or whether it is my conclusion from accumulayed facts. So, strictly
> speaking, this may only be a hypothesis,

> but is is a good one.

No, it's nonsense.

>
> > I notice that some Seagate product manuals specify a lower number for
> > average seek time during reads as opposed to writes (8.5ms versus
> > 10ms). Track-to-track seeks are also lower (0.8ms versus 1.0ms).

> Ah, so this is still going on.

Like the physics changed over time, you babblebot moron.

>
> > I don't know whether this reflects the operation of the drive's read
> > ahead cache or whether it supports your claim regarding a "preemptive"
> > read strategy.
>
> Caches/buffers are not involved here. Otherwise they would
> certainly also include them for writing,

> and write-buffers can hav a lot more impact than read-ahead.

Actually, it would reduce the seektime to zero, zip, nada.

>
> Arno

Folkert Rienstra

unread,

Apr 22, 2008, 6:43:54 PM4/22/08

to

Franc Zabkar wrote in news:jq6v04pghj6ilcu18...@4ax.com

> On 17 Apr 2008 20:56:34 GMT, Arno Wagner <m...@privacy.net> put finger
> to keyboard and composed:
>
> > Seek errors are due to modern drives starting reading before the
> > heads have settled. This usually works, but when it does not work it
> > becomes a seek error.

> I've been searching for references to support your statement but I
> haven't had any luck.

Gee, there's a big surprise.

> I know from personal experience

Personal experience, no less.

> that older drives could be commanded to seek with a positive or negative
> track offset.
> Positioning the head slightly off-track and attempting a read was commonly
> done during low level formatting to test the integrity of the data surface.

Right, and it was you 'personally' that held the heads off track.

> Any track that failed would be taken out of service
> and replaced with a spare. Servo offsets could also be used when
> recovering data from marginal sectors.
>
> I notice that some Seagate product manuals specify a lower number

> for average seek time during reads as opposed to writes (8.5ms ver-
> sus 10ms). Track-to-track seeks are also lower (0.8ms versus 1.0ms).

One reason could be that this is an average and the increase of the ave-
rage is caused by just some writes having to wait another full rev due to
the servo system deciding it's not certain that it is on the correct track
due to the sector being near but not enough track marks on the track,
to have read (just) one, before the sector arrives.
On a read the drive can read the sector and confirm the track number
afterwards and let it go through (or not) depending on the outcome. On
a write that's not possible as that's destructive and it doesn't want the
wrong sector overwritten so it lets the opportunity pass and do another rev.

Nothing to do with the heads "having settled" and poor
reads that make or don't make it through error correction.

> I don't know whether this reflects the operation of the drive's read
> ahead cache or whether it supports your claim regarding a

> "preemptive" read strategy.

Close, but that's not what he said.

>
> - Franc Zabkar