error message for sda

Bob Crochelt

unread,

Jun 6, 2022, 8:10:06 PM6/6/22

to

Hi:

Running updated Debian on a pretty old iMac. Seeing messages that complain about sda, only drive in the system:

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x000e 100 253 006 Old_age Always - 0

3 Spin_Up_Time 0x0003 095 093 000 Pre-fail Always - 0

4 Start_Stop_Count 0x0032 093 093 020 Old_age Always - 7818

5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0

7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 521708297

9 Power_On_Hours 0x0032 045 045 000 Old_age Always - 48690

10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0

12 Power_Cycle_Count 0x0032 098 098 020 Old_age Always - 2325

187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0

189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0

190 Airflow_Temperature_Cel 0x0022 045 037 045 Old_age Always FAILING_NOW 55 (255 255 60 26 0)

194 Temperature_Celsius 0x0022 055 063 000 Old_age Always - 55 (0 19 0 0 0)

195 Hardware_ECC_Recovered 0x001a 056 047 000 Old_age Always - 44745763

197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0

198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0

202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0

I think this means its time to replace the hard drive, any other thoughts?

thanks in advance,

If more appropriate for another listh please advise.

Bob Crochelt

David Christensen

unread,

Jun 6, 2022, 10:10:05 PM6/6/22

to

On 6/6/22 17:02, Bob Crochelt wrote:
> Hi:
> Running updated Debian on a pretty old iMac. Seeing messages that complain about sda, only drive in the system:

> I think this means its time to replace the hard drive, any other
thoughts?

Interpreting smartctl output is tough. Try to find manufacturer
documentation that describes the SMART attributes, etc., for that
specific make and model.

The most important SMART report line to look at is:

SMART overall-health self-assessment test result: PASSED

Please run the following command:

# smartctl -t long <dev>

Run the following command until the test is done:

# smartctl -x <dev>'

Please post the complete, final SMART report. There is much more
information than you have posted below.

> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000e 100 253 006 Old_age Always - 0
> 3 Spin_Up_Time 0x0003 095 093 000 Pre-fail Always - 0
> 4 Start_Stop_Count 0x0032 093 093 020 Old_age Always - 7818
> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
> 7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 521708297
> 9 Power_On_Hours 0x0032 045 045 000 Old_age Always - 48690

I have an HDD with Power_On_Hours of 28k+. It works.

> 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
> 12 Power_Cycle_Count 0x0032 098 098 020 Old_age Always - 2325
> 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
> 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
> 190 Airflow_Temperature_Cel 0x0022 045 037 045 Old_age Always FAILING_NOW 55 (255 255 60 26 0)

That looks like your drive is overheating. Check cleanliness (e.g.
dust) and cooling (fans, airflow restriction/ blockage). If those are
good, swap the drive with a known good drive, run the above commands,
and post the results.

> 194 Temperature_Celsius 0x0022 055 063 000 Old_age Always - 55 (0 19 0 0 0)
> 195 Hardware_ECC_Recovered 0x001a 056 047 000 Old_age Always - 44745763

I have drives with worse numbers for Hardware_ECC_Recovered. They work.

> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
> 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
> 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
> 202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0

David

unread,

Jun 6, 2022, 11:30:05 PM6/6/22

to

On Tue, 7 Jun 2022 at 10:03, Bob Crochelt <rf...@fastmail.com> wrote:

> 9 Power_On_Hours 0x0032 045 045 000 Old_age Always - 48690

Assuming the raw value is hours, that's about 5.5 years of power on time.

> 190 Airflow_Temperature_Cel 0x0022 045 037 045 Old_age Always FAILING_NOW 55 (255 255 60 26 0)
> 194 Temperature_Celsius 0x0022 055 063 000 Old_age Always - 55 (0 19 0 0 0)

Assuming the raw value is Celsius, that's somewhat warmer than ideal.
I would be looking at the airflow inside the machine and removing any dust.

> I think this means its time to replace the hard drive, any other thoughts?

That depends on your appetite for risk. I don't see anything else alarming
in the data you provided. But, that's because I am accustomed to having
several layers of backup available, so I can tolerate a high level of risk.

Even when a hard drive is brand new, I make sure to have sufficient backup
copies of important data that I can recover whatever is important to me
if the drive fails.

I have several personal machines and I run my hard drives to failure.
For example, the primary hard drive in this machine started failing self
test, due to read errors, at around 17000 hours. I did some recovery procedure
to remap the problem sectors, and the drive is now at 28000 hours
with no further problems.

David Christensen

unread,

Jun 7, 2022, 12:30:05 AM6/7/22

to

On 6/6/22 20:27, David wrote:
> On Tue, 7 Jun 2022 at 10:03, Bob Crochelt <rf...@fastmail.com> wrote:
>
>> 9 Power_On_Hours 0x0032 045 045 000 Old_age Always - 48690
>
> Assuming the raw value is hours, that's about 5.5 years of power on time.
>
>> 190 Airflow_Temperature_Cel 0x0022 045 037 045 Old_age Always FAILING_NOW 55 (255 255 60 26 0)
>> 194 Temperature_Celsius 0x0022 055 063 000 Old_age Always - 55 (0 19 0 0 0)
>
> Assuming the raw value is Celsius, that's somewhat warmer than ideal.
> I would be looking at the airflow inside the machine and removing any dust.
>
>> I think this means its time to replace the hard drive, any other thoughts?
>
> That depends on your appetite for risk. I don't see anything else alarming
> in the data you provided. But, that's because I am accustomed to having
> several layers of backup available, so I can tolerate a high level of risk.

+1

> Even when a hard drive is brand new, I make sure to have sufficient backup
> copies of important data that I can recover whatever is important to me
> if the drive fails.

+1

I experienced an enterprise HDD fail in under 1 month. But, it was a
factory sealed, new, previous generation model that likely sat on the
shelf for a few years.

I have read, and have been told by knowledgeable disk drive manufacturer
engineers, that a HDD sitting on a shelf for long periods is more likely
to fail than a HDD idling for the same period. To keep a drive working,
keep it powered and spinning.

I am starting think that some combination of manufacturing date,
Power_On_Hours, Total_LBAs_Written, and Total_LBAs_Read might predict
eminent failure.

> I have several personal machines and I run my hard drives to failure.
> For example, the primary hard drive in this machine started failing self
> test, due to read errors, at around 17000 hours. I did some recovery procedure
> to remap the problem sectors, and the drive is now at 28000 hours
> with no further problems.

+1

David

Felix Miata

unread,

Jun 7, 2022, 1:20:05 AM6/7/22

to

Bob Crochelt composed on 2022-06-06 17:02 (UTC-0700):

> Running updated Debian on a pretty old iMac. Seeing messages that complain about sda, only drive in the system:

> 190 Airflow_Temperature_Cel 0x0022 045 037 045 Old_age Always FAILING_NOW 55 (255 255 60 26 0)
> 194 Temperature_Celsius 0x0022 055 063 000 Old_age Always - 55 (0 19 0 0 0)

Old iMacs are hot running. Whether 55 is actually too hot for the drive
really IMO should be checked with its manufacturer.

# pinxi -Ma
Machine:
Type: Desktop System: Apple product: iMac7,1 v: 1.0 serial: QP7440XDX89
Chassis: type: 13 v: Mac-F42386C8 serial: QP7440XDX89
Mobo: Apple model: Mac-F42386C8 v: PVT serial: 1 UEFI: Apple
v: IM71.88Z.007A.B03.0803051705 date: 03/05/08
# inxi -D
Drives:
Local Storage: total: 931.51 GiB used: 69.24 GiB (7.4%)
ID-1: /dev/sda vendor: Seagate model: ST1000DM003-1SB10C size: 931.51 GiB
# smartctl -x /dev/sda | grep flow
190 Airflow_Temperature_Cel -O---K 069 043 040 - 31 (Min/Max 25/31)
...
# smartctl -x /dev/sda | grep e_Cel
190 Airflow_Temperature_Cel -O---K 046 043 040 - 54 (Min/Max 25/54)
194 Temperature_Celsius -O---K 054 021 000 - 54 (0 21 0 0 0)
# uptime
00:51:35 up 0:50, 2 users, load average: 0.07, 0.02, 0.00
# smartctl -x /dev/sda | grep emp
190 Airflow_Temperature_Cel -O---K 045 043 040 - 55 (Min/Max 25/55)
194 Temperature_Celsius -O---K 055 021 000 - 55 (0 21 0 0 0)
Current Temperature: 55 Celsius
Power Cycle Min/Max Temperature: 26/55 Celsius
Lifetime Min/Max Temperature: 22/57 Celsius
Under/Over Temperature Limit Count: 0/0
0x03 0x028 4 0 --- Read Recovery Attempts
0x05 ===== = = === == Temperature Statistics (rev 1) ==
0x05 0x008 1 55 --- Current Temperature
0x05 0x010 1 46 --- Average Short Term Temperature
0x05 0x018 1 48 --- Average Long Term Temperature
0x05 0x020 1 57 --- Highest Temperature
0x05 0x028 1 26 --- Lowest Temperature
0x05 0x030 1 55 --- Highest Average Short Term Temperature
0x05 0x038 1 43 --- Lowest Average Short Term Temperature
0x05 0x040 1 50 --- Highest Average Long Term Temperature
0x05 0x048 1 47 --- Lowest Average Long Term Temperature
0x05 0x050 4 10240 --- Time in Over-Temperature
0x05 0x058 1 55 --- Specified Maximum Operating Temperature
0x05 0x060 4 0 --- Time in Under-Temperature
0x05 0x068 1 13 --- Specified Minimum Operating Temperature

# smartctl -x /dev/sda | grep e_Cel
190 Airflow_Temperature_Cel -O---K 044 043 040 - 56 (Min/Max 25/56)
194 Temperature_Celsius -O---K 056 021 000 - 56 (0 21 0 0 0)
# uptime
01:01:58 up 1:00, 2 users, load average: 0.02, 0.04, 0.00
# uptime
01:15:10 up 1:13, 2 users, load average: 0.00, 0.01, 0.00
# smartctl -x /dev/sda | grep e_Cel
190 Airflow_Temperature_Cel -O---K 043 043 040 - 57 (Min/Max 25/57)
194 Temperature_Celsius -O---K 057 021 000 - 57 (0 21 0 0 0)

10240 "Time in Over-Temperature" on this iMac is probably from its condition
when acquired. Its OEM HD was dead, and its cooling system was clogged.
--
Evolution as taught in public schools is, like religion,
based on faith, not based on science.

Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata

David Christensen

unread,

Jun 7, 2022, 2:00:05 AM6/7/22

to

On 6/6/22 22:17, Felix Miata wrote:
> Bob Crochelt composed on 2022-06-06 17:02 (UTC-0700):
>
>> Running updated Debian on a pretty old iMac. Seeing messages that complain about sda, only drive in the system:
>
>> 190 Airflow_Temperature_Cel 0x0022 045 037 045 Old_age Always FAILING_NOW 55 (255 255 60 26 0)
>> 194 Temperature_Celsius 0x0022 055 063 000 Old_age Always - 55 (0 19 0 0 0)
>
> Old iMacs are hot running.

Yikes!

> Whether 55 is actually too hot for the drive really IMO should be checked with its manufacturer.

+1

Some of my drives report "Min/Max Temperature Limit":

HGST -40/70 Celsius
Maxtor 0/71 Celsius
Samsung 0/70 Celsius
Seagate 10/60 Celsius
Toshiba 10/60 Celsius
Western Digital -40/70 Celsius

Does anyone know the SMART definition for "Min/Max Temperature Limit"?
Notably -- is it operational temperature limits, or non-operational limits?

David