Interesting. I try to run smartd anywhere it's supported and have
appended the last few entries before things went sideways; 60° C/140° F
is a bit toasty!
This system is a couple of years old, might be time to blow the dust out
with compressed air and see if the bios has more aggressive fan settings.
Is the Raw_Read_Error_Rate changed a problem?
(Thanks!)
Craig
May 3 13:59:22 tiny smartd[770]: Device: /dev/ada0, SMART Usage
Attribute: 190 Airflow_Temperature_Cel changed from 59 to 60
May 3 13:59:22 tiny smartd[770]: Device: /dev/ada0, SMART Usage
Attribute: 194 Temperature_Celsius changed from 41 to 40
May 3 14:59:23 tiny smartd[770]: Device: /dev/ada0, SMART Usage
Attribute: 190 Airflow_Temperature_Cel changed from 60 to 58
May 3 14:59:23 tiny smartd[770]: Device: /dev/ada0, SMART Usage
Attribute: 194 Temperature_Celsius changed from 40 to 42
May 3 17:29:23 tiny smartd[770]: Device: /dev/ada0, SMART Prefailure
Attribute: 1 Raw_Read_Error_Rate changed from 75 to 76
> On 5/3/2018 9:07 PM, Warner Losh wrote:
> > Async events are 'something went wrong' messages. Log page 2 is the
> > smart log page.
> >
> > what does 'nvmecontrol logpage -p 2 nvme0' tell you right after this
> > happens. My guess is that it's overheating.
>
> Interesting. I try to run smartd anywhere it's supported and have
> appended the last few entries before things went sideways; 60° C/140° F
> is a bit toasty!
>
> This system is a couple of years old, might be time to blow the dust out
> with compressed air and see if the bios has more aggressive fan settings.
>
> Is the Raw_Read_Error_Rate changed a problem?
>
> (Thanks!)
>
> Craig
>
> May 3 13:59:22 tiny smartd[770]: Device: /dev/ada0, SMART Usage
> Attribute: 190 Airflow_Temperature_Cel changed from 59 to 60
> May 3 13:59:22 tiny smartd[770]: Device: /dev/ada0, SMART Usage
> Attribute: 194 Temperature_Celsius changed from 41 to 40
> May 3 14:59:23 tiny smartd[770]: Device: /dev/ada0, SMART Usage
> Attribute: 190 Airflow_Temperature_Cel changed from 60 to 58
> May 3 14:59:23 tiny smartd[770]: Device: /dev/ada0, SMART Usage
> Attribute: 194 Temperature_Celsius changed from 40 to 42
> May 3 17:29:23 tiny smartd[770]: Device: /dev/ada0, SMART Prefailure
> Attribute: 1 Raw_Read_Error_Rate changed from 75 to 76
>
Things are getting hot, and there was a recoverable error (since you didn't
report a read error, though you could also check page 1 for any errors).
Chances are the controller shut down completely (though from just a few
data points you've given aren't enough for me to be sure).
Warner