nvme0: async event occurred (log page id=0x2)

Craig Leres

unread,

May 4, 2018, 12:01:07 AM5/4/18

to

I have an intel nuc (NUC6i3SYH) that ran 10.3-RELEASE until a few weeks
ago and now 11.1-RELEASE. The system disk is an intel 600p M.2 SSD and
there is also a 2TB seagate laptop drive (ST2000LM007).

Occasionally the system SSD will go to sleep. It happened today with
this on the console:

nvme0: async event occurred (log page id=0x2)
nvme0: resetting controller
nvme0: nvme_ctrlr_wait_for_ready called with desired_val = 0 but
cc.en = 1

Later it would occasionally print out:

swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1509, size: 12200

There was an app playing music from the 2TB drive that was still working
when I reset the box. But no i/o was occurring with with the M.2 SSD.

I see PR 209571 might be related (same async event log anyway at least).

Does anyone have suggestions for me?

Craig
_______________________________________________
freebsd...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hacke...@freebsd.org"

Warner Losh

unread,

May 4, 2018, 12:11:14 AM5/4/18

to

On Thu, May 3, 2018 at 9:56 PM, Craig Leres <le...@freebsd.org> wrote:

> I have an intel nuc (NUC6i3SYH) that ran 10.3-RELEASE until a few weeks
> ago and now 11.1-RELEASE. The system disk is an intel 600p M.2 SSD and
> there is also a 2TB seagate laptop drive (ST2000LM007).
>
> Occasionally the system SSD will go to sleep. It happened today with
> this on the console:
>
> nvme0: async event occurred (log page id=0x2)
> nvme0: resetting controller
> nvme0: nvme_ctrlr_wait_for_ready called with desired_val = 0 but
> cc.en = 1
>
> Later it would occasionally print out:
>
> swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1509, size:
> 12200
>
> There was an app playing music from the 2TB drive that was still working
> when I reset the box. But no i/o was occurring with with the M.2 SSD.
>
> I see PR 209571 might be related (same async event log anyway at least).
>

> Does anyone have suggestions for me?
>

Async events are 'something went wrong' messages. Log page 2 is the smart
log page.

what does 'nvmecontrol logpage -p 2 nvme0' tell you right after this
happens. My guess is that it's overheating.

Warner

Craig Leres

unread,

May 4, 2018, 12:32:24 AM5/4/18

to

On 5/3/2018 9:07 PM, Warner Losh wrote:
> Async events are 'something went wrong' messages. Log page 2 is the
> smart log page.
>
> what does 'nvmecontrol logpage -p 2 nvme0' tell you right after this
> happens. My guess is that it's overheating.

Interesting. I try to run smartd anywhere it's supported and have
appended the last few entries before things went sideways; 60° C/140° F
is a bit toasty!

This system is a couple of years old, might be time to blow the dust out
with compressed air and see if the bios has more aggressive fan settings.

Is the Raw_Read_Error_Rate changed a problem?

(Thanks!)

Craig

May 3 13:59:22 tiny smartd[770]: Device: /dev/ada0, SMART Usage
Attribute: 190 Airflow_Temperature_Cel changed from 59 to 60
May 3 13:59:22 tiny smartd[770]: Device: /dev/ada0, SMART Usage
Attribute: 194 Temperature_Celsius changed from 41 to 40
May 3 14:59:23 tiny smartd[770]: Device: /dev/ada0, SMART Usage
Attribute: 190 Airflow_Temperature_Cel changed from 60 to 58
May 3 14:59:23 tiny smartd[770]: Device: /dev/ada0, SMART Usage
Attribute: 194 Temperature_Celsius changed from 40 to 42
May 3 17:29:23 tiny smartd[770]: Device: /dev/ada0, SMART Prefailure
Attribute: 1 Raw_Read_Error_Rate changed from 75 to 76

Warner Losh

unread,

May 4, 2018, 12:37:43 AM5/4/18

to

On Thu, May 3, 2018 at 10:28 PM, Craig Leres <le...@freebsd.org> wrote:

> On 5/3/2018 9:07 PM, Warner Losh wrote:
> > Async events are 'something went wrong' messages. Log page 2 is the
> > smart log page.
> >
> > what does 'nvmecontrol logpage -p 2 nvme0' tell you right after this
> > happens. My guess is that it's overheating.
>
> Interesting. I try to run smartd anywhere it's supported and have
> appended the last few entries before things went sideways; 60° C/140° F
> is a bit toasty!
>
> This system is a couple of years old, might be time to blow the dust out
> with compressed air and see if the bios has more aggressive fan settings.
>
> Is the Raw_Read_Error_Rate changed a problem?
>
> (Thanks!)
>
> Craig
>
> May 3 13:59:22 tiny smartd[770]: Device: /dev/ada0, SMART Usage
> Attribute: 190 Airflow_Temperature_Cel changed from 59 to 60
> May 3 13:59:22 tiny smartd[770]: Device: /dev/ada0, SMART Usage
> Attribute: 194 Temperature_Celsius changed from 41 to 40
> May 3 14:59:23 tiny smartd[770]: Device: /dev/ada0, SMART Usage
> Attribute: 190 Airflow_Temperature_Cel changed from 60 to 58
> May 3 14:59:23 tiny smartd[770]: Device: /dev/ada0, SMART Usage
> Attribute: 194 Temperature_Celsius changed from 40 to 42
> May 3 17:29:23 tiny smartd[770]: Device: /dev/ada0, SMART Prefailure
> Attribute: 1 Raw_Read_Error_Rate changed from 75 to 76
>

Things are getting hot, and there was a recoverable error (since you didn't
report a read error, though you could also check page 1 for any errors).
Chances are the controller shut down completely (though from just a few
data points you've given aren't enough for me to be sure).

Warner