Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

10.1 NVMe kernel panic

72 views
Skip to first unread message

Sean Kelly

unread,
May 21, 2015, 11:34:26 AM5/21/15
to freebsd...@freebsd.org
Greetings.

I have a Dell R630 server with four of Dell’s 800GB NVMe SSDs running FreeBSD 10.1-p10. According to the PCI vendor, they are some sort of rebranded Samsung drive. If I boot the system and then load nvme.ko and nvd.ko from a command line, the drives show up okay. If I put
nvme_load=“YES”
nvd_load=“YES”
in /boot/loader.conf, the box panics on boot:
panic: nexus_setup_intr: NULL irq resource!

If I boot the system with “Safe Mode: ON” from the loader menu, it also boots successfully and the drives show up.

You can see a full ‘boot -v’ here:
http://smkelly.org/stuff/nvme-panic.txt <http://smkelly.org/stuff/nvme-panic.txt>

Anyone have any insight into what the issue may be here? Ideally I need to get this working in the next few days or return this thing to Dell.

Thanks!

--
Sean Kelly
smk...@smkelly.org
http://smkelly.org

_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"

Jim Harris

unread,
Jun 2, 2015, 7:10:36 PM6/2/15
to Sean Kelly, FreeBSD-STABLE Mailing List
On Thu, May 21, 2015 at 8:33 AM, Sean Kelly <smk...@smkelly.org> wrote:

> Greetings.
>
> I have a Dell R630 server with four of Dell’s 800GB NVMe SSDs running
> FreeBSD 10.1-p10. According to the PCI vendor, they are some sort of
> rebranded Samsung drive. If I boot the system and then load nvme.ko and
> nvd.ko from a command line, the drives show up okay. If I put
> nvme_load=“YES”
> nvd_load=“YES”
> in /boot/loader.conf, the box panics on boot:
> panic: nexus_setup_intr: NULL irq resource!
>
> If I boot the system with “Safe Mode: ON” from the loader menu, it also
> boots successfully and the drives show up.
>
> You can see a full ‘boot -v’ here:
> http://smkelly.org/stuff/nvme-panic.txt <
> http://smkelly.org/stuff/nvme-panic.txt>
>
> Anyone have any insight into what the issue may be here? Ideally I need to
> get this working in the next few days or return this thing to Dell.
>

Hi Sean,

Can you try adding hw.nvme.force_intx=1 to /boot/loader.conf?

I suspect you are able to load the drivers successfully after boot because
interrupt assignments are not restricted to CPU0 at that point - see
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321 for a related
issue. Your logs clearly show that vectors were allocated for the first 2
NVMe SSDs, but the third could not get its full allocation. There is a bug
in the INTx fallback code that needs to be fixed - you do not hit this bug
when loading after boot because bug #199321 only affects interrupt
allocation during boot.

If the force_intx test works, would you able to upgrade your nvme drivers
to the latest on stable/10? There are several patches (one related to
interrupt vector allocation) that have been pushed to stable/10 since 10.1
was released, and I will be pushing another patch for the issue you have
reported shortly.

Thanks,

-Jim

Sean Kelly

unread,
Jun 2, 2015, 11:07:26 PM6/2/15
to Jim Harris, FreeBSD-STABLE Mailing List
Jim,

Thanks for the reply. I set hw.nvme.force_intx=1 and get a new form of kernel panic:
http://smkelly.org/stuff/nvme_crash_force_intx.txt <http://smkelly.org/stuff/nvme_crash_force_intx.txt>

It looks like the NVMes are just failing to initialize at all now. As long as that tunable is in the kenv, I get this behavior. If I kldload them after boot, the init fails as well. But if I kldunload, kenv -u, kldload, it then works again. The only difference is kldload doesn’t result in a panic, just timeouts initializing them all.

I also compiled and tried stable/10 and it crashed in a similar way, but i’ve not captured the panic yet. It crashes even without the tunable in place. I’ll see if I can capture it.
> On Jun 2, 2015, at 6:10 PM, Jim Harris <jim.h...@gmail.com> wrote:
>
>
>
> On Thu, May 21, 2015 at 8:33 AM, Sean Kelly <smk...@smkelly.org <mailto:smk...@smkelly.org>> wrote:
> Greetings.
>
> I have a Dell R630 server with four of Dell’s 800GB NVMe SSDs running FreeBSD 10.1-p10. According to the PCI vendor, they are some sort of rebranded Samsung drive. If I boot the system and then load nvme.ko and nvd.ko from a command line, the drives show up okay. If I put
> nvme_load=“YES”
> nvd_load=“YES”
> in /boot/loader.conf, the box panics on boot:
> panic: nexus_setup_intr: NULL irq resource!
>
> If I boot the system with “Safe Mode: ON” from the loader menu, it also boots successfully and the drives show up.
>
> You can see a full ‘boot -v’ here:
> http://smkelly.org/stuff/nvme-panic.txt <http://smkelly.org/stuff/nvme-panic.txt> <http://smkelly.org/stuff/nvme-panic.txt <http://smkelly.org/stuff/nvme-panic.txt>>
>
> Anyone have any insight into what the issue may be here? Ideally I need to get this working in the next few days or return this thing to Dell.
>
> Hi Sean,
>
> Can you try adding hw.nvme.force_intx=1 to /boot/loader.conf?
>
> I suspect you are able to load the drivers successfully after boot because interrupt assignments are not restricted to CPU0 at that point - see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321 <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321> for a related issue. Your logs clearly show that vectors were allocated for the first 2 NVMe SSDs, but the third could not get its full allocation. There is a bug in the INTx fallback code that needs to be fixed - you do not hit this bug when loading after boot because bug #199321 only affects interrupt allocation during boot.
>
> If the force_intx test works, would you able to upgrade your nvme drivers to the latest on stable/10? There are several patches (one related to interrupt vector allocation) that have been pushed to stable/10 since 10.1 was released, and I will be pushing another patch for the issue you have reported shortly.
>
> Thanks,
>
> -Jim
>
>
>
>
> Thanks!
>
> --
> Sean Kelly
> smk...@smkelly.org <mailto:smk...@smkelly.org>
> http://smkelly.org <http://smkelly.org/>
>
> _______________________________________________
> freebsd...@freebsd.org <mailto:freebsd...@freebsd.org> mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>
> To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org <mailto:freebsd-stabl...@freebsd.org>"
0 new messages