Kernel module

218 views
Skip to first unread message

Jonathan

unread,
Dec 17, 2020, 1:11:59 PM12/17/20
to Warewulf

I have 24 12-core nodes with nvidia chipsets that require kmod-forcedeth to get their embedded network interface working.  I did the yum --installroot=$CHROOT command to install this kernel module into the VNFS.  I also did the wwvnfs command, and also the wwbootstrap and the wwsh provision command.  And yet, when I go to boot a 12-core node, it still halts with a "network hardware not recognized" error.

Any ideas on what I did wrong?

Ryan Novosielski

unread,
Dec 17, 2020, 2:23:21 PM12/17/20
to Warewulf
Kernel modules come, in most cases, from the bootstrap. Make sure that the appropriate kernel module is correctly referred to in /etc/warewulf/bootstrap.conf. If the module isn’t included there, regenerating the bootstrap will not help.

--
#BlackLivesMatter
____
|| \\UTGERS, |---------------------------*O*---------------------------
||_// the State | Ryan Novosielski - novo...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
|| \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark
`'

> On Dec 17, 2020, at 1:11 PM, Jonathan <j...@newtrumpet.org> wrote:
>
>
> I have 24 12-core nodes with nvidia chipsets that require kmod-forcedeth to get their embedded network interface working. I did the yum --installroot=$CHROOT command to install this kernel module into the VNFS. I also did the wwvnfs command, and also the wwbootstrap and the wwsh provision command. And yet, when I go to boot a 12-core node, it still halts with a "network hardware not recognized" error.
>
> Any ideas on what I did wrong?
>
> --
> You received this message because you are subscribed to the Google Groups "Warewulf" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to warewulf+u...@lbl.gov.
> To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/warewulf/eed623fa-7eb5-4062-aa66-02f03ae089ean%40lbl.gov.

Jonathan

unread,
Dec 17, 2020, 4:03:33 PM12/17/20
to Warewulf, Ryan Novosielski
Bingo.  Thank you Ryan.  I added a line in /etc/warewulf/bootstrap.conf like this:

modprobe +- kmod-forcedeth

and re-ran the wwbootstrap command (without errors).  I'm at home and my cluster is at work, so I haven't tried it out yet.  Did I add the correct line?

Ryan Novosielski

unread,
Dec 17, 2020, 6:38:06 PM12/17/20
to ware...@lbl.gov
Nope; the name of the module is actually “forcedeth” (I had a look: you can see if you check the package contents; what modprobe will use is the name of the actual module.ko).

Whether you need to do “modprobe” or just “drivers” — you can see both parts are present in the bootstrap.conf — depends on how early the driver needs to be loaded, whether or not the boot process autodetects it, etc. I generally try to do “drivers” unless it doesn’t work. You don’t always need to do this BTW — GPFS and the NVIDIA modules are a couple of examples that are content to load later/don’t need to be in the bootstrap, but if you’re booting via this network card, it probably does.

--
#BlackLivesMatter
____
|| \\UTGERS, |---------------------------*O*---------------------------
||_// the State | Ryan Novosielski - novo...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
|| \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark
`'

> To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/warewulf/db3c5649-f44b-4620-a1bc-8072015b4de8n%40lbl.gov.

Jonathan

unread,
Dec 18, 2020, 8:32:41 AM12/18/20
to Warewulf, Ryan Novosielski
Thanks again Ryan.  OK.  Since these nodes are indeed booting off that interface, I added a drivers += forcedeth and deleted my modprobe line.  I expect this to work!

Jonathan

unread,
Dec 18, 2020, 12:44:08 PM12/18/20
to Warewulf, Jonathan, Ryan Novosielski
Damnation!  My 12 core nodes still won't boot.  Do you recommend putting a drivers line and a modprobe line?

Jonathan

unread,
Feb 24, 2021, 10:15:47 AM2/24/21
to Warewulf
I finally went into work to scope out this inability to boot and...  It's getting farther than it did before!  Ryan, your recommendation of editing the /etc/warewulf/boostrap.conf file fixed the stage 1 boot.  It does boot up the VNFS but then fails with the same error (unknown network interface).  I verified that I had indeed installed the forcedeth module into the VNFS with a yum list command. 

Does anybody see what I'm doing wrong?

Jason Stover

unread,
Feb 24, 2021, 5:03:16 PM2/24/21
to ware...@lbl.gov
First check .... Is the module actually included? On the node run
something like:

find /lib/modules -name "*forcedeth*"

And just to confirm, you rebuilt the bootstrap after modifying
bootstrap.conf ? I *think* the command is something like:

wwsh bootstrap rebuild (or resync)

which will rebuild all bootstraps. Don't have my provisioner available
to me to verify. :/

-J

On Wed, Feb 24, 2021 at 9:15 AM Jonathan <j...@newtrumpet.org> wrote:
>
> I finally went into work to scope out this inability to boot and... It's getting farther than it did before! Ryan, your recommendation of editing the /etc/warewulf/boostrap.conf file fixed the stage 1 boot. It does boot up the VNFS but then fails with the same error (unknown network interface). I verified that I had indeed installed the forcedeth module into the VNFS with a yum list command.
>
> Does anybody see what I'm doing wrong?
>
> --
> You received this message because you are subscribed to the Google Groups "Warewulf" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to warewulf+u...@lbl.gov.
> To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/warewulf/9760e108-4313-49ca-9b5a-e6bc1538e08bn%40lbl.gov.
Message has been deleted

Jonathan

unread,
Mar 6, 2021, 12:56:57 PM3/6/21
to Warewulf, jason....@gmail.com
At Jason's instigation, I tried a manual install of the kernel module.  Jason, my email to you Feb 25th apparently went into the bit bucket.

I logged onto a bootable 48-core node to discover (through lsmod) that there was no forcedeth module.  I was able to use insmod to install the latest/greatest module the find command found earlier, and that command executed without error.  And lsmod now shows the module installed on the node where it isn't needed.

How do I install this module early in the boot sequence on a Scientific Linux 7.something provisioning node?
Reply all
Reply to author
Forward
Message has been deleted
0 new messages