[Rocks-Discuss] several nodes fail to kickstart after power failure - stuck in GRUB

632 views
Skip to first unread message

Craig Plaisance

unread,
Jan 20, 2010, 6:28:12 PM1/20/10
to npaci-rocks...@sdsc.edu
About 25% of our nodes are failing to kickstart after suffering a power
failure. It goes through the DHCP ..... part of the bootup, then goes
through a bunch of stuff before saying:

TFTP prefix:
Trying to load: pxelinux.cfg/01-00-24-e8-2f-fa-11
Trying to load: pxelinux.cfg/0A01FFF4
Booting from local disk...
PXE-M0F: Exiting Broadcom PXE ROM.
GRUB Loading Stage 2

and then hangs. Seems like it is trying to boot from the local disk
instead of kickstarting. Tried deleting all partitions on the hard
drive and creating a new one but that didn't change anything. When
typing "rocks list host" all of the nodes say something like:

HOST MEMBERSHIP CPUS RACK RANK RUNACTION INSTALLACTION
compute-1-29: Compute 8 1 29 os install

even for the nodes that need to kickstart. Tried to change the
RUNACTION to install using "rocks set host boot compute-1-29
action=install" but didn't change the runaction.

On some of the other down nodes, the bootup goes as before but then
continues past "GRUB Loading Stage 2" and then continues to a grub prompt

Any ideas? Thanks

Stoner, Richard

unread,
Jan 20, 2010, 6:39:29 PM1/20/10
to Discussion of Rocks Clusters
I ran into the same problem on one of our nodes (rocks 5.3 cluster): stuck at Grub loading stage 2

I believe our issue was caused by accidentally running '/boot/kickstart/cluster-kickstart' rather than '/boot/kickstart/cluster-kickstart-pxe'

I am unsure what the root cause acutally is, but I was able to recover the node by:

Removing the compute node from the frontend (rocks remove host compute-0-2)

Sync the config

Start insert-ethers on the frontend and manually pxe boot/kickstart the bad node.


Good luck.

Rich Stoner

Anoop Rajendra

unread,
Jan 20, 2010, 6:53:29 PM1/20/10
to Discussion of Rocks Clusters
For all the nodes that are failing to boot,

run the command,

# rocks set host boot <hostname> action=install

Then reboot the nodes.

-a

Ross Ishida

unread,
Jan 20, 2010, 6:56:14 PM1/20/10
to Discussion of Rocks Clusters
Hi Craig,

I've run into this issue before, and my solution was to force those nodes to
Boot from the network. Not sure that is the problem with your nodes. To
force
Them to boot from the network, you have to edit the pxelinux.cfg file
corresponding to the node you are rebuilding. Those files are located in
/tftpboot/pxelinux/pxelinux.cfg
The corresponding filename will be the hex equivalent of the ip address of
the node.
You can either delete the file or you can copy in the default from there.

Hope this helps.

Ross

Ross Ishida
ish...@soest.hawaii.edu

Philip Papadopoulos

unread,
Jan 20, 2010, 7:08:52 PM1/20/10
to Discussion of Rocks Clusters
# rocks set host boot action=install <hostname>

Then reboot your node. Right now your node is set to boot from the local
OS.
Eg.
# rocks list host boot <hostname>
almost certainly says os.

-P


--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100120/32cb77e3/attachment.html

Reply all
Reply to author
Forward
0 new messages