[Rocks-Discuss] Compute node stuck at GRUB>

341 views
Skip to first unread message

Mario Chow

unread,
Jan 28, 2009, 6:11:10 PM1/28/09
to Discussion of Rocks Clusters
Hello there,
I have a couple of compute nodes stuck at GRUB> prompt. I've looked
around and found a page that explains how to set the path to the right
os version to boot from. Unfortunately I don't know the path. Has
anyone run into this before, if yes, can you please tell me how you
solved this issue.

I also tried to boot using the DVD ISO but the node goes back to the
GRUB> prompt. Thanks
-M
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090128/d36551a0/attachment.html

Greg Bruno

unread,
Jan 28, 2009, 6:18:11 PM1/28/09
to Discussion of Rocks Clusters
On Wed, Jan 28, 2009 at 3:11 PM, Mario Chow <mc...@wovensystems.com> wrote:
> Hello there,
> I have a couple of compute nodes stuck at GRUB> prompt. I've looked
> around and found a page that explains how to set the path to the right
> os version to boot from. Unfortunately I don't know the path. Has
> anyone run into this before, if yes, can you please tell me how you
> solved this issue.
>
> I also tried to boot using the DVD ISO but the node goes back to the
> GRUB> prompt. Thanks

in the past, we've seen this behavior on supermicro motherboards. if
you have one, then boot the node into BIOS and disable the floppy.

then, reinstall the node.

- gb

Mario Chow

unread,
Jan 28, 2009, 6:28:15 PM1/28/09
to Discussion of Rocks Clusters
By re-install the node, you mean use pxeboot (insert-ethers) or use the
dvd? Thanks
-M

Greg Bruno

unread,
Jan 28, 2009, 6:49:15 PM1/28/09
to Discussion of Rocks Clusters
On Wed, Jan 28, 2009 at 3:28 PM, Mario Chow <mc...@wovensystems.com> wrote:
> By re-install the node, you mean use pxeboot (insert-ethers) or use the
> dvd? Thanks

PXE.

- gb

Mario Chow

unread,
Jan 28, 2009, 6:59:33 PM1/28/09
to Discussion of Rocks Clusters
I just tried it a couple of times with the floppy drive and controller
disabled to no avail. The nodes keeps going into the grub> prompt after
stating "loading stage 2"

I tried pxe and using the dvd iso.

-M

-----Original Message-----
From: npaci-rocks-dis...@sdsc.edu
[mailto:npaci-rocks-dis...@sdsc.edu] On Behalf Of Greg Bruno
Sent: Wednesday, January 28, 2009 3:49 PM
To: Discussion of Rocks Clusters
Subject: Re: [Rocks-Discuss] Compute node stuck at GRUB>

Greg Bruno

unread,
Jan 28, 2009, 7:06:42 PM1/28/09
to Discussion of Rocks Clusters
On Wed, Jan 28, 2009 at 3:59 PM, Mario Chow <mc...@wovensystems.com> wrote:
> I just tried it a couple of times with the floppy drive and controller
> disabled to no avail. The nodes keeps going into the grub> prompt after
> stating "loading stage 2"
>
> I tried pxe and using the dvd iso.

hard to say what is going on.

it only happens on a couple nodes, right?

if so, then you may want to try updating the BIOS on those nodes.

- gb

Mario Chow

unread,
Jan 28, 2009, 7:32:05 PM1/28/09
to Discussion of Rocks Clusters
Yes. So far I have 4 nodes showing this behavior. I have a total of 30
nodes to install and I'm afraid that I will run into this issue as I go
along. Thanks for the help.
-M

BTW,
I have an old frontend running 4.1. Could I use this old frontend and
just update it rather than installing a new frontend and run into these
headaches?

-----Original Message-----
From: npaci-rocks-dis...@sdsc.edu
[mailto:npaci-rocks-dis...@sdsc.edu] On Behalf Of Greg Bruno
Sent: Wednesday, January 28, 2009 4:07 PM
To: Discussion of Rocks Clusters
Subject: Re: [Rocks-Discuss] Compute node stuck at GRUB>

Malcolm Cowe

unread,
Jan 29, 2009, 5:21:49 AM1/29/09
to Discussion of Rocks Clusters
It looks like the boot order is set to look at the HDD before anything
else, so the system never attempts to read the DVD drive for media.
However, the real problem is that GRUB is not working as expected. I'm
not an expert with GRUB diagnostics but since the software is unable to
locate the files it needs, try typing the following from the "grub>" prompt:

find /boot/grub/stage1
find /boot/grub/stage2

For example:
grub> find /boot/grub/stage1
(hd0,0)
(hd1,0)

If both of the find commands succeed, then it looks like grub was not
installed correctly. You can try the following in an attempt to
re-install grub on the MBR (make sure that the disk device matches your
setup -- the following example assumes a SATA drive on /dev/sda):

grub> device (hd0) /dev/sda
grub> root (hd0,0)
grub> setup (hd0)

Example output:

grub> device (hd0) /dev/sda
device (hd0) /dev/sda
grub> root (hd0,0)
root (hd0,0)
Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
setup (hd0)
Checking if "/boot/grub/stage1" exists... yes
Checking if "/boot/grub/stage2" exists... yes
Checking if "/boot/grub/e2fs_stage1_5" exists... yes
Running "embed /boot/grub/e2fs_stage1_5 (hd0)"... 16 sectors are
embedded. succeeded
Running "install /boot/grub/stage1 (hd0) (hd0)1+16 p
(hd0,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded
Done.

If, however, the find commands fail, then either the grub software has
not been installed onto the HDD at all (unlikely), or grub is unable to
locate the /boot partition on the host. This might be because the BIOS
is not "presenting" the HDDs to grub or because the OS has been
installed on a drive that is not bootable -- that is, not visible to the
BIOS during POST. I've seen issues like this on systems with many HDDs
installed.

It's not much to go on, but it may point you in the right direction. You
definitely need to find a way to boot from a CD/DVD if you can, though.
That way you can boot a rescue environment and start poking around.

Regards,

Malcolm.

-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090129/20f40283/attachment.html

Hamilton, Scott L.

unread,
Jan 29, 2009, 9:43:15 AM1/29/09
to Discussion of Rocks Clusters
I have had this happen to me on three of my 200 nodes. All I had to do
was reinstall them via pxe boot, however, I had to set their pxe action
to install as they were somehow set to OS even though the os install did
not complete successfully.

Either do
$ rock set host pxeboot pxeboot="install" hostname

Or remove the host and reinsert it using
$ rocks remove host hostname
$ insert-ethers --update
$ insert-ethers --rack=racknum --rank=ranknum

Daniel De Marco

unread,
Jan 29, 2009, 10:51:07 AM1/29/09
to Discussion of Rocks Clusters
this is heppening sometimes in my cluster as well. The nodes are Dell
PowerEdge 1950. Rebooting the stuck node after setting its pxeboot to
install is enough, but it is still annoying nonetheless :-)

Daniel.

* Hamilton, Scott L. <hamil...@mst.edu> [01/29/2009 09:57]:

Mario Chow

unread,
Feb 5, 2009, 7:27:33 PM2/5/09
to Discussion of Rocks Clusters
Scott,
Unfortunately, the frontend node has no knowledge of these nodes stuck in "grub>" prompt, so I don't think I can issue the commands you mention in your email.

Malcom,
I typed "find /boot/grub/stage1" and I got a "file not found". I tried to install grub by following your recommendation:

device (hd0) /dev/sda
root (hd0,0)
setup (hd0)

and the command "device" does not exist at the grub prompt.

any other ideas?

thanks
-M

find /boot/grub/stage1
find /boot/grub/stage2

Example output:

Regards,

Malcolm.

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090205/2a5ef9c6/attachment.html

Malcolm Cowe

unread,
Feb 6, 2009, 5:19:54 AM2/6/09
to Discussion of Rocks Clusters
No ideas, only the suggestion that you boot the affected systems from a
recovery disk and carry on the investigation from there. If grub cannot
find the files that it needs, then those files have been installed on a
drive that is not "visible" from the BIOS. In other words, the files
required by GRUB are not installed on a "BIOS drive". This is often a
result of a buggy BIOS or because the drive is on an add-in controller
that does not have a BIOS... The GRUB manual may be of help here (look
for information related to BIOS):

http://www.gnu.org/software/grub/manual/
http://www.gnu.org/software/grub/manual/html_node/Installing-GRUB-natively.html#Installing-GRUB-natively

You may have to create a custom device map or there may be another
problem. For example, the section about the "install" command highlights
some known BIOS problems:

http://www.gnu.org/software/grub/manual/html_node/install.html#install

The manual is full of references to BIOS setup, the ones I mention are
simply examples of issues that the GRUB developers know about. It may or
may not be related to the problems you are experiencing. Without booting
the affected system into a recovery environment and having a look around
(e.g. using fdisk, /proc file system, etc.) it is hard to know for sure
what's gone wrong. One might also simply try to install the OS from a
DVD and see what happens.

Regards,

Malcolm.

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090206/b485ab7c/attachment.html

Hamilton, Scott L.

unread,
Feb 6, 2009, 9:49:35 AM2/6/09
to Discussion of Rocks Clusters
Mario,

You need to make sure that these node are in fact PXE booting. My guess
is that they are trying to but off of the hard-drive and there is a
corrupt installation on them. If your head node has no knowledge of the
nodes, then you need to rerun insert-ethers and pxe boot the nodes.

Scott

Hamilton, Scott L.

unread,
Feb 6, 2009, 9:53:22 AM2/6/09
to Discussion of Rocks Clusters
I just tested, you will get the grub> prompt if you pxeboot a node that is not in the rocks database if insert-ethers is not running. Try running insert-ethers then pxe-boot the node.
Reply all
Reply to author
Forward
0 new messages