[Rocks-Discuss] Compute node not working

864 views
Skip to first unread message

Funlola Ogunleye

unread,
Mar 28, 2012, 12:47:20 PM3/28/12
to npaci-rocks...@sdsc.edu
I am running ROCKS 5.4.3 on a frontend and two compute nodes compute-0-0
and compute-0-1. When i powered on both of my compute nodes, the error
message appears:
GRUB Loading stage1.5.

GRUB loading, please wait...
Error 15

What can I do to fix this issue?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20120328/87e7c14c/attachment.html

Philip Papadopoulos

unread,
Mar 28, 2012, 1:41:09 PM3/28/12
to Discussion of Rocks Clusters
assuming your nodes are set to boot from the network first (BIOS boot order)

# rocks set host boot action=install compute-0-0 compute-0-1

then power cycle your nodes and let them reinstall.


The GRUB error is usually caused by nodes not being properly shutdown and
therefore the filesystem
on disk becomes corrupted. At that point the boot loader cannot find the
next stage read from disk
and fails.

Reinstallation should fix whatever the base problem is (and is generally
faster than trying to figure out a
"hand fix")

-P


--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)


-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20120328/285b18e7/attachment.html

Funlola Ogunleye

unread,
Mar 28, 2012, 2:31:06 PM3/28/12
to npaci-rocks...@sdsc.edu
I tried "rocks set host boot action=install compute-0-0 compute -0-1" but
it still is stuck. I think it might have to do something with when I tried
to add packages to the nodes using xml. I looked through the forum and
found that i can run the xmllint program. When I ran

xmllint --valid --noout cd
/export/rocks/install/site-profiles/5.4.3/nodes/extend-compute.xml
warning: failed to load external entity "cd"
/export/rocks/install/site-profiles/5.4.3/nodes/extend-compute.xml:3:
validity error : Validation failed: no DTD found !

Does this have something to do with it? If so, this is the contents
extend-compute.xml file just in case you spot anything. Thank you.

<?xml version="1.0" standalone="no"?>

<kickstart>

<description>

A skeleton XML node file. This file is a template and is intended
as an example of how to customize your Rocks cluster. Kickstart XML
nodes such as this describe packages and "post installation" shell
scripts for your cluster.

XML files in the site-nodes/ directory should be named either
"extend-[name].xml" or "replace-[name].xml", where [name] is
the name of an existing xml node.

If your node is prefixed with replace, its instructions will be used
instead of the official node's. If it is named extend, its directives
will be concatenated to the end of the official node.

</description>


<changelog>
</changelog>

<main>
<!-- kickstart 'main' commands go here -->
</main>

<pre>
<!-- partitioning commands go here -->
</pre>


<!-- There may be as many packages as needed here. Just make sure you only
uncomment as many package lines as you need. Any empty
<package></package>
tags are going to confuse rocks and kill the installation procedure
-->
<package>aoetools</package>
<package>euca-axis2c</package>
<package>euca-rampartc</package>
<package>perl-Crypt-OpenSSL-Random</package>
<package>perl-Crypt-OpenSSL-RSA</package>
<package>perl-Crypt-X509</package>
<package>eucalyptus</package>
<package>eucalyptus-gl</package>
<package>eucalyptus-nc</package>

<post>
<!-- Insert your post installation script here. This
code will be executed on the destination node after the
packages have been installed. Typically configuration files
are built and services setup in this section. -->

<!-- WARNING: Watch out for special XML chars like ampersand,
greater/less than and quotes. A stray ampersand will cause the
kickstart file building process to fail, thus, you won't be able
to reinstall any nodes. It is recommended that after you create an
XML node file, that you run:

xmllint -noout file.xml
-->

<eval shell="python">

<!-- This is python code that will be executed on the
frontend node during kickstart file generation. You may contact
the database, make network queries, etc. These sections are
generally used to help build more complex configuration
files. The 'shell' attribute is optional and may point to any
language interpreter such as "bash", "perl", "ruby", etc.
By default shell="bash". -->

</eval>

</post>

</kickstart>

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20120328/e5baec30/attachment.html

Luca Clementi

unread,
Mar 28, 2012, 8:05:15 PM3/28/12
to Discussion of Rocks Clusters
On Wed, Mar 28, 2012 at 11:31 AM, Funlola Ogunleye
<funlola...@gmail.com> wrote:
> I tried "rocks set host boot action=install compute-0-0 compute -0-1" but
> it still is stuck. I think it might have to do something with when I tried
> to add packages to the nodes using xml. I looked through the forum and
> found that i can run the xmllint program. When I ran
>
> xmllint --valid --noout cd
> /export/rocks/install/site-profiles/5.4.3/nodes/extend-compute.xml
> warning: failed to load external entity "cd"
> /export/rocks/install/site-profiles/5.4.3/nodes/extend-compute.xml:3:
> validity error : Validation failed: no DTD found !
>
> Does this have something to do with it? If so, this is the contents
> extend-compute.xml file just in case you spot anything. Thank you.

Hey Funlola,
Your xml seems fine.
What happen on the compute-0-0 when you reboot them?
Do they start reinstalling?


Luca

Philip Papadopoulos

unread,
Mar 29, 2012, 12:23:13 AM3/29/12
to Discussion of Rocks Clusters
before rebooting ..
try the following

# rocks list host profile compute-0-0 2>&1 | less

are there ANY errors at the beginning?

-P

--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)

-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20120328/0407ed59/attachment.html

Funlola Ogunleye

unread,
Mar 29, 2012, 9:58:43 AM3/29/12
to npaci-rocks...@sdsc.edu
I just decided to reinstall the nodes. Thank you for your recent support

On Wed, Mar 28, 2012 at 2:31 PM, Funlola Ogunleye <funlola...@gmail.com
> wrote:

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20120329/20a2bde0/attachment.html

Reply all
Reply to author
Forward
0 new messages