[Rocks-Discuss] Reinstalling nodes after installing new torque roll on frontend

200 views
Skip to first unread message

Cláudio Forain

unread,
Sep 13, 2010, 9:42:17 PM9/13/10
to Discussion of Rocks Clusters
So, I installed the frontend and then I kickstarted to the nodes. But
when I finished all of them, I realized that I didnt installed SGE nor
Torque rolls. So, following the documentation, it seems I had to
reinstall the frontend and then reinstall all the nodes, adding the
torque roll to the frontend. Now, when I run insert-ethers to add a
compute, when I pxeboot any of the nodes, it does boot the kernel
across the network, but when the process go on, instead of just
installing the node, it prompts me about the language and image
location, then it fails because cant find the image. I though that it
would have something to do with the previous instalattion in the
nodes, so I saw the documentation
(http://www.rocksclusters.org/roll-documentation/base/5.3/x1354.html)
and when I run


# rocks list host boot
HOST ACTION

compute-0-0: install
compute-0-1: install
compute-0-2: install
compute-0-3: install
compute-0-4: install
compute-0-5: install
compute-0-6: install
compute-0-7: install


Which I figure means that the next pxeboot, the node will reinstall.
Am I missing something here? Thanks in advance.

Greg Bruno

unread,
Sep 14, 2010, 12:41:03 AM9/14/10
to Discussion of Rocks Clusters
2010/9/13 Cláudio Forain <claudi...@gmail.com>:

> So, I installed the frontend and then I kickstarted to the nodes. But
> when I finished all of them, I realized that I didnt installed SGE nor
> Torque rolls. So, following the documentation, it seems I had to
> reinstall the frontend and then reinstall all the nodes, adding the
> torque roll to the frontend. Now, when I run insert-ethers to add a
> compute, when I pxeboot any of the nodes, it does boot the kernel
> across the network, but when the process go on, instead of just
> installing the node, it prompts me about the language and image
> location, then it fails because cant find the image. I though that it
> would have something to do with the previous instalattion in the
> nodes, so I saw the documentation
> (http://www.rocksclusters.org/roll-documentation/base/5.3/x1354.html)
> and when I run
>
>
> # rocks list host boot
> HOST         ACTION
>
> compute-0-0: install
> compute-0-1: install
> compute-0-2: install
> compute-0-3: install
> compute-0-4: install
> compute-0-5: install
> compute-0-6: install
> compute-0-7: install

what is the output of:

# rocks list host profile compute-0-0 > /tmp/ks.cfg

- gb

Cláudio Forain

unread,
Sep 14, 2010, 9:44:56 AM9/14/10
to Discussion of Rocks Clusters
Doing this (# rocks list host profile compute-0-0 > /tmp/ks.cfg),
won´t I overwrite the the file?

Greg Bruno

unread,
Sep 14, 2010, 9:55:26 AM9/14/10
to Discussion of Rocks Clusters
2010/9/14 Cláudio Forain <claudi...@gmail.com>:

> Doing this (# rocks list host profile compute-0-0 > /tmp/ks.cfg),
> won´t I overwrite the  the file?

yes, it will overwrite /tmp/ks.cfg, but it is just a temporary file
that is not used for node installation. the above command will not
harm the system.

- gb

Cláudio Forain

unread,
Sep 14, 2010, 10:17:02 AM9/14/10
to Discussion of Rocks Clusters
OK. I will only have access to the machine again tomorrow, I will post
the results.

Bart Brashers

unread,
Sep 14, 2010, 2:33:03 PM9/14/10
to Discussion of Rocks Clusters
> So, I installed the frontend and then I kickstarted to the nodes. But
> when I finished all of them, I realized that I didnt installed SGE nor
> Torque rolls. So, following the documentation, it seems I had to
> reinstall the frontend and then reinstall all the nodes, adding the
> torque roll to the frontend. Now, when I run insert-ethers to add a
> compute,

This was your problem, see below...

> when I pxeboot any of the nodes, it does boot the kernel
> across the network, but when the process go on, instead of just
> installing the node, it prompts me about the language and image
> location, then it fails because cant find the image.

Did you run "cd /home/export/rocks ; rocks create distro" before you PXE-booted your nodes? That creates the "image".

> I thought that it


> would have something to do with the previous instalattion in the
> nodes, so I saw the documentation
> (http://www.rocksclusters.org/roll-documentation/base/5.3/x1354.html)
> and when I run
>
>
> # rocks list host boot
> HOST ACTION
>
> compute-0-0: install
> compute-0-1: install
> compute-0-2: install
> compute-0-3: install
> compute-0-4: install
> compute-0-5: install
> compute-0-6: install
> compute-0-7: install
>
>
> Which I figure means that the next pxeboot, the node will reinstall.
> Am I missing something here? Thanks in advance.

Because the nodes are already in the database, you don't need to use "insert-ethers" again (on these nodes). Just boot them. Assuming they are set to PXE boot, they will install.

If you still have problems, it might be because the compute nodes disks have partitions on them already, which are marked with files named ".rocks-release". This makes the installer refuse to delete/reformat them, in an attempt to save your data.

When the install stops with a problem, you can press Ctrl-Alt-F1, -F2, -F3, -F4 etc. to see some useful info. One of those will contain a command line, from which you can run "fdisk" and wipe out any non-user-data partitions.

Bart

This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.

Cláudio Forain

unread,
Sep 14, 2010, 4:14:21 PM9/14/10
to Discussion of Rocks Clusters
Sorry, I forgot to mention that I did rebuild the distro. I installed
R packages using this tutorial
(http://technical.bestgrid.org/index.php/Installing_R_on_a_Rocks_Cluster#Installing_R)
. Checking with the rocks documentation, it seems right. Anyway, it
doesnt seem related, because I think I had this problem before
installing R.

Bart Brashers

unread,
Sep 14, 2010, 4:21:33 PM9/14/10
to Discussion of Rocks Clusters
So can you, or can't you, (re-) install your nodes?

Bart

Cláudio Forain

unread,
Sep 14, 2010, 4:31:08 PM9/14/10
to Discussion of Rocks Clusters
Still can't. I just forgot to mention this fact. I don't have access
to the cluster untill tomorrow, so there I will try your suggestions.
I just told you this to see if theres something obvious Im missing.
Sorry for not making that clear, and thanks for your attention. I will
keep you guys updated.

On Tue, Sep 14, 2010 at 5:21 PM, Bart Brashers

Cláudio Forain

unread,
Sep 15, 2010, 8:26:45 AM9/15/10
to Discussion of Rocks Clusters
Im using Rocks 5.3 (Rolled Tacos) by the way.
So, lets see.
Answering Greg:

[root@lpge-cluster install]# rocks list host profile compute-0-0
Traceback (most recent call last):
File "/opt/rocks/bin/rocks", line 264, in ?
command.runWrapper(name, args[i:])
File "/opt/rocks/lib/python2.4/site-packages/rocks/commands/__init__.py",
line 1774, in runWrapper
self.run(self._params, self._args)
File "/opt/rocks/lib/python2.4/site-packages/rocks/commands/list/host/profile/__init__.py",
line 273, in run
[
File "/opt/rocks/lib/python2.4/site-packages/rocks/commands/__init__.py",
line 1467, in command
o.runWrapper(name, args)
File "/opt/rocks/lib/python2.4/site-packages/rocks/commands/__init__.py",
line 1774, in runWrapper
self.run(self._params, self._args)
File "/opt/rocks/lib/python2.4/site-packages/rocks/commands/list/host/xml/__init__.py",
line 189, in run
xml = self.command('list.node.xml', args)
File "/opt/rocks/lib/python2.4/site-packages/rocks/commands/__init__.py",
line 1467, in command
o.runWrapper(name, args)
File "/opt/rocks/lib/python2.4/site-packages/rocks/commands/__init__.py",
line 1774, in runWrapper
self.run(self._params, self._args)
File "/opt/rocks/lib/python2.4/site-packages/rocks/commands/list/node/xml/__init__.py",
line 511, in run
handler.parseNode(node, doEval)
File "/opt/rocks/lib/python2.4/site-packages/rocks/profile.py", line
374, in parseNode
parser.feed(line)
File "/opt/rocks/lib/python2.4/site-packages/_xmlplus/sax/expatreader.py",
line 220, in feed
self._err_handler.fatalError(exc)
File "/opt/rocks/lib/python2.4/site-packages/_xmlplus/sax/handler.py",
line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: <unknown>:163:2: mismatched tag

Answering Barts:

/home didnt have the export folder, although I ran in
/export/rocks/install/(acording to this tutorial
http://technical.bestgrid.org/index.php/Installing_R_on_a_Rocks_Cluster#Installing_R)

[root@lpge-cluster install]# rocks create distro
Cleaning distribution
Resolving versions (base files)
including "kernel" (5.3,x86_64) roll...
including "torque" (5.3.0,x86_64) roll...
including "hpc" (5.3,x86_64) roll...
including "base" (5.3,x86_64) roll...
including "web-server" (5.3,x86_64) roll...
including "ganglia" (5.3,x86_64) roll...
including "os" (5.3,x86_64) roll...
Including critical RPMS
Resolving versions (RPMs)
including "kernel" (5.3,x86_64) roll...
including "torque" (5.3.0,x86_64) roll...
including "hpc" (5.3,x86_64) roll...
including "base" (5.3,x86_64) roll...
including "web-server" (5.3,x86_64) roll...
including "ganglia" (5.3,x86_64) roll...
including "os" (5.3,x86_64) roll...
Resolving versions (SRPMs)
including "kernel" (5.3,x86_64) roll...
including "torque" (5.3.0,x86_64) roll...
including "hpc" (5.3,x86_64) roll...
including "base" (5.3,x86_64) roll...
including "web-server" (5.3,x86_64) roll...
including "ganglia" (5.3,x86_64) roll...
including "os" (5.3,x86_64) roll...
Creating files (symbolic links - fast)
Applying stage2.img
Applying updates.img
Installing XML Kickstart profiles
installing "hpc" profiles...
installing "ganglia" profiles...
installing "web-server" profiles...
installing "base" profiles...
installing "kernel" profiles...
installing "os" profiles...
installing "torque" profiles...
installing "site" profiles...
Creating repository
making "torrent" files for RPMS


And about insert-ethers, I have the same problem even not running it.
About the partitions I had thought about it too, so I booted a Debian
Install CD and destroyed all the partitions in the HD, but had the
same problems.

Using ctrl+F3 I got the following relevant output:

ROCKS: Found disk device sda
ROCKS:getCert: No Rocks disks found
ks location: https://172.16.0.1/install/sbin/kickstart.cgi?arch=x86_64&np=8
ROCKS:transfering
https://172.16.0.1//install/sbin/kickstart.cgi?arch=x86_64&np=8
.
.
.
ROCKS:httpsGetFileDesc:status 200 OK
ROCKS:urlinstStartSSLTransfer:attempt (1)
ROCKS:writeInterfacesFile
ROCKS:setting up kickstart


And then it asks for language and image file.

Im kinda worried about that error. I guess its something to do with
the XML I edited to install the packages,although I dont believe its
related to the problem.

Anyway, here is the only XML I have ever edited:

[root@lpge-cluster install]# cat
/export/rocks/install/site-profiles/5.3/nodes/extend-compute.xml
<?xml version="1.0" standalone="no"?>

<kickstart>

<description>

A skeleton XML node file. This file is a template and is intended
as an example of how to customize your Rocks cluster. Kickstart XML
nodes such as this describe packages and "post installation" shell
scripts for your cluster.

XML files in the site-nodes/ directory should be named either
"extend-[name].xml" or "replace-[name].xml", where [name] is
the name of an existing xml node.

If your node is prefixed with replace, its instructions will be used
instead of the official node's. If it is named extend, its directives
will be concatenated to the end of the official node.

</description>


<changelog>
</changelog>

<main>
<!-- kickstart 'main' commands go here -->
</main>

<pre>
<!-- partitioning commands go here -->
</pre>


<!-- There may be as many packages as needed here. Just make sure you only
uncomment as many package lines as you need. Any empty <package></package>
tags are going to confuse rocks and kill the installation procedure
-->
<!-- <package> insert 1st package name here and uncomment the line</package> -->
<!-- <package> insert 2nd package name here and uncomment the line</package> -->
<!-- <package> insert 3rd package name here and uncomment the line</package> -->

<package>R</package>
<package>R-devel</package>
<package>libRmath</package>
<package>libRmath-devel</package>

<post>
<!-- Insert your post installation script here. This
code will be executed on the destination node after the
packages have been installed. Typically configuration files
are built and services setup in this section. -->

<!-- WARNING: Watch out for special XML chars like ampersand,
greater/less than and quotes. A stray ampersand will cause the
kickstart file building process to fail, thus, you won't be able
to reinstall any nodes. It is recommended that after you create an
XML node file, that you run:

xmllint -noout file.xml
-->

mkdir /install/rocks-dist/scripts
<file name="/install/rocks-dist/scripts/rconfig.r">
Sys.getenv("http_proxy")

options(repos="http://cran.stat.auckland.ac.nz")

#Install Rmpi separately due to configure.args requirement
install.package("Rmpi",configure.args='--with-mpi=/opt/openmpi')

# Create a list of standard packages
packagelist &lt;-
c("sp","maptools","lattice","spproj","spgpc","spgrass6","spgdal","gstat","splancs","DCluster","spdep","spPBS","spmaps","spspatstat","spgeoR","spRandomFields","spatstat","geoR","geoRglm","odesolve","snow","coda","akima")
for (pkg in packagelist)
{
if (!require(pkg))
{
print(paste("Attempting to install ",pkg))
install.packages(pkg)
}
}
</file>

ls -l /install/rocks-dist/scripts

http_proxy=http://<address>:<port> /usr/bin/R CMD BATCH --vanilla
/install/rocks-dist/scripts/rconfig.r /var/log/rconfig.log

<eval shell="python">

<!-- This is python code that will be executed on the
frontend node during kickstart file generation. You may contact
the database, make network queries, etc. These sections are
generally used to help build more complex configuration
files. The 'shell' attribute is optional and may point to any
language interpreter such as "bash", "perl", "ruby", etc.
By default shell="bash". -->

</eval>

</post>

</kickstart>

2010/9/14 Cláudio Forain <claudi...@gmail.com>:

Cláudio Forain

unread,
Sep 15, 2010, 9:09:25 AM9/15/10
to Discussion of Rocks Clusters
Update. Well, halfway through installation, it complains about not
reading package metadata ans crashes... So, still messed up.

2010/9/15 Cláudio Forain <claudi...@gmail.com>:
> Update:
>
> I tried to install via http as the node asked. It askes for the
> updates.img file. Looking around in the node, I found it in with the
> following contents
> (https://10.0.0.74/install/rolls/kernel/5.3/x86_64/images/)
>
> product.img
> index.html
> updates.img
> TRANS.TBL
> stage2.img
>
> So I pointed to that path and it seems to be installing properly. I
> will update you guys. Anyway, I beleive its not the right behavior.
> So, what's wrong?
> 2010/9/15 Cláudio Forain <claudi...@gmail.com>:

Cláudio Forain

unread,
Sep 15, 2010, 9:04:17 AM9/15/10
to Discussion of Rocks Clusters
Update:

I tried to install via http as the node asked. It askes for the
updates.img file. Looking around in the node, I found it in with the
following contents
(https://10.0.0.74/install/rolls/kernel/5.3/x86_64/images/)

product.img
index.html
updates.img
TRANS.TBL
stage2.img

So I pointed to that path and it seems to be installing properly. I
will update you guys. Anyway, I beleive its not the right behavior.
So, what's wrong?
2010/9/15 Cláudio Forain <claudi...@gmail.com>:

Greg Bruno

unread,
Sep 15, 2010, 10:54:28 AM9/15/10
to Discussion of Rocks Clusters
2010/9/15 Cláudio Forain <claudi...@gmail.com>:

the problem is that '<port>' is in the line in the above section:

http_proxy=http://<address>:<port> /usr/bin/R CMD BATCH --vanilla

the XML parser is trying to parse '<port>' and you want the XML parser
to treat it as a literal. to accomplish that, start your <post>
section with:

<post>
<![CDATA[

and end your </post> section with:

]]>
</post>

- gb

Cláudio Forain

unread,
Sep 15, 2010, 11:16:35 AM9/15/10
to Discussion of Rocks Clusters
So then I recreate my distro? Anyway, the XML ended up like this:


<?xml version="1.0" standalone="no"?>

<kickstart>

<description>

A skeleton XML node file. This file is a template and is intended
as an example of how to customize your Rocks cluster. Kickstart XML
nodes such as this describe packages and "post installation" shell
scripts for your cluster.

XML files in the site-nodes/ directory should be named either
"extend-[name].xml" or "replace-[name].xml", where [name] is
the name of an existing xml node.

If your node is prefixed with replace, its instructions will be used
instead of the official node's. If it is named extend, its directives
will be concatenated to the end of the official node.

</description>


<changelog>
</changelog>

<main>
</main>

<pre>
</pre>

<package>R</package>
<package>R-devel</package>
<package>libRmath</package>
<package>libRmath-devel</package>

<post>
<![CDATA[


mkdir /install/rocks-dist/scripts
<file name="/install/rocks-dist/scripts/rconfig.r">
Sys.getenv("http_proxy")

options(repos="http://cran.stat.auckland.ac.nz")

#Install Rmpi separately due to configure.args requirement
install.package("Rmpi",configure.args='--with-mpi=/opt/openmpi')

# Create a list of standard packages
packagelist &lt;-
c("sp","maptools","lattice","spproj","spgpc","spgrass6","spgdal","gstat","splancs","DCluster","spdep","spPBS","spmaps","spspatstat","spgeoR","spRandomFields","spatstat","geoR","geoRglm","odesolve","snow","coda","akima")
for (pkg in packagelist)
{
if (!require(pkg))
{
print(paste("Attempting to install ",pkg))
install.packages(pkg)
}
}
</file>

ls -l /install/rocks-dist/scripts

http_proxy=http://<address>:<port> /usr/bin/R CMD BATCH --vanilla
/install/rocks-dist/scripts/rconfig.r /var/log/rconfig.log
<eval shell="python">


</eval>
]]>
</post>

</kickstart>


If I run #rocks list host profile, it gives me:

[root@lpge-cluster nodes]# rocks list host profile

xml.sax._exceptions.SAXParseException: <unknown>:133:2: mismatched tag

Do you see anything wrong?

Cláudio Forain

unread,
Sep 15, 2010, 11:24:46 AM9/15/10
to Discussion of Rocks Clusters
UPDATE

Aparently I was finally able to kickstart a node. But those XML errors
still worry me. Im afraid that it wont run the scripts or install the
rpms properly. Thanks for now, but give me a heads up if you see
anything wrong in the xml.

2010/9/15 Cláudio Forain <claudi...@gmail.com>:

Greg Bruno

unread,
Sep 15, 2010, 3:07:36 PM9/15/10
to Discussion of Rocks Clusters
2010/9/15 Cláudio Forain <claudi...@gmail.com>:

> So then I recreate my distro? Anyway, the XML ended up like this:

yes, after you make any modification to a node XML file, you need to
rebuild the distro:

# cd /export/rocks/install
# rocks create distro

- gb

Reply all
Reply to author
Forward
0 new messages