we got a general power failure; the rocks cluster is not functionning
quite well, but the pxe rebuild of the computing nodes.
When I watch the console output of a reinstalling node, I see a correct
PXE start, dhcp etc. up to the reinstall procedure, then :
no kickstart found for this node,
so the reinstall begins to present the classical prompts (install
language, keyboard..)
I checked the frontend sanity :
httpd : OK
mysqld : OK
rocks list host interface : OK
rocks list host profile compute-0-0 : KO !
--------------------
Traceback (most recent call last):
File "/opt/rocks/bin/rocks", line 264, in ?
command.runWrapper(name, args[i:])
File
"/opt/rocks/lib/python2.4/site-packages/rocks/commands/__init__.py",
line 1774, in runWrapper
self.run(self._params, self._args)
(...)
File "/opt/rocks/lib/python2.4/site-packages/_xmlplus/sax/handler.py",
line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: <unknown>:155:2: junk after
document element
-----------------------------------
Is it useful to investigate further ?
Or better proceed to a removal of every computing nodes + a fresh PXE
install of all of them ?
I like to dive into the Rocks mechanism, but time is precious, and a
general advice to orient me will be welcome.
Thanks,
best regards,
Alain
--
Dr Alain EMPAIN, Bioinformatics, Bryology
National Botanic Garden of Belgium alain....@br.fgov.be
University of Liège, GIGA +1, Alma-in-silico alain....@ulg.ac.be
Rue des Martyrs, 11 B-4550 Nandrin
Mobile: +32 497 701764 HOME:+32 85 512341 ULG: +32 4 3664157
Rebuild the distro and then do the rocks list host profile command again.
cd /export/rocks/install
rocks create distro
rocks list host profile compute-0-0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100412/2d62f053/attachment.html
Assuming you are running Rocks 5.3....
Sounds like a bug in your extend-compute.xml file. Go to
/export/rocks/install/site-profiles/5.3/nodes
and run the xmllint command
xmllint -noout extend-compute.xml
Replace 5.3 above with 5.2 or 5.1. I think in 5.0 your file lives in
/home/install/site-profiles/5.0/nodes
Tim
--
-------------------------------------------
Tim Carlson, PhD
Senior Research Scientist
Environmental Molecular Sciences Laboratory
sorry for the delay, at Belgian time it was high time to sleep ;-)
I am using rocks 5.3;
I will test your suggestion as soon as I will be at work.
Thanks,
Alain
sorry for the delay, at Belgian time it was high time to sleep ;-)
Indeed, I am using rocks 5.3; I will test your suggestion as soon as I
will be at work.
Thanks,
Alain
> On Mon, 12 Apr 2010, Empain Alain wrote:
indeed, the problem was within the extend-compute.xml.
Thanks for the advices.
In fact when the power shortage distracted me, I was developping a tool to
centralize my growing knowledge of Rocks behavior, and
/export/rocks/install/site-profiles/5.3/nodes/extend-compute.xml was left
with an abnormal line.
Here is the draft help of my 'rocks-rpm' tool : when a user ask to install
an application, I ask her/him to provide a 'rpm.def' file, containing the
mirror URL and a list of rpm to be loaded (see below).
My tool helped me to build a better knowledge of rocks, and to centralize
the related install actions within a common tool.
REM: /home/safe is an automounted NAS, as /home/volatile is the second
one, so the def and rpms are kept within a permanent place, protected from
any frontend reinstallation.
One caveat :
up to now, the tool concats the new '<package>xxx</package>' line to the
current extend-compute.xml end, and I have yet to edit it by hand to
inspect and move the added line to its correct place (important to check
the file).
I was bitten by this shortcut :-{
rocks-rpm -h
======================================================================
/usr/local/sbin/rocks-rpm Version 2010-03-31 A.Empain 2010
======================================================================
Script to download RPMs, prepare a new distro and reinstall the nodes
Usage:
-s : setup of /usr/local/sbin/rocks-rpm (install it)
-g package.def : get the RPMs and prepare the kickstart env.
-r : rebuild the distro
-I def : IMMEDIATE RPM installation (no reboot, 'rocks run ...')
-R def : REMOVE (immediate) RPM's (no reboot, 'rocks run ...')
-F : FORCE the node reinstallation with the new distro
-Q : QUEUE the node reinstallation (SGE) with the new distro
The package.def (perl syntax) must provide the variable and the array :
-------------------------------
$MIRROR="http://ftp.uni-koeln.de/mirrors/fedora/epel/5/x86_64"
@RPM=(
"perl-File-Copy-Recursive-0.35-1.el5.noarch.rpm",
"xdg-utils-1.0.2-4.el5.noarch.rpm",
"R-core-2.10.1-1.el5.x86_64.rpm"
);
Edit : /export/rocks/install/site-profiles/5.3/nodes/extend-compute.xml
rpms : /home/safe/ROCKS/rpms (RPM safe storage)
rpm.d : /home/safe/ROCKS/rpm.d (RPM definitions)
Best regards,
Alain