I'm using Rocks 5.1. Everything was fine until I restarted one of the
compute nodes by pushing the power button the other day. Now I can't ssh to
this particular node. Here is the error message.
ssh -v compute-0-1
OpenSSH_4.3p2, OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Connecting to compute-0-1 [192.168.0.253] port 22.
debug1: connect to address 192.168.0.253 port 22: Connection refused
ssh: connect to host compute-0-1 port 22: Connection refused
I can still ping the node. Anyone knows how to fix the problem?
Thanks,
Luming
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100121/acea25b0/attachment.html
insert-ethers --remove="compute-X-Y"
rocks sync config
insert-ethers --update
insert-ethers --cabinet=X --rank=Y
reboot compute-X-Y
V.
--
*Dr. Vlad Constantin Manea*
*Professor of Geophysics*
Computational Geodynamics Lab. <http://www.geociencias.unam.mx/geodinamica>
Centro de Geociencias,
Campus UNAM, Juriquilla,
Blvd Juriquilla 3001,
Juriquilla, Querétaro, 76230,
México.
phone: +52 55 5623 4104/ext.133
fax: (55) 5623-4129
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100120/75d63a0f/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cgeo_logo.gif
Type: image/gif
Size: 7448 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100120/75d63a0f/cgeo_logo.gif
I am going to recreate the distribution. Hopefully it will work this time.
Regards,
Luming
For example, I don't think you can use < or > (redirects) or & in the file, instead you have to use the XML safe tags, something like > < & (those may not be correct, but it's something like that).
I've had installs hang at the select language due as a result of those characters.
Also, if you are doing custom partitioning, double check the partitioning xml as well.
Hope that helps,
Mike
Always run your custom XML files through the xmllint program to find
errors :)
This might be something nice to incorportate into "rocks create distro".
Tim
It might also be handy to have this tool in the Rocks manual where it talks about extend-compute.xml or any other customization files. For example in chapter 4, section 4.1, something like:
==============================================
Now build a new Rocks distribution, first running a syntax check on the extend-compute.xml file. This will bind the new package into a RedHat compatible distribution in the
directory /export/rocks/install/rocks-dist/....
# xmllint --valid --noout /export/rocks/install/site-profiles/5.3/nodes/extend-compute.xml
# cd /export/rocks/install
# rocks create distro
Now, reinstall your compute nodes.
==============================================
That way new Rocks users will get in the habit of running the check. I suppose the "no DTD found" is a false failure for these files?
drop the flags to xmllint, that is, just run:
# xmllint /export/rocks/install/site-profiles/5.3/nodes/extend-compute.xml
- gb
Does your extend-compute.xml file refernce a DTD, if so remove this.
mason katz
+1.240.724.6825
On Fri, Jan 22, 2010 at 1:11 PM, Dave Felt <fe...@caltech.edu> wrote:
Using "--noout" prevents the full XML from dumping to the screen and should only dump errors, if any are present.
for example as a test I added an error to a tmp extend-compute.xml to test:
$ xmllint --noout /tmp/extend-compute1.xml
/tmp/extend-compute1.xml:109: parser error : xmlParseEntityRef: no name
/sbin/service sgeexecd.cluster1 stop >> /tmp/blah.log 2>&1
^
In this case, &1 is an error. It's also useful to view the full output (leave off --noout), I didn't realize that > and >> would work for embeded redirects, but it appears that they get automatically converted to > and >>
The < redirect will throw an error, however.
Handy tool.
you want to be careful with the above. if your node XML file contains
a reference to an attribute (e.g., &hostname;), 'xmllint' will fail.
this is because xmllinit isn't aware of the rocks attributes.
- gb
cd /export/rocks/install
rocks create distro
insert-ethers --remove="compute-X-Y"
rocks sync config
insert-ethers --update
insert-ethers --cabinet=X --rank=Y
and reboot the compute node. I could see the (*) symbol on the frontend, but
I found the compute node still stuck at the selecting language page and the
installation was not successful.
I did use custom partitioning when I first time installed all the compute
nodes a year ago. But everything was fine then. Today I checked my
/export/rocks/install/site-profiles/5.1/nodes folder. There was no
extend-compute.xml file. There were only skeleton.xml and
replace-partition.xml. I don't remember I ever deleted the
extend-compute.xml file.
Anyone know what I should do next? Can I create the extend-compute.xml file
myself?
Thanks,
Luming
-------------- next part --------------
An HTML attachment was scrubbed...
V.
luming shen escribió:
Luming
1) reboot the switch.
2) try deleting the HDD partition (and check for errors), make another one,
and try reinstalling the compute node.
V.
--
*Dr. Vlad Constantin Manea*
*Professor of Geophysics*
Computational Geodynamics Lab. <http://www.geociencias.unam.mx/geodinamica>
Centro de Geociencias,
Campus UNAM, Juriquilla,
Blvd Juriquilla 3001,
Juriquilla, Querétaro, 76230,
México.
phone: +52 55 5623 4104/ext.133
fax: (55) 5623-4129
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cgeo_logo.gif
Type: image/gif
Size: 7448 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100127/b66277ee/cgeo_logo.gif