[Rocks-Discuss] ssh Connection refused

639 views
Skip to first unread message

luming shen

unread,
Jan 20, 2010, 7:39:36 PM1/20/10
to npaci-rocks...@sdsc.edu
Hi All,

I'm using Rocks 5.1. Everything was fine until I restarted one of the
compute nodes by pushing the power button the other day. Now I can't ssh to
this particular node. Here is the error message.

ssh -v compute-0-1
OpenSSH_4.3p2, OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Connecting to compute-0-1 [192.168.0.253] port 22.
debug1: connect to address 192.168.0.253 port 22: Connection refused
ssh: connect to host compute-0-1 port 22: Connection refused

I can still ping the node. Anyone knows how to fix the problem?

Thanks,

Luming
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100121/acea25b0/attachment.html

Vlad Manea

unread,
Jan 20, 2010, 7:56:09 PM1/20/10
to Discussion of Rocks Clusters
If reinstall failed, I would delete that node and then
rerun insert-ethers:

insert-ethers --remove="compute-X-Y"
rocks sync config
insert-ethers --update
insert-ethers --cabinet=X --rank=Y
reboot compute-X-Y

V.


--
*Dr. Vlad Constantin Manea*
*Professor of Geophysics*
Computational Geodynamics Lab. <http://www.geociencias.unam.mx/geodinamica>
Centro de Geociencias,
Campus UNAM, Juriquilla,
Blvd Juriquilla 3001,
Juriquilla, Querétaro, 76230,
México.
phone: +52 55 5623 4104/ext.133
fax: (55) 5623-4129


-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100120/75d63a0f/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cgeo_logo.gif
Type: image/gif
Size: 7448 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100120/75d63a0f/cgeo_logo.gif

Mike Hanby

unread,
Jan 20, 2010, 11:12:08 PM1/20/10
to Discussion of Rocks Clusters, Discussion of Rocks Clusters
I suspect that if you attach a monitor to the compute node you will
find it stuck in the install, probably at the select language prompt.

Double check your extend compute files ( if any ), recreate the
distribution ( cd /export/rocks/install && rocks create distro ), and
restart the compute node.

On Jan 20, 2010, at 19:20, "Vlad Manea" <vl...@geociencias.unam.mx>
wrote:

luming shen

unread,
Jan 21, 2010, 10:56:17 PM1/21/10
to Discussion of Rocks Clusters
Thanks Mike. You are right. I tried to follow the steps suggested by Vlad
Manea to reinstall the compute node. But I found it stuck in the install at
selecting language page.

I am going to recreate the distribution. Hopefully it will work this time.

Regards,

Luming

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100122/4e7f1e7a/attachment.html

Mike Hanby

unread,
Jan 22, 2010, 12:18:27 PM1/22/10
to Discussion of Rocks Clusters
Identifying the problem in the extend-compute.xml can be tricky and frustrating, go through it line by line and look for missing syntax, terminators or reserved characters.

For example, I don't think you can use < or > (redirects) or & in the file, instead you have to use the XML safe tags, something like &gt; &lt; &amp; (those may not be correct, but it's something like that).

I've had installs hang at the select language due as a result of those characters.

Also, if you are doing custom partitioning, double check the partitioning xml as well.

Hope that helps,

Mike

Tim Carlson

unread,
Jan 22, 2010, 1:06:14 PM1/22/10
to Discussion of Rocks Clusters
On Fri, 22 Jan 2010, Mike Hanby wrote:

Always run your custom XML files through the xmllint program to find
errors :)

This might be something nice to incorportate into "rocks create distro".

Tim

Mike Hanby

unread,
Jan 22, 2010, 2:04:42 PM1/22/10
to Discussion of Rocks Clusters
Tim, Thanks for the tip I'd not used that tool before. I'm adding it to my Rocks notes :-)

It might also be handy to have this tool in the Rocks manual where it talks about extend-compute.xml or any other customization files. For example in chapter 4, section 4.1, something like:

==============================================
Now build a new Rocks distribution, first running a syntax check on the extend-compute.xml file. This will bind the new package into a RedHat compatible distribution in the
directory /export/rocks/install/rocks-dist/....

# xmllint --valid --noout /export/rocks/install/site-profiles/5.3/nodes/extend-compute.xml
# cd /export/rocks/install
# rocks create distro

Now, reinstall your compute nodes.

==============================================

That way new Rocks users will get in the habit of running the check. I suppose the "no DTD found" is a false failure for these files?

Dave Felt

unread,
Jan 22, 2010, 4:11:45 PM1/22/10
to Discussion of Rocks Clusters
All - When I run the xmllint command on Rocks 5.2 to test the
extend-compute.xml file, it fails with
the error: Validation failed: no DTD found !
<kickstart>
..........^
So, how do I get this test to run correctly? I sure agree that being
able to run this test is helpful..
Dave

Greg Bruno

unread,
Jan 22, 2010, 4:18:26 PM1/22/10
to Discussion of Rocks Clusters
On Fri, Jan 22, 2010 at 1:11 PM, Dave Felt <fe...@caltech.edu> wrote:
> All - When I run the xmllint command on Rocks 5.2 to test the
> extend-compute.xml file, it fails with
> the error: Validation failed: no DTD found !
> <kickstart>
> ..........^
> So, how do I get this test to run correctly? I sure agree that being able to
> run this test is helpful..

drop the flags to xmllint, that is, just run:

# xmllint /export/rocks/install/site-profiles/5.3/nodes/extend-compute.xml

- gb

Mason J. Katz

unread,
Jan 22, 2010, 4:20:16 PM1/22/10
to Discussion of Rocks Clusters
We don't have a DTD for the kickstart xml framework. Lazy xml :)

Does your extend-compute.xml file refernce a DTD, if so remove this.

mason katz
+1.240.724.6825

On Fri, Jan 22, 2010 at 1:11 PM, Dave Felt <fe...@caltech.edu> wrote:

Dave Felt

unread,
Jan 22, 2010, 4:40:33 PM1/22/10
to Discussion of Rocks Clusters
Thanks, Greg, that works ok (modified for 5.3)
-Dave

Mike Hanby

unread,
Jan 22, 2010, 5:12:16 PM1/22/10
to Discussion of Rocks Clusters
Ah, ok so the --valid was causing the DTD error.

Using "--noout" prevents the full XML from dumping to the screen and should only dump errors, if any are present.

for example as a test I added an error to a tmp extend-compute.xml to test:

$ xmllint --noout /tmp/extend-compute1.xml
/tmp/extend-compute1.xml:109: parser error : xmlParseEntityRef: no name
/sbin/service sgeexecd.cluster1 stop >> /tmp/blah.log 2>&1
^

In this case, &1 is an error. It's also useful to view the full output (leave off --noout), I didn't realize that > and >> would work for embeded redirects, but it appears that they get automatically converted to &gt; and &gt;&gt;

The < redirect will throw an error, however.

Handy tool.

Greg Bruno

unread,
Jan 25, 2010, 6:03:12 PM1/25/10
to Discussion of Rocks Clusters
On Fri, Jan 22, 2010 at 11:04 AM, Mike Hanby <mha...@uab.edu> wrote:
> Tim, Thanks for the tip I'd not used that tool before. I'm adding it to my Rocks notes :-)
>
> It might also be handy to have this tool in the Rocks manual where it talks about extend-compute.xml or any other customization files. For example in chapter 4, section 4.1, something like:
>
> ==============================================
> Now build a new Rocks distribution, first running a syntax check on the extend-compute.xml file. This will bind the new package into a RedHat compatible distribution in the
> directory /export/rocks/install/rocks-dist/....
>
> # xmllint --valid --noout /export/rocks/install/site-profiles/5.3/nodes/extend-compute.xml
> # cd /export/rocks/install
> # rocks create distro
>
> Now, reinstall your compute nodes.
>
> ==============================================

you want to be careful with the above. if your node XML file contains
a reference to an attribute (e.g., &hostname;), 'xmllint' will fail.
this is because xmllinit isn't aware of the rocks attributes.

- gb

luming shen

unread,
Jan 25, 2010, 9:01:07 PM1/25/10
to Discussion of Rocks Clusters
Thanks all for your input. Unfortunately, the problem hasn't been solved.
After I ran

cd /export/rocks/install
rocks create distro


insert-ethers --remove="compute-X-Y"
rocks sync config
insert-ethers --update
insert-ethers --cabinet=X --rank=Y

and reboot the compute node. I could see the (*) symbol on the frontend, but
I found the compute node still stuck at the selecting language page and the
installation was not successful.

I did use custom partitioning when I first time installed all the compute
nodes a year ago. But everything was fine then. Today I checked my
/export/rocks/install/site-profiles/5.1/nodes folder. There was no
extend-compute.xml file. There were only skeleton.xml and
replace-partition.xml. I don't remember I ever deleted the
extend-compute.xml file.

Anyone know what I should do next? Can I create the extend-compute.xml file
myself?

Thanks,

Luming

-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100126/d9455ddf/attachment.html

Vlad Manea

unread,
Jan 26, 2010, 10:34:18 AM1/26/10
to Discussion of Rocks Clusters
Do you have another machine available
to try installing as compute node?

V.

luming shen escribió:

luming shen

unread,
Jan 26, 2010, 8:43:25 PM1/26/10
to Discussion of Rocks Clusters
No. I don't have another machine available at the moment. You think that's
hardware problem with the computer node? or a problem with frontend which
does not have re-install ability anymore?

Luming

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100127/ebaed6cf/attachment.html

Vlad Manea

unread,
Jan 27, 2010, 12:59:29 PM1/27/10
to Discussion of Rocks Clusters
Good question. No idea. Maybe some Rocks gurus have better ideas
how to solve this problem. Meantime, if you have some time, try several
things:

1) reboot the switch.
2) try deleting the HDD partition (and check for errors), make another one,
and try reinstalling the compute node.

V.

--
*Dr. Vlad Constantin Manea*
*Professor of Geophysics*
Computational Geodynamics Lab. <http://www.geociencias.unam.mx/geodinamica>
Centro de Geociencias,
Campus UNAM, Juriquilla,
Blvd Juriquilla 3001,
Juriquilla, Querétaro, 76230,
México.
phone: +52 55 5623 4104/ext.133
fax: (55) 5623-4129

-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100127/b66277ee/attachment.html

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cgeo_logo.gif
Type: image/gif
Size: 7448 bytes
Desc: not available

Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100127/b66277ee/cgeo_logo.gif

Reply all
Reply to author
Forward
0 new messages