[Rocks-Discuss] Need help Insert-ethers

231 views
Skip to first unread message

shiva kumar

unread,
Oct 1, 2010, 1:47:29 PM10/1/10
to npaci-rocks...@sdsc.edu
Hi,
I am building CLuster with 33 compute nodes each with 32cores and i am using
netgear managed switch. my question is when i run insert-ether command and i
selected compute option,  frist i connected 1 compute node and the terminal
window is showing 3  compute nodes but only 1 * on one of them and i connected
second node and total it shows 6 names with two stars.. and after installation's
finished i ran the "rocks list host" total it shows 6 hosts with two systems as
32 cpus and remaining as 1 cpu.. please understand what i am trying to say..

i dindn't configure anything on switch...
when i select "ethernet" option i cannot install on nodes i am getting error..


please tell me what are the other host names with 1 cpu means... it has a mac
address too...  i can ping to all the hosts including 1 cpu hosts

on ganglia web page it only shows hosts which has 32 cpus

I will apprecieate your help...

thanks,
shiva



-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101001/d684edec/attachment.html

Jon Forrest

unread,
Oct 1, 2010, 2:00:53 PM10/1/10
to npaci-rocks...@sdsc.edu
On 10/1/2010 10:47 AM, shiva kumar wrote:
> Hi,
> I am building CLuster with 33 compute nodes each with 32cores and i am using
> netgear managed switch. my question is when i run insert-ether command and i
> selected compute option, frist i connected 1 compute node and the terminal
> window is showing 3 compute nodes but only 1 * on one of them and i connected
> second node and total it shows 6 names with two stars.. and after installation's
> finished i ran the "rocks list host" total it shows 6 hosts with two systems as
> 32 cpus and remaining as 1 cpu.. please understand what i am trying to say..

> please tell me what are the other host names with 1 cpu means... it has a mac


> address too... i can ping to all the hosts including 1 cpu hosts

What's probably happening is that your switch and your IPMI
interface are set to use DHCP to get an address.

Cordially,
--
Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
94720-1460
510-643-1032
jlfo...@berkeley.edu

Bart Brashers

unread,
Oct 1, 2010, 2:00:20 PM10/1/10
to Discussion of Rocks Clusters
When you see more than one node after booting your first compute node, that tells us you have other DHCP requests on that sub-net. I'll bet your managed switch is trying to DHCP, to obtain an IP address.

Try turning off all your compute nodes and your switch. Remove everything from your database using "rocks remove host ...".

Start "insert-ethers" and select "Ethernet". Turn on your switch, and it should DHCP to obtain an IP address. Stop insert-ethers (F10 is it?) and try to ssh to (or at least ping) the switch.

Start insert-ethers again, and select "Ethernet" again. Wait a few minutes, and see if anything else sends a DHCP request. When you're sure, stop insert-ethers. Start insert-ethers again and pick "Compute". Turn on your first compute node and make it PXE boot.

Could it be that your compute nodes have 2 NICs, and you've hooked up both with Ethernet cables? If that's the case, then both NICs could be sending DCHP requests. You should hook up only one (at least for now).

Could it also be that you have virtual IPMI cards that are on the same NIC, and are also trying to DHCP? That could explain the 3 responses per node (in which case my guess about your switch sending DHCP requests was wrong).

Bart


This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.

shiva kumar

unread,
Oct 1, 2010, 2:18:50 PM10/1/10
to Discussion of Rocks Clusters
thanks,
i will try this ,
do i need to reinstall on all compute nodes again

 

________________________________
From: Bart Brashers <bbra...@Environcorp.com>
To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
Sent: Fri, October 1, 2010 11:00:20 AM
Subject: Re: [Rocks-Discuss] Need help Insert-ethers

Bart

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101001/f68d8842/attachment.html

Bart Brashers

unread,
Oct 1, 2010, 2:26:41 PM10/1/10
to Discussion of Rocks Clusters
I would, if I were you. You can be more sure of what you're doing/getting if you start over.

Bart

shiva kumar

unread,
Oct 1, 2010, 3:12:35 PM10/1/10
to Discussion of Rocks Clusters

i am using 48 port switch, each time i do insert-ethers it is discovering new
appliance .. how do i stop all the ports for dhcp request..


________________________________
From: Bart Brashers <bbra...@Environcorp.com>
To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>

Sent: Fri, October 1, 2010 11:26:41 AM

Bart

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101002/c68bed32/attachment.html

Bart Brashers

unread,
Oct 1, 2010, 3:39:31 PM10/1/10
to Discussion of Rocks Clusters
Why would you start insert-ether if you DIDN'T want to discover a new appliance? That's what it's for: inserting new appliances into the Rocks database.

Did you do what I suggested about first discovering your switch?

Bart

shiva kumar

unread,
Oct 1, 2010, 4:48:54 PM10/1/10
to Discussion of Rocks Clusters
yes i did,  i removed all the hosts by using "rocks remove host -"   and i
turned off all the compute node and also switch
then ran insert ethers and selected ethernet switch and i turned on switch
it discovered 1  switch  and  and hit f9 and again i did the same thing checking
any more dhcp requests ..that's what below instruction says.. tell me if am
wrong.. it is detectecing few more appliances

i also factory reseted the switch... its the same thing i am getting...

so do you say once switch get ip address for one port through insert-ethers and
if i connect compute node and select compute option there wont be any more
additional hosts ....

________________________________
From: Bart Brashers <bbra...@Environcorp.com>
To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>

Sent: Fri, October 1, 2010 12:39:31 PM

Bart

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101002/4d2e1a21/attachment.html

shiva kumar

unread,
Oct 1, 2010, 5:06:44 PM10/1/10
to Discussion of Rocks Clusters
any configuration to be done in the switch , does it required to configure
: interface vlan 1 and give ip address to switch

 

________________________________
From: Bart Brashers <bbra...@Environcorp.com>
To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>

Sent: Fri, October 1, 2010 12:39:31 PM

Bart

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101002/c302cf20/attachment.html

Bart Brashers

unread,
Oct 1, 2010, 5:06:34 PM10/1/10
to Discussion of Rocks Clusters
> yes i did,  i removed all the hosts by using "rocks remove host -"   and i
> turned off all the compute node and also switch
> then ran insert ethers and selected ethernet switch and i turned on switch
> it discovered 1  switch  and  and hit f9 and again i did the same thing
> checking
> any more dhcp requests ..that's what below instruction says.. tell me if am
> wrong..

Great, so now your switch has an IP address. Did you try to ping it, and ssh to it? Your manual should say what the default password is. You should be able to ssh to it and change settings (manage it). You could also try to load its IP address in a web browser, e.g. http://10.0.254.254 (or whatever IP address it was assigned). Log in somehow and verify that it thinks its IP address is the same, and that it will quit sending periodic DHCP requests. It's only supposed to request an IP address once, during boot.

> it is detecting few more appliances

That means there are other devices on your network that are sending DHCP requests. Are you sure that there's nothing else that's powered up on the switch's network? What else, besides the powered down compute nodes are connected?

It could also mean that the switch didn't accept the IP address it was given, and is still trying to DHCP.



> i also factory reseted the switch... its the same thing i am getting...
>
> so do you say once switch get ip address for one port through insert-ethers

Each DEVICE gets one IP address. The 48 ports on the switch don't each get an IP address. Each port may be connected to a device (a computer, for example) that gets an IP address. The switch should only request and get on IP address.

> and
> if i connect compute node and select compute option there wont be any more
> additional hosts ....

That's what I'm guessing, but I'm not there so can't be sure. If you turn off all the compute nodes, and start insert-ethers and pick "Ethernet", and something new shows up (i.e. something sends a DHCP request) you've got to figure out which device it is.

Bart

Bart Brashers

unread,
Oct 1, 2010, 5:28:00 PM10/1/10
to Discussion of Rocks Clusters
Sorry, but I don't understand your question/statement. Try again to explain what you want to do, or think you need to do.

Bart

shiva kumar

unread,
Oct 2, 2010, 4:51:47 PM10/2/10
to Discussion of Rocks Clusters
when i run rocks "run host command " i am getting message as "remote connection
timed out on host compute-0-0"
i can ssh to that host and i am getting ping response.
any configuration am i missing,
 
command : rocks run host compute-0-0 "/sbin/shutdown -r now"
please help me..


________________________________
From: Bart Brashers <bbra...@Environcorp.com>
To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>

Sent: Fri, October 1, 2010 2:28:00 PM


Bart

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101003/516f2a2c/attachment.html

Reply all
Reply to author
Forward
0 new messages