[Rocks-Discuss] Problem installing nodes.

1,204 views
Skip to first unread message

Matt Karas

unread,
Nov 3, 2008, 3:07:01 PM11/3/08
to npaci-rocks...@sdsc.edu
I just re-installed the front end on my cluster and rebooted the nodes to
re-install them. After it pxe booted, it started the install process, and
asked me where the installation files were and I am unable to install the
nodes. When I first installed rocks on this cluster it did not do this. I
am using rocks 5 with redhat 5. The frontend is on a Dell PowerEdge 2950
and the nodes are on a Dell PowerEdge m1000e blade server.

Thanks for any help,
Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20081103/8c425013/attachment.html

Deependu Saxena

unread,
Nov 3, 2008, 4:51:44 PM11/3/08
to Discussion of Rocks Clusters
Hello Matt,
Can you please check out the connectivity between the computes nodes and the
front end. I think there is some problem related to connectivity.Secondly,
are there any other DHCP server running excep the front node, it may be
possible that compute nodes are getting IP from that DHCP server.


With regards
-Deependus
(community please correct me if I am wrong)

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20081104/901bfb56/attachment.html

Matt Karas

unread,
Nov 4, 2008, 9:26:01 AM11/4/08
to Discussion of Rocks Clusters
Hi,
Before I re-installed the frontend yesterday, I successfully installed both
the frontend and nodes last week. I had to re-install both the frontend and
nodes to test something we are implementing. When the node pxe boots, it
goes to a screen asking me how I want to install the nodes:
- local cd rom
- hard drive
- NFS Image
- FTP
- HTTP

I tried connecting the node through the router and directly to the frontend.
Both come up with the same thing.
There is no other DHCP server running except the frontend. One thing I did
notice was when I first installed the frontend and nodes last week, it was
giving the nodes a 10.*.*.* IP address. When I tried to install the nodes
today, it was giving them a 192.168.*.* IP address.

Thanks for any help,
Matt

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20081104/da5d1ed0/attachment.html

Deependu Saxena

unread,
Nov 4, 2008, 12:59:23 PM11/4/08
to Discussion of Rocks Clusters
hi Matt,
Did u changed the ip address given to the ethernet adapter while installing
front node.By default it given 10.1.1.1/16 to eth0 and 192.168.0.0/16 to
eth1. Check out that, what ip address is given to what ethernet adapter.
Secondly, please check the connection, whether you are connecting to etho or
eth1.By default DHCP starts on eth0. I think you have by mistake did some
changes while giving the ip address to adapters while installing
frontnode.Please check out what ipaddress has been given to eth0 and eth1.


With regards
-Deependus

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20081104/fde1178a/attachment.html

Bart Brashers

unread,
Nov 4, 2008, 1:16:47 PM11/4/08
to Discussion of Rocks Clusters

When a node asks how you want to install the nodes, use the
Ctrl-Alt-F[1234] screen to look for problems. It could be that the
frontend is refusing to over-write (delete) the old partitions on your
compute node, and can't find enough space to install. If so, you can
use "fdisk" or "parted" on the command line available under Ctrl-Alt-F4
(or F5, I can't remember just now) to delete all the partitions, then
try PXE booting again. You could also check what Rocks has listed in
its database for the partitions it expects to find on each compute node
with "rocks list host partition".

Bart

> discussion/attachments/20081104/901bfb56/attachment.html


> > >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL:
> > https://lists.sdsc.edu/pipermail/npaci-rocks-

> discussion/attachments/20081104/da5d1ed0/attachment.html


> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-

> discussion/attachments/20081104/fde1178a/attachment.html


This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.

Juan Carlos Perin

unread,
Nov 4, 2008, 1:21:50 PM11/4/08
to Discussion of Rocks Clusters
Matt,

I had the exact same problem on almost the exact same hardware. What
combination of rolls are you using?

Also run the command: sudo rocks-dist dist and paste the output.

When i did this, it spit out a slew of stuff, essentially pointing out
that the RPMS required for the installation of the nodes was totally
messed up. Initially installation seemed fine, and things were
running, but at some point in the process the RPMs required for nodes
to install correctly were never created. This was using a rocks 5.1
beta install. My solution ended up being, falling back to Rocks 5.0,
including using the OS bundled with it, and I had the system running
in about 1 hour, with all nodes reporting. I had spent all week on
this, and it took an hour to finally fix with the clean install!!

Juan

Juan Carlos Perin

unread,
Nov 4, 2008, 1:25:39 PM11/4/08
to Discussion of Rocks Clusters
BTW. I also tried this suggestion on my systems. I went as far as
wiping out the entire hard drive and leaving it completely blank
before trying again, unsuccesfully.

Matt Karas

unread,
Nov 4, 2008, 1:39:05 PM11/4/08
to Discussion of Rocks Clusters
When I restarted the node to test Bart's suggestion, it came up with an
exception error.

17:59:11 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:16 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:21 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:26 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:31 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error


Here is the output from rocks-dist dist

# rocks-dist dist
Cleaning distribution
Resolving versions (base files)
including "kernel" (5.0,x86_64) roll...
including "java" (5.0,x86_64) roll...
including "Red_Hat_Enterprise_Linux_Server_5" (5.0,x86_64) roll...
including "web-server" (5.0,x86_64) roll...
including "base" (5.0,x86_64) roll...
Including critical RPMS
Resolving versions (RPMs)
including "kernel" (5.0,x86_64) roll...
including "java" (5.0,x86_64) roll...
including "Red_Hat_Enterprise_Linux_Server_5" (5.0,x86_64) roll...
including "web-server" (5.0,x86_64) roll...
including "base" (5.0,x86_64) roll...
Resolving versions (SRPMs)
including "kernel" (5.0,x86_64) roll...
including "java" (5.0,x86_64) roll...
including "Red_Hat_Enterprise_Linux_Server_5" (5.0,x86_64) roll...
including "web-server" (5.0,x86_64) roll...
including "base" (5.0,x86_64) roll...
Creating files (symbolic links - fast)
Applying stage2.img
Applying updates.img
Installing XML Kickstart profiles
installing "web-server" profiles...
installing "base" profiles...
installing "java" profiles...
installing "kernel" profiles...
installing "site" profiles...
Creating repository
making "torrent" files for RPMS
Cleaning distribution
Resolving versions (base files)
including "kernel" (5.0,x86_64) roll...
including "base" (5.0,x86_64) roll...
Including critical RPMS
Resolving versions (RPMs)
including "kernel" (5.0,x86_64) roll...
including "base" (5.0,x86_64) roll...
Resolving versions (SRPMs)
including "kernel" (5.0,x86_64) roll...
including "base" (5.0,x86_64) roll...
Creating files (symbolic links - fast)
Applying stage2.img
Applying updates.img
Installing XML Kickstart profiles
installing "kernel" profiles...
installing "base" profiles...
Creating repository
Linking boot stages from lan
Building Roll Links

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20081104/c5f9136f/attachment.html

Greg Bruno

unread,
Nov 4, 2008, 2:05:34 PM11/4/08
to Discussion of Rocks Clusters
On Tue, Nov 4, 2008 at 6:26 AM, Matt Karas <kara...@gmail.com> wrote:
> Hi,
> Before I re-installed the frontend yesterday, I successfully installed both
> the frontend and nodes last week. I had to re-install both the frontend and
> nodes to test something we are implementing. When the node pxe boots, it
> goes to a screen asking me how I want to install the nodes:
> - local cd rom
> - hard drive
> - NFS Image
> - FTP
> - HTTP
>
> I tried connecting the node through the router and directly to the frontend.
> Both come up with the same thing.
> There is no other DHCP server running except the frontend. One thing I did
> notice was when I first installed the frontend and nodes last week, it was
> giving the nodes a 10.*.*.* IP address. When I tried to install the nodes
> today, it was giving them a 192.168.*.* IP address.

send us the output of:

# rocks list host
# rocks list host interface

- gb

Matt Karas

unread,
Nov 4, 2008, 2:17:49 PM11/4/08
to Discussion of Rocks Clusters
Please ignore the stars I put in.

# rocks list host
HOST MEMBERSHIP CPUS RACK RANK COMMENT
llan: Frontend 1 0 0 -------
compute-0-0: Compute 8 0 0 -------
compute-0-1: Compute 8 0 1 -------

# rocks list host interface

HOST SUBNET IFACE MAC IP NETMASK
GATEWAY MODULE NAME
***: private eth0 00:1e:c9:4c:7a:ef 10.1.1.1 255.0.0.0
------------ bnx2 ***
***: public eth1 00:1e:c9:4c:7a:f1 155.*.*.*
255.255.255.0 155.*.*.* bnx2 *.*.*.*
compute-0-0: private eth0 00:1e:4f:33:0b:6e 10.255.255.254 255.0.0.0
------------ bnx2 compute-0-0
compute-0-0: ------- eth1 00:1e:4f:33:0b:70 --------------
------------- ------------ bnx2 ---------------
compute-0-1: private eth0 00:1e:4f:38:38:da 10.255.255.253 255.0.0.0
------------ bnx2 compute-0-1
compute-0-1: ------- eth1 00:1e:4f:38:38:dc --------------
------------- ------------ bnx2 ---------------

-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20081104/c28cdfcc/attachment.html

Greg Bruno

unread,
Nov 4, 2008, 3:02:50 PM11/4/08
to Discussion of Rocks Clusters
On Tue, Nov 4, 2008 at 11:17 AM, Matt Karas <kara...@gmail.com> wrote:
> Please ignore the stars I put in.
>
> # rocks list host
> HOST MEMBERSHIP CPUS RACK RANK COMMENT
> llan: Frontend 1 0 0 -------
> compute-0-0: Compute 8 0 0 -------
> compute-0-1: Compute 8 0 1 -------
>
> # rocks list host interface
> HOST SUBNET IFACE MAC IP NETMASK
> GATEWAY MODULE NAME
> ***: private eth0 00:1e:c9:4c:7a:ef 10.1.1.1 255.0.0.0
> ------------ bnx2 ***
> ***: public eth1 00:1e:c9:4c:7a:f1 155.*.*.*
> 255.255.255.0 155.*.*.* bnx2 *.*.*.*
> compute-0-0: private eth0 00:1e:4f:33:0b:6e 10.255.255.254 255.0.0.0
> ------------ bnx2 compute-0-0
> compute-0-0: ------- eth1 00:1e:4f:33:0b:70 --------------
> ------------- ------------ bnx2 ---------------
> compute-0-1: private eth0 00:1e:4f:38:38:da 10.255.255.253 255.0.0.0
> ------------ bnx2 compute-0-1
> compute-0-1: ------- eth1 00:1e:4f:38:38:dc --------------
> ------------- ------------ bnx2 ---------------

ok, that fact that you say that when your compute node PXE boot, they
are getting 192.168.X.X IP addresses tells me that you have another
DHCP server on your network that is answering the compute nodes' DHCP
requests.

do you know if there is another ethernet cable that is plugged into
your cluster switch that is connected to your organization's network?

- gb

Matt Karas

unread,
Nov 4, 2008, 3:18:26 PM11/4/08
to Discussion of Rocks Clusters
All that is plugged into the clusters switch is the front end and the other
nodes, which only one of them is on. I'm building this cluster to be on its
own private local network, not connected to anything else.

-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20081104/8355bb71/attachment.html

Amato Umberto

unread,
Nov 4, 2008, 5:25:47 PM11/4/08
to Discussion of Rocks Clusters
I'm experiencing the same problem just now (Rocks version 5.0). I also
installed successfully the frontend and a couple of compute nodes. Then I
decided to try different configurations of the same compute nodes
(different partitions), so I deleted the compute nodes from the database
(with rocks remove host myhostname) instead of re-installing the compute
nodes with the new distribution. What happens is what is mentioned by
Matt. Then, to be sure that no track is left of the previous installation,
I (low level) formatted the disk on the compute nodes. Now when I issue on
the frontend the command insert-ethers and I turn on a compute node, what
happens is that the compute node does not receive any DHCP offer from the
frontend and therefore pxe boot is neversuccessfull. In practice my
experience is that after I have installed for the first time a compute
node, then if for some reason I remove the node from the database, then
insert-ethers is not able anymore to offer DHCP request to that node. When
I add a new (that is, never installed) compute node on the same network
through insert-ethers, it works fine; as soon as I delete that host from
the database (and even "clean" its disk from any track of the previous
installation) insert-ethers is no longer able to see that node. At moment
I will solve re-installing the frontend from the beginning and then the
compute nodes with the configuration I found in the while.
Thank you for any help
Umberto

Umberto Amato
Istituto per le Applicazioni del calcolo 'Mauro Picone' CNR
Via Pietro Castellino 111
80131 Napoli (Italy)

Juan Carlos Perin

unread,
Nov 3, 2008, 6:13:17 PM11/3/08
to Discussion of Rocks Clusters
Matt,

I just had this same problem. What combination of rolls are you
trying to install? In my case, although installation of 5.1 beta
seemed to be ok, i was missing the actual required files for node
installation to occur. I was getting the same request for an install
location, on the nodes. I decided to use rocks 5.0 and everything
went perfectly well.

Juan

Greg Bruno

unread,
Nov 4, 2008, 6:04:11 PM11/4/08
to Discussion of Rocks Clusters
On Tue, Nov 4, 2008 at 2:25 PM, Amato Umberto <U.A...@na.iac.cnr.it> wrote:
> I'm experiencing the same problem just now (Rocks version 5.0). I also
> installed successfully the frontend and a couple of compute nodes. Then I
> decided to try different configurations of the same compute nodes
> (different partitions), so I deleted the compute nodes from the database
> (with rocks remove host myhostname) instead of re-installing the compute
> nodes with the new distribution. What happens is what is mentioned by
> Matt. Then, to be sure that no track is left of the previous installation,
> I (low level) formatted the disk on the compute nodes. Now when I issue on
> the frontend the command insert-ethers and I turn on a compute node, what
> happens is that the compute node does not receive any DHCP offer from the
> frontend and therefore pxe boot is neversuccessfull. In practice my
> experience is that after I have installed for the first time a compute
> node, then if for some reason I remove the node from the database, then
> insert-ethers is not able anymore to offer DHCP request to that node. When
> I add a new (that is, never installed) compute node on the same network
> through insert-ethers, it works fine; as soon as I delete that host from
> the database (and even "clean" its disk from any track of the previous
> installation) insert-ethers is no longer able to see that node. At moment
> I will solve re-installing the frontend from the beginning and then the
> compute nodes with the configuration I found in the while.

after you execute 'rocks remove host', then execute:

# rocks sync config

then you will need to 'rediscover' your nodes that you removed with
'insert-ethers':

# insert-ethers

then PXE boot the removed nodes.

- gb

Hamilton, Scott L.

unread,
Nov 5, 2008, 9:01:29 AM11/5/08
to Discussion of Rocks Clusters
Crazy idea, but by chance is your switch a Linksys. If so it may be a
router/switch with a factor default configuration to hand out DHCP
addresses. My Linksys Router gives out ip addresses in that range. I
am not sure if there are other inexpensive switches out there that might
do the same.

Scott


-----Original Message-----
From: npaci-rocks-dis...@sdsc.edu
[mailto:npaci-rocks-dis...@sdsc.edu] On Behalf Of Matt Karas
Sent: Tuesday, November 04, 2008 2:18 PM
To: Discussion of Rocks Clusters
Subject: Re: [Rocks-Discuss] Problem installing nodes.

Hamilton, Scott L.

unread,
Nov 5, 2008, 9:03:40 AM11/5/08
to Discussion of Rocks Clusters
Tyr running "rocks sync config" after removing the node. It will probably solve your problem.

Scott



-----Original Message-----
From: npaci-rocks-dis...@sdsc.edu [mailto:npaci-rocks-dis...@sdsc.edu] On Behalf Of Amato Umberto
Sent: Tuesday, November 04, 2008 4:26 PM
To: Discussion of Rocks Clusters
Subject: Re: [Rocks-Discuss] Problem installing nodes.

Matt Karas

unread,
Nov 5, 2008, 9:06:59 AM11/5/08
to Discussion of Rocks Clusters
The switch I am using is a Nortel Networks - Baystack 5510-48T Switch.

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20081105/dc4dda41/attachment.html

Hamilton, Scott L.

unread,
Nov 5, 2008, 9:26:07 AM11/5/08
to Discussion of Rocks Clusters
The Nortel Baystack 5510 does have the ability to act as a DHCP server.
I am not sure what the default addresses it would hand out are, or even
if it has a default DHCP configuration, but I would check your switch
config. Also check the web for PXE boot problems and Nortel Switches.
Release 4.5.2.4 of their IOS introduces some new issues to PXE boot. We
use Nortel switches here and have had endless problems with switch
configurations affecting the connected systems in seemingly random ways.
Our network problems almost always stem from a switch configuration
problem. I am not a member of the networking team that manages the
switches, and know very little about the Nortels. My previous job was
primarily a Cisco shop when it came to network hardware.

Matt Karas

unread,
Nov 5, 2008, 9:45:42 AM11/5/08
to Discussion of Rocks Clusters
I seem to be running into a different error now. An unhandled exception has
occurred. This happens just after it says its running the pre-install
scripts. As I'm watching the Inserted Appliances screen to see if it finds
the new nodes, it finds the new mac addresses, then in the parentheses it
goes from ( ) to (403) to (*)

The only thing I can find in the error report is the following:

17:59:11 WARNING : Unable to find temp path, going to use ramfs path


17:59:11 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:16 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:21 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:26 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:31 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error


17:59:40 WARNING :
/usr/lib/python2.4/site-packages/pykickstart/parser.py:298:
DeprecationWarning: Ignoring deprecated command on line 65: The mouse
command has been deprecated and no longer has any effect. It may be
removed from future releases, which will result in a fatal error from
kickstart. Please modify your kickstart file to remove this command.
warnings.warn(_("Ignoring deprecated command on line %(lineno)s:
The %(cmd)s command has been deprecated and no longer has any effect.
It may be removed from future releases, which will result in a fatal
error from kickstart. Please modify your kickstart file to remove
this command.") % mapping, DeprecationWarning)

17:59:40 WARNING : step installtype does not exist
17:59:40 WARNING : step complete does not exist
17:59:40 WARNING : step complete does not exist
17:59:40 WARNING : step complete does not exist
17:59:40 WARNING : step complete does not exist
17:59:40 WARNING : step complete does not exist
17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES-en_US.UTF-8.html:


HTTP Error 500: Internal Server Error

17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES.en_US.UTF-8:


HTTP Error 500: Internal Server Error

17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES-en_US.html:


HTTP Error 500: Internal Server Error

17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES.en_US:


HTTP Error 500: Internal Server Error

17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES-en.html:


HTTP Error 500: Internal Server Error

17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES.en: HTTP


Error 500: Internal Server Error

17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES-C.html:


HTTP Error 500: Internal Server Error

17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES.C: HTTP


Error 500: Internal Server Error

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20081105/33818582/attachment.html

Matt Karas

unread,
Nov 6, 2008, 1:07:16 PM11/6/08
to Discussion of Rocks Clusters
Does anyone have any ideas on how to fix this?

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20081106/7522acb6/attachment.html

Brandon Davidson

unread,
Nov 6, 2008, 1:54:49 PM11/6/08
to Discussion of Rocks Clusters
Hi Matt,

Some files that the installer looks for legitimately don't exist - disc1,
discinfo, release notes, etc... so that's not necessarily indicative of any real
problem. I have seen the rocks HTTP server give 500 errors if it can't reach the
bittorrent tracker running on the frontend. Try doing:

/etc/init.d/rocks-bittorrent stop
/etc/init.d/rocks-bittorrent start

Rocks guys:

1) how about status/restart actions on the rocks-bittorrent service :)

2) Am I accurate in saying that the node HTTP server will error if the tracker
isn't running? Would it perhaps be more useful if it instead failed back to a
direct download from the frontend, or (if that would be too much effort) it
could just send a 302 with the URL of the file on the frontend?

3) It looks like the retry that you guys implemented to make node downloads more
resilient to overloaded frontends also takes effect if the frontend returns a
404 error. Would this maybe make sense to do only if the server returns a
500-level error code? It's not like the file is going to exist if the client
asks enough times...

-Brandon

Matt Karas wrote:
> Does anyone have any ideas on how to fix this?
>
> An unhandled exception has occurred. This happens just after it says its
> running the pre-install scripts. As I'm watching the Inserted Appliances
> screen to see if it finds the new nodes, it finds the new mac addresses,
> then in the parentheses it goes from ( ) to (403) to (*)
>
> The only thing I can find in the error report is the following:
>
> 17:59:11 WARNING : Unable to find temp path, going to use ramfs path
> 17:59:11 CRITICAL: IOError 14 occurred getting
> http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
> Error 500: Internal Server Error


--
Brandon Davidson
Systems Administrator
University of Oregon Neuroinformatics Center
(541) 346-2417 bran...@uoregon.edu
Key Fingerprint 1F08 A331 78DF 1EFE F645 8AE5 8FBE 4147 E351 E139

Matt Karas

unread,
Nov 6, 2008, 2:55:07 PM11/6/08
to Discussion of Rocks Clusters
The bittorent tracker wasn't the problem. I fixed this problem by logging
into each node and deleting the .rocks-release file from the / directory.

How can I have the /.rocks-release file deleted from each node automatically
when the pre-install scripts are ran? I don't want to delete this file each
time I need to re-install a node.

Thanks,
Matt


-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20081106/37b1acca/attachment.html

Greg Bruno

unread,
Nov 6, 2008, 3:13:54 PM11/6/08
to Discussion of Rocks Clusters
On Thu, Nov 6, 2008 at 10:54 AM, Brandon Davidson <bran...@uoregon.edu> wrote:
> Hi Matt,
>
> Some files that the installer looks for legitimately don't exist - disc1,
> discinfo, release notes, etc... so that's not necessarily indicative of any
> real problem. I have seen the rocks HTTP server give 500 errors if it can't
> reach the bittorrent tracker running on the frontend. Try doing:
>
> /etc/init.d/rocks-bittorrent stop
> /etc/init.d/rocks-bittorrent start
>
> Rocks guys:
>
> 1) how about status/restart actions on the rocks-bittorrent service :)

we can do that.

> 2) Am I accurate in saying that the node HTTP server will error if the
> tracker isn't running? Would it perhaps be more useful if it instead failed
> back to a direct download from the frontend, or (if that would be too much
> effort) it could just send a 302 with the URL of the file on the frontend?

if an installing node can't get a torrent file for an RPM, then it
downloads the RPM directly from the frontend.

> 3) It looks like the retry that you guys implemented to make node downloads
> more resilient to overloaded frontends also takes effect if the frontend
> returns a 404 error. Would this maybe make sense to do only if the server
> returns a 500-level error code? It's not like the file is going to exist if
> the client asks enough times...

we use 'wget' to get the torrents and their respective files. if wget
receives a 404, it doesn't retry.

- gb

Greg Bruno

unread,
Nov 6, 2008, 3:17:58 PM11/6/08
to Discussion of Rocks Clusters
On Thu, Nov 6, 2008 at 11:55 AM, Matt Karas <kara...@gmail.com> wrote:
> The bittorent tracker wasn't the problem. I fixed this problem by logging
> into each node and deleting the .rocks-release file from the / directory.
>
> How can I have the /.rocks-release file deleted from each node automatically
> when the pre-install scripts are ran? I don't want to delete this file each
> time I need to re-install a node.

you could create a replace-partition.xml node file and put some code
inside a <pre> section to remove .rocks-release. here's how to create
a replace-partition.xml node file:

http://www.rocksclusters.org/roll-documentation/base/5.0/customization-partitioning.html

keep in mind, in a pre section, the partitions are not mounted, so
you'll need to write code to mount each partition before you try to
remove .rocks-release.

- gb

Reply all
Reply to author
Forward
0 new messages