Thanks for any help,
Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20081103/8c425013/attachment.html
With regards
-Deependus
(community please correct me if I am wrong)
I tried connecting the node through the router and directly to the frontend.
Both come up with the same thing.
There is no other DHCP server running except the frontend. One thing I did
notice was when I first installed the frontend and nodes last week, it was
giving the nodes a 10.*.*.* IP address. When I tried to install the nodes
today, it was giving them a 192.168.*.* IP address.
Thanks for any help,
Matt
With regards
-Deependus
Bart
> discussion/attachments/20081104/901bfb56/attachment.html
> > >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL:
> > https://lists.sdsc.edu/pipermail/npaci-rocks-
> discussion/attachments/20081104/da5d1ed0/attachment.html
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-
> discussion/attachments/20081104/fde1178a/attachment.html
This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.
I had the exact same problem on almost the exact same hardware. What
combination of rolls are you using?
Also run the command: sudo rocks-dist dist and paste the output.
When i did this, it spit out a slew of stuff, essentially pointing out
that the RPMS required for the installation of the nodes was totally
messed up. Initially installation seemed fine, and things were
running, but at some point in the process the RPMs required for nodes
to install correctly were never created. This was using a rocks 5.1
beta install. My solution ended up being, falling back to Rocks 5.0,
including using the OS bundled with it, and I had the system running
in about 1 hour, with all nodes reporting. I had spent all week on
this, and it took an hour to finally fix with the clean install!!
Juan
17:59:11 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:16 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:21 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:26 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:31 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
Here is the output from rocks-dist dist
# rocks-dist dist
Cleaning distribution
Resolving versions (base files)
including "kernel" (5.0,x86_64) roll...
including "java" (5.0,x86_64) roll...
including "Red_Hat_Enterprise_Linux_Server_5" (5.0,x86_64) roll...
including "web-server" (5.0,x86_64) roll...
including "base" (5.0,x86_64) roll...
Including critical RPMS
Resolving versions (RPMs)
including "kernel" (5.0,x86_64) roll...
including "java" (5.0,x86_64) roll...
including "Red_Hat_Enterprise_Linux_Server_5" (5.0,x86_64) roll...
including "web-server" (5.0,x86_64) roll...
including "base" (5.0,x86_64) roll...
Resolving versions (SRPMs)
including "kernel" (5.0,x86_64) roll...
including "java" (5.0,x86_64) roll...
including "Red_Hat_Enterprise_Linux_Server_5" (5.0,x86_64) roll...
including "web-server" (5.0,x86_64) roll...
including "base" (5.0,x86_64) roll...
Creating files (symbolic links - fast)
Applying stage2.img
Applying updates.img
Installing XML Kickstart profiles
installing "web-server" profiles...
installing "base" profiles...
installing "java" profiles...
installing "kernel" profiles...
installing "site" profiles...
Creating repository
making "torrent" files for RPMS
Cleaning distribution
Resolving versions (base files)
including "kernel" (5.0,x86_64) roll...
including "base" (5.0,x86_64) roll...
Including critical RPMS
Resolving versions (RPMs)
including "kernel" (5.0,x86_64) roll...
including "base" (5.0,x86_64) roll...
Resolving versions (SRPMs)
including "kernel" (5.0,x86_64) roll...
including "base" (5.0,x86_64) roll...
Creating files (symbolic links - fast)
Applying stage2.img
Applying updates.img
Installing XML Kickstart profiles
installing "kernel" profiles...
installing "base" profiles...
Creating repository
Linking boot stages from lan
Building Roll Links
send us the output of:
# rocks list host
# rocks list host interface
- gb
# rocks list host
HOST MEMBERSHIP CPUS RACK RANK COMMENT
llan: Frontend 1 0 0 -------
compute-0-0: Compute 8 0 0 -------
compute-0-1: Compute 8 0 1 -------
# rocks list host interface
HOST SUBNET IFACE MAC IP NETMASK
GATEWAY MODULE NAME
***: private eth0 00:1e:c9:4c:7a:ef 10.1.1.1 255.0.0.0
------------ bnx2 ***
***: public eth1 00:1e:c9:4c:7a:f1 155.*.*.*
255.255.255.0 155.*.*.* bnx2 *.*.*.*
compute-0-0: private eth0 00:1e:4f:33:0b:6e 10.255.255.254 255.0.0.0
------------ bnx2 compute-0-0
compute-0-0: ------- eth1 00:1e:4f:33:0b:70 --------------
------------- ------------ bnx2 ---------------
compute-0-1: private eth0 00:1e:4f:38:38:da 10.255.255.253 255.0.0.0
------------ bnx2 compute-0-1
compute-0-1: ------- eth1 00:1e:4f:38:38:dc --------------
------------- ------------ bnx2 ---------------
-------------- next part --------------
An HTML attachment was scrubbed...
ok, that fact that you say that when your compute node PXE boot, they
are getting 192.168.X.X IP addresses tells me that you have another
DHCP server on your network that is answering the compute nodes' DHCP
requests.
do you know if there is another ethernet cable that is plugged into
your cluster switch that is connected to your organization's network?
- gb
-------------- next part --------------
An HTML attachment was scrubbed...
Umberto Amato
Istituto per le Applicazioni del calcolo 'Mauro Picone' CNR
Via Pietro Castellino 111
80131 Napoli (Italy)
I just had this same problem. What combination of rolls are you
trying to install? In my case, although installation of 5.1 beta
seemed to be ok, i was missing the actual required files for node
installation to occur. I was getting the same request for an install
location, on the nodes. I decided to use rocks 5.0 and everything
went perfectly well.
Juan
after you execute 'rocks remove host', then execute:
# rocks sync config
then you will need to 'rediscover' your nodes that you removed with
'insert-ethers':
# insert-ethers
then PXE boot the removed nodes.
- gb
Scott
-----Original Message-----
From: npaci-rocks-dis...@sdsc.edu
[mailto:npaci-rocks-dis...@sdsc.edu] On Behalf Of Matt Karas
Sent: Tuesday, November 04, 2008 2:18 PM
To: Discussion of Rocks Clusters
Subject: Re: [Rocks-Discuss] Problem installing nodes.
The only thing I can find in the error report is the following:
17:59:11 WARNING : Unable to find temp path, going to use ramfs path
17:59:11 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:16 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:21 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:26 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:31 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
Error 500: Internal Server Error
17:59:40 WARNING :
/usr/lib/python2.4/site-packages/pykickstart/parser.py:298:
DeprecationWarning: Ignoring deprecated command on line 65: The mouse
command has been deprecated and no longer has any effect. It may be
removed from future releases, which will result in a fatal error from
kickstart. Please modify your kickstart file to remove this command.
warnings.warn(_("Ignoring deprecated command on line %(lineno)s:
The %(cmd)s command has been deprecated and no longer has any effect.
It may be removed from future releases, which will result in a fatal
error from kickstart. Please modify your kickstart file to remove
this command.") % mapping, DeprecationWarning)
17:59:40 WARNING : step installtype does not exist
17:59:40 WARNING : step complete does not exist
17:59:40 WARNING : step complete does not exist
17:59:40 WARNING : step complete does not exist
17:59:40 WARNING : step complete does not exist
17:59:40 WARNING : step complete does not exist
17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES-en_US.UTF-8.html:
HTTP Error 500: Internal Server Error
17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES.en_US.UTF-8:
HTTP Error 500: Internal Server Error
17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES-en_US.html:
HTTP Error 500: Internal Server Error
17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES.en_US:
HTTP Error 500: Internal Server Error
17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES-en.html:
HTTP Error 500: Internal Server Error
17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES.en: HTTP
Error 500: Internal Server Error
17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES-C.html:
HTTP Error 500: Internal Server Error
17:59:40 CRITICAL: IOError 14 occurred getting
http://127.0.0.1/install/rocks-dist/lan/x86_64/RELEASE-NOTES.C: HTTP
Error 500: Internal Server Error
Some files that the installer looks for legitimately don't exist - disc1,
discinfo, release notes, etc... so that's not necessarily indicative of any real
problem. I have seen the rocks HTTP server give 500 errors if it can't reach the
bittorrent tracker running on the frontend. Try doing:
/etc/init.d/rocks-bittorrent stop
/etc/init.d/rocks-bittorrent start
Rocks guys:
1) how about status/restart actions on the rocks-bittorrent service :)
2) Am I accurate in saying that the node HTTP server will error if the tracker
isn't running? Would it perhaps be more useful if it instead failed back to a
direct download from the frontend, or (if that would be too much effort) it
could just send a 302 with the URL of the file on the frontend?
3) It looks like the retry that you guys implemented to make node downloads more
resilient to overloaded frontends also takes effect if the frontend returns a
404 error. Would this maybe make sense to do only if the server returns a
500-level error code? It's not like the file is going to exist if the client
asks enough times...
-Brandon
Matt Karas wrote:
> Does anyone have any ideas on how to fix this?
>
> An unhandled exception has occurred. This happens just after it says its
> running the pre-install scripts. As I'm watching the Inserted Appliances
> screen to see if it finds the new nodes, it finds the new mac addresses,
> then in the parentheses it goes from ( ) to (403) to (*)
>
> The only thing I can find in the error report is the following:
>
> 17:59:11 WARNING : Unable to find temp path, going to use ramfs path
> 17:59:11 CRITICAL: IOError 14 occurred getting
> http://127.0.0.1/install/rocks-dist/lan/x86_64//disc1/.discinfo: HTTP
> Error 500: Internal Server Error
--
Brandon Davidson
Systems Administrator
University of Oregon Neuroinformatics Center
(541) 346-2417 bran...@uoregon.edu
Key Fingerprint 1F08 A331 78DF 1EFE F645 8AE5 8FBE 4147 E351 E139
How can I have the /.rocks-release file deleted from each node automatically
when the pre-install scripts are ran? I don't want to delete this file each
time I need to re-install a node.
Thanks,
Matt
-------------- next part --------------
An HTML attachment was scrubbed...
we can do that.
> 2) Am I accurate in saying that the node HTTP server will error if the
> tracker isn't running? Would it perhaps be more useful if it instead failed
> back to a direct download from the frontend, or (if that would be too much
> effort) it could just send a 302 with the URL of the file on the frontend?
if an installing node can't get a torrent file for an RPM, then it
downloads the RPM directly from the frontend.
> 3) It looks like the retry that you guys implemented to make node downloads
> more resilient to overloaded frontends also takes effect if the frontend
> returns a 404 error. Would this maybe make sense to do only if the server
> returns a 500-level error code? It's not like the file is going to exist if
> the client asks enough times...
we use 'wget' to get the torrents and their respective files. if wget
receives a 404, it doesn't retry.
- gb
you could create a replace-partition.xml node file and put some code
inside a <pre> section to remove .rocks-release. here's how to create
a replace-partition.xml node file:
http://www.rocksclusters.org/roll-documentation/base/5.0/customization-partitioning.html
keep in mind, in a pre section, the partitions are not mounted, so
you'll need to write code to mount each partition before you try to
remove .rocks-release.
- gb