We have recently setup a local Rocks 5.0 cluster and have been having
problems reinstalling the nodes. After a power failure that brought down
several of the nodes, all of the nodes but one was able to kickstart
correctly from the frontend. The one node would stop at the "select
language" screen. After this happened, other nodes have needed to be
restarted, but they don't kickstart correctly anymore. Kickstarting
works if I use the install CD to boot the nodes or if I remove the node
completely and add it back with insert-ethers. I have tried to read all
the posts on other users having problems with kickstarting, but I cannot
find the problem. The second console on the failed installation node says
ROCKS:rocksNetworkUp:no network devices in choose network device!
got to setupCdrom without a CD device
which may indicate that it cannot find a network device to use (this
nodes use forcedeth device) but it says it loaded the modules correctly
before it gave the error.
Things that I have tried:
- Remove rocks-dist folder and recreated distribution
- There are no customized xml files
The frontend node is a 64 bit AMD and the nodes are 64 bit AMD as well.
The installed rolls are the ones included in the jumbo dvd (base, bio,
ganglia, hpc, java, kernel, os, sge, web-server, xen)
Any help or suggestions would be greatly appreciated.
Sincerely,
Juan M. Vanegas
Biophysics Graduate Group
University of California, Davis
what is the output of
# rocks list host pxeboot
-a
I've seen this happen when 'insert-ethers' dies. In my case it
died because I didn't run 'rocks sync config' after removing
a node from the database (using rocks remove host).
I don't fully understand this either.
Cordially,
--
Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
94720-1460
510-643-1032
jlfo...@berkeley.edu
Thanks,
Scott
> Juan M Vanegas wrote:
>
>> Hi,
>>
>> We have recently setup a local Rocks 5.0 cluster and have been having
>> problems reinstalling the nodes. After a power failure that brought down
>> several of the nodes, all of the nodes but one was able to kickstart
>> correctly from the frontend. The one node would stop at the "select
>> language" screen.
>>
>
> I've seen this happen when 'insert-ethers' dies. In my case it
> died because I didn't run 'rocks sync config' after removing
> a node from the database (using rocks remove host).
That's an error on insert-ethers and has been fixed in CVS for the next
release. The problem happens because
of inconsistency between the DB and the config files on disk. The "Fix"
for now is to run "rocks sync config" before
you run insert-ethers -- OR run "rocks sync config" after you
added/removed/modified nodes using rocks commands.
-P
>
> I don't fully understand this either.
>
> Cordially,
> --
> Jon Forrest
> Research Computing Support
> College of Chemistry
> 173 Tan Hall
> University of California Berkeley
> Berkeley, CA
> 94720-1460
> 510-643-1032
> jlfo...@berkeley.edu
>
>
--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20080807/7b789d93/attachment.html
Juan
As far as I know, this is normal behavior. After installing, the system
assumes you want to boot from the local hard drive. However, the PXE
boot action isn't the only way that the nodes can install. All the "os"
action is doing is telling the node to boot from it's local hard drive.
If the node was shut down uncleanly, that local hard drive will still
have the appropriate kernel and GRUB configuration in place to start the
install process. However, if the hard drive's partition table, etc.,
are messed up, and it's not bootable anymore, then you'd need to do a
purely PXE install to get them up again. It really depends on the
damage that the outage caused.
--
Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu