I had all of the nodes and FE working fine before the weekend, however I
rebooted the front end and after the reboot, I couldnt ssh into any compute
node, I'd get the error:
ssh: connect to host compute-0-0 port 22: Connection refused
So I manually KVM'ed (verb?) into compute-0-0 and saw the error message:
Unable to locate partition mapper/nvidia_afjdheif3 to use for . Press 'OK'
to reboot system.
In which I did, but after restart, the node just looped to the same error
message. eth0 is connected and running fine in the FE. Are the FE and nodes
not talking and why would i be getting this error message?
Mike
--
Michael Vandewege, Ph.D. Student
Graduate Research Assistant
Dept. of Biochemistry and Molecular Biology
Mississippi State University
Mississippi State, MS 39762
Email: mike.va...@gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20120130/dfbfc8fe/attachment.html
Maybe it's a good idea to deactivate it on the BOIS and re-install all
the compute nodes.
To reinstall all the compute node you can execute on the FE:
rocks set host boot compute action=install
and then restart the compute node
Rebooting the FE should not create any problem, on a properly
installed cluster.
Before rebooting the FE I would try rebooting a compute node (just to
very the compute node can reboot properly).
Sincerely,
Luca
Mike