I attempted to upgrade our cluster from 5.2 to 5.3 this evening using the restore roll procedure. At this time addition to the kernel, core, os1 - os7 disks I also added the bio roll and the area51 roll. This last one only because although the roll was not installed, I could see area51 related directories in the file system, so I assumed that roll was removed much earlier.
After reboot I did not get the graphical CentOs display to log in. Instead of it I could see the command line login screen flashed on the monitor about four times, then just a cursur line and finally I could see:
INIT: version 2.86 reloading.
At this point hitting ctrl cmd F1 - F5 gives mo the login prompt, but if I try to login it kicks me out immediately. Ctrl cmd F6 brings back the INIT... screen.
When I try to login remotely as root I see this in the terminal:
Last login: Tue Jul 12 20:39:47 2011
Rocks 5.3 (Rolled Tacos)
Profile built 21:54 12-Jul-2011
Kickstarted 20:13 12-Jul-2011
-----------------------------
Rocks 5.2 (Chimichanga)
Profile built 22:25 07-Mar-2011
Kickstarted 18:42 07-Mar-2011
-----------------------------
Rocks 5.1 (V.I)
Profile built 16:14 11-Mar-2009
Kickstarted 13:39 11-Mar-2009
/bin/bash: Permission denied
Connection to plexus.med.yale.edu closed.
Any good hint what went wrong and how to fix it ?
Thanks ahead,
János
Additional info:
Yesterday I also added to the /export/site-roll/rocks/src/roll/restore/version.mk the following files:
/export/site-roll/rocks/src/roll/restore/version.mk
/etc/auto.master
/etc/auto.share
/etc/autofs_ldap_auth.conf
/etc/gconf/2/evoldap.conf
/etc/ldap.conf
/etc/openldap/ldap.conf
/etc/krb.conf
/etc/krb5.conf
/share/apps/create_vol.sh
and followed the guide to do the upgrade. This morning I booted into "build rescue" mode, did chroot /mnt/sysimage and did the following:
1. Run system-config-authenticate --test, followed by system-config-authenticate --update and rebooted by exiting from the shells. Nothing happened I still unable to log in.
2. Affixed the above listed files in the /etc directory with the ".old" affix, that is /etc/auto.master became /etc/auto.master.old, etc... Rebooted the machine by exiting the shells again. Nothing changed, I still unable to login as root or any other user.
Looks to me something is bad with the Xwindow system. On my VmWare based test Rocks cluster the cli based login flashes for a moment, then goes to the GUI to ask for username and password. On this production cluster the cli based login flashes 4 times, then goes to this "INIT: version 2.86 reloading." message with the black background. <ctrl>+<cmd>+<F1...F6> bring this cli based login prompt forward where I am unable to login.
My next step is to to look into any logs I can find and after it redo the upgrade using the restore roll, but without the bio and area51 rolls, although I do not see how they could create this problem.
Even when I do chroot /mnt/sysimage, I am still unable to log into the machine either as myself or as root remotely.
Any good advice is highly appreciated.
Thanks ahead,
János
Final info. Re-done the install without the area51 and bio rolls. Everything worked flawlessly :-) After the upgrade I installed the 5.3 Bio Roll without a problem. As a bonus, now the R410 I had so much problem to bring into the 5.2 cluster, is now a happy member of the 5.3 cluster :-)
Thank you all !!
János