[Rocks-Discuss] Error with rocks sync users

1,026 views
Skip to first unread message

Snyder, David

unread,
Sep 1, 2009, 2:46:29 PM9/1/09
to Discussion of Rocks Clusters
Hello Everybody! It¹s me again with yet another SNAFU.

I have recently added some users to our system so I want to run rocks sync
users to propagate their user information to the compute nodes. But when
the sys-admin-type person ran rocks sync users (logged in as root,
presumably using the root password he used when installing the cluster
software, although I cannot be certain of this ‹ he may have changed the
root password post-installation), he got a bunch of errors (I¹ve listed them
below following a single asterisk *).

I suspected that the problem was that the nodes were not logged into so
hence the files /root/.ssh/id_rsa were not set up on the compute nodes
(indeed those files still don¹t exist on the compute nodes).

Anyway, I asked again for the root password from our sys-admin-type person
and he pointed out that I could change the root password using my sudo
privileges on the front-end node (I have no such privileges on the compute
nodes, how would I set them up?). But I still cannot log in successfully as
root to the compute nodes (I guess because the root password is unchanged on
those nodes), and hence rocks sync users hangs without doing anything.

Am I correct that I need to log in SUCCESSFULLY as root into each of the
compute nodes before running rocks sync users? How do I reset the root
password on the compute nodes to match the password on the frontend node (as
well as give myself sudo privileges on the compute nodes) so I can log into
each compute node so that way the /root/.ssh/id_rsa files are written for
each compute node and that way I can successfully run rocks sync users?

Of course, that assumes I am correct about what¹s going wrong with rocks
sync users. Am I on the right track in interpreting the error codes given?

Also, while I¹m asking ... I installed some additional software in /opt/ on
the frontend. How do I ³sync² (as it were) the opt directory so that the
software I¹ve installed is accessible on all the compute nodes (in the same
way as the software that came, e.g., in the rocks bio roll is so accessible
across all the nodes)? Another odd thing is that for some reason when the
sys-admin-type-person added himself and me as users our user directories are
right off of the root directory (just like root) so that we have, e.g.,
/snyderd as a directory, but the users I have added are in /home/ (e.g.
/home/dasstudent/) and somehow mirrored, of course, in /export/home/. Will
it be a problem to sync the users since I don¹t have any sort of directory
listed under /export/home/ ?

Thanks again for all your help.
- David Snyder

*

[root@spock ~]# rocks sync users
Enter passphrase for key '/root/.ssh/id_rsa': Enter passphrase for key
'/root/.ssh/id_rsa': Enter passphrase for key '/root/.ssh/id_rsa': Enter
passphrase for key '/root/.ssh/id_rsa':
Enter passphrase for key '/root/.ssh/id_rsa':
root@compute-0-0's password:
root@compute-0-0's password:
root@compute-0-2's password:
root@compute-0-2's password:
root@compute-0-3's password:
root@compute-0-3's password:
root@compute-0-4's password:
root@compute-0-0's password:
root@compute-0-2's password:
root@compute-0-3's password:
root@compute-0-4's password:



make: Entering directory `/var/411'
rm -rf /etc/411.d/*
make
make[1]: Entering directory `/var/411'
/opt/rocks/sbin/411put --comment="#" /etc/auto.home
411 Wrote: /etc/411.d/etc.auto..home
Size: 664/313 bytes (encrypted/plain)
Alert: sent on channel 255.255.255.255:8649 with master 192.168.10.200

/opt/rocks/sbin/411put --comment="#" /etc/auto.master
411 Wrote: /etc/411.d/etc.auto..master
Size: 6715/4792 bytes (encrypted/plain)
Alert: sent on channel 255.255.255.255:8649 with master 192.168.10.200

/opt/rocks/sbin/411put --comment="#" /etc/auto.misc
411 Wrote: /etc/411.d/etc.auto..misc
Size: 1462/905 bytes (encrypted/plain)
Alert: sent on channel 255.255.255.255:8649 with master 192.168.10.200

/opt/rocks/sbin/411put --comment="#" /etc/auto.net
411 Wrote: /etc/411.d/etc.auto..net
Size: 2739/1852 bytes (encrypted/plain)
Alert: sent on channel 255.255.255.255:8649 with master 192.168.10.200

/opt/rocks/sbin/411put --comment="#" /etc/auto.share
411 Wrote: /etc/411.d/etc.auto..share
Size: 502/194 bytes (encrypted/plain)
Alert: sent on channel 255.255.255.255:8649 with master 192.168.10.200

/opt/rocks/sbin/411put --comment="#" /etc/auto.smb
411 Wrote: /etc/411.d/etc.auto..smb
Size: 1702/1084 bytes (encrypted/plain)
Alert: sent on channel 255.255.255.255:8649 with master 192.168.10.200

/opt/rocks/sbin/411put --nocomment /etc/passwd
411 Wrote: /etc/411.d/etc.passwd
Size: 13146/9555 bytes (encrypted/plain)
Alert: sent on channel 255.255.255.255:8649 with master 192.168.10.200

/opt/rocks/sbin/411put --nocomment /etc/group
411 Wrote: /etc/411.d/etc.group
Size: 10824/7838 bytes (encrypted/plain)
Alert: sent on channel 255.255.255.255:8649 with master 192.168.10.200

/opt/rocks/sbin/411put --nocomment /etc/shadow
411 Wrote: /etc/411.d/etc.shadow
Size: 3063/2090 bytes (encrypted/plain)
Alert: sent on channel 255.255.255.255:8649 with master 192.168.10.200

make[1]: Leaving directory `/var/411'
make: Leaving directory `/var/411'
### compute-0-0(stat: 255, dur(s): 99.94):
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-with-mic,password).
### compute-0-2(stat: 255, dur(s): 100.27):
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-with-mic,password).
### compute-0-3(stat: 255, dur(s): 131.02):
Permission denied, please try again.
Permission denied, please try again.
Connection closed by 192.168.10.252
### compute-0-4(stat: 255, dur(s): 157.23):
Permission denied, please try again.
Connection closed by 192.168.10.251
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090901/7a14e91f/attachment.html

jean-francois prieur

unread,
Sep 1, 2009, 5:30:42 PM9/1/09
to Discussion of Rocks Clusters
For the software installation you should install stuff that you want to
share under /share/apps, this gets shared to all nodes (
http://www.rocksclusters.org/roll-documentation/base/5.2/customization-adding-applications.html).
The other way is the rpm method, also described in the docs (
http://www.rocksclusters.org/roll-documentation/base/5.2/customization-adding-packages.html
)
For your other issues, I don't mean to seem trite but it seems like there
are too many cooks playing with your cluster. I may be completely off-base
(I am a relative ROCKS newbie myself) but if you (the royal you) administer
from the viewpoint of 'its just another linux box' you are heading to a
world of hurt. If the sysadmin changed the root password so that you can't
access your own cluster (which is ridiculous), that could cause a whole lot
of problems if not done properly. ROCKS is tightly integrated with a MySQL
database that contains all the configuration parameters of the cluster, such
as the root password. If that becomes out of sync...

How did the sysadmin add those users that are different? Did he use the GUI?
(in my experience the only way to configure ROCKS is through the command
line eg. useradd <login>, passwd <login>, rocks sync users. GUI's break
things)

I am sorry I cannot be more help, I do not know if you need to login to each
node before doing rocks sync users (it sounds strange if it did require
this, don't recall having to do this).

You should have full root access to your cluster and know how people are
using and interacting with it. I will also refrain from commenting about the
"IT dictating research systems use " issue.

If possible, I think you should re-install this cluster with the sysadmin (I
don't think your policies will change that quick!) while following the docs.
I have found that, especially on your first installation attempts, ROCKS
either works or it doesn't out of the box. This is even more the case when
the current configuration and state of the cluster are in an unknown state.

Good luck and do not hesitate to ask questions, the ROCKS community has been
very friendly and good to me and my basic questions!
Regards,
JF Prieur
Research Assistant, Lamoureux Lab
Department of Chemistry and Biochem

2009/9/1 Snyder, David <SNY...@wpunj.edu>

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090901/14cf3610/attachment.html

Snyder, David

unread,
Sep 1, 2009, 5:51:24 PM9/1/09
to Discussion of Rocks Clusters
JF Prieur,

Thank you for your advice. I actually did manage (as I described) to get
the root password, so policies can change quickly (when the appropriate
people are CCd on e-mails). Anyway, how is the correct way to change the
root password so that it carries properly through the MySQL database? I
will ask the sysadmin what he did, but I suspect he just used passwd to
change it (perhaps only on the front end node) -- is this wrong?

Anything I need to be aware of in re-installing ROCKS (I figure I might as
well install the proper 64 bit version while I am at it) or do I just start
out as if I were doing it the first time and it will reformat the hard
drives for me and everything?

- David Snyder

2009/9/1 Snyder, David <SNY...@wpunj.edu>

> https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090901/1
> 4cf3610/attachment.html

jean-francois prieur

unread,
Sep 1, 2009, 6:26:32 PM9/1/09
to Discussion of Rocks Clusters
I do not know the exact answer to your root password question because I have
never changed it on our cluster.
I would imagine that a change of the root password using passwd followed by
rocks sync users would work (not in your case obviously), but as I stated I
have never tried.

Yes, start out from scratch and rocks will re-partition your disks. Good
Luck!

Regards,
Jean-Francois Prieur
Research Assistant, Lamoureux Lab
Department of Chemistry and Biochemistry
Concordia University,
Montreal, QC, CANADA

2009/9/1 Snyder, David <SNY...@wpunj.edu>

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090901/2f1c7546/attachment.html

Greg Bruno

unread,
Sep 1, 2009, 7:22:36 PM9/1/09
to Discussion of Rocks Clusters
On Tue, Sep 1, 2009 at 3:26 PM, jean-francois prieur<jfpr...@gmail.com> wrote:
> I do not know the exact answer to your root password question because I have
> never changed it on our cluster.
> I would imagine that a change of the root password using passwd followed by
> rocks sync users would work (not in your case obviously), but as I stated I
> have never tried.

on a rocks 5.2 system, if you want to change the root password, you
need to execute:

# rocks set password

- gb

Snyder, David

unread,
Sep 2, 2009, 8:51:30 AM9/2/09
to Discussion of Rocks Clusters
I guess I really do need to reinstall. When my student and I downloaded the
Rocks software, it was still version 5.1, so changing the root password this
way doesn't work, and I can't figure out (trying to search the listserv
archives) how to similarly change the password (and have it be carried to
the compute nodes) in version 5.1

Thank you all for all your help. I think I can manage (with proper CCing of
e-mails) to ensure I'll at least be allowed to start the install myself.
Now to make sure I can do this without a grumbling IT person hanging around
in the background mumbling about how he doesn't have time for all of this ;)

- David Snyder

Steven Dick

unread,
Sep 4, 2009, 8:31:37 AM9/4/09
to Discussion of Rocks Clusters
Actually, I doubt changing root's password on the head node would have
caused these problems.
Of course, changing it on the head node does not change it on the compute
nodes, so it's a bit useless unless you just wanted to have the password on
the head node.

More likely, the ssh config for root is out of sync with the compute nodes
and / or a password was put in
when ssh asked for one on first login.

One way to fix that would be to kill ~root/.ssh on the head node, then log
in as root and hit return a couple
of times to initialize ssh without a secondary password, then reinstall all
the compute nodes.

Of course, since rocks doesn't enable sudo on compute nodes by default (easy
to fix though), and since ssh to the compute nodes doesn't work as root,
this may be a bit tricky.

By the way -- the original question, the password errors with rocks sync
users are mostly harmless. I think usually it just causes a delay before
the nodes pick up the new password information. Before I realized what
caused that, I ran a cluster for about a year with that problem, and it
worked fine.


-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090904/8606ce92/attachment.html

Steven Dick

unread,
Sep 4, 2009, 8:39:36 AM9/4/09
to Discussion of Rocks Clusters
On Fri, Sep 4, 2009 at 8:31 AM, Steven Dick <kg4...@gmail.com> wrote:

> One way to fix that would be to kill ~root/.ssh on the head node, then log
> in as root and hit return a couple
> of times to initialize ssh without a secondary password, then reinstall all
> the compute nodes.
>

A detail I forgot. In order for the compute nodes to correctly receive
the new .ssh config, the permissions on headnode:~root/.ssh have to be set
in a particular way. I'm not sure if ssh sets it up correctly or if it is
overly restrictive by default. I think id_rsa.pub has to be readable to
apache (or the world). id_rsa should never be readable to anyone but the
owner.


-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090904/8dc91739/attachment.html

Scott L. Hamilton

unread,
Sep 4, 2009, 9:58:35 AM9/4/09
to Discussion of Rocks Clusters
I have fixed this without a reinstall by doing the following as root on
the head node.
# rm -rf /root/.ssh
logout and back in.
Do not enter a passphrase for the key when prompted.
Add a user to the sudoers file on the head node.
# 411put /etc/sudoers
wait for the 411 service to propogate the change to the nodes. It will
work in probably about 15 minutes.
# cp -r /root/.ssh /share/apps/.ssh (You will remove this as soon as you
are done with the next step)
login as the user with sudo priv.
$ cluster-fork sudo rm -rf /root/.ssh (should remove bad ssh keys from
nodes)
$ cluster-fork sudo cp -r /share/apps/.ssh /root (should copy ssh keys
to nodes)
$ sudo rm /share/apps/.ssh

If it works for you it saves you having to reinstall all the nodes. Now
if you don't want the root accounts on the nodes to be able to login to
the head node, or each other without a password you need to remove the
private key from the nodes. This is the standard way rocks is
configured. So do the following as root.
# cluster-fork rm /root/.ssh/id_rsa now you will be able to login to
the nodes from the head as root without a password, but not the other
way around. This is important in case someone manages to install a
root kit on one of the compute nodes, you are not putting the entire
cluster at risk.

Thanks,
Scott

Snyder, David

unread,
Sep 4, 2009, 12:19:04 PM9/4/09
to Discussion of Rocks Clusters
Scott,

Thank you for these detailed instructions. I tried to follow them, but they
don't seem to be working (yet). Actually, removing of /root/.ssh wiped out
my root password ... that was simple enough to deal with -- I reset it with
sudo passwd root. But then when I tried (logged in as either myself -- with
sudo privilges -- or as root) cluster-fork sudo rm -rf /root/.ssh, it
prompted me to login to the first node, but when I typed the appropriate
password, I got the error message

sudo: sorry, you must have a tty to run sudo

I am running this all via ssh into my head node from a terminal on my Mac.
Do I need to be doing this directly on the head node or is there some way I
can adjust the settings of my terminal to try your suggestion from the
comfort of my office?

- David

Scott L. Hamilton

unread,
Sep 4, 2009, 1:49:54 PM9/4/09
to Discussion of Rocks Clusters
David,

I forgot about sudo needing a tty and cluster-fork not providing one.
Unfortunately you will have to ssh to each node and run the sudo commands.

Or you can try this, I haven't tried it before, but if the sudoers file
updated this should work.

On the head node do:
#411put /root/.ssh/authorized_keys

This should update the authorized_keys file on all the nodes to match
the one on the head node. Once this happens you should be able to ssh to
the nodes as root without a password.

Then you will want to do this:
# cd /var/411
# make clean
# make restart

This will remove the authorized_keys file from the 411 push, you
probably don't want to leave it there.

If all this fails powering off the nodes one at a time to force a
reinstall will update the /root/.ssh folder correctly.


Scott

Steven Berg

unread,
Sep 4, 2009, 2:39:32 PM9/4/09
to Discussion of Rocks Clusters
Hi All,

I recently reinstalled my cluster. While attempting to upgrade my Ethernet
drivers from R8169 to R8168 I encountered some problems that I didn't not
encounter last.

I added the rpm kmod-r8168-PAE to extend-compute.xml and ran

#rocks create distro
# rocks set host boot compute-0-0 action=install
# ssh compute-0-0 "shutdown-r now"

When the node reboots it gets it asks for you to select a language.

Did I skip a step or miss something? Somehow I was able to get this to
install properly before.

Thanks in advance.

Steve


Philip Papadopoulos

unread,
Sep 4, 2009, 2:54:44 PM9/4/09
to Discussion of Rocks Clusters
Please send your extend-compute.xml file, it probably has syntax error
(you can run xmllint extend-compute.xml as a first-level test).

Also
send the output of
# rpm -qip kmod-r8168-PAE*rpm

-P


--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)


-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090904/a901d3d5/attachment.html

Steven Berg

unread,
Sep 4, 2009, 3:21:31 PM9/4/09
to Discussion of Rocks Clusters
Hi Philip,

Thanks for your quick reply.

I ran xmllint and found my error. The nodes now successfully reinstall with
the r8168 driver.

Steve

Gianluca Cecchi

unread,
Sep 7, 2009, 3:29:28 AM9/7/09
to Discussion of Rocks Clusters
For the message

sudo: sorry, you must have a tty to run sudo

generally speaking you can at first distribute sudoers file with this line
commented out :

Defaults requiretty

But I don't know if there are any side effects for rocks specifically

Gianluca

On Fri, Sep 4, 2009 at 7:49 PM, Scott L. Hamilton <hamil...@mst.edu>wrote:

> David,
>
> I forgot about sudo needing a tty and cluster-fork not providing one.
> Unfortunately you will have to ssh to each node and run the sudo commands.
>

-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090907/419458fd/attachment.html

Reply all
Reply to author
Forward
0 new messages