[Rocks-Discuss] automount broken

Craig Plaisance

unread,

Mar 2, 2011, 11:25:36 PM3/2/11

to Discussion of Rocks Clusters

I just reinstalled my cluster and then added two entries to auto.share.
For a while, everything worked fine and I was able to access these
mounts, but all of a sudden, I can't access any of the automounts (in
/home or /share). When I try to access from the frontend, I get

[root@atlas etc]# cd /share/apps
-bash: cd: /share/apps: No such file or directory

And I checked that /export/apps does exist. I did do two things between
the time it worked and stopped working. First, I appended the user's
entries from a pre-install backup of passwd, group, shadow, and gshadow
onto the new versions. Second, I added all the compute nodes, each with
the commands:

rocks add host $node cpus=$cpus rack=$rack rank=$rank membership=$membership
rocks add host interface $node eth0
rocks set host interface ip $node eth0 $ip
rocks set host interface name $node eth0 $name
rocks set host interface mac $node eth0 $mac0
rocks set host interface module $node eth0 tg3
rocks set host interface subnet $node eth0 private
rocks add host interface $node eth1
rocks set host interface mac $node eth1 $mac1
rocks set host interface module $node eth1 tg3
rocks set host boot $node action=install

And then running "rocks sync users" and "rocks sync config"

But I don't see how these things could affect automounting. Doing
"service autofs restart" doesn't fix anything and neither does rebooting
the frontend. Any ideas of how to fix this? Can't find anything useful
here or with google. Thanks

Here is my auto.master:

[root@atlas etc]# cat auto.master
/share /etc/auto.share --timeout=1200
/home /etc/auto.home --timeout=1200

auto.share:

[root@atlas etc]# cat auto.share
apps atlas.local:/export/&
bigtmp atlas.local:/vault/&
common atlas.local:/vault/&

and auto.home:

[root@atlas etc]# cat auto.home
ac4wd atlas.local:/vault/home/ac4wd
accelrys atlas.local:/vault/home/accelrys
cb2pa atlas.local:/vault/home/cb2pa
cpp6f atlas.local:/vault/home/cpp6f
ddh9r atlas.local:/vault/home/ddh9r
fl3p atlas.local:/vault/home/fl3p
hx4e atlas.local:/vault/home/hx4e
mff7d atlas.local:/vault/home/mff7d
mn4n atlas.local:/vault/home/mn4n
mp5ke atlas.local:/vault/home/mp5ke
nks4a atlas.local:/vault/home/nks4a
ouu5b atlas.local:/vault/home/ouu5b
qq3b atlas.local:/vault/home/qq3b
qt3c atlas.local:/vault/home/qt3c
rr2tn atlas.local:/vault/home/rr2tn
yc6n atlas.local:/vault/home/yc6n
bh8hp atlas.local:/vault/home/bh8hp
bpp6u atlas.local:/vault/home/bpp6u
rc5up atlas.local:/vault/home/rc5up
zt2ba atlas.local:/vault/home/zt2ba
tang atlas.local:/vault/home/tang
lijun atlas.local:/vault/home/lijun
loveless atlas.local:/vault/home/loveless
vasp_user atlas.local:/vault/home/vasp_user
bjork atlas.local:/vault/home/bjork
mah8js atlas.local:/vault/home/mah8js
laymanka atlas.local:/vault/home/laymanka
xf5az atlas.local:/vault/home/xf5az

Richard Chang

unread,

Mar 3, 2011, 2:02:11 AM3/3/11

to npaci-rocks...@sdsc.edu

Is the NFS server running?.

run the following on the front-end

service nfs status
chkconfig --list |grep nfs
showmount -e

RC

Craig Plaisance

unread,

Mar 3, 2011, 8:06:05 AM3/3/11

to Discussion of Rocks Clusters

Here is the output of those commands, but I don't think it is an NFS
issue b/c these directories are not even automounting on the frontend.
For example, I can't even log in as a non-root user since the /home
mount is not working

[root@atlas ~]# service nfs status
rpc.mountd (pid 3839) is running...
nfsd (pid 3836 3835 3834 3833 3832 3831 3830 3829) is running...
rpc.rquotad (pid 3783) is running...

[root@atlas ~]# chkconfig --list |grep nfs
nfs 0:off 1:off 2:on 3:on 4:on 5:on 6:off
nfslock 0:off 1:off 2:off 3:on 4:on 5:on 6:off

[root@atlas ~]# showmount -e
Export list for atlas.che.virginia.edu:
/vault 10.1.0.0/255.255.0.0,10.1.1.1
/state/partition1 10.1.0.0/255.255.0.0,10.1.1.1

Craig Plaisance

unread,

Mar 3, 2011, 9:06:15 AM3/3/11

to Discussion of Rocks Clusters

It seems like my problem was that I added a compute node and gave it the
same ip address as atlas.local. I removed the compute node from the
database (rocks remove host <compute-node>) but I still can't access
atlas.local, even after rebooting ("ssh atlas.local" fails). How do I
fix this? Thanks

Laotsao

unread,

Mar 3, 2011, 9:28:37 AM3/3/11

to Discussion of Rocks Clusters, Discussion of Rocks Clusters

Double check the dns for local zone
Double check 411
Dump out all rocks setup and check it again
Since you add eth1 to compute nodes this will impact SGE

Sent from my iPad
Laotsao

Laotsao

unread,

Mar 3, 2011, 8:49:55 AM3/3/11

to Discussion of Rocks Clusters, Discussion of Rocks Clusters

Check or restart autofs in frontend

Sent from my iPad
Laotsao

Reply all

Reply to author

Forward