I'm brand new to cluster computing so I'd first like to apologize for
my ignorance and thank you for your help and patience :). I am a
student at University of Nevada Reno tasked with bringing back to life
an old cluster. Things are going pretty well except for a couple snags.
when a user logs into any of the compute nodes with ssh we get this:
Could not chdir to home directory /export/home/<usrname>: No such file
or directory
The contents of /etc/auto.master
/share /etc/auto.share --timeout=1200
/home /etc/auto.home --timeout=1200
The contents of /etc/auto.home
install front_end.local:/export/home/&
user1 front_end.local:/export/home/user1/&
user2 front_end.local:/export/home/user2/&
user3 front_end.local:/export/home/user3/&
The contents of /etc/exports
/export 10.1.1.1(rw,async,no_root_squash)
10.1.1.0/255.255.255.0(rw,async)
411 is servicing the /etc/auto.* files correctly, i see the same /etc/
auto.* on all the compute nodes
I'm just not very familiar with autofs in the first place, is there
something I'm missing?
Thanks very much for your help,
Joel
Did you do "rocks sync users"?
Gus Correa
Joel Larsen wrote:
> Hello,
>
> I'm brand new to cluster computing so I'd first like to apologize for my
> ignorance and thank you for your help and patience :). I am a student
> at University of Nevada Reno tasked with bringing back to life an old
> cluster. Things are going pretty well except for a couple snags.
>
> when a user logs into any of the compute nodes with ssh we get this:
>
> Could not chdir to home directory /export/home/<usrname>: No such file
> or directory
>
> The contents of /etc/auto.master
>
> /share /etc/auto.share --timeout=1200
> /home /etc/auto.home --timeout=1200
>
>
> The contents of /etc/auto.home
>
> install front_end.local:/export/home/&
> user1 front_end.local:/export/home/user1/&
> user2 front_end.local:/export/home/user2/&
> user3 front_end.local:/export/home/user3/&
>
>
> The contents of /etc/exports
>
> /export 10.1.1.1(rw,async,no_root_squash) 10.1.1.0/255.255.255.0(rw,async)
>
> 411 is servicing the /etc/auto.* files correctly, i see the same
> /etc/auto.* on all the compute nodes
> Hello,
>
> I'm brand new to cluster computing so I'd first like to apologize
> for
> my ignorance and thank you for your help and patience :). I am a
> student at University of Nevada Reno tasked with bringing back to
> life
> an old cluster. Things are going pretty well except for a couple
> snags.
>
> when a user logs into any of the compute nodes with ssh we get this:
>
> Could not chdir to home directory /export/home/<usrname>: No such
> file
> or directory
>
> The contents of /etc/auto.master
>
> /share /etc/auto.share --timeout=1200
> /home /etc/auto.home --timeout=1200
>
>
> The contents of /etc/auto.home
>
> install front_end.local:/export/home/&
> user1 front_end.local:/export/home/user1/&
> user2 front_end.local:/export/home/user2/&
> user3 front_end.local:/export/home/user3/&
>
What version of ROCKS are you running?
autofs config files vary by version of autofs, so yours
may just not be syntactically correct.
Ian
Sorry i forget to mention that.
Joel
I gave that a try but it did not seem to help
I also followed it with a 'cluster-fork 411get --all'
-still scratching my head
Joel
on a compute node, what is the output of:
# cat /etc/hosts
# cat /etc/auto.home
- gb
127.0.0.1 localhost.localdomain localhost
xxx.xxx.xxx.xxx <domain>
10.1.1.253 compute-0-0.local compute-0-0
#cat /etc/auto.home
install front_end.local:/export/home/&
user1 front_end.local:/export/home/user1/&
user2 front_end.local:/export/home/user2/&
user3 front_end.local:/export/home/user3/&
thanks,
Joel
on the compute node, what is the output of:
# host front_end.local
- gb
# host front_end.local
front_end.local has address 10.1.1.1
Joel
-----Original Message-----
From: npaci-rocks-dis...@sdsc.edu
[mailto:npaci-rocks-dis...@sdsc.edu] On Behalf Of Greg Bruno
Sent: Tuesday, April 14, 2009 9:22 PM
To: Discussion of Rocks Clusters
Subject: Re: [Rocks-Discuss] can't find /exports/home on compute nodes
It seems that I have the same situation -- autofs doesnt mount.
For me,
== [root@compute-0-0 ~]# host front_end.local
== Host front_end.local not found: 3(NXDOMAIN)
--
With best regards,
Grigory Shamov
--- On Tue, 4/14/09, Greg Bruno <greg....@gmail.com> wrote:
> From: Greg Bruno <greg....@gmail.com>
> Subject: Re: [Rocks-Discuss] can't find /exports/home on compute nodes
> To: "Discussion of Rocks Clusters" <npaci-rocks...@sdsc.edu>
> Date: Tuesday, April 14, 2009, 9:21 PM
> On Tue, Apr 14, 2009 at 8:47 PM, Joel
> Larsen <lars...@unr.nevada.edu>
> wrote:
> > #cat /etc/hosts
> >
> > 127.0.0.1 localhost.localdomain localhost
> > xxx.xxx.xxx.xxx <domain>
> > 10.1.1.253 compute-0-0.local compute-0-0
> >
> >
> > #cat /etc/auto..home
Somewhere, your auto.home config got messed up. Based upon
your config, your user home dirs are located in
/export/home/user*/user* - which is most likely incorrect.
Your auto.home should be (for example):
user1 front_end.local:/export/home/&
Were these systems upgraded to ROCKS 5.1?
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering
I've just installed Rocks 5.1 and have similar problems.
useradd creates user home directories under /export/home.
Which isnt a problem as long as the directory gets exported by NFS, but it somehow doesnt.
--
WBR, Grigory Shamov
--- On Wed, 4/15/09, Kaufman, Ian <ikau...@soe.ucsd.edu> wrote:
On a new install, /export should be a symlink to /state/partition1
by default, and autofs should be configured to mount /home/user
using /export/home/user. Is autofs working correctly? Did you
run "rocks sync users"?
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering
> -----Original Message-----
> From: npaci-rocks-dis...@sdsc.edu [mailto:npaci-rocks-
> discussio...@sdsc.edu] On Behalf Of Grigory Shamov
> Sent: Wednesday, April 15, 2009 10:37 AM
> To: Discussion of Rocks Clusters
> Subject: Re: [Rocks-Discuss] can't find /exports/home on compute
> nodes
>
>
try:
# rocks sync users
then, on a compute node, send us the output of:
# cat /etc/auto.home
- gb
I've added one user and did rocks sync users. If I mount /export manually on the nodes, there is no problem -- I can ssh w/o passowrd etc.
How do I check if autofs works correctly?
It seem to have entries for the user in its /etc/auto.* files.
But doesnt mount /share or /home or /export/home.
> > Hi Ian
> >
> > I've just installed Rocks 5.1 and have similar
> problems.
> > useradd creates user home directories under
> /export/home.
> > Which isnt a problem as long as the directory gets
> exported by NFS,
> > but it somehow doesnt.
> >
> > --
> > WBR, Grigory Shamov
> == [root@compute-0-0 ~]# host front_end.local == Host
> front_end.local not found: 3(NXDOMAIN)
On compute-0-0, what is the output of "host front_end.local"?
What is in /etc/hosts on both the node and the head?
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering
> -----Original Message-----
> From: npaci-rocks-dis...@sdsc.edu [mailto:npaci-rocks-
> discussio...@sdsc.edu] On Behalf Of Grigory Shamov
> Sent: Wednesday, April 15, 2009 10:37 AM
> To: Discussion of Rocks Clusters
> Subject: Re: [Rocks-Discuss] can't find /exports/home on compute
> nodes
>
>
If they are not being automounted under /home, then the problem is with
autofs.
I'll describe how mine looks on a recent install of Rocks 5.1. In your
/etc/exports file there should be a line which exports the /export
directory to the cluster nodes. Then there should be a file
/etc/auto.master which has a line like:
/home /etc/auto.home --timeout=1200
Finally there should be a /etc/auto.home file which has a line for each
user in the cluster that looks like:
username frontendname.local:/export/home/username
The same files (except /etc/exports) should be present on the compute nodes.
Send the contents of those files if they are all there but things are
still not working. Another option is try restarting the autofs daemon
on the frontend and compute nodes.
-John Wilkinson
What is in /etc/auto.master on both the compute node and the head?
/etc/auto.home?
By using the manual partitioning scheme, and creating /export as
a partition (as opposed to the default of /state/partition1 and
symlinking /export to /state/partition1), you may have broken a
few things.
I had a bad experience in the past (with Rocks 4.3) when I tried to
create manually a frontend partition named "/export".
I eventually reinstalled the whole thing.
Was this what you did?
Not sure if this would be a problem with Rocks 5.1.
As Ian said, /export is a soft link to /state/partition1.
This soft link is created automatically by Rocks, I think.
The User Guide was (still is?) a bit confusing about this.
Gus Correa
Like, for /etc/auto.home
======
apps ruthenium.local:/export/&
bio ruthenium:/export/&
======
--- On Wed, 4/15/09, Greg Bruno <greg....@gmail.com> wrote:
> From: Greg Bruno <greg....@gmail.com>
> Subject: Re: [Rocks-Discuss] can't find /exports/home on compute nodes
> To: "Discussion of Rocks Clusters" <npaci-rocks...@sdsc.edu>
> Date: Wednesday, April 15, 2009, 10:53 AM
> On Wed, Apr 15, 2009 at 10:36 AM,
> Grigory Shamov <ga...@yahoo.com>
> wrote:
> >
> > Hi Ian
> >
> > I've just installed Rocks 5.1 and have similar
> problems.
> > useradd creates user home directories under
> /export/home.
> > Which isnt a problem as long as the directory gets
> exported by NFS, but it somehow doesnt.
>
Th command host front_end.local gives the error in the quote below.
On the server, /etc/hosts is
====
127.0.0.1 localhost.localdomain localhost
191.168.11.1 my_cluster_name.local ruthenium # originally frontend-0-0
191.168.11.254 compute-0-0.local compute-0-0
191.168.11.253 compute-0-1.local compute-0-1
191.168.11.252 compute-0-2.local compute-0-2
191.168.11.250 compute-0-4.local compute-0-4
192.168.10.100 my_cluster_name.fqdn
====
On the compute-0-0 it is
====
127.0.0.1 localhost.localdomain localhost
192.168.10.100 my_cluster_name.fqdn
191.168.11.254 compute-0-0.local compute-0-0
====
(The latter is strange -- dont all the computes have to have each other's addresses?)
--
WBR, Grigory Shamov
University of Manitoba
--- On Wed, 4/15/09, Kaufman, Ian <ikau...@soe.ucsd.edu> wrote:
> From: Kaufman, Ian <ikau...@soe.ucsd.edu>
> Subject: Re: [Rocks-Discuss] can't find /exports/home on compute nodes
> To: "Discussion of Rocks Clusters" <npaci-rocks...@sdsc.edu>
> Date: Wednesday, April 15, 2009, 10:56 AM
> Also, it looks like you have a DNS
> problem.
>
> > == [root@compute-0-0 ~]# host front_end.local == Host
>
> > front_end.local not found: 3(NXDOMAIN)
>
> On compute-0-0, what is the output of "host
> front_end.local"?
> What is in /etc/hosts on both the node and the head?
>
> Ian Kaufman
> Research Systems Administrator
> UC San Diego, Jacobs School of Engineering
>
>
> > -----Original Message-----
> > From: npaci-rocks-dis...@sdsc.edu
> [mailto:npaci-rocks-
> > discussio...@sdsc.edu]
> On Behalf Of Grigory Shamov
> > Sent: Wednesday, April 15, 2009 10:37 AM
> > To: Discussion of Rocks Clusters
> > Subject: Re: [Rocks-Discuss] can't find /exports/home
> on compute
> > nodes
> >
> >
> > Hi Ian
> >
> > I've just installed Rocks 5.1 and have similar
> problems.
> > useradd creates user home directories under
> /export/home.
> > Which isnt a problem as long as the directory gets
> exported by NFS,
> > but it somehow doesnt.
> >
> > --
> > WBR, Grigory Shamov
> >
> > --- On Wed, 4/15/09, Kaufman, Ian <ikau...@soe.ucsd.edu>
> wrote:
> >
> > > From: Kaufman, Ian <ikau...@soe.ucsd.edu>
> > > Subject: Re: [Rocks-Discuss] can't find
> /exports/home on compute
> > nodes
> > > To: "Discussion of Rocks Clusters"
> <npaci-rocks-
> > discu...@sdsc.edu>
> > > Date: Wednesday, April 15, 2009, 10:06 AM
> > > Hi Joel,
> > >
> > > Somewhere, your auto.home config got messed up.
> Based upon
> > > your config, your user home dirs are located in
> > > /export/home/user*/user* - which is most likely
> incorrect.
> > > Your auto.home should be (for example):
> > >
> > > user1
> > > front_end.local:/export/home/&
> > >
> > > Were these systems upgraded to ROCKS 5.1?
> > >
>
>
> The output of cat /etc/auto.home is like this:
> ======
> my_user my_server_name.local:/export/home/my_user
> ======
>
> Like, for /etc/auto.home
> ======
> apps ruthenium.local:/export/&
> bio ruthenium:/export/&
> ======
>
That looks fine, but I suspect there is still a DNS
issue.
Again, on the head node and a compute node, what is in
/etc/hosts? Does the compute node properly resolve
"my_server_name.local"? Is the head also called "ruthenium"?
Are /apps and /bio getting mounted correctly on the compute
node?
OK, my earlier response crossed paths ...
>
> Th command host front_end.local gives the error in the quote below.
Try "host ruthenium.local" on the compute node (I assume ruthenium is
the frontend's name, and not my_cluster_name ;) )
>
> On the server, /etc/hosts is
> ====
> 127.0.0.1 localhost.localdomain localhost
> 191.168.11.1 my_cluster_name.local ruthenium # originally
> frontend-0-0
> 191.168.11.254 compute-0-0.local compute-0-0
> 191.168.11.253 compute-0-1.local compute-0-1
> 191.168.11.252 compute-0-2.local compute-0-2
> 191.168.11.250 compute-0-4.local compute-0-4
> 192.168.10.100 my_cluster_name.fqdn
> ====
That looks good, although you may run into DNS issues since
the frontend does not talk to an external network. Also, is
191.168.X.X correct for your private subnet? That is a bad
idea - 191.168 is a routable address. If this cluster ever
gets connected to a public network, you may run into problems.
Anyway, I expect you set up your cluster off the Uni's public
network? If so, do you ever plan on moving it?
>
> On the compute-0-0 it is
> ====
> 127.0.0.1 localhost.localdomain localhost
> 192.168.10.100 my_cluster_name.fqdn
> 191.168.11.254 compute-0-0.local compute-0-0
> ====
>
> (The latter is strange -- dont all the computes have to have each
> other's addresses?)
>
Not strange - the nodes will query the frontend's DNS to get the
other node addresses. Try "host compute-0-1" from compute-0-0 -
it will return the correct info (assuming it all works correctly).
-John
The front end _might_ have been upgraded. . . I don't know as it was
up when i got here. However, there was only 1 compute node up when i
signed on and it had this problem just like the other 9 nodes i
kickstarted.
I tried changing the entries in /etc/auto.home to the format you
suggest and it does not change the behavior when logging into the
compute nodes.
Thanks very much for all the help. . .is there anything else I can
share with you that might give us a better clue?
Joel
>
> Hi Ian,
>
> The front end _might_ have been upgraded. . . I don't know as it was
> up when i got here. However, there was only 1 compute node up when
> i
> signed on and it had this problem just like the other 9 nodes i
> kickstarted.
>
> I tried changing the entries in /etc/auto.home to the format you
> suggest and it does not change the behavior when logging into the
> compute nodes.
>
> Thanks very much for all the help. . .is there anything else I can
> share with you that might give us a better clue?
>
> Joel
>
Did you restart autofs after making the changes?
What comes back after "ls -l /" on the frontend?
What about "ls -l /" on the node?
The problem is /export should not exist on the nodes, and yet the
autofs maps are trying to mount /export on the nodes.
Try configuring /etc/auto.home on the node to:
user1 front_end.local:/export/home/user1
Just to force the issue and not use the wildcard set up. Make
sure you restart autofs on the node. Then try "su - user1" and
see what happens. If it fails, check the logs. Better yet, open
a second terminal on the node, and type "tail -f /var/log/messages",
and then restart autofs and try to su. Also examine /var/log/daemon.
So after checking into all those things, I've finally got it. This is
what I did:
First tried to manually mount front_end.local:/export
#mount front_end.local:/export /test
<something> Permission denied.
Aha!
So after changing my /etc/exports to:
/export/home compute-0-0(rw,no_root_squash)
/export/home compute-0-1(rw,no_root_squash)
/export/home compute-0-2(rw,no_root_squash)
/export/home compute-0-3(rw,no_root_squash)
/export/home compute-0-4(rw,no_root_squash)
/export/home compute-0-5(rw,no_root_squash)
/export/home compute-0-6(rw,no_root_squash)
/export/home compute-0-7(rw,no_root_squash)
/export/home compute-0-8(rw,no_root_squash)
/export/home compute-0-9(rw,no_root_squash)
/export/home compute-0-10(rw,no_root_squash)
Users can log into all the compute nodes with no problems with /home
Thanks for everybodies help!
Joel
>
> So after checking into all those things, I've finally got it. This
> is
> what I did:
>
> First tried to manually mount front_end.local:/export
> #mount front_end.local:/export /test
> <something> Permission denied.
>
> Aha!
>
> So after changing my /etc/exports to:
>
> /export/home compute-0-0(rw,no_root_squash)
> /export/home compute-0-1(rw,no_root_squash)
> /export/home compute-0-2(rw,no_root_squash)
> /export/home compute-0-3(rw,no_root_squash)
> /export/home compute-0-4(rw,no_root_squash)
> /export/home compute-0-5(rw,no_root_squash)
> /export/home compute-0-6(rw,no_root_squash)
> /export/home compute-0-7(rw,no_root_squash)
> /export/home compute-0-8(rw,no_root_squash)
> /export/home compute-0-9(rw,no_root_squash)
> /export/home compute-0-10(rw,no_root_squash)
>
> Users can log into all the compute nodes with no problems with /home
>
> Thanks for everybodies help!
>
> Joel
While it's cool you got things working, I suspect something
is still not right. You should not have to list each node
in /etc/exports, ROCKS by default sets up /etc/exports on
the frontend to push things out to the entire private net
automatically, i.e. it should look like:
/export 10.1.1.1(rw,async,no_root_squash) \
10.1.1.0/255.255.255.0(rw,async)
Ian
What you list there is exactly what mine previously contained. . .so
if it _was_ correct. What else could i look into as being incorrect?
Joel
I think networking might still be off, or DNS. It sounds like the
system may have been modified from "stock" ROCKS, and you inherited
something that might take quite some time to unravel.
On the front end, what is the output of:
"rocks list network"
"rocks list host"
"rocks list host interface front_end"
"rocks list host interface compute-0-0"
"more /etc/fstab"
If you don't want this info to appear on the net, you can send
it to me directly.
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering
ikaufmanATsoeDOTucsd.edu x49716