But we have found that if any one machine is down, no one can log in,
thus negating the whole point of distributing the $HOMEs. Instead they
get an "NFS server...not responding" message for some random other
server.
So I decided to track down this dependency. I ran truss on the login
shell (tcsh) and found that it was lstat()ing some seemingly random
other home directories like:
23753: stat("/", 0xFFBEEEC8) = 0
23753: lstat(".", 0xFFBEEE40) = 0
23753: stat("../", 0xFFBEED30) = 0
23753: open64("../", O_RDONLY|O_NDELAY) = 3
23753: fcntl(3, F_SETFD, 0x00000001) = 0
24157: fstat64(3, 0xFFBEE038) = 0
23753: getdents64(3, 0x00083018, 1048) = 448
23753: lstat("../jimmyg", 0xFFBEEDB8) = 0
23753: lstat("../carol", 0xFFBEEDB8) = 0
23753: lstat("../toddm", 0xFFBEEDB8) = 0
23753: lstat("../clava", 0xFFBEEDB8) = 0
23753: lstat("../csrt", 0xFFBEEDB8) = 0
[...]
Now since ../ is /home and those other logins are on various machines,
any one of them being down stops everything in its tracks.
What I can't figure out is why it is doing this. If I change the
user's login shell to, say, /bin/bash, then this problem doesn't occur.
So it seems to be something particular to [t]csh.
Any ideas? Thanks...
--
NOTE: Remove the temp?? hostname to reply after two weeks.
Jim Gottlieb | E-Mail: jimmy at nccom.com |
V-Mail: +1 619 364 6912 | Fax: +1 858 274 8181
My Home Page URL: http://tokyojim.com/
> What I can't figure out is why it is doing this. If I change the
> user's login shell to, say, /bin/bash, then this problem doesn't occur.
> So it seems to be something particular to [t]csh.
>
> Any ideas? Thanks...
Most likely tcsh is doing the moral equivalent of pwd, but is using
the old-fashioned algorithm that causes things to hang in your case.
Where did you get your tcsh from? The tcsh bundled with Solaris 9 doesn't
seem to do this: at least "truss tcsh -l" shows it opening /etc/mnttab and
avoiding the "old-fashioned algorithm" when it gets to a filing system
boundary it its climb up the directory tree.
Chris Thompson
Email: cet1 [at] cam.ac.uk
>Hi. This is related to a problem I've been trying to figure out for a
>while. We have our users' home directories scattered on different
>machines and we use the Solaris automounter to manage it all.
Are you using the nobrowse option on /home?
>So I decided to track down this dependency. I ran truss on the login
>shell (tcsh) and found that it was lstat()ing some seemingly random
>other home directories like:
>23753: stat("/", 0xFFBEEEC8) = 0
>23753: lstat(".", 0xFFBEEE40) = 0
>23753: stat("../", 0xFFBEED30) = 0
>23753: open64("../", O_RDONLY|O_NDELAY) = 3
>23753: fcntl(3, F_SETFD, 0x00000001) = 0
>24157: fstat64(3, 0xFFBEE038) = 0
>23753: getdents64(3, 0x00083018, 1048) = 448
>23753: lstat("../jimmyg", 0xFFBEEDB8) = 0
>23753: lstat("../carol", 0xFFBEEDB8) = 0
>23753: lstat("../toddm", 0xFFBEEDB8) = 0
>23753: lstat("../clava", 0xFFBEEDB8) = 0
>23753: lstat("../csrt", 0xFFBEEDB8) = 0
>[...]
This is typical for "old getpwd"; in order to find its current
working directory the old getpwd algorithm would recursively go
up one level and then stat all files to find one that matched the
current directory.
Are you using your own build tcsh?
I don't see this with Solaris tcsh in S8 or later.
Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
>This is typical for "old getpwd";
>[...]
>Are you using your own build tcsh?
>
>I don't see this with Solaris tcsh in S8 or later.
Yup, I was using a home-grown version of tcsh from 1995. I always
replaced Sun's version with my own because it includes some additional
features compiled in (namely Japanese support).
I have put Sun's version back as /bin/tcsh and that seems to have taken
care of the problem. Thank you all.