Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

why does tcsh lstat() sibling directories at login

1 view
Skip to first unread message

Jim Gottlieb

unread,
Jul 18, 2004, 7:44:40 PM7/18/04
to
Hi. This is related to a problem I've been trying to figure out for a
while. We have our users' home directories scattered on different
machines and we use the Solaris automounter to manage it all.

But we have found that if any one machine is down, no one can log in,
thus negating the whole point of distributing the $HOMEs. Instead they
get an "NFS server...not responding" message for some random other
server.

So I decided to track down this dependency. I ran truss on the login
shell (tcsh) and found that it was lstat()ing some seemingly random
other home directories like:

23753: stat("/", 0xFFBEEEC8) = 0
23753: lstat(".", 0xFFBEEE40) = 0
23753: stat("../", 0xFFBEED30) = 0
23753: open64("../", O_RDONLY|O_NDELAY) = 3
23753: fcntl(3, F_SETFD, 0x00000001) = 0
24157: fstat64(3, 0xFFBEE038) = 0
23753: getdents64(3, 0x00083018, 1048) = 448
23753: lstat("../jimmyg", 0xFFBEEDB8) = 0
23753: lstat("../carol", 0xFFBEEDB8) = 0
23753: lstat("../toddm", 0xFFBEEDB8) = 0
23753: lstat("../clava", 0xFFBEEDB8) = 0
23753: lstat("../csrt", 0xFFBEEDB8) = 0
[...]

Now since ../ is /home and those other logins are on various machines,
any one of them being down stops everything in its tracks.

What I can't figure out is why it is doing this. If I change the
user's login shell to, say, /bin/bash, then this problem doesn't occur.
So it seems to be something particular to [t]csh.

Any ideas? Thanks...
--
NOTE: Remove the temp?? hostname to reply after two weeks.
Jim Gottlieb | E-Mail: jimmy at nccom.com |
V-Mail: +1 619 364 6912 | Fax: +1 858 274 8181
My Home Page URL: http://tokyojim.com/

Paul Eggert

unread,
Jul 19, 2004, 4:22:47 AM7/19/04
to
At 18 Jul 2004 15:44:40 -0800, ji...@temp01.nccom.com (Jim Gottlieb) writes:

> What I can't figure out is why it is doing this. If I change the
> user's login shell to, say, /bin/bash, then this problem doesn't occur.
> So it seems to be something particular to [t]csh.
>
> Any ideas? Thanks...

Most likely tcsh is doing the moral equivalent of pwd, but is using
the old-fashioned algorithm that causes things to hang in your case.

Chris Thompson

unread,
Jul 19, 2004, 9:41:17 AM7/19/04
to
In article <7wzn5w7...@sic.twinsun.com>,

Where did you get your tcsh from? The tcsh bundled with Solaris 9 doesn't
seem to do this: at least "truss tcsh -l" shows it opening /etc/mnttab and
avoiding the "old-fashioned algorithm" when it gets to a filing system
boundary it its climb up the directory tree.

Chris Thompson
Email: cet1 [at] cam.ac.uk

Casper H.S. Dik

unread,
Jul 19, 2004, 10:02:44 AM7/19/04
to
ji...@temp01.nccom.com (Jim Gottlieb) writes:

>Hi. This is related to a problem I've been trying to figure out for a
>while. We have our users' home directories scattered on different
>machines and we use the Solaris automounter to manage it all.

Are you using the nobrowse option on /home?

>So I decided to track down this dependency. I ran truss on the login
>shell (tcsh) and found that it was lstat()ing some seemingly random
>other home directories like:

>23753: stat("/", 0xFFBEEEC8) = 0
>23753: lstat(".", 0xFFBEEE40) = 0
>23753: stat("../", 0xFFBEED30) = 0
>23753: open64("../", O_RDONLY|O_NDELAY) = 3
>23753: fcntl(3, F_SETFD, 0x00000001) = 0
>24157: fstat64(3, 0xFFBEE038) = 0
>23753: getdents64(3, 0x00083018, 1048) = 448
>23753: lstat("../jimmyg", 0xFFBEEDB8) = 0
>23753: lstat("../carol", 0xFFBEEDB8) = 0
>23753: lstat("../toddm", 0xFFBEEDB8) = 0
>23753: lstat("../clava", 0xFFBEEDB8) = 0
>23753: lstat("../csrt", 0xFFBEEDB8) = 0
>[...]

This is typical for "old getpwd"; in order to find its current
working directory the old getpwd algorithm would recursively go
up one level and then stat all files to find one that matched the
current directory.

Are you using your own build tcsh?

I don't see this with Solaris tcsh in S8 or later.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Jim Gottlieb

unread,
Jul 19, 2004, 4:49:47 PM7/19/04
to
In article <40fbd484$0$65124$e4fe...@news.xs4all.nl>,

Casper H.S. Dik <Caspe...@Sun.COM> wrote:

>This is typical for "old getpwd";

>[...]


>Are you using your own build tcsh?
>
>I don't see this with Solaris tcsh in S8 or later.

Yup, I was using a home-grown version of tcsh from 1995. I always
replaced Sun's version with my own because it includes some additional
features compiled in (namely Japanese support).

I have put Sun's version back as /bin/tcsh and that seems to have taken
care of the problem. Thank you all.

0 new messages