>--
>Bjorn Engsig, Domain: ben...@oracle.nl, ben...@oracle.com
> Path: uunet!mcsun!orcenl!bengsig
-- Mitch Patenaude
g-pa...@steer.calstate.edu
And where do you think this reference resides? Directory entries are
just (filename,inode) number pairs, nothing specialized here. The
symbolic link is characterized thus by the mode bits of the inode
(S_IFLNK from <sys/stat.h>) ; the 'reference to another file' is
contained in the disk blocks for this special file.
Bjorn's question seems a valid one; a rename() emulated by a
link()/unlink() pair fails for the above reason in the case of a
symbolic link (one is left with a hard link to the original
softlinked-to file, instead of a different name for the previous soft
link).
Leo.
Soft and hard links don't exactly mix well: the semantics are too
different. I vaguely recollect one of the triple [Kernighan|Ritchie|Thompson]
commenting that BSD should have removed the hard links if they were planning
on adding soft ones.
Specifically, the definition of a symlink as "something that when
referenced is evaluated to produce a pathname" leads one to interpret the
creating of a link to a symlink as creation of a link to the referred-to
thing... Yet a symlink has a separate existance, can be seen in an ls, etc.
which leads the unwary to trip over the (operational) definition and go
crashing into "this make no sense" land.
If it had a better definition, maybe I could reason about it better...
--dave
--
David Collier-Brown, | dav...@Nexus.YorkU.CA, ...!yunexus!davecb or
72 Abitibi Ave., | {toronto area...}lethe!dave
Willowdale, Ontario, | "And the next 8 man-months came up like
CANADA. 416-223-8968 | thunder across the bay" --david kipling
Sorry, but that just isn't so, at least on the 4.2/4.3 BSD file system
and on other UNIX file systems that implement symbolic links. The
target pathnames of symbolic links are *not* stored in directory
entries, but in data blocks that are reached via real, honest-to-god
symlink inodes. Where do you think the information reported by
lstat(2) comes from?
The answer to Bjorn's question is, sorry, that's just the way the
link(2) system call works. There's no way to tell it *not* to resolve
its first argument into a hard link if it isn't one already. So,
barring the invention of a new system call (llink(2) anyone?),
there's no way to get a handle on a symlink itself, as opposed to
the target of that symlink.
However, it is possible to achieve the same effect, provided you're
willing to go beyond the bounds of system calls and diddle directly
with the on-disk representation of the file system. If you know the
inode number of an existing symlink file (not its target, but the
symlink itself), you could rewrite an existing directory block so that
it has a new entry -- an entry whose link is to the desired symlink
inode. Of course, your system probably makes no guarantees about
handling multiple hard links to symlinks, so you're on your own if
you actually mount the file system and start using the links. In
particular, it would be interesting to observe the effects of
deleting one of the symlinks and then trying to resolve the remaining
one....
------------------------------------------------------------------------
Bob Goudreau +1 919 248 6231
Data General Corporation
62 Alexander Drive goud...@dg-rtp.dg.com
Research Triangle Park, NC 27709 ...!mcnc!rti!xyzzy!goudreau
USA
And article <7...@ehviea.ine.philips.nl> by l...@ehviea.ine.philips.nl
(Leo de Wit) says:
does make a bit happy, since he says:
|Bjorn's question seems a valid one; a rename() emulated by a
|link()/unlink() pair fails for the above reason in the case of a
|symbolic link (one is left with a hard link to the original
|softlinked-to file, instead of a different name for the previous soft
|link).
Yes, that's also what I get by RTFM, but I'd like to know _why_ it is done
like that. I'd say that the semantics of this are hard to understand:
unlink() does remove a symbolic link and not what it points to (fortunately :-),
so why doesn't link() do the reverse? I'd like to hear if there would be any
bad consequences of this, even though I know that it is probably much too late
to change things.
Bjorn> Yes, that's also what I get by RTFM, but I'd like to know _why_ it is done
Bjorn> like that. I'd say that the semantics of this are hard to understand:
Bjorn> unlink() does remove a symbolic link and not what it points to (fortunately :-),
Bjorn> so why doesn't link() do the reverse?
And what would be the reverse? unlink() removes a reference to a
file. There are two kinds of references to a file: hard links and
symbolic links. Thus, unlink() removes "a symbolic link [which is the
reference] and not what it points to." link() creates a reference to
a file. The old file can be specified either by naming a hard link or
by naming a symbolic link. In either case, this is just a way of
referencing the actual file to be linked to.
--
-steve
------------------------------------------------------------------
Steve Clark
National Institute for Standards and Technology (formerly NBS)
cl...@cme.nist.gov ..uunet!cme-durer!clark
(301)975-3595 / 3544
"I'm workin' on it!"
There's no reason symbolic links have to have their own inode. They
could just as well be done entirely in the directory entry. Probably
cheaper to do this way too.
--
Felix Lee fl...@shire.cs.psu.edu *!psuvax1!flee
I hope not, because symbolic links are dependent on hard links. All that
a hard link is is a directory entry. With no directory entries you have
no way to reach the inodes.
This has been considered before, but there are some tricky
implementation details. People who were going to do that always wound
up putting short symlinks into the i_db[] field of the inode instead.
(This requires only a few changes to the files system code and to fsck,
icheck, dcheck, and dump/rdump. I think. Using i_db+i_ib lets you
store up to 60 bytes in the inode directly. Of course, if you are
going to do this, you might as well do it for *all* files rather than
just symlinks; the only thing that makes symlinks special here is that
they never grow.)
As to the original question (why does link() link to the target of a
symlink rather than the symlink itself): on one level, the answer is
simply that link() does a namei() with `LOOKUP | FOLLOW'. If you take
out the FOLLOW, links to symlinks will link to the symlink.
On another level, though, the answer is `because that seemed to be what
we wanted' (supply whomever you like for `we'). Perhaps this was
because the designers wanted to keep open the possibility of
implementing symlinks entirely in the directory structure.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: ch...@cs.umd.edu Path: uunet!mimsy!chris
The trouble with symlinks is that operations on them mostly refer to
the linked-to file, sometimes to the link itself; since a program can
lstat() and readlink() it can decide whether or not to follow a
symbolic link. Sometimes it makes sense to do one thing, another time
the other thing, a third time to do both, a fourth time it is not clear
what should be done. It's both a powerful and troublesome mechanism.
The following sequence might illustrate this a bit:
$ mkdir a
$ ln a b
a is a directory # OK, no hard links to a directory.
$ ln -s a b
$ ln b c # Remember Bjorn Ensig's post, this attempts to link c to a.
b is a directory # OK, like above: no hard links to a directory.
$ rmdir b
rmdir: b: Not a directory # Note b's dual nature.
| If it had a better definition, maybe I could reason about it better...
I sometimes like to see a file as a set of data blocks (the actual
contents) having a label attached to it: the filename. Hard links are
just extra labels to the same set of data blocks. Soft links are also
labels, but they are not attached to a set of data blocks, but to other
labels. So hard links are always directly referring to a set of data
blocks (in fact indistinguisable from them), soft links can refer to
the label they're attached to, or to the label that label is attached
to, or ..., or to the set of data blocks a label is (finally) attached
to (a label representing a hard link).
Another way of viewing it is to see the links as pointers, the file
contents as the pointed to data. You can have lots of pointers to the
same data (hard links); the pointers differ, their value not. Soft
links are like pointers to pointers.
Leo.
read mitch's reply carefully. the fact is that sym links need not be implemented
as peculiar files at all. a particularly elegant implementation for
BSD-style directories is to have the ``content'' of the sym link
follow the entry name. no need to use an inode at all.
of course, if bjorn is saying, as you can be read as implying, that a
necessary requirement for a sym link that it be represented as an inode,
then you should allow hard links to this inode as hard links are simply and only
synonyms for a particular inode.
What file would you expect the hard link to refer to? The
same as the original, following the rule that a hard
link is indistinguishable from the original name. Right?
What if that link is in a different directory, and the symlink
contains a relative pathname? You get the same namespace operator,
but *not* the same file. What a mess.
--
Steve Nuchia South Coast Computing Services (713) 964-2462
"To learn which questions are unanswerable, and _not_to_answer_them;
this skill is most needful in times of stress and darkness."
Ursula LeGuin, _The_Left_Hand_of_Darkness_
I know what Mitch was implying, and I don't find the solution of
placing the symlink's content together with the directory entry
'particularly elegant'.
1) It blurs the clean UNIX file model of directories and inodes. Inodes
contain, among other things, the type of a device (be it disk file or
other). Now you move that information to where it doesn't belong (a
directory entry) - you have to, to decide whether it's a symlink or
not. All system calls dealing with inodes and directory entries will
have to make an exception for symlinks.
2) An inode is still needed: if I ln -s a b, and a does not exist,
where does the system put the creation date, owner, permission bits
etc. of b? I hope you're not suggesting to move that to the directory
entry too?
3) I think the main reason for putting a symlink's content in/with a
directory entry is performance; the funny thing is it will probably
cost you performance. It will cost at least one bit per directory entry
to indicate whether or not it is really a symlink (space). It will cost
at least one test for each directory entry lookup to test for a symlink
(time). Given the fact that symlinks are rare with respect to hard links
this will probably cost you more than it yields.
Since inodes are needed anyway, for small symlink's contents you can
use the inode itself, as Chris (Torek) suggest; this can be made a
general policy for small files. For a large symlink's contents you'll
have to resort to file blocks anyway.
Leo.
No, it doesn't. It takes the second name-specific type of file (viz.,
symlinks) and moves its information where it belongs, into the original
name-specific type of file (viz., directories). This cleanly separates
the notions of searching for a file by name and looking it up through
its inode, by combining all name searches into a single type of file.
A different way to understand this is to consider what effect this
change would have on the clean UNIX file system calls. lstat() would
disappear or return garbage (as it basically does now). That's it. How
can removing a syscall (and mangling the directory-reading libraries,
boo hoo) blur anything?
> 2) An inode is still needed: if I ln -s a b, and a does not exist,
> where does the system put the creation date, owner, permission bits
> etc. of b? I hope you're not suggesting to move that to the directory
> entry too?
Who says there has to be a creation date, owner, permission bits, or
anything else? After all, that information is practically useless as is.
I see this as a strong argument for why symlinks make sense in directory
entries rather than separate files.
[ 3. Would lose performance by storing symlinks in directory entries ]
The time and space losses you refer to within the entry are much less
than the (negligible) loss from variable-length directory names. Both of
those losses are dwarfed by the huge gain you'd get from no longer using
an almost wasted inode and fragment per symlink.
[ large symlinks need file blocks anyway ]
Not if symlinks are limited to 255 characters. Or whatever.
---Dan
Not to be nitpicking, but symlinks are path-specific, directories
name-specific; there is a difference.
| This cleanly separates
|the notions of searching for a file by name and looking it up through
|its inode, by combining all name searches into a single type of file.
A file is identified by its path; a path search (that is what you
probably meant when you mentioned name searches), involves both lookups
by directories (name search) _and_ inodes (identifying the next
directory in the path).
|
|A different way to understand this is to consider what effect this
|change would have on the clean UNIX file system calls. lstat() would
|disappear or return garbage (as it basically does now). That's it. How
|can removing a syscall (and mangling the directory-reading libraries,
|boo hoo) blur anything?
Reread what I said. I never said putting symlinks in directory entries
could not have _some_ good effects; the blur is there because directory
entries should just associate names with inodes (hence the name:
linking).
|
|> 2) An inode is still needed: if I ln -s a b, and a does not exist,
|> where does the system put the creation date, owner, permission bits
|> etc. of b? I hope you're not suggesting to move that to the directory
|> entry too?
|
|Who says there has to be a creation date, owner, permission bits, or
|anything else? After all, that information is practically useless as is.
|I see this as a strong argument for why symlinks make sense in directory
|entries rather than separate files.
As my example above (ln -s a b , with a non-existent) shows, there are
cases where the symlink does not refer to a file. Another way of
putting this is to say, that symlinks and their associated data have
'separate existence'. This is perfectly valid and gives you the
freedom of creating a link beforehand (or after creation of the actual
file the symlink points to). So, put in this light, a symlink is a
separate object in the file space. You want to see it in an 'ls', I
presume?
|
| [ 3. Would lose performance by storing symlinks in directory entries ]
|
|The time and space losses you refer to within the entry are much less
|than the (negligible) loss from variable-length directory names. Both of
|those losses are dwarfed by the huge gain you'd get from no longer using
|an almost wasted inode and fragment per symlink.
OK, since you didn't provide any base for your conclusions, I will for
mine. Let's determine the cost of a 'symlink mark' at one bit per
directory entry. This is probably the cheapest you can get (yes/no a
symlink). An inode of 64 bytes at 8 bits/byte is 512 bits per inode.
However, you only loose that in case of a symlink, while the 1 bit per
dirent is lost for each type of file. So, if the proportion symlinks to
all links is less than 1 per 512, you need more ordinary disk blocks
than you win from unused inodes (the above calculation does not take
into account in case of using an inode, whether or not the symlink's
contents will be placed into the inode itself). So the correct answer
to what is cheaper in space is: it depends.
A system that heavily uses symlinks is probably better off storing its
contents into directory entries; this has an additional profit since
the total number of inodes in a file system is fixed. My personal
experience is that symlinks are not used that often, and when they are,
hard links can often be used instead, taking up even less space (and
less time evaluating).
Since I don't have any kernel lookup algoritms for directory scanning,
I can't give any calculations for time performance gains/losses. Note
that also here the observation holds, that if you have few symlinks,
having the symlink contents in the directory entry will costs you more
than it yields (consider the extreme case: few == 0).
|
| [ large symlinks need file blocks anyway ]
|
|Not if symlinks are limited to 255 characters. Or whatever.
By the best of my knowledge, inodes are (a lot) less than 255 bytes long
(64 is a better figure). And on the systems I used them, symlinks can be
sized MAXPATHLEN, there being 1024.
On this dual-universe Pyramid system I'm using right now, symlinks can
be conditional. In this case symlinks can be even twice as long (plus
some overhead for the condition).
|
|---Dan
Leo.
gw...@smoke.BRL.MIL (Doug Gwyn) writes:
| I hope not, because symbolic links are dependent on hard links. All that
| a hard link is is a directory entry. With no directory entries you have
| no way to reach the inodes.
The author of the comment may have been very aware of that (:-))
I was using ``name-specific'' as a catch-all, meaning ``the kernel
understands this structure when it's doing name lookups.'' Symlinks and
directories are the two and only name-specific types of files.
> | This cleanly separates
> | the notions of searching for a file by name and looking it up through
> | its inode, by combining all name searches into a single type of file.
[ but searching through a path involves both kinds of lookup ]
So what? I'm saying that all name searches should go through a single
type of structure, namely the directory. Obviously directories need to
be stored somewhere on the disk, and you have to use inodes to allocate
storage; just because X implies Y doesn't mean you can't cleanly
separate out every use of X (together with Y) from uses of Y alone.
[ putting symlinks into inodes wouldn't change *anything* in the ]
[ syscalls except possibly lstat()---so how can it blur anything? ]
> Reread what I said. I never said putting symlinks in directory entries
> could not have _some_ good effects; the blur is there because directory
> entries should just associate names with inodes (hence the name:
> linking).
On what basis do you say ``should''?
[ an inode is still needed: if there's no inode, where do you ]
[ put the creation date, owner, permission bits, etc.? ]
[ fine, just trash the information---it's useless anyway ]
[ but the linked-to file need not exist! ]
So what? Why should a symlink have any more inode-type information than
any other name-specific structure used purely for name searches?
[ symlinks have a ``separate existence'' from ``associated data'' ]
Quite true. Why does this imply that symlinks need inodes?
[ you do want to see a symlink in an ls, right? ]
Yes; that's what readlink() is for. Presumably the read-directory calls
would return symlink information too. (readdir() and friends should be
syscalls; directories shouldn't be read any more than they should be
written.) So what? Are you saying everything ls outputs has to have the
same format? How about reporting the symlink owner as the directory
owner---which would make much more sense for lstat() anyway?
> | [ 3. Would lose performance by storing symlinks in directory entries ]
[ no, would not ]
> OK, since you didn't provide any base for your conclusions, I will for
> mine. Let's determine the cost of a 'symlink mark' at one bit per
> directory entry.
Sorry, no. Remember, symlinks don't have inodes, so that extra bit is
just a special value of the inode number. (0? -1? Whatever works.) So
there is absolutely no space loss; and the gained inodes and blocks for
each symlink would mean a huge gain.
I don't have timing information either, but as there are already a few
tests for directory-entry inode values, there won't be any time loss,
for exactly the same reason.
> | [ large symlinks need file blocks anyway ]
> | Not if symlinks are limited to 255 characters. Or whatever.
[ but inodes are 64, and symlinks can be 1024 or 2048 or more! ]
So what? Symlinks can and should be treated like all other names in
directories: limited to some convenient length (say 255), and stored in
the directory entry. The relative length of symlinks and inodes is
irrelevant. (Inodes were 128 last time I checked...)
---Dan
> [ fine, just trash the information---it's useless anyway ]
> [ you do want to see a symlink in an ls, right? ]
>
>Yes; that's what readlink() is for. Presumably the read-directory calls
>would return symlink information too.
Why do all this extra work when most programs don't wish to know
about symlinks, they just want to know the names of the directory
entries, and maybe file vs directory. I don't know that this is
all that bad though. It also makes "struct direct" messier (two
variable length strings).
>
>> | [ 3. Would lose performance by storing symlinks in directory entries ]
> [ no, would not ]
>> OK, since you didn't provide any base for your conclusions, I will for
>> mine. Let's determine the cost of a 'symlink mark' at one bit per
>> directory entry.
>
>Sorry, no. Remember, symlinks don't have inodes, so that extra bit is
>just a special value of the inode number. (0? -1? Whatever works.) So
>there is absolutely no space loss; and the gained inodes and blocks for
>each symlink would mean a huge gain.
You also need somewhere to store the length of the symlink target.
Unless you wish to allow directory entries to span block boundaries X-(
you will end up eating a lot of space due to fragmetation of the
directories, since the average symlink directory entry will be much
larger (closer to to the block size) than typical non-symlink
directory entries.
There would be a performance hit in doing a name lookup in the directory.
1. Possible extra code complexity for finding the next directory entry,
2. More directory blocks to read and scan.
The only gain in speed is when you follow the link, and then only
if it is one of the first symlinks in the directory (because of 1 and 2).
>
>I don't have timing information either, but as there are already a few
>tests for directory-entry inode values, there won't be any time loss,
>for exactly the same reason.
Please elaborate.
>
>> | [ large symlinks need file blocks anyway ]
>> | Not if symlinks are limited to 255 characters. Or whatever.
> [ but inodes are 64, and symlinks can be 1024 or 2048 or more! ]
>
>So what? Symlinks can and should be treated like all other names in
>directories: limited to some convenient length (say 255), and stored in
>the directory entry.
The symlink length would have to be limited to less than 255 if you wish
it to have a maximum name length of 255 and have the whole directory
entry fit in a block. A length limitation like this is pretty bogus
because it means that you can create files that you can't symlink to.
IMHO, the way to go is to store short symlinks (and files) right in
the inode.
--
Don "Truck" Lewis Harris Semiconductor
Internet: d...@mlb.semi.harris.com PO Box 883 MS 62A-028
Phone: (407) 729-5205 Melbourne, FL 32901
Not entirely. The name part of a symlink is allowed to up to MAXPATHLEN
(in <sys/param.h>; now PATH_MAX in <sys/syslimits.h> in 4.4BSD) bytes
long. MAXPATHLEN is 1024 on most BSD-derived systems, and no doubt
there are people who have 1000-character symlinks . . . .
(This is one of those `sticky issues' I mentioned.)
Apollo's Domain/OS does this in its on-disk filesystem format (which
is not derived from 4.x BSD); in particular, Domain/OS directories are
organized as btrees. Creating a 1KB symlink in an empty directory
expands the directory from 1KB to 3KB.
By the way, I suspect that putting the link target in the directory
will speed up pathname resolution significantly; one need not fetch
the the symlink's inode or the symlink's inode's data page. *Most*
symlinks are small, probably under 32 bytes long; why burn a whole 128
byte inode when just 32 or 40 bytes in the directory would do it?
About the only thing that this really mucks up is "ls -lt"; since the
mod time on the symlink is the same as the directory, every time you
add a new file to the the directory, all the symlinks "bubble up" to
the top of the list.
- Bill
--
This space for rent. | Bill Sommerfeld at MIT/Project Athena
| somme...@mit.edu
Even more important, this and other ways of implementing file
system objects should be decoupled from file system interface
semantics. So far no one has presented a convincing enough
argument for why, at the FS interface level, hardlinking to a
symlink is a bad idea (presumably the reason for disallowing it?)
More generally, why shouldn't a symlink be a first class object,
with meaningful permission bits and so on?
IMHO, a symlink is only the first of many naming objects people
will think up. Treating it as a first class object makes its
effect modular. No need to clutter up directories. For example,
how about using a regular expression as a symlink -- make x/y
point to either x1/y or failing that x2/y. How about a program as
a link -- find out which file to link during execution.
[ We also need to start thinking about how to deal with
some of these exciting possibilities (not by putting in
hooks for them but by making sure we don't block them
out) ]
*If* a symlink is to be considered a first class object, a new
syscall to update its contents will be needed to keep its
hardlinks intact (yes, you have heard this before in std.unix!).
I also hope that someday the BSD limit of no following more than 8
symlinks is replaced with a real loop detection algorithm.
I might as well bring in `..' w.r.t. symlinks. The current
behavior of entering the *physical* parent of a directory may be
easy to implement but is quite surprising. This also requires
that a `..' be kept in each directory which I'd rather do without.
It is a lot easier to explain that .. means `go back one step';
after all, we arrive at an inode by following a `path'. Then
`<x>/y/../<z>' is equivalent to `<x>/<z>'.
-- bakul shah
Ah, most of the time you don't want the link target anyway. Lets say
you have a directory containing 100 symlinks with 100 character targets
and their names are 10 characters long. With the targets stored elsewhere,
the directory entries will be about 24 bytes long. With the targets
stored in the directory, the directory entries will be about 128 bytes
long. If you want to follow the link in the last directory entry, you
must shuffle through 2400 bytes in the first case, 12800 bytes in second.
Searching directory entries and skipping those that don't match is
much more common than processing the directory entries that do match
(if there are N entries in a directory, on average you will skip
the first N/2 before you find the one you want), so it makes sense
to optimize directory searches the most. Furthermore, if you often
follow certain symlinks, the buffer cache will be more efficient if
links are not stored in the directory itself. Fewer blocks need
to be cached if the directory is small (and you have to separately
cache the link target), than if the directory is large because it
contains a bunch of links that you are going to skip over anyway.
Actually someone did.
In article <22...@nuchat.UUCP> st...@nuchat.UUCP (Steve Nuchia) writes:
]I was staying out of this since it's all been said before, but it
]just occured to me that there is a good reason to disallow hard
]links to symlinks.
]
]What file would you expect the hard link to refer to? The
]same as the original, following the rule that a hard
]link is indistinguishable from the original name. Right?
]
]What if that link is in a different directory, and the symlink
]contains a relative pathname? You get the same namespace operator,
]but *not* the same file. What a mess.
>More generally, why shouldn't a symlink be a first class object,
>with meaningful permission bits and so on?
>
[description deleted]
Some flavors of symbolic links can conditionally point to different
targets. The main thing is to make fancier links do something
interesting enough (and commonly useful) so that it is worth
both expending the effort to implement them and paying whatever
performance penalty they cause.
>
>I might as well bring in `..' w.r.t. symlinks. The current
>behavior of entering the *physical* parent of a directory may be
>easy to implement but is quite surprising. This also requires
>that a `..' be kept in each directory which I'd rather do without.
>It is a lot easier to explain that .. means `go back one step';
I seem to recall reading somewhere that nfs mounted directories
do not require entries for `.' and `..'. The local host keeps
track of the list of directories, and if it encounters `..' in
the path it backs up one.
A paper on the OS/2 file system says that its directories are also
B-trees. I think this is a waste of effort. It is worth noting that
Encore used to have B-trees for large directories in their Umax 4.2BSD
port, but removed them from their 4.3BSD port (fsck will convert B-tree
directories to plain directories). I think this is because, on the
average, directories are small enough that a fancier mechanism than
linear search is not worthwhile, particularly given name-to-inode
cacheing.
The program must be compiled with `-DNFS' if your machine supports NFS
(in which case the quota and inode .h files come from <ufs/*> instead
of <sys/*>) and with -DPOSIX if your machine has a POSIX-style readdir()
(`struct dirent' and <dirent.h> rather than `struct direct' and <dir.h>).
#include <sys/types.h>
#include <stdio.h>
#include <string.h>
#if POSIX
#include <dirent.h>
#else
#include <sys/dir.h>
#define dirent direct
#endif
#include <sys/syslimits.h>
#include <sys/stat.h>
#if NFS
#include <ufs/quota.h>
#include <ufs/inode.h>
#else
#include <sys/inode.h>
#endif
struct dinode di;
#define SMALLSIZE (sizeof(di.di_db) + sizeof(di.di_ib))
#define VERYBIGDIR 8192 /* directories > this are `very big' */
int zerofiles, smallfiles, bigfiles;
int zerodirs, smalldirs, bigdirs, verybigdirs;
int zerolinks, smalllinks, biglinks;
int zeroall, smallall, bigall;
int unread, badstat;
struct stat st;
char pathbuf[PATH_MAX];
main(argc, argv)
int argc;
char **argv;
{
register int i;
if (argc < 2)
chksize(".");
else
for (i = 1; i < argc; i++) {
pathbuf[0] = 0;
chksize(argv[i]);
}
if (badstat)
(void) printf("(unable to stat %d pathnames)\n", badstat);
sum("Directories", zerodirs, smalldirs, bigdirs);
i = smalldirs + bigdirs;
(void) printf("\t%d very big (size > %d) => %.2f%% very big\n",
verybigdirs, VERYBIGDIR, i ? 100.0 * verybigdirs / i : 0.0);
if (unread)
(void) printf("\t(%d unreadable)", unread);
sum("\nRegular files", zerofiles, smallfiles, bigfiles);
sum("\nSymlinks", zerolinks, smalllinks, biglinks);
sum("\nOverall", zeroall, smallall, bigall);
exit(0);
}
sum(what, z, s, b)
char *what;
int z;
register int s, b;
{
register int total = s + b;
(void) printf("%s:\n\t%d: %d small (%d empty), %d big => %.2f%% big\n",
what, total, s, z, b, total ? 100.0 * b / total : 0.0);
}
int
chksize(file)
char *file;
{
register int ty;
register DIR *d;
register struct dirent *dp;
register int i;
extern int errno;
#define issmall() (st.st_size <= SMALLSIZE)
#define BUMP(a,b,c) \
if (st.st_size == 0) a++; \
if (issmall()) b++; else c++
if (lstat(file, &st)) {
(void) fprintf(stderr, "lstat(%s%s): %s\n", pathbuf, file,
strerror(errno));
badstat++;
return;
}
ty = st.st_mode & S_IFMT;
switch (ty) {
case S_IFREG:
BUMP(zerofiles, smallfiles, bigfiles);
break;
case S_IFLNK:
BUMP(zerolinks, smalllinks, biglinks);
break;
case S_IFDIR:
BUMP(zerodirs, smalldirs, bigdirs);
if (st.st_size > VERYBIGDIR)
verybigdirs++;
if (chdir(file)) {
(void) fprintf(stderr, "chdir(%s%s): %s\n",
pathbuf, file, strerror(errno));
break;
}
if ((d = opendir(".")) == NULL) {
(void) fprintf(stderr,
"cannot read directory %s%s: %s\n",
pathbuf, file, strerror(errno));
if (chdir(".."))
panic("cannot chdir back");
unread++;
break;
}
i = strlen(pathbuf);
(void) sprintf(pathbuf + i, "%s/", file);
while ((dp = readdir(d)) != NULL) {
if (dp->d_namlen < 1 || dp->d_ino == 0)
continue;
if (dp->d_name[0] == '.' &&
(dp->d_name[1] == 0 ||
dp->d_name[1] == '.' && dp->d_name[2] == 0))
continue;
chksize(dp->d_name);
}
closedir(d);
if (chdir(".."))
panic("cannot chdir back (2)");
pathbuf[i] = 0;
break;
case S_IFBLK:
case S_IFCHR:
if (st.st_size)
(void) printf("%s%s is nonempty special\n",
pathbuf, file);
return;
case S_IFSOCK:
if (st.st_size)
(void) printf("%s%s is nonempty socket\n",
pathbuf, file);
return;
default:
(void) printf("ty=%#o\n", ty);
panic("chksize ty");
/* NOTREACHED */
}
BUMP(zeroall, smallall, bigall);
}
panic(str)
char *str;
{
(void) fprintf(stderr, "panic: %s\n", str);
(void) fprintf(stderr, "(pathbuf = %s)\n", pathbuf);
exit(2);
Last desperate gasp here? :-)
I was thinking of having the symlink targets at the end of the directory
structure. (Of course, readdir() wouldn't tell you the target in this
case.) So this point is moot.
---Dan
Here is a little program to find out how many small files and
symlinks there are in a directory tree....The ratio of small to
big should suggest whether special handling for small, big, or
`very big' files and/or directories would be useful.
What you will find will probably reflect the fact that big directories
on UNIX machines are already known to be undesirable, and are avoided
by software designers, while multiple levels of directories are quite
easy to use and a more efficient substitute. So a conclusion like "big
directories seldom exist, so we need not worry too much about them" is
natural, but probably only because it is a self-fulfilling prophecy.
It turns out that under VMS, directories are often huge. Why? Because
(a) huge directories are efficiently handled by VMS, and (b) multiple
levels of directories are inelegantly handled by VMS. A search on a
typical VMS machine would probably lead us to conclude that "big
directories often exist, so they should be efficiently handled" --
another self-fulfilling prophecy.
UUCP used to like to keep lots of things in one directory. BSD changed
this and introduced an additional directory level. (AT&T, as it often
does, followed suit some years later). We should ask why.
Rahul Dhesi <dhesi%cir...@oliveb.ATC.olivetti.com>
UUCP: oliveb!cirrusl!dhesi
Globally speaking, correct. Of course, there is always a trade-off. If
deeply nested directories are used, this _may_ involve more inode
lookup time (depending on whether complete pathnames are being used,
how much inodes are cached). Probably for certain statistics of file
name lengths, inode buffering etc. there is an ideal number of
files/directory.
|
|It turns out that under VMS, directories are often huge. Why? Because
|(a) huge directories are efficiently handled by VMS, and (b) multiple
|levels of directories are inelegantly handled by VMS. A search on a
|typical VMS machine would probably lead us to conclude that "big
|directories often exist, so they should be efficiently handled" --
|another self-fulfilling prophecy.
Add to this (c) nesting depth is restricted to a level of 9 under VMS
(last time I looked). A limit not reached so easily, but I certainly
hit it at least once.
Leo.
Perhaps---but I would note the following:
a. Personally, I prefer to keep my own directories relatively small.
My home directory on mimsy is `too big' (more than one screenful of
`ls' output). It contains 145 entries (214 if you count `.' files)
and fits in a single file system block. So, speaking as a user, I
have no desire to use bigger directories myself.
b. As a programmer, I prefer to keep my programs' directories relatively
small, partly because big directories are inefficient. I have found
this to be quite easy to do---i.e., not an `undue burden'---and so,
speaking as a programmer, I have no great desire to use bigger
directories myself.
c. Rule b is not completely hard-and-fast...:
>UUCP used to like to keep lots of things in one directory. BSD changed
>this and introduced an additional directory level.
UUCP was, and still is, one of the exceptions to `b' above. The
addition of per-system subdirectories was a good idea, not just for
efficiency but also for sheer organisational sanity. But now that
it has been done, there are still some sites with huge UUCP directories
(e.g., uunet).
All in all, though, I am still unconvinced that adding a special case
for big directories would help overall.
Our experience at Pitt has been that (a) is *not* true. In fact, adding
or deleting a file to a directory is very slow, since VMS re-sorts the
directory with every change. We've had our systems absolutely crawl
if our PMDF queue directory got too large. The current PMDF uses
subdirectories for each channel, which makes things much more reasonable.
Carl Fongheiser
c...@unix.cis.pitt.edu
Hmm. Now what happens if I try to make a hard to link to a symbolic
link and the referenced file is on another spindle/partition?
What if the referenced file doesn't exist? (I would predict that
link would complain that it can't find the file.)
Not having access to a machine with symbolic links this week,
I can't try it.
--
:!mcr!: | Tellement de lettres, si peu de temps.
Michael Richardson | If Meech passes, no one will understand that.
Play: m...@julie.UUCP Work: mic...@fts1.UUCP Fido: 1:163/109.10 1:163/138
Amiga----^ - Pay attention only to _MY_ opinions. - ^--Amiga--^
We came to the same conclusion when we were writing DG/UX 4.0. We were
looking at a complex hash-based mechanism, so I did an analysis of
the size of directories encountered during lookups (generally, more than 98%
of directory operations are lookups). I found that more than 82% of
the lookups were for directories of 8 blocks or less. If your buffer
size (or page size for SunOS/Mach) is 4K, you're talking just one disk
I/O to read in your directory.
Also, Chris' point about name-to-inode caching is right on. The gain
from a good name cache hit rate will dwarf those from a fancy directory
access method. Why pay for all the code complexity and size of a btree
implementation? It's overkill.
Steve Stukenborg
Data General Corporation
62 Alexander Drive stuke...@dg-rtp.dg.com
Research Triangle Park, NC 27709 ...!mcnc!rti!xyzzy!stukenborg
J. Eliot B. Moss, Assistant Professor
Department of Computer and Information Science
Lederle Graduate Research Center
University of Massachusetts
Amherst, MA 01003
(413) 545-4206; Mo...@cs.umass.edu
Chris Torek pointed out that his home directory was quite small. This is
really to be expected--humans have trouble with large directories.
He also said that his applications directories also tend to be small,
but the inefficiency of large directories on Unix was one of his reasons.
I would assert that the inefficiency of large directories on Unix is
equally as limiting as the awkwardness of deep directory trees in VMS is.
My best anecdote on this comes from BIX. BIX is implemented using
the CoSy software from the University of Guelph. CoSy was first
implemented in the early '80s, under version 7. It had a wide
user community at Guelph, where it ran on Amdahl's UTS, which was then
a Version 7 implementation. The editors at Byte liked it, so they
bought an Arete box (nice character I/O performance) to run it.
This ran SVR2 (no hissing, please). CoSy was installed, and they
began testing BIX internally. Everything basically worked fine.
Then they opened the system up to the public for beta-test. Lots of
people signed up. Soon, they found their first scale-up problem:
CoSy kept its per-user information as one directory per user. Each
of these directories had the name of the user's login name, and lived
in the users/ directory. Well, System V at the time didn't allow more
than 1000 links to an inode. BIX quickly went over 1000 users, and
all of those '..' links killed it. So, some midnight (literally)
programming was done, and the next day joeuser's per-user files
were in users/j/joeuser, rather than users/joeuser. That got them
going again, but they still had the wall at 26,000 users to worry
about.
I lost touch with the details after this, but I'm pretty sure that they've
gone over the 26,000 user limit since. I think today, they've bagged using
the Unix file system completely--the per-user data is now in some sort
of database.
This effect also shows up in things like SysV's terminfo database,
where you also get somedirectory/a/adm-3 kinds of things.
My point on this is that some implementations probably should do something
about the large-directory problem. If your application works most
naturally with 2000 subdirectories, or 20,000 subdirectories, in a single
directory, you shouldn't have to recode to get around system inefficiencies.
Now, maybe the random workstation doesn't need this capability. But for
the future, at least some implementation of Unix will need to do large
directories well.
--
Craig Jackson
dri...@drilex.dri.mgh.com
{bbn,axiom,redsox,atexnet,ka3ovk}!drilex!{dricej,dricejb}
> all of those '..' links killed it. So, some midnight (literally)
> programming was done, and the next day joeuser's per-user files
> were in users/j/joeuser, rather than users/joeuser. That got them
> going again, but they still had the wall at 26,000 users to worry
> about.
the umich.edu afs cell has taken this one step further -- my account
there is users/e/m/emv. unfortunately the top /afs level is totally
flat, I expect that'll change as more and more cells come on line
on some horrible and painful re-naming day.
one world, one filesystem -- did I hear "one vendor" too?
--Ed
Edward Vielmetti, U of Michigan math dept <e...@math.lsa.umich.edu>
"security through diversity"
this may depend on what you consider to be "big", but I can think of 2
applications where "big" directories are produced:
1. news:
for example, an ls -ld on /usr/spool/news/comp/unix (minus the files
in comp.unix) on my news server:
drwxrwxr-x 2 news 2560 May 28 08:08 aix
drwxrwxr-x 2 news 2560 May 28 00:33 aux
drwxrwxr-x 2 news 512 May 24 00:31 cray
drwxrwxr-x 2 news 16384 May 28 16:16 i386
drwxrwxr-x 2 news 1024 May 27 23:33 microport
drwxrwxr-x 2 news 18432 May 28 17:20 questions
drwxrwxr-x 2 news 6144 May 28 10:07 ultrix
drwxrwxr-x 2 news 11776 May 28 17:20 wizards
drwxrwxr-x 2 news 9728 May 28 14:40 xenix
comp.unix.{i386,questions,wizards,xenix}, for example, are getting big
to the point of unwieldy. (yeah, we could shrink them by rebuilding
the directories, but that's pretty awkward.) the others aren't
exactly scrawny either.
2. mh-based mail
we have more than a few users who have thousands of files (messages)
per folder; this can be really slow. moreover, when you have been
accumulating mail for a decade, it is _easy_ for some people to get
themselves in this kind of bind.
In article <24...@mimsy.umd.edu> ch...@mimsy.umd.edu (Chris Torek) writes:
>Perhaps---but I would note the following:
>
> a. Personally, I prefer to keep my own directories relatively small.
...
> b. As a programmer, I prefer to keep my programs' directories relatively
> small, ...
...
>All in all, though, I am still unconvinced that adding a special case
>for big directories would help overall.
well, I would agree that
a. the problem needs more study -- how can we quantify how much
impact the current directory setup has on overall usage?
b. most naive people learn about directories eventually; in general,
people find it unwieldy to try and manage directories with
thousands of entries. more sophisticated people also seem to
avoid this problem.
but
when the application (news, mh) hides some of that complexity from
you, it is easy for things to get out of hand.
maybe the application needs fixing, maybe it's the filesystem.
(this thread is kinda curious to me, because it stirs memories of an
old Chris Torek posting which I think stated "the filesystem IS the
database" and discussed the advantages of using hashing to build directory
entries.)
--
t. patterson domain: t...@wsl.dec.com path: decwrl!tp
enet: wsl::tp
icbm: 122 9 41 W / 37 26 35 N