Are links as useful as they could be?

Comfy chair

unread,

Sep 26, 1986, 11:46:11 PM9/26/86

to

First let me make clear the context of my question, ask the question,
then explain why I ask.

I am asking about hard links, not symbolic links. The question is: If
you were designing a new operating system, say a successor to U**x,
would you implement links in the same way, enhance it, or put more
restrictions on it?

Now my thoughts leading up to the question. When I first read how links
worked in U**x, I liked the idea. Mv was just a link() followed by
unlink(), rm was unlink(), and you could share files and have Jerkyl
and Hyde programs, &c, &c. Over the years I noticed that the
limitations were such that people couldn't use the full generality of
the concept. You couldn't link across filesystems or link directories,
so symbolic links were invented. To find out where the other links
were, you would have to search the whole filesystem, in the general
case. Utilities like tar and rdist have to remember files with links
and might run out of memory to store the filenames, in theory at
least.

One restriction that would ease the search problem is to restrict links
to the current directory. Most uses of links are to alias things like
"vi" and "ex", "tip" and "cu" and these things usually live close
together. This means rename would have to be made a primitive. But mv
has to do a copy when it can't do a link because link() doesn't
generalize across filesystems.

There still is obviously a need for some kind of indirection mechanism.
I don't like symbolic links, there are some warts, like having to check
for looping, but I can't think of anything better.

Do you have any ideas on this? I'd like to hear them. Please mail. I
don't want to start an OS war.

Ken
--
UUCP: ..!{allegra,decvax,seismo}!rochester!ken ARPA: k...@rochester.arpa
Snail: CS Dept., U. of Roch., NY 14627. Voice: Ken!
"It is absurd to divide people into good or bad. People are either
charming or tedious." -- Oscar Wilde

Robert L Krawitz

unread,

Sep 28, 1986, 3:52:16 PM9/28/86

to

First of all, mv does not use link(2) followed by unlink(2). It uses
rename(2), which is a system call.

Secondly, "all hard links are equal." Each directory entry is just a
name and a pointer to an inode, and the inode holds the link count.
Link() and unlink() increment and decrement the link count of the
inode. When this reaches zero, the inode is deallocated and the
storage returned to the free list. This mechanism allows the use of
links to protect the existence (although not the contents) of precious
files -- just keep a link to the file around somewhere.

It should be obvious why hard links between file systems are
impossible. A directory entry refers to a certain inode, not to any
filesystem (there is no guarantee that any other filesystem will be
mounted). The device is implicitly the device that the directory is
in. Links to directories are not impossible, just forbidden to
non-superusers to reduce the chance of filesystem corruption of the
form of a closed loop unaccessible to the rest of the file system
(fsck does detect this, by the way). Actually, mkdir(2) does create
links to directories -- after all, . and .. in each directory are
nothing more than links. On unix systems without the mkdir(2) system
call, a privileged program (mkdir(1)) calls mknod(2) followed by two
calls to link(). This exception is a controlled, safe exception,
since a system call rmdir(2) is needed to unlink a directory, which
takes care of all cleanup.

Restricting all hard links to the same (NOT the "current", which has a
specific meaning) directory would cause far more problems that it
could possibly solve. First of all, all calls to link() would have to
check that other links to a file were in the same directory, which
would require a search of the whole filesystem. Secondly, rename()
would have to check similar conditions. Also, it would weaken the
power of links.

It is true that to find all links to a file you have to search the
entire filesystem. That's one of the problems with the simple link
concept of unix. However, all powerful tools have some drawbacks.

The problems for tar and rdist aren't as bad as you suggest. All that
they have to do is remember which inodes from what filesystems have
already been found. This could be a bit vector. Usually the actual
disk partitions are not readable, but in a pinch df -i can be used to
get the number of inodes in each file system (a fixed quantity).

Symbolic links are completely different. They are pointers to
arbitrary pathnames as opposed to pointers to inodes. As far as the
filesystem is concerned, they are just a slightly special type of
file. The only thing special about them is that most system calls
automatically indirect through them (to a certain level to prevent
looping).

Translation: restricting links would be pointless, difficult to
implement, etc. The homogeneity of the unix filesystem is one of its
strengths.
--
Robert^Z

wom...@ccvaxa.uucp

unread,

Sep 28, 1986, 9:14:00 PM9/28/86

to

My pet peeve with links:

We do some of our development here within "boxes", which are miniature unix
filesystems reached via dressed-up chroot calls. The nice thing about them
is that you can have complete control over your development environment. The
problem is that for each box you set up on a different partition (and almost
all of them are on different partitions), you must also install copies of
stardard utilities, including most of /bin and /lib plus some of /usr/lib,
/usr/bin, and /usr/ucb plus emacs. These have to be real copies, because we
don't have nearly enough room on our root disks to put development projects
thus making hard links useless, and because symbolic links can't be followed
to a target outside of a box. And there goes half the disk space from your
box. (Idle speculation says it would be nice if symlinks could reach through
a chroot call, but since we also use boxes for security I don't think I
could convince other people we should do it.)

I disagree that "most uses of links are to connect things in the same
directory":

We have a project here that uses a stable, common source tree. The
developers working on it usually need to change only a few files, but we
want them to do it in their own copy of the tree so that they can test it
before checking it into the official tree. So they make a copy of the
official tree which starts off consisting entirely of directories and hard
links to (non-RCS *,v) files in the official tree. When a developer wants to
change a file, they first remove the linked version, then get a true copy
and edit away. Since the source directory is over 10,000 blocks this saves a
lot of disk space. (However, if you ever have to restore the partition from
full+incremental dumps you have a really nasty problem on your hands, as we
have already discovered.)

(You may have guessed from this that our greatest shortage around here is
disk space. B-)

"Our first order of business will be to find a deranged alchemist, which
should not be very difficult. China," said Master Li, "is overstocked
with deranged alchemists."
Barry Hughart, *Bridge of Birds* Wombat
ihnp4!uiucdcs!ccvaxa!wombat

ECSC68 S Brown CS

unread,

Oct 2, 1986, 11:07:17 AM10/2/86

to

In article <21...@rochester.ARPA> k...@rochester.UUCP (Comfy chair) writes:
>
>There still is obviously a need for some kind of indirection mechanism.
>I don't like symbolic links, there are some warts, like having to check
>for looping, but I can't think of anything better.

The "check for looping" could be fixed for symbolic links by defining
some primitive that converts a filename into the filename that it
"really is" -- ie, it does the work that it does internally in order
to do things like open(), execle(), etc... on a symbolic link.
lstat() is fine but it only does one level of translation.

--
Simon Brown
Computer Science Dept.
University of Edinburg.

Tom Kiermaier

unread,

Oct 6, 1986, 9:08:58 AM10/6/86

to

The mv command on SysV does indeed implement renames as link() followed
by unlink(). The rename() system call doesn't exist on SysV.

g...@sun.uucp

unread,

Oct 7, 1986, 4:14:50 AM10/7/86

to

> The "check for looping" could be fixed for symbolic links by defining
> some primitive that converts a filename into the filename that it
> "really is" -- ie, it does the work that it does internally in order
> to do things like open(), execle(), etc... on a symbolic link.

Huh? The "check for looping" is there to prevent calls like "open" from
looping:

ln -s a b
ln -s b a
cat a

I presume the primitive you're referring to would be something like:

int evaluatelink(const char *path, char *buf, int buflen);

which would take the path pointed to by "path" and return the path name of
the file it ultimately refers to in the buffer whose first character is
pointed to by "buf", transferring at most "buflen" characters.

How would this help? If that primitive does the work that the kernel does
internally for things like "open", it would have the same problems as those
calls, and would have to do the same check for looping.

> lstat() is fine but it only does one level of translation.

Huh? "lstat" does *no* translation; that's what it's there for. It finds
the file referred to by the path argument, assuming *no* symbolic-link
translation, and returns its file status. "stat" does symbolic-link
translation.
--
Guy Harris
{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
g...@sun.com (or g...@sun.arpa)

ag...@ccvaxa.uucp

unread,

Oct 10, 1986, 9:07:00 PM10/10/86

to

> If you are willing to expend large amounts of space, the symlink
> loop checks can be made rigorous, e.g., by remembering each symlink
> inode and requiring that no one appear twice. The eight-links
> limit seems to work well in practice, though, particularly since
> symlinks slow name translation markedly.
>
> Chris Torek

As a compromise between counting and remembering paths, don't fail before
at least N=8 symlinks, and hash the remainder into bits in as long a mask
as you care. The hashing overhead might be worth it if it saves more name
translations - AND if you want to have symlink paths longer than N=8 a lot.

Andy "Krazy" Glew. Gould CSD-Urbana. USEnet: ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801 ARPAnet: aglew@gswd-vms

System Mangler

unread,

Oct 20, 1986, 12:59:52 AM10/20/86

to

In article <21...@rochester.ARPA> k...@rochester.UUCP (Comfy chair) writes:
> I don't like symbolic links, there are some warts, like having to check
> for looping, but I can't think of anything better.

Warts... you can't chmod, chgrp, utime, or link them.
The access time never means much, because doing an
"ls -l" to see it has the side effect of changing it.

Symbolic links are too expensive to use freely. They take up
an inode and 1K of disk space, just to hold a few characters.
They carry all the baggage of a regular inode (atime, mtime,
links, owner, group, mode) but you can't make proper use of
any of it.

Since Berkeley was making directory entries variable length
anyway, why didn't they just make symbolic links a variant
type of directory entry, containing a string instead of an
inode number? They might be twice the size of a normal
directory entry, but the time saved in not having to read
another inode would be a big win.

Don Speck sp...@vlsi.caltech.edu {seismo,rutgers}!cit-vax!speck

g...@sun.uucp

unread,

Oct 20, 1986, 12:55:57 PM10/20/86

to

> Since Berkeley was making directory entries variable length
> anyway, why didn't they just make symbolic links a variant
> type of directory entry, containing a string instead of an
> inode number? They might be twice the size of a normal
> directory entry, but the time saved in not having to read
> another inode would be a big win.

Because that would have required non-trivial changes to programs that read
directory entries, in order that they understand this new type of directory
entry. The 4.2BSD file system changed the format of directory entries, but
didn't really change their meaning; as far as an application reading the
directory is concerned, they are still <inumber, name> pairs. Converting a
program to use the directory library is a mechanical, albeit not automated,
operation. If this new "indirect" directory entry were introduced, the
conversion process would no longer be mechanical.

John Owens

unread,

Oct 21, 1986, 11:28:04 AM10/21/86

to

In article <1900036@ccvaxa>, ag...@ccvaxa.UUCP writes:
> As a compromise between counting and remembering paths, don't fail before
> at least N=8 symlinks, and hash the remainder into bits in as long a mask
> as you care. The hashing overhead might be worth it if it saves more name
> translations - AND if you want to have symlink paths longer than N=8 a lot.
>
> Andy "Krazy" Glew. Gould CSD-Urbana. USEnet: ihnp4!uiucdcs!ccvaxa!aglew
> 1101 E. University, Urbana, IL 61801 ARPAnet: aglew@gswd-vms

Yeah, but then the behavior is unpredictable to the "casual observer".
It's easy to say "you can't have more than 8 steps", or even "you
can't have a loop, since the kernel remembers the device/inodes
traversed". With this method, sometimes you can have more than 8,
sometimes not; I don't see how you can be completely reliable with
hashing unless you save the actual device/inodes as well, and just use
the hashing for speed.

John Owens General Electric Company - Charlottesville, VA
j...@edison.GE.COM old arpa: jso%edison...@seismo.CSS.GOV
+1 804 978 5726 old uucp: {seismo,decuac,houxm,calma}!edison!jso

System Mangler

unread,

Oct 26, 1986, 5:07:37 AM10/26/86

to

In article <83...@sun.uucp>, g...@sun.UUCP writes:
> > why didn't they [Berkeley] just make symbolic links a variant

> > type of directory entry, containing a string instead of an
> > inode number?
>

> Because that would have required non-trivial changes to programs that read
> directory entries,

Many directory-reading programs (ls, tar, find) had to gain explicit
knowledge of symbolic links anyway, the changes were even user-visible.

g...@sun.uucp

unread,

Oct 26, 1986, 5:28:34 PM10/26/86

to

> Many directory-reading programs (ls, tar, find) had to gain explicit
> knowledge of symbolic links anyway, the changes were even user-visible.

And many didn't. If a new type of directory entry were added, *every*
directory-reading program would have to gain explicit knowledge of symbolic
links, and the change would be more complicated (with symbolic links as they
are, the *directory-reading code* didn't have to change, other than
mechanically replacing explicit "read"/"fread"/whatever calls with "readdir"
calls, etc.). Furthermore, "fsck" would have to be taught about these new
kinds of directory entries (not just about new kinds of inodes, as was the
case with the current symbolic link implementation), as would a bunch of
other utilities that know about file system formats.

I presume they just decided the added benefits weren't worth the hassle.

Chris Torek

unread,

Nov 1, 1986, 6:16:25 PM11/1/86

to

In article <85...@sun.uucp> g...@sun.UUCP writes:
>I presume they just decided the added benefits weren't worth the hassle.

What are the benefits supposed to be again?

Faster name translation? The namei cache takes care of that.
Indeed, the extra code required to skip over the proposed extra
directory entry when scanning for other names might have more of
a slowing effect than any gain provided by having the contents of
the link at hand.

Less disk space used? Perhaps; but not much, not unless you have
a large number of symlinks.

Fewer inodes used? How often does one run out of inodes? newfs is
very conservative about inode allocation.

All in all, I think writing the contents of the link through an
inode is cleaner. It certainly helps keep namei, already a large
and ugly routine, from being larger and uglier.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP: seismo!umcp-cs!chris
CSNet: chris@umcp-cs ARPA: ch...@mimsy.umd.edu

Guy Harris

unread,

Nov 3, 1986, 2:57:05 AM11/3/86

to

> >I presume they just decided the added benefits weren't worth the hassle.
>
> What are the benefits supposed to be again?

Well, to quote from the article that claimed they had added benefits:

> From: man...@cit-vax.Caltech.Edu (System Mangler)
> Newsgroups: net.unix,net.unix-wizards
> Subject: Re: Are links as useful as they could be?
> Message-ID: <10...@cit-vax.Caltech.Edu>
> Organization: California Institute of Technology
> Summary: symbolic links shouldn't have been inodes

> In article <21...@rochester.ARPA> k...@rochester.UUCP (Comfy chair) writes:
> > I don't like symbolic links, there are some warts, like having to check
> > for looping, but I can't think of anything better.

> Warts... you can't chmod, chgrp, utime, or link them.

> The access time never means much, because doing an
> "ls -l" to see it has the side effect of changing it.

> Symbolic links are too expensive to use freely. They take up
> an inode and 1K of disk space, just to hold a few characters.
> They carry all the baggage of a regular inode (atime, mtime,
> links, owner, group, mode) but you can't make proper use of
> any of it.

Since Don Speck was a defender of the "symbolic links as special directory
entry" idea, while I was not, I'll let him argue the point further. Note,
however, that one of the objections - the first one listed - was not one of
resource consumption, but one of transparency.