I had the distribution archives in /dist under "kfs" and decided to do
the following:
% cd /dist
% mkdir plan9
% mv *.9gz plan9
the machine is still busy copying plan9.9gz :-(
In this case, a "rename" would have been appropriate. Is it not
possible to determine whether it would work and resort to cp+rm if
it doesn't?
In fact, a rename command would be an adequate compromise. I see
there's a "kfs" command called "rename" but the man page does not
specify its actual scope. Should I have used that (one file at the
time) instead?
++L
How would you determine whether it would work?
How would you specify it to the underlying 9P server?
Don't use kfs rename; it doesn't move files
between directories.
Russ
mv does try to rename (uses dirwstat, I think), but you can only rename to the
same directory, check mv.c. There was a discussion a while back about how
hard cross directory renames are.
Sorry to be pig-headed about this, but I guess I'm spoilt (Boyd,
none of your sarcasm, please) with NetBSD doing it all for me.
Thing is, I may be representative of a largish community of spoilt
users, but there are also other considerations, for example, there
may not be enough space for the copy, a situation made worse by
the presence of a large time window in which race conditions can
occur.
Russ asks a pertinent question, how does one tell? I'm wondering,
not being too good at the innards of Plan 9 and/or the filesystem,
whether it would not be worth sacrificing the cleanliness of the
filesystem to the ability to know. Yes, it is a slippery slope:
once you can tell, why not move directories around, and so forth.
All I can say in self-defence is that user convenience is being
sacrificed here (I ought to know, but I can't remember how the
various Windows flavours deal with the problem) and the sacrifice
may be greater than the gain in simplicity.
I'm sure I'm wrong, but I never quite understood the underlying
limitation, so perhaps I can be pointed in the right direction and
I'll shut up for good.
++L
LDR> All I can say in self-defence is that user convenience is being
LDR> sacrificed here (I ought to know, but I can't remember how the
LDR> various Windows flavours deal with the problem) and the sacrifice
LDR> may be greater than the gain in simplicity.
win2k doesn't copy the file
It could be done, it just depends on what you'ld like to pay for it,
its just code after all.
You'ld need to create a new file system message (Tmv) and a new system
call (mv). The kernel would have to start by walking both paths and
determine if they came from the same mount point, error if not. It would
then send a Tmv message with 2 fids. The file system would determine
if it could do it and return an error if not (might not be space, ...).
The file system structure would also have to be changed to accomodate it, we
don't have links, though the copy on write makes that not too terrible.
We consciously traded simplicity for it. It hasn't bothered us enough to
regret it, perhaps because a real file server is faster than kfs, perhaps
because we don't do a lot of file moving.
If you think otherwise, that's why the source is open. Have a look and
see what you'ld need to change and then convince others that the change
is worth it. Implementations are no big deals. New file system messages
are since you have to change lots of file servers to go with it.
> I had the distribution archives in /dist under "kfs" and decided to do
> the following:
>
> % cd /dist
> % mkdir plan9
> % mv *.9gz plan9
>
> the machine is still busy copying plan9.9gz :-(
>
> In this case, a "rename" would have been appropriate. Is it not
> possible to determine whether it would work and resort to cp+rm if
> it doesn't?
This is indeed what happened, mv determined it wouldn't work and did a
cp+rm when it found it wouldn't.
Presotto certainly covered the important facets rather nicely, it
would be a pity to drop that particular ball now.
In passing, I can't easily find a reference to the 9P2000 preliminary
information, could someone post a URL? I seem to remember some
sort of version information being designed in which would certainly be
helpful.
++L
PS: Dave, nice domain name.
--jim
like the new email address.
phil
In addition, it may be more consistent for mv not to go part way
and do a cp+rm on my behalf _on_condition_ that the request is not
recursive, rather not do it at all. But I suppose my MSDOS background
shows here, the "mv" I'm thinking of is "REN". That, I ought to
be able to fix by crippling "mv".
What wasn't clear from Dave's response, is whether we would need
a "Tmv" to do it all, including recursive copies that (shudder!)
may need to understand the entire namespace, or just "Tren" that
may still change the namespace, but at least won't have to make
decisions on how to follow mounted resources.
I should imagine that BSD "rename" must have been made orders of
magnitude more complex by symbolic links. After all, SysV 3.2 "mv"
could rename directories into other directories in the same
filesystem, couldn't it? It's been a long time since my 3B2 was
turned off, so it's hard for me to check. I'm almost certain SCO
Xenix and Unix could do it without copying.
++L
On Mon, 8 Oct 2001, Lucio De Re wrote:
> On Sun, Oct 07, 2001 at 12:23:59PM -0400, j...@plan9.bell-labs.com wrote:
> >
> > As Dave points out, there are a lot of balls in the air during
> > an atomic 'rename'. 4.2BSD introduced the 'rename' system call,
> > and as an alpha/beta tester for 4.1[abc] and 4.2BSD I can testify
> > to how long it took to get it right and how much ugly code was
> > involved.
So can anybody who had seen current 4.4BSD derivatives. _None_ of them
got it right. rename()/rename() race (creating a loop and detaching it
from the rest of tree) is still there.
> There are really only two social factors overriding here. People
> like me have grown to expect it, unfortunate as it may be, and the
> alternative is inconsistent and open to error, besides being much
> more time and resource intensive.
> I should imagine that BSD "rename" must have been made orders of
> magnitude more complex by symbolic links. After all, SysV 3.2 "mv"
> could rename directories into other directories in the same
> filesystem, couldn't it? It's been a long time since my 3B2 was
Same race + metric buttload of other fun stuff.
And then there is _really_ fun stuff in userland. Like find / mv and
rm -rf / mv races that allow any user to create a deep tree in /tmp,
wait for cron-ran script to start removing it and do something along
the lines of mv /tmp/foo/bar/baz /tmp/quux, tricking find(1) or rm(1)
into walking up into the root and proceeding from there. Works like
charm on almost all Unices I've seen.
Sure, it's a broken userland and it needs to be fixed whenever such
bugs are found, but the point is that writing correct tree-walker
that would deal gracefully with cross-directory renames is _nasty_.
Often. This is one of the things that most irritates me about Plan 9.
In particular, the oft-cited idiom of using tar to copy the hierarchy
is especially undesirable under Plan 9 because it generates lots of
garbage that will fill up your dump filesystem much faster.
To what extent do you think your expectations and habits are shaped by
the limitations of your tools?
So can you say how often in practice that race condition is exercised?
> > I should imagine that BSD "rename" must have been made orders of
> > magnitude more complex by symbolic links. After all, SysV 3.2 "mv"
> > could rename directories into other directories in the same
> > filesystem, couldn't it? It's been a long time since my 3B2 was
>
> Same race + metric buttload of other fun stuff.
turning a strict tree into a mesh/DAG is always fun. But, its also possibly
useful for some people. Me? I liked the newcastle connection but we don't
have /.../ so I guess I lost. Namespaces might be in the same spirit.
I understand its a deeply held religion, but what IS it with runtime/use-time
reference by name which people find so offensive in the filesystem apart from
implementation complexity? We have call by name/value/reference in code, why
not in namespaces?
-George
--
George Michaelson | APNIC
Email: g...@apnic.net | PO Box 2131 Milton QLD 4064
Phone: +61 7 3367 0490 | Australia
Fax: +61 7 3367 0482 | http://www.apnic.net
++L
Because I'm steeped in SysV (which is where I cut my Unix teeth), I
wouldn't find it hard to use "find . -print | cpio -pvdm dest" as
taught in one of the man pages in my early days. So, to answer your
question truthfully: "Never".
++L
On Mon, 8 Oct 2001, George Michaelson wrote:
>
> > So can anybody who had seen current 4.4BSD derivatives. _None_ of them
> > got it right. rename()/rename() race (creating a loop and detaching it
> > from the rest of tree) is still there.
> >
>
> So can you say how often in practice that race condition is exercised?
Any time when attacker feels like that. System where nonprivileged users
can cause filesystem corruption is broken. Period.
Umm yes, but Alexander, when was the last time you *saw* one of these?
If I understand the above correctly, my take is that symbolic links
are totally unchecked filesystem GOTOs and deserve a much worse fate
than they get. Specially as they can point to the void and often do.
As to the rules (-L) for when they are dereferenced or not, that's a
morrass of confusion.
++L
++L
Riiight, but that was designed in. Its a goal. They wanted that behaviour.
> As to the rules (-L) for when they are dereferenced or not, that's a
> morrass of confusion.
No argument. That could have been done so much better.
But the idea of a filename, booked into the filename space which says
"I may not resolve all the time, but if I do, I point to <x>"
Isn't so inherently evil to me. it seems to me it mimics real-world behaviours
rather well.
Pointing out M$ has .LNK isn't going to help my cause much is it :-)
cheers
-George
Are you saying that this problem demonstrably exploited the race condition
between cp/mv and rename as implemented in FreeBSD?
I really do mean the question as put:
when was the last time anybody saw a successful exploit of this race condition
or an unstable filesystem they can show came from it, exploit or accident?
I have seen many problems with UFS/FFS, and Softupdates gave me the willeys
but I have also not yet seen serious corruption of the on-disk state which
lies directly with problems in the FS code itself. Side-effects of kernel
crashes during meta-state updates, sure. But this sounds to me like FUD which
in practice doesn't exist.
You could probably argue half a million potential race conditions exist in
lots of systems. The frequency they occur is different.
Now, now. It's nowhere near the nastiness of remote root compromise, but
yes, "nobody will ever try to screw me" is exactly the attitude that made
them possible.
As for the original question - two weeks ago, when I had demonstrated the
effect to a guy who claimed the OpenBSD kernel was "bulletproof". OTOH, in
case of OpenBSD it's one of the mildest problems - there being able to do
rfork(RFPROC) means being able to cause kernel panic (races between
fstat()/close(), dup2()/close(), write()/close() - you name it).
On Mon, 8 Oct 2001, George Michaelson wrote:
> Are you saying that this problem demonstrably exploited the race condition
> between cp/mv and rename as implemented in FreeBSD?
Yes. ufs_rename() is racy and yes, that race is wide enough to be
exploitable. BTDT.
> I really do mean the question as put:
>
> when was the last time anybody saw a successful exploit of this race condition
> or an unstable filesystem they can show came from it, exploit or accident?
>
> I have seen many problems with UFS/FFS, and Softupdates gave me the willeys
> but I have also not yet seen serious corruption of the on-disk state which
> lies directly with problems in the FS code itself. Side-effects of kernel
> crashes during meta-state updates, sure. But this sounds to me like FUD which
> in practice doesn't exist.
process 1: current directory in /tmp/a/a/a/a/a, does
rename("/tmp/b/b", "a");
process 2: current directory in /tmp/b/b/b/b/b, does
rename("/tmp/a/a", "b");
Normal outcome: first process to do rename() succeeds, second - fails
with ELOOP. With the right timing _both_ succeed, creating a loop and
detaching it from the rest of filesystem.
Notice that use of relative pathnames is critical here - otherwise
lookup in the second rename() will block on the lock acquired by the first
one. Code in /sys/ufs/ufs/ufs_vnops.c implicitly assumes that lookups
ending in descendants of directory will have to pass through that directory.
That assumption is obviously false - namei(9) can start in a descendant of
directory in question.
And no, it's not too narrow - window includes quite a bit of disk
IO. Figuring out details of turning that into full-blown attack (i.e.
what should be done to widen the window) are left as an exercise to anyone
who can RTFS - it's pretty straightforward.
One valid point is that computers are able to mimic real life more
closely, but I'm of the school that was trained on deterministic
behaviour.
Non-deterministic bahaviour is a curse and a sufficiently vast
computer system is non-deterministic. Dijkstra had trouble with
interrupts, but I believe they could be assimilated into a
deterministic framework, modern bloat is orders of magnitude too
complex for such treatment.
There's a strange commercial strength in non-determinism, basically
getting the luser to accept failure as if it were essential. I
suppose that was the M$ genius: don't let the user expect perfection
in anything but the _next_ generation of software.
++L
What's this, can't the Plan 9 Thought Police take a little criticism? :-)
More seriously, an extreme response would be something like giving
up Plan 9. If anything, I am using it more often these days.
That said, there are numerous irritations. Some are architectural
and unlikely to be fixed, others are easy to fix. Here's a sample:
* The existence of non-network-transparent kernel devices like /srv
which don't work when accessed by exportfs on behalf of a remote machine.
* The need for /$objtype/bin/ape/psh, which is a #!/bin/rc script.
(Exercise for the reader: explain why this exists even though the
same script can also be found in /rc/bin/ape/psh.)
* Lack of long file names. (Someday to be fixed in 9P2000.)
* Other arbitrary limits in programs. Everyone is familiar with line
length limits in various utilities. Lots more obscure ones, like
max 512 files in ramfs. Avoiding arbitrary limits is something the
GNU folk got really right, because one person's "this is a reasonable limit"
is another's "how could those morons set such a low limit?" It's easy
to get right and you should Just Do It.
* rio windows are not searchable, but on the other hand acme "win" windows
won't let you run graphics programs.
* Lack of find and xargs. I don't need creeping featurism like "chmod -R",
but there should be *some* mechanism for mapping commands over the
file hierarchy.
* Anemic functionality in diff, and lack of patch. For example, GNU
diff will at least tell you whether binary files differ; Plan 9 diff
just gives up.
* Bad release engineering. For example, in /sys/src:
"mk all" builds the commands in /sys/src/cmd, but doesn't
build the commands in subdirectories of /sys/src/cmd.
(you need to also go to /sys/src/cmd and do "mk all.directories")
why?
"mk clean" doesn't remove every generated file. try (say)
"find . -mtime -1 -type f -print" after rebuilding the world.
(oops, forgot--Plan 9 doesn't have "find".)
"mk all" installs some things even though there is a
separate "mk install" target for installing. the lib*
makefiles are particularly annoying in this regard.
* Poor support for symbolic debugging. This one has always baffled
me because Bell Labs had an amazingly pleasant debugger called "pi"
under v10.
* Poor memory management. On my 1GB desktop system, a ridiculous amount
of kernel memory gets allocated to pixmaps, but kfs still uses only 2mb
for file buffers. (Yes, I know I'm supposed to use the dedicated file
server... someday soon.)
* Lack of a web browser. Sigh, even those of us who should know better
get caught. ("Hey man, you've got to try this stuff... don't worry,
you can't get addicted if you just try it once!")
* /lib vs. /sys/lib
> More seriously, an extreme response would be something like giving
> up Plan 9. If anything, I am using it more often these days.
>
You're right, of course. And it's good that Bell Labs released Plan 9
to a more critical audience, even though we do occasionally get their
goat with our requests.
In your list, a few issues would be very easy to correct, and I'm sure
the right amount of fedback (call it code, if that's what it takes)
will be accepted. So, probably, would be an attempt to provide a
semi-sane "rename" capability.
Looking at Russ making progress on so many fronts, I would say we can
all vote with our code. Even to the extreme of re-introducing ALEF,
if I was willing to complete the port to 3ed.
++L
> How would you determine whether it would work?
> How would you specify it to the underlying 9P server?
We had this problem in the Hurd, and the answer was give to one of the
servers for the two names (it doesn't matter which) your capability
for the other server.
Then if both are handled by the same server (what Unix thinks of as
"same filesystem") it can be done directly. If not, the server could
return EXDEV (as Unix does). Or, two cooperating servers might have a
private protocol for arranging it between them.
I don't know whether this would work in Plan 9, though I'm interested
in hearing details. Can you hand your capability for one server off
to the other?
> As Dave points out, there are a lot of balls in the air during
> an atomic 'rename'. 4.2BSD introduced the 'rename' system call,
> and as an alpha/beta tester for 4.1[abc] and 4.2BSD I can testify
> to how long it took to get it right and how much ugly code was
> involved.
Color me confused, but isn't it a jillion times harder to get it right
in user space?
We should be careful to distinguish between a utility command,
which usually is best if it works as widely as it can to do a
simply described function (from the user point of view; it might
be complicated to implement), and a fundamental operation in the
protocol itself (I/O primitive).
I think it is reasonable for the 9P200x protocol to have a
Trename request, which of course can fail if the operation is
not possible or makes no sense for the particular object.
Then it is up to the (file) server to make it happen; the
important thing is that the file server need only use local
locks while it shuffles things around (if precached, it could
be serviced as an atomic operation).
There is the question of what permissions should be required
for a rename to succeed: write access on the directory, and/or
on the named object itself? This may be the same question as
whether the rename ought to be for a handle in the directory
(i.e. Trename is an operation on a directory) or for a handle
on the named object.
I am inclined to think rename ought to work within a directory
but not in general between distinct path prefixes.
I frequently move large trees around, but often they are not within
the same filesystem.
> So can anybody who had seen current 4.4BSD derivatives. _None_ of them
> got it right. rename()/rename() race (creating a loop and detaching it
> from the rest of tree) is still there.
I'm pretty sure you're incorrect. Can you describe the case in more
detail?
I'm pretty sure in the opposite and unfortunately *BSD source happens to
agree with me. Consider the scenario I've described upthread (s/ELOOP/EINVAL/
- sorry about the braino) and think what happens if both renames get to
ufs_checkpath() simultaneously. See /src/ufs/ufs/ufs_vnode.c::ufs_rename()
for details. Notice that none of the lookups is going to come anywhere near
the vnodes affected by another rename() and ufs_checkpath() is called with
fvp unlocked and it unlocks dp (parent of fvp).
They spent quite a while in rename hell.
> I think it is reasonable for the 9P200x protocol to have a
> Trename request, which of course can fail if the operation is
> not possible or makes no sense for the particular object.
> Then it is up to the (file) server to make it happen; the
> important thing is that the file server need only use local
> locks while it shuffles things around (if precached, it could
> be serviced as an atomic operation).
>
I think there's growing agreement on this score. I certainly feel
it's justified. But Jim's reservations about race conditions have to
be taken into account. Alexander seems to think BSD did it wrong (did
Linux improve on that?) but didn't suggest it can't be done.
> There is the question of what permissions should be required
> for a rename to succeed: write access on the directory, and/or
> on the named object itself? This may be the same question as
> whether the rename ought to be for a handle in the directory
> (i.e. Trename is an operation on a directory) or for a handle
> on the named object.
>
These are rules, they need to be established and documented (and make
sense, of course). But that's about it, really.
> I am inclined to think rename ought to work within a directory
> but not in general between distinct path prefixes.
My choice would be the point where copying the data becomes necessary.
Whatever operation doesn't need to relocate the actual file contents,
ought to succeed. But I can't say I understand all the subtleties
involved.
++L
This is because the file system isn't a tree and hasn't been for a very
long time. It's a graph, but people still talk about it as thought it were
a tree and write simple recursive algorithms and get themselves in
trouble all the time.
-rob
-rob
In my ignorance, I presume that only the root needs to move around and
the rest (hopefully acyclically) will remain unchanged. Does the
possibility exist that we may be creating loops?
++L
Do you? A mv-tree thingy would require the server to know the name
space of the client to get this right. The server doesn't know that
one of the files in the client's tree is somewhere else. I honestly
don't see a reasonable way to do this right, even if we don't worry
about race conditions (and we do).
-rob
Actually, there's another fun issue here:
bind /foo/bar/baz /quux
cd /quux/crap
mv /foo/bar/baz/crap /foo/bar/crap
cd ..
Where should we end up? I have a somewhat reasonable answer for our
semantics of bindings, but I don't see it for Plan 9 one.
there are probably many ways to mess with your own
namespace. but why should the system care? after
all it's your own process-local namespace.
I do it all the time (at least 5, 10 times a month) on Linux, because
I'm kind of compulsive about maintaining just the right names for data
files I care about.
The point is not that it is necessary, but rather that millions are
already used to being able to do it, and letting them do it on Plan 9
(provided of course it is implemented cleanly) will lower the obstacle
to their become Plan 9 users.
But Plan 9 is not Unix and the more one tries to make it look like
Unix, feel like Unix, or quack like Unix, the more grief one comes to.
Under Unix I use mv to move trees around fairly frequently (as often
as once or twice a day), although at least half the time it is across
partitions so the point is moot anyway. The urge to do this
disappeared pretty quickly when I started using Plan 9 to the
exclusion of Unix this summer. I'm more of a neat-freak when it comes
to directory structure than most and I still didn't feel the loss.
Perhaps having a real fileserver makes the difference, but I'd suggest
living for a while with Plan 9 as Plan 9 and not Unix and see if you
still find it intolerable. I think the biggest obstacle to using Plan
9 is knowing that it was built in the same room as Unix.
-WJ
it is to me. how do i know if it does or doesn't? what's the condition?
what's the behaviour if it doesn't? hell, what's the behaviour if it _does_
resolve? questions of whether to follow links or not... it's just never
seemed worth it. the _only_ time i've missed symlinks in Plan 9 is in
importing packages from elsewhere. this simpler version of what you
have above, which _wouldn't_ seem inherently evil to me, would be:
"I point to <x>, where x is a set of data."
much simpler, much easier to implement (uh, like, files?), much more
consistant, and thus easier to understand. even multiple hard links
under unix satisfy this: if it's there, it points to a certain set of data.
no condidtions on that.
// Pointing out M$ has .LNK isn't going to help my cause much is it
no, not likely. it's been a while since i've poked at any Windows box,
but under at least '95 and '98revA, drop into a shell prompt and try
to cat^Wtype foo.lnk. it's not even built into the FS, it's interpreted by
the tools, like explorer.
-.
Moves that require data to be tranferred are forbidden in my model
(they can be handled by user-space operations, I suppose) and damage
to the namespace is the user's responsibility.
Again, in my ignorance, I assume that one would create an anonymous
node at the destination and point it (in some fashion I really
don't know enough about, but I'll be pleased to have my nose rubbed
into the details of, whichever way) to the fileserver entity
originally pointed at by the source. The original connection would
then be removed and name transfer take place (roughly). Please
educate me on this, reading the documents and man pages didn't make
the picture any clearer.
I certainly can't agree that the fileserver won't know that a file
is elsewhere: if the nameserver can't tell which files it serves
we have a serious problem. Now, if by the time the fileserver
considers a file it has an id that looks like a local one, we
certainly have a crisis, but I'm hoping that the kernel can prevent
this _at_the_root_ of the rename, yet I hear continuous references
to the remainder of the graph, which puzzles me because I don't
see rename navigating the graph at all. I'm sure I'm wrong, so
please tell me where.
As for Alexander's example, I really need to look at it more
carefully, I don't understand the details at all.
++L
rio windows are searchable, there's just not convienent pre-packaged ways
to do it. try `sed 10q /dev/text' from a rio window. "Look" in rio would be
very nice; having /dev/text would be, as well.
// ...acme "win" windows won't let you run graphics programs.
there was work on correcting this, no? i seem to remember rob having a
version of acme that would at least allow one to display static images
within acme. did this ever get any further?
// * Lack of find and xargs.
i've missed find several times. xargs always struck me as compensating for
poor shell design. i still think find would be useful, although a massivly
stripped-down version from what's in most unixes would be more than
enough. about 90% of my find usage has been replaced with the fragment
`{du -a . | awk '{print $2}'} and i think a shell script built around that, ls -l,
and grep could be all the find i ever need.
// * Anemic functionality in diff, and lack of patch. For example, GNU diff
// will at least tell you whether binary files differ; Plan 9 diff just gives up.
cmp(1) works fine on binary files. what else are you looking for in diff? i
bind adiff into my /bin even when not in acme, to plumb from anywhere.
// * Poor memory management. On my 1GB desktop system, a ridiculous
// amount of kernel memory gets allocated to pixmaps...
kernel memory is a tunable parameter in plan9.ini(8); see *kernelpercent.
i'm not sure that's what you're talking about, though. are you looking to
adjust individual pools (image, heap main), like you can in Inferno?
// * Lack of a web browser.
in Inferno, see charon(1). i use it for about 90-95% of my web use. of the
sites it has problem on, the majority (in my experience) are stupid sites
doing bogus checks to make sure i've got appropriate capabilities (most
comonly 128-bit SSL) by checking my browser version, and complaining
if it's not certain version of IE or Netscape. Charon lets you set the agent
string, but that leads to other problems. i've sometimes missed better
JavaScript support, but never Java support.
i use VNC to a remote Solaris box with Netscape installed in the 5-10%
where Charon won't cut it.
// * /lib vs. /sys/lib
i maintain that this is a good distinction that just hasn't been followed.
maybe the names could have been chosen better, but i think there's
good reason to seperate a repository of information (/lib) from system
configuration info (/sys/lib). i don't, for example, want the kana tables
or constitution in the same place as system-wide ssh or plumbing
configuration info. of course, Plan 9's already broke this (several things
in /lib, like /lib/namespace, seem like they belong in /sys/lib), but
that doesn't indicate the idea's flawed.
-.
1) avoiding a copy the data and perhaps some of the metadata
2) on a recursive call, avoiding a message per file
3) getting it right if two people/programs do it simultaneously
The first 2 are performance, the last is adding something that
you can't do with our current mv.
(3) seems to be a solving a problem in the wrong place.
If two people simultaneously move a graph, the result is going to
be confusing/scarey even if everything resides in the same file
server and we somehow manage to lock the whole file server while
it happens, since in the absolute best case one is going to see
his graph inexpicably disappear. This is more a social problem than
a technical one. If programs are going to do it, they'ld better
set locks somewhere and have some agreement on what name space is
being looked at.
In plan 9, as rob pointed out, there's no way
for the file server to know what your name space is.
Therefore, the application or kernel has to walk the whole
subspace being moved and then try to create a set of requests
to the relevant file servers. Not only will this set not
be locked but you'll have done a large amount of walking
to figure out what to do. Because of this, I think we're stuck with
forgetting about (2) and (3) without majorly changing how mounts
work.
(1), on the otherhand, may be possible. It still requires that
you walk both paths and make sure they end up on the same
server. Then you can have a message that says 'mv fid1 to fid2'.
I think that that's the best you can do, i.e., file by file.
You might improve that to a directory mv but you'ld have to
know that nothing was mounted below the source directory.
That is probably more trouble than its worth.
The file server can always say 'error - unimplemented' and
you can fall back to the create/copy/rm. That way you
don't have to require exportfs, ftpfs, upas/fs and a
slurry of other impossible to understand servers to
change.
By the way, think about the following scenario:
mkdir x
> x/y
bind -a x .
mv y z
What do you want to happen here? Should y change directory when
renamed to z or not?
On the whole, I don't think it worth the trouble. As the above
example points out, the result of mv is already pretty confusing
in our namespace. Whatever solution you choose should make things
less confusing, not more.
To increase the usefulness of renamefs, we allow it to store 2 types of
records. Specifically, a record can take the form
D<path>
, which means that the string <path> "no longer lives here"
(ie, renamefs will act as if <path> does not exist), or it can
take the form
A<path><path2>
, which means that <path2> will get rewritten as <path> and does not
carry the implication that <path> no longer lives here. (Yes, yes, I
know: between <path> and <path2> you need some lexeme that does not
occur in paths.)
Then,
Mv <path> <path2>
can be implemented by adding two records, A<path><path2> and D<path>, and
something resembling Unix's
Ln -s <path> <path2>
is implementable by adding one record, A<path><path2>, to the rewrite
table inside renamefs. The main socioeconomic purpose of these features
is to make Unix users feel more at home by providing means to move
directories, means to do inter-directory moves and symlinks. Moreover,
Del <path>
which just adds D<path> to the rewrite table, is a delete that supports
undelete (via removal of the D record). And resource forks can be
implemented via ... --Just kidding.
No need to modify 9P of course: just have renamefs listen on a specific
file like /foo/rename where foo might be mnt (sorry for my unfamiliarity
with Plan9 file naming conventions) and have mv write requests
(candidates already in the form of strings for entry into the rewrite
table) to /foo/rename (rather than having mv send rename or copy and
delete requests like it does now).
When everyone is sleeping, or when the user says so, renamefs can empty
the rewrite table by actually copying and deleting files in the
underlying filesystem. Call this operation "update".
There are multiple ways I'm sure to create pathological situations with
renamefs, and I do not know enough about Plan 9 to address them, except
to point out that one way to recover from pathology --assuming the
update operation has not run yet-- is simply to delete the rewrite table
via, eg, echo clear >>/foo/rename or, if the namespace is so screwed up
that that impossible, to reboot without mounting renamefs.
Advantages of my design:
Users who do not need the functionality do not pay any cost: the code
for renamefs and it's clients (eg, the new mv command) lie dormant on
the hard drive where they cannot cause problems. No modification
of extant code is required.
Makes Plan 9 look more like Unix for those who want want that sort
of thing, and yes I hear loud and clear that many here do not.
Somebody please fix this. All you need
to do is implement a recursive union mount.
We will all be grateful.
> * Lack of long file names. (Someday to be fixed in 9P2000.)
9P2000 doesn't allow long file names. You can't have names bigger
than 65536 bytes.
> "mk clean" doesn't remove every generated file. try (say)
> "find . -mtime -1 -type f -print" after rebuilding the world.
> (oops, forgot--Plan 9 doesn't have "find".)
"mk nuke" does, no?
i thought overlayfs was supposed to do something like this?
maybe it never worked...
> "mk nuke" does, no?
while we're on the subject of removing intermediate files and
criticisms, one of my pet peeves is that object files have a
single-letter extension, thus at a stroke removing many potentially
useful filename extensions and making the choice of a new $O fraught
with danger...
there's no easy way of removing all object files from a directory
without enumerating all the possible (and changing) set of objtype
extensions.
would it be an enormous hardship to change from
file.$O
to
file.o$O
?
and then at least i can remove all object files in reasonable
comfort and safety with "rm *.o?"...
rog.
geoff got stitch mostly working before he
left, but i haven't gone back to it.
stitch is rather heavy weight for something
that you probably want in every person's
namespace. (for instance, you probably want
stitch -a /rc/bin /bin
stitch -a /$objtype/bin /bin
so that you can do away with /$objtype/bin/ape/psh).
i'm convinced the kernel should do it,
but the implementation is hard. however, once
you have it working, you could write an nfafs
to replace libregexp: just mount the regexp and
then try to open /mnt/regexp/t/e/x/t/t/o/s/e/a/r/c/h.
Kenji
my stance is that OS designers should learn about
capability OSes like EROS and that Plan 9 should
be made more attractive to people used to Unix's mv
command. do you detect an inconsistency in
that?
The other question I have: is all this worth just to avoid mv
doing a cp+rm? Beware that even if you could move, the
worm would burn a copy unless you also modify it.
Of course, if 9 doesn't have them, it doesn't have the problem either.
-George
> cmp(1) works fine on binary files. what else are you looking for in diff? i
> bind adiff into my /bin even when not in acme, to plumb from anywhere.
I want diff to DTRT for whatever kind of file it is.
Indeed, line-by-line diff can even work for binary files! You have to
make sure you escape things appropriately so that patch (or ed, or
whatever) can deal, but the case is not somehow inconceivable. (Of
course, for binary files a diff will often typically be as large as
the original: and I don't so much mind just falling back to cmp.)
> // * Lack of a web browser.
>
> in Inferno, see charon(1).
Inferno != Plan 9.
There is no web browser in Plan 9---and that's got to be a big
obstacle.
> I'm pretty sure in the opposite and unfortunately *BSD source
> happens to agree with me. Consider the scenario I've described
> upthread (s/ELOOP/EINVAL/ - sorry about the braino) and think what
> happens if both renames get to ufs_checkpath() simultaneously. See
> /src/ufs/ufs/ufs_vnode.c::ufs_rename() for details. Notice that
> none of the lookups is going to come anywhere near the vnodes
> affected by another rename() and ufs_checkpath() is called with fvp
> unlocked and it unlocks dp (parent of fvp).
It would certainly be a bug if two processes got to ufs_checkpath
simultaneously.
My recollection is that BSD 4.3 contained code to serialize directory
renames, which solves the problem quite nicely. I can't find an
analog in NetBSD (which of course has totally changed the organization
of the filesystem code since BSD 4.3).
It's adequate to do this on a per-filesystem basis (assuming you
prohibit cross-device renames), so the Hurd can conveniently do it in
individual servers (and does); Plan 9 could do the same I would
suspect.
Thomas
? It has a root, and in every directory including the root there
are names that refer to things that are either directories or not.
The only thing that would keep those properties from imposing a
tree structure would be the existence of cycles. Sure, it is
possible to construct Moebius poiuyts, but most of the time if
someone wants to invoke a tree walker it is because he *knows*
that no such shenanigans are present. On UNIX variants these
days, there are symbolic links that allow cycles, but all that
is necessary to keep them from causing problems is to flag an
error when path-walking revisits a node that is in the parent
path. Even on pre-7th Edition Unix, it was fairly easy to make
a cyclic structure using (hard) links. But tree-walking has
been common and useful for decades despite those opportunities
for cyclic behavior.
As the WWW has shown, structures don't have to be perfect to
be very useful. We're still waiting for Xanadu..
> i've missed find several times. xargs always struck me as
> compensating for poor shell design.
Isn't the problem the argv limits of number of characters and number of
words?
> i still think find would be useful, although a massivly stripped-down
> version from what's in most unixes would be more than enough. about
> 90% of my find usage has been replaced with the fragment `{du -a . |
> awk '{print $2}'} and i think a shell script built around that, ls
> -l, and grep could be all the find i ever need.
Didn't Bell Labs have a stat(1) program on Unix which used to print the
return from stat(2) which could then be used to choose the files using
awk, etc. This was an improvement on find(1) because you had more
flexibility instead of find's built-in language.
Ralph.
On Tue, 9 Oct 2001, Thomas Bushnell, BSG wrote:
> It's adequate to do this on a per-filesystem basis (assuming you
> prohibit cross-device renames), so the Hurd can conveniently do it in
BTDT (see fs/namei.c::vfs_rename_dir() in Linux 2.4.x). But that's far
from the only annoying piece of shit in rename(2) semantics. E.g. you
are getting very ugly locking rules if you want to handle the case of
rename() removing an empty directory correctly.
Again, I've really been there and done that. It wasn't pretty and I
would be much happier if rename() couldn't change the topology. We
couldn't do that since it would break a lot of userland stuff; Plan 9
has a luxury of dropping annoying crap and being able to find and fix
resulting breakage in userland. In this case crap is not there in the
first place, so introducing it would cause a lot of ugliness in the kernel
for no good reason.
> individual servers (and does); Plan 9 could do the same I would
> suspect.
I suspect that Hurd has slightly, erm, different design philosophy (and
that's about the only printable comment I can make on the thing, so let's
not go there).
Inferno runs hosted under Plan 9 and Charon runs in that environment.
we'll be blurring the boundaries further as well, at least for Plan 9.
i don't think so entirely...
for instance if i want to grep through an entire source tree,
i don't want to wait until the tree has been traversed completely
before the text searching begins. using
du -a . | awk '{print $2}' | xargs grep pattern
often gives me results much faster than
grep pattern `{du -a . | awk '{print $2}'}
the latter usually works. i don't know what the kernel
limit is on command-line arguments, but it's more than sufficient
to pass all the source files in /sys/src (6764 files, ~160K).
rog.
if you had a way to unblock/buffer the pipe, would the diffence be that
great?
That aside, I suspect I prefer the xargs one because my frontal lobes
are wired for linear tape-processing methods, and the second one requires
understanding function theory and/or two passes to parse the nested clause.
> the latter usually works. i don't know what the kernel
> limit is on command-line arguments, but it's more than sufficient
> to pass all the source files in /sys/src (6764 files, ~160K).
>
> rog.
Ie if you deliberately avoid the glob expansion problems which bedevill
ls * | do a bunch of stuff you don't need find in flat spaces.
xargs is cheap to implement in a shellscript and I have done that too,
because I didn't know xargs existed for a long time. Likewise I use awk
'{print $fieldnum}' because I didn't know cut existed, and my erstwhile
younger and wiser heads find me laughably inept at shell these days.
So is this yet another iteration of 'why write a command when you can write
a script' ?
-George
PS amusingly enough, the same year I was learning about v7 on a pdp11
in Leeds, I had an Armenian guy try and convince me PICK was the way of
the future, and that all coming filesystems would be database abstraction.
I suppose if I'd been malleable enough I'd have bought that too. Still,
we all laughed at the head of department for being a laser physicist. "What
could *he* know about computing?" we said. And more fool us...
--
George Michaelson | APNIC
Email: g...@apnic.net | PO Box 2131 Milton QLD 4064
Phone: +61 7 3367 0490 | Australia
Fax: +61 7 3367 0482 | http://www.apnic.net
Surely hard links couldn't point to directories, so it was only about
alternate paths to leaf objects. Thats not the same as a GOTO loop.
>
> As the WWW has shown, structures don't have to be perfect to
> be very useful. We're still waiting for Xanadu..
I entirely agree. I think that symlinks can be viewed as runtime
evaluated conditionals in the filespace. That it means perfoming
more work than traversing the hard link is true. That it lets
you perform a large number of tasks that want to walk into partitioned
spaces it also true I think.
That the only extant implementation has severe flaws and inconsistencies
is a damn shame.
Didn't the guys one stack down on the shoulders of the giants decide to
walk away from symlinks because it was hard to get right, more than
because philosophically they were 'wrong'? Unlike with dead physicists,
we get to ask these questions and sometimes get answers more than conjecture.
-George
they could, but (at least by 5th eidtion) it was restricted to the super-user.
in fact, that was how directory rename was implemented, by setuid mv using
link and unlink. races? argument checking? ``values of [beta] will give rise to dom!''.
the . and .. names were actual links in the directory, put there by the mkdir command.
the rmdir command did the 3 unlinks (for ., .., and the name itself).
Symlinks were put in to make it possible to link directories from user
mode and to have cross-file system links. Plan 9 does both those
things another way. I have never felt the need for symbolic links in
Plan 9.
They also have some ugly properties. I find the delayed evaluation
mystifying sometimes. Also, although they are used to implement
cross-system links, they live on a single file system. For example,
/usr/rob might be a link to /home/rob even though /usr and /home live
on different disks or even different servers, which means /usr has a
file that talks about /home, a file on a different machine. Besides
the creepiness of that, the delayed evaluation can bite you hard, the
permanence of the link can be troublesome, and this little piece of
name space stored as a time bomb somewhere in the network rather than
in the client's representation of its resources is poor compartment-
alization and therefore lousy design. And why on earth does ls
default to showing you the link rather than what it points to?
This does not imply that Plan 9's name space was designed in reaction
to symbolic links, of course, but at least some of us did observe that
symbolic links were little more than a disgusting hack and that there
had to be a better way. Ken was already thinking about that way when
all this was happening back in the early '80s. I believe that the
design he came up with, Plan 9's name space, is indeed better,
although it has some problems of its own and doesn't exactly cover
symlinks (the lack of permanence of the mount table comes to mind).
The unification of all naming operations into a single data structure
(the mount table), the well-defined evaluation, and the clean
separation of file servers and name spaces, and most important the
user-modifiable, per-process name space were all improvements.
As I said in my `dot-dot' paper, Unix still hasn't recovered from all
the problems wrought by the introduction of symbolic links. It is a
lesson, although not a new one, that the ugly, supposedly temporary
hack of symlinks became a permanent fixture of the system, and one
that people argue for passionately, mistaking the addressing of the
issue for a true solution to the problem (c.f. Plan 9's # notation).
-rob
I have an iMAC for that. One OS doesn't have to do everything.
Brantley Coile
I see your point.
In renamefs's defense, I note that most users most of the time are
working on their own stuff, not making any chanages to the parts of the
network that everyone relies on. If the renamefs is running on your own
terminal or cpued to some cpu server, and your working on your own
stuff, a crash of the renamefs will not affect the other users.
Ie, do not use renamefs when you are adminning the network.
// There is no web browser in Plan 9---and that's got to be a big obstacle.
given the presence of Charon for Inferno, i doubt it, actually. the point is to
provide a web browser usable withing Plan 9, and Charon fills that need
nicely (not totally, but nicely). do you really care that it's a dis program, or
that it's running within a virtual machine? do you think the Mac OS X users
are bothered by the fact that many of their apps run in Classic (OS X's
emulation of OS 9, for the unaware)?
i have a "charon" rc script in Plan 9 that starts up a seperate emu window
of reasonable size and starts charon running in it. the differences between
that and were Charon running "native" in Plan 9 are minimal, to me as a
user. and, more to the point, given the presence of a web browser usable
directly from within Plan 9, as Charon is, i definatly don't think i'd advocate
anyone spending time on that, while there's still plenty about Plan 9 that
i'd like to see fixed that Inferno _doesn't_ solve (yet?).
and yes, one OS doesn't have to do everything, but it is worth looking at
how much we can make it do (without breaking what's right about it).
-.
funny, and i always considered xargs confusing to the way my brain understood
shell stuff: "uh, where's grep's other argument?" i found this especially true in rc,
where the nice {} and clean quoting rules make the "nested" version much easier
to read and understand than in many other shells. xargs seems to invert the flow
of the pipe, reminding me more of the "infix" notation originally proposed for
pipes (see http://cm.bell-labs.com/cm/cs/who/dmr/hist.html#pipes) than the
current structure.
rog's point about the buffering of input is a good one, and maybe xargs is just
better there, but it's just never been an issue for me, even before i found Plan 9
and worked exclusivly on Unix machines (and thus had xargs as an option).
-.
> Again, I've really been there and done that. It wasn't pretty and I
> would be much happier if rename() couldn't change the topology. We
> couldn't do that since it would break a lot of userland stuff; Plan 9
> has a luxury of dropping annoying crap and being able to find and fix
> resulting breakage in userland. In this case crap is not there in the
> first place, so introducing it would cause a lot of ugliness in the kernel
> for no good reason.
I'm not expert in Plan 9, but it seems to me that it breaks it for the
user.
Is this just an example of the "New Jersey" preference for simplicity
of implementation over the "Cambridge" preference for correctness and
completeness?
[about symlinks]
> and this little piece of
> name space stored as a time bomb somewhere in the network rather than
> in the client's representation of its resources is poor compartment-
> alization and therefore lousy design
This makes me think of something.
Plan 9 puts the mount hierarchy over in the per-process namespace,
which is a nifty idea.
Rob points out here that symlinks also really belong in per-process
places (and the same effect can be more cleanly realized with
"bind"). So clearly, there is no need for symlinks in Plan 9.
(One thing; please clear up my ignorance: sometimes a symlink is used
for the benefit of all the users; where in a Plan 9 system do I put a
"bind" such that it is the automatic default for all users in that
way?)
It seems to me that the same could be true of directory renames, if
the entire naming hierarchy were regarded as a per-process namespace,
rather than just the "mount" portions. Then a given user could move,
mount, unlink, etc, without changing anything on the fileserver at all
(except a ref count). [Actually, let it become more liberal and a ref
count is probably not sufficient any more.]
Yeah, but Plan 9 is "free", at least in scare quotes, and AFAICT
Inferno is not. "It runs under Inferno" is therefore considerably
materially different than Plan 9.
From this it sounds rather as if splufty new applications for Plan 9
are not going to happen, and that new work is all happening over in
Inferno land. Is that accurate?
``correctness and completeness''? this sounds like the result of
a self-assessment exercise.
"The user"?
You mean there's only one?!?:-)
Most of the people who have been complaining are not typical "users".
In reality, I doubt that most users want to move large directories around.
(In my experience, most Windows users just leave files wherever
the application drops them and then complain when they can't find them ...)
Plan9's ability to present a flexible view of the file system
probably obviates most reasons to want to shuffle directories around.
Also, to be brutal for a minute,
why *do* people want to move directories around:
why not just put them in the right place to start with?
The only time I can remember needing to move large directories
was when i needed to balance NFS servers,
i.e. relocating them from one file server or volume to another ...
> Is this just an example of the "New Jersey" preference for simplicity
> of implementation over the "Cambridge" preference for correctness and
> completeness?
No polite comment available:-).
Cheers,
Dave.
On Tue, 9 Oct 2001, Thomas Bushnell, BSG wrote:
> Is this just an example of the "New Jersey" preference for simplicity
> of implementation over the "Cambridge" preference for correctness and
> completeness?
Taste vs. GNU? That too, but there's more to it.
BTW, while we are at it, check what does GNU find(1) (which is the
default on Hurd, IIRC) do when somebody moves a directory while
find is walking through its subdirectories. And think about the
implications. What was that about correctness and completeness, again?
surely it's "broken for the user" the moment the user can't move a
file from one place to another for any reason at all (e.g. because
the file is on a different disk, or a different location on the
network, or whatever).
under Windows, where disks and networks are visible at the top level
of the file tree to the user, perhaps this is slightly more
understandable, but where the file tree is unified, the difference
between "can't because it's on a different disk" and "can't because
it's in a different directory" is surely one of degree not of kind?
rog.
most of Inferno is as "free" (using the same or lighter quotes as in
Plan 9 being "free"); some is not. the free parts, which include
Charon, can be downloaded from Vita Nuova's site.
// From this it sounds rather as if splufty new applications for Plan 9
// are not going to happen, and that new work is all happening over
// in Inferno land. Is that accurate?
i doubt it. there's lots of inteligent C programmers running around
on this list alone. for various reasons, interesting things continue to
happen in Plan 9. but if something can more easially, simply,
quickly, and correctly (!) be done in Inferno than "native" Plan 9,
and can thus run on Plan 9 and bunches of other OSs, as is the case
with Charon, why duplicate the effort in Plan 9? i'd certainly rather
people interested in working on Plan 9 work towards making the
_other_ interesting things happen, like 9P2000, IPv6, USB support,
better driver support, etc. there's lots of room for interesting work
in both Plan 9 and Inferno.
-.
>> From this it sounds rather as if splufty new applications for Plan 9
>>are not going to happen, and that new work is all happening over in
>>Inferno land. Is that accurate?
not at all, it's just that in this particular case, an HTML 3.x browser (with common 3.x extensions)
and Ecmascript exists, with reasonably small source (and it is being extended to 4.x etc).
it's hard work to start one of those from scratch, and if porting one of the existing ones,
good luck.
That's what /lib/namespace is. It represents a global namespace for
all users on a machine. Of course, the user can then do whatever
he/she wants but serves as the default namespace.
/lib/namespace and your profile also represent persistence which is
another useful property you get from symlinks.
plan 9, an operating system for people who never change their minds
or have new ideas.
erik
Thanks
Kenji
> >Also, to be brutal for a minute,
> >why *do* people want to move directories around:
> >why not just put them in the right place to start with?
>
(I almost ignored this as a troll)
Just to be explicit:
* The _old_ idea here is *being* able to rename directories.
* The _new_ idea here is *not being* able to rename directories.
Also, I have yet to see anyone explain *why* moving directories around
is such a big win. All I hear is
"Well, I can do it on this other system ...".
As I said
> > Plan9's ability to present a flexible view of the file system
> > probably obviates most reasons to want to shuffle directories around.
Cheers,
Dave.
> Also, to be brutal for a minute,
> why *do* people want to move directories around:
Typical reason is wanting to re-arrange a directory structure after a
better layout becomes clearer with use.
Ralph.
> Also, to be brutal for a minute,
> why *do* people want to move directories around:
> why not just put them in the right place to start with?
Unlike people here, apparently, I sometimes make mistakes.
> BTW, while we are at it, check what does GNU find(1) (which is the
> default on Hurd, IIRC) do when somebody moves a directory while
> find is walking through its subdirectories. And think about the
> implications. What was that about correctness and completeness, again?
find does not necessarily traverse every file on an active partition.
But that has nothing to do with the moving of directories? The same
thing happens with the moving of single files too.
> you can download all you need of Inferno, including the source for Charon,
> free of charge and with a licence that's at least as liberal as the Plan 9
> one.
Thank you for correcting my misunderstanding! Where do I find the
download? A web search was pretty unsuccessful, and poking around the
vita nuova site I also failed to find it.
[Moderator's Note: try http://www.vitanuova.com]
Thomas
There are several problems with traditional xargs.
The main one is that it doesn't properly quote the arguments,
so if there are embedded special characters the invoked
command gets bogus arguments. (I usually encounter this in
one of the font directories where files are named after the
characters.)
8th Edition UNIX had an "apply" utility that was pretty
much the same thing. Something of the sort is an important
tool to have in the utility toolkit.
Yes, they could. It was a common form of corruption in 6th Edition
UNIX.
But Plan 9 has been advertised as a superior alternative to the
usual developer's interactive computing environment. A good Web
browser is another essential tool, these days. Why should one
be forced to buy a second computer to do something that ought to
be right up the alley of the first one?