"If a particular name is relative, then it will be joined to the
previous file name argument. Otherwise, any earlier arguments will be
discarded, and joining will proceed from the current argument. For
example,
file join a b /foo bar
returns /foo/bar."
I'm curious as to the justification for this behaviour.
What is the usecase for supplying arguments in a file join that are to
be ignored?
Can someone show me an example of how this crops up in code and is
desirable?
It seems to me it would be nicer if 'file join' would throw an error
if any component is absolute other than the 1st one.
(with the exception of ~xxx, which despite a 'file pathtype' of
absolute, probably shouldn't be considered an absolute path in the
context of joining 'segments' like this)
In particular the current behaviour seems like an impediment to
improving the whole situation with 'tilde expansion' in Tcl.
~foo can be considered an absolute path or merely a path segment
depending on the context.
The inconsistencies and surprises with Tcl in this area are multiple.
'file join x ~user etc' and 'file join /x ~user etc'
both return
~user/etc
It would be more reasonable to return:
x/~user/etc
and
/x/~user/etc
This would help avoid the problem where a completely tilde - unaware
programmer does something like:
foreach segment [glob -dir $folder -tail *] {
puts stdout "processing $segment"
do_something [file join $folder $segment]
}
At first glance they've done the right thing by passing a full path to
their do_something command, and not relying on the current working
directory.
However.. if there is a file named ~foo in $folder, the do_something
command will now receive an unqualified ~foo. Potentially a nasty
situation if 'foo' matches the name of a user.
The situation is exacerbated by the undocumented inconsistency where
'file tail /x/~user' returns ./~user
I use both windows and unix platforms - and am aware of the utility of
tilde substitution - but I don't see this particular bit of
babysitting as helpful.
I expect the last segment of the path to be returned - and I'm well
aware that it's not appropriate to plug this tail segment directly
into a 'file delete' etc
This whole area including documentation, needs a tidyup surely.
Even statements in the filename man page such as "If a file name
starts with a tilde" are not explicit enough. Does that actually mean
the filename portion only - or the leading portion of an entire path?
Anyway - the behaviour of tilde processing in Tcl is clearly a
minefield and has been thrashed about on C.L.T before.. but for now
if I could get some comments on the 'file join' argument discarding
behaviour - it would be appreciated.
How much backwards incompatibility would be introduced by dropping
the ./ from 'file tail', and changing the 'file join' behaviour? I'm
guessing that sort of thing is a 'wait til Tcl 9.x' type change..
Is anyone working on reviewing this whole area? There are outstanding
bugs relating to tilde substitution that IMHO should be blockers for
the 8.6 release - but on the other hand - it seems like more than just
patching up the bugs is required.
Julian
The typical use case (after old Mac OS with its very different path
separators is out) is to make a path absolute if it isn't:
set f [file join [pwd] $f]
Ok.. thanks, that makes sense.
Presumably that's common enough idiom that it would be undesirable to
break even across a major version.
I wonder though if a 'file join -all ...' option might make sense as a
way to improve the situation I described above where 'file join
$folder ~foo' causes surprises.
I'm also curious as to how entrenched (how much would break & how
attached people are to it) the additional two characters are in the
results of 'file tail ~foo'
(returning ./~foo} To me this seems a likely source of errors, even
aside from it's inconsistency with the -tail results from glob.
J
> The typical use case (after old Mac OS with its very different path
> separators is out) is to make a path absolute if it isn't:
> set f [file join [pwd] $f]
Not a compelling need when there is [file normalize].
Donald Arseneau
Yes.. though I expect there may be resistance to the idea of changing
this sort of thing - even for tcl 9, on the basis of backwards
compatibility in a large number of scripts going way back.
What I'm hoping to get an idea of here - is whether the community, and
the core team in particular - would be amenable to some changes in
this area.
In particular - I think it would make sense to have a division between
file commands which actually need to hit the filesystem and those
which would be better implemented as only operating on the 'string' so
to speak.
Specifically - the subcommands tail,rootname,extension,dirname,join &
split should IMO all be redone such that they never actually look at
the filesystem directly. They would make much more sense if thought of
as string convenience functions which take into account the platform
separator. (I'm guessing join & split currently don't 'hit' the
filesystem - but they do take part in the mangling of ~ prefixed
segments with the leading './' hack)
The existing situation has surprises such as:
%file tail ~foo
user "foo" doesn't exist
yet the following gives no error:
%file tail ~foo/blah
blah
With the existence of 'file normalize' - the attempts of these
commands to peer at the filesystem and check tilde substitution
doesn't seem appropriate.
subcommands such as delete,exists,rename,readable,stat,type should of
course do the normal tilde substitution if the leading character of
the entire path is ~.
Does anyone else agree that the 'path manipulation' subcommands I
mentioned could reasonably be made and listed in a group in
documentation as deliberately tilde unaware? perhaps for Tcl 9.0?
Julian
This is very simple to explain: an absolute path resets the basis for
further path normalizations.
You have to think of each path component in absolute terms. If the
first term is relative, then the relative path is "joined" to the
[pwd], so the actual value is an absolute path. If the next path
component is relative, then it is applied in the context of the
previous absolute path. But if the next path component is an absolute
path, then it replaces the path basis.
The point is that each path component is transformed into an absolute
path. If relative paths are applied to this previously established
absolute path, you get a new absolute path. But if the next component
is an absolute path, it must completely replace the previous absolute
path. You can't add one absolute path to another.
It is actually very simple.
Thanks, but as far as I can see, you've described what it currently
does - but not why this is desirable.
The fact that Tcl will always assume a ~foo component is an absolute
path - can make even the supposedly simple behaviour you describe,
somewhat surprising.
Files with leading tildes crop up in a few filesystems I deal with
which have a certain percentage of windows-users doing some uploading.
That [file join /path ~text.doc] will treat ~text.doc as an absolute
path could hardly be considered as following a principle of least
surprise.
(even though it's consistent with [file pathtype ~test.doc] reporting
it as absolute).
Once you know of the issue - you can of course hack around it.. but
even the initial workaround attempts are likely to cause surprise.
For example - one might think - oh well.. I'll examine the first
character and do a special case for tildes.
A naive [string range [file tail $f] 0 0] where $f is something
like /path/~test.doc will return you a "." instead of "~"
My argument is essentially that the more robust (less prone to hidden
mistakes) and even simpler functionality would have been to raise an
error if any element other than the 1st, were absolute.. and for this
command to treat ~foo just as any other string.
It's still not clear to me when it's desirable to throw away the
'previously established absolute path' and begin anew.
What is the usecase for the caller to be supplying an absolute path
after the 1st element?
In most cases wouldn't such an element more likely be a mistake which
will either blow up in the form of a nonexistant path, or worse an
unintended path that does exist?
It occurs to me that even for Tcl 9 - changing the subcommand
behaviour of such a fundamental command as 'file' is a pretty major
upheaval.
What about introducing 'path join' 'path tail' 'path rootname' etc
in the leadup to 9.x - with a view to excising them from under 'file'
in 9.0?
A use case is for example when you let the user enter a path in an entry widget, which is by default relative to some project directory, but may also be absolute.
Then you can simply use [file join $projectdir $entrypath] as the resulting path.
--Koen
Michael