Common Lisp represents pathnames as _structured objects_. On the other
hand, operating systems (at least `mainstream' ones; I don't know about
systems like the Lisp Machine's) have systems calls that accept _strings_
as filename arguments.
This means that when I call a function such as CL:OPEN and pass it a
namestring, it would^1 be parsed and a pathname object would be
constructed, whose components would then be joined together to
form the string that is passed to the system call for opening files.
Now, the question is, can I be sure that this second string would be
identical to the first string that was passed to CL:OPEN? I was unable
to find such a requirement in the CLHS but it appears to me a reasonable
thing to expect---or am I missing something? In particular, could
parsing itself (i.e. when the defaults argument has all components as NIL)
introduce defaults for missing components?
__________
^1 this is a simplification; I believe it would be more correct to say that
in certain cases it does get parsed, e.g. with (OPEN #P"foo"), and in
other cases it is implementation-dependent if it gets parsed, e.g. with
(OPEN "foo")
Vassil Nikolov
Permanent forwarding e-mail: vnik...@poboxes.com
For more: http://www.poboxes.com/vnikolov
Abaci lignei --- programmatici ferrei.
Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
there is an infinite number of strings that name exactly the same file
under Unix. I don't think it is even remotely reasonable to expect that
the same filename will be returned from a function that has to figure out
the full pathname corresponding to a filename. after all, the operating
systems doesn't even _know_ the filename -- all it cares about is a
device and an inode number. now, if the operating system actually knew
the filename, or even the full pathname, you could ask it and get back a
value that you could ask your question about. lacking such functionality
in all Unixen, your question is implementationally meaningless.
there is also an infinite number of namestrings that may be parsed into
the same pathname and vice versa, so there cannot be such a guarantee in
the general case.
the interesting questions are (1) for which namestrings and pathnames
there exists a guarantee, and (2) whether such a guarantee can be
established algorithmically for a given namestring without actually
consulting the operating system.
#:Erik
--
(defun pringles (chips)
(loop (pop chips)))
> Recent posts on pathnames have made me think about the following issue.
>
> Common Lisp represents pathnames as _structured objects_. On the other
> hand, operating systems (at least `mainstream' ones; I don't know about
> systems like the Lisp Machine's) have systems calls that accept _strings_
> as filename arguments.
>
> This means that when I call a function such as CL:OPEN and pass it a
> namestring, it would^1 be parsed and a pathname object would be
> constructed, whose components would then be joined together to
> form the string that is passed to the system call for opening files.
> Now, the question is, can I be sure that this second string would be
> identical to the first string that was passed to CL:OPEN?
You are not even guaranteed at all that it SHOULD be passed as given.
Consider the EUNICE emulator for Unix-esque filenames under VMS. If memory
serves me, VMS had some problem about case, such that there were no lowercase
names. EUNICE did some thing that simulated case by storing a $ prefix before
either upper or lowercase (I can't remember) so that a file system file
named $FOO meant "Foo" or "fOO". The idea was that PARSE-NAMESTRING would
not be obliged to send through "Foo" or "fOO" but might be allowed, with
appropriate implementation documentation, to define such a mapping so that
you could use the filenames invisibly from your programs and not know what
really happens. I'm not familiar with DOS/Windows compatibility, but it sure
*looks* like Windows does a similar trick to make FAT file systems work,
so the question of what the real underlying operating-system level filename
is may even be subject to question when the system has two different names
for the same file.
> I was unable
> to find such a requirement in the CLHS but it appears to me a reasonable
> thing to expect---or am I missing something?
It's reasonable to wish some algebraic statements be made about equality
relationships that are at least hoped to hold, but they aren't there.
Partly because we worried it would accidentally confine some implementation
in a way we weren't sure we had the wisdom to confine it. As an understanding
of these issues increases and the number of operating systems (sigh) decreases,
this might be easier to do in a future standard.
> In particular, could
> parsing itself (i.e. when the defaults argument has all components as NIL)
> introduce defaults for missing components?
That's not specified. Some implementations may prefer to do that rather than
pass NIL. The problem is that on some operating systems, a raw open call like
"foo" will open a file named foo, and on others it will default the type. And
even where it does default the type, the directory is unclear. (root dir?
working dir? home dir?) This is all left to the implementation to resolve
in a manner appropriate to the implementation. You can very easily assure
for a portable program that this situation doesn't come up, and I recommend
you do so.
Whether it's reasonable to have a default pathname defaults containing a
relative pathname spec is also open to question, but is surely not portable.
> * Vassil Nikolov <v...@einet.bg>
> | Now, the question is, can I be sure that this second string would be
> | identical to the first string that was passed to CL:OPEN?
[...]
> there cannot be such a guarantee in
> the general case.
I was under a misconception; thanks for correcting me.
> the interesting questions are (1) for which namestrings and pathnames
> there exists a guarantee, and (2) whether such a guarantee can be
> established algorithmically for a given namestring without actually
> consulting the operating system.
I agree, but it seems to me these questions are not _that_ interesting
in order for someone to actually spend the time necessary to provide an
answer.
> Vassil Nikolov <v...@einet.bg> writes:
[...]
> > Now, the question is, can I be sure that this second string would be
> > identical to the first string that was passed to CL:OPEN?
>
> You are not even guaranteed at all that it SHOULD be passed as given.
[example about EUNICE introducing a special character as a `lower-case
escape' under VMS]
> you could use the filenames invisibly from your programs and not know what
> really happens.
Yes, that's right; my thinking was mistaken.
Now, however, I see another issue here. Since the two strings may be
different, I think IWBN if the programmer had some way to find out
about the string that gets passed to the operating system (e.g. for
interfacing to programs and/or user actions outside the Lisp world,
such as browsing the file system). I can see several approaches (not
necessarily exclusive of each other):
* have functions that internally pass filenames to the operating system
return an additional value which is the actual filename string as
passed to the operating system;
* have stream objects store this string as an additional attribute, and
provide a reader method called STREAM-OS-FILENAME or some such;
* provide a function that `translates' a Lisp pathname designator into
am operating system filename (this functionality is present in a
Lisp implementation anyway since it must be able to construct
OS-level filenames from Lisp pathnames in order to make system
calls).
> I'm not familiar with DOS/Windows compatibility, but it sure
> *looks* like Windows does a similar trick to make FAT file systems work,
> so the question of what the real underlying operating-system level filename
> is may even be subject to question when the system has two different names
> for the same file.
No, that's another issue. I was not asking about the `real underlying
operating-system level filename' which, as you point out, may not be
well-defined, but rather about the filename that the Lisp implementation
passes to the underlying operating system.
[...]
> It's reasonable to wish some algebraic statements be made about equality
> relationships that are at least hoped to hold, but they aren't there.
Yes, that would be perhaps a good thing, but that's much stronger than
just having access to the result of the conversion from Lisp pathnames
into OS filenames.
[...]
this is what you're supposed to get back from the NAMESTRING function
applied to a physical pathname object.
| Yes, that would be perhaps a good thing, but that's much stronger than
| just having access to the result of the conversion from Lisp pathnames
| into OS filenames.
I'm uncertain why you think this is currently missing. what have you
tried and found wanting?
#:Erik
--
save the children: just say NO to sex with pro-lifers
No, it's not: the namestring syntax is defined by the Lisp
implementation, and while the naming conventions are, as the standard
says, "usually those customary for the file system in which the named
file resides", they are not the same. Furthermore, the namestring
syntax is intended for human consumption, and what needs to be passed
to OS functions might be different (e.g., LWW does some hackery to
remove trailing backslashes, and add some magic to make UNC pathnames
work).
I think he's also asking to see all the stuff that Lisp file functions
do to fill out incomplete pathnames, like merging in the current
directory (although I suppose some might let the OS do that),
translate :VERSION :NEWEST, etc. (BTW, what would "OS-level" mean on
a Symbolics?)
I'm not sure why you would want this, though. For most purposes,
(PROBE-FILE pathname) or (TRUENAME stream) will tell you what file was
accessed. The details aren't interesting unless you're planning to
bypass the Lisp file interfaces and call the OS directly.
--
Pekka P. Pirinen Adaptive Memory Management Group, Harlequin Limited
If you don't succeed at first, try again. Then quit. No use of being a
damn fool about it. - W. C. Fields
> Erik Naggum <er...@naggum.no> writes:
> > * Vassil Nikolov <v...@einet.bg>
> > | * provide a function that `translates' a Lisp pathname designator into
> > | am operating system filename
|...|
> > this is what you're supposed to get back from the NAMESTRING function
> > applied to a physical pathname object.
>
> No, it's not: the namestring syntax is defined by the Lisp
> implementation, and while the naming conventions are, as the standard
> says, "usually those customary for the file system in which the named
> file resides", they are not the same. Furthermore, the namestring
> syntax is intended for human consumption, and what needs to be passed
> to OS functions might be different (e.g., LWW does some hackery to
> remove trailing backslashes, and add some magic to make UNC pathnames
> work).
Exactly. Consider, for example, EUNICE as mentioned by Kent Pitman, or
the following. When opening a file with :DIRECTION :OUTPUT and
:IF-EXISTS :SUPERSEDE, an implementation might pass to the OS a
different name (perhaps similar to the one passed to OPEN, e.g.
"fo0" for "foo") at open and rename the file (deleting the existing
one) at close. (This method was used e.g. by LCL, I don't know if
it still does it.)
In any case, NAMESTRING keeps you within the Lisp world, while
TRANSLATE-PATHNAME-TO-OS-FILENAME (or however it should be
called) crosses (or at least touches) the border between the Lisp
world and the rest of the universe.
> I think he's also asking to see all the stuff that Lisp file functions
> do to fill out incomplete pathnames, like merging in the current
> directory (although I suppose some might let the OS do that),
> translate :VERSION :NEWEST, etc.
Yes.
> (BTW, what would "OS-level" mean on
> a Symbolics?)
I don't know. I wrote in my previous post that I didn't know about
the Lisp Machine, and that should be understood as including
the Symbolics.
> I'm not sure why you would want this, though. For most purposes,
> (PROBE-FILE pathname) or (TRUENAME stream) will tell you what file was
> accessed. The details aren't interesting unless you're planning to
> bypass the Lisp file interfaces and call the OS directly.
I don't have a particular problem at hand right now, but I would
need it if my Lisp program interacts with another non-Lisp program
e.g by reading/writing files together.
> Now, however, I see another issue here. Since the two strings may be
> different, I think IWBN if the programmer had some way to find out
> about the string that gets passed to the operating system
Yeah, the lisp machine has a :string-for-host message that you can send to a
pathname and it will give you this back. left out of cl as "excessive".
> Erik Naggum <er...@naggum.no> writes:
> > * Vassil Nikolov <v...@einet.bg>
> > | * provide a function that `translates' a Lisp pathname designator into
> > | am operating system filename (this functionality is present in a
> > | Lisp implementation anyway since it must be able to construct
> > | OS-level filenames from Lisp pathnames in order to make system
> > | calls).
> >
> > this is what you're supposed to get back from the NAMESTRING function
> > applied to a physical pathname object.
>
> No, it's not: the namestring syntax is defined by the Lisp
> implementation, and while the naming conventions are, as the standard
> says, "usually those customary for the file system in which the named
> file resides", they are not the same.
I concur with this.
STRING-FOR-HOST is a completely different concept not provided by CL.
too bad. It's not like we didn't think of it and ask for it. It was
part of what seemed to me a desire on everyone's part to adopt half of
what the Lisp Machine had and hope that would be "not too bloaty"
without thinking as hard as I think should have been done about what was
really needed. Some things the Lisp Machine had a reputation for overkill
on, but other things it had just studied well and understood well and people
should not have fought so hard. File systems were one. The Lisp Machine
people, because of their desire to integrate all file systems remotely into
one model, really has a remarkably good file system model that had stood up
under heavy stress for years and should have been better trusted by the
committee. IMO, of course.
Note, btw, that the lispmachine "namestring" uses some funny characters
that are lispm only to be placeholders for "unfilled" slots (having NIL
fillers) so that the namestring syntax is invertible even though there
is no native representation necessarily for that. The character
with a double-pointed arrow, which I'll write as "<=>" but was one
character, worked on a unix pathname to do:
#P"u:/foo/<=>.lisp"
even though there was no underlying syntax for this. i think (but can't
check from where I'm typing) that if you did string-for-host on this you'd
get "u:/foo/.lisp" or maybe you'd get the namestring but it'd be bogus or
maybe it'd signal an error if you did string-for-host. either way, the
point is to underscore pekka's point that the requirements of these
functions is indeed different.
> I'm not sure why you would want this, though. For most purposes,
> (PROBE-FILE pathname) or (TRUENAME stream) will tell you what file was
> accessed. The details aren't interesting unless you're planning to
> bypass the Lisp file interfaces and call the OS directly.
if you're writing batch scripts or make files, it's very important to have
the native notation. for example, i feel strongly that lisp namestrings
should fix the bug of having /foo refer to a directory. i think only
/foo/ should refer to a directory and /foo should refer to the file foo
in the parent directory viewed as a file, not a directory WHEN you view
it as a namestring. but STRING-FOR-HOST should show it without the
trailing slash, no matter how conceptually stupid it was for the native
host to want that.
I would suppose that (namestring (truename ***)) should produce the
canonical namestring for both logical and physical pathnames. This, of
course, assumes that the OS defines some canonical namestring for the
file. It may also assume that the lisp implementation makes use of the
OS's canonical namestring as well.
--
Thomas A. Russ, USC/Information Sciences Institute t...@isi.edu
Oh, I see. Excessive...
Thank you for the note.
No further questions your honour...
Vassil Nikolov. (See header for additional contact information.)
> * provide a function that `translates' a Lisp pathname designator into
> am operating system filename (this functionality is present in a
> Lisp implementation anyway since it must be able to construct
> OS-level filenames from Lisp pathnames in order to make system
> calls).
The real problem with this requirement/request is the assumption
that the filename eventually passed to the operating system fully
specifies the desired file.
Consider on unix the following file: /foo/bar and in lisp a
pathname specification that translates in its lowest form to
"bar". If this is passed to open() will /foo/bar be opened?
How about if it is passed to dlopen()?
The answer is varied, since unix has its own "logical pathname"
system that does further translation, different for each of the
two system calls, and which depends on the state of the process's
environment, including the working directory or one of the
environment variables LD_LIBRARY_PATH, LDPATH, or LPATH (depending
on which unix) and sometimes other variables/conditions as well.
Windows has a set of translations of its own, similar but different.
There is no unix or windows function that I know of which is
equivalent to what you are asking for in lisp, which, given a
file name string will return the precise name of the file being
opened (perhaps it would better be specified as a filesystem/inode
pair, for most precision, but even that is a can of worms in the
presence of distributed filesystems). Thus, the idea of getting
"the real file name" is, I believe, misguided, since it will never
yield an answer you can count on.
--
Duane Rettig Franz Inc. http://www.franz.com/ (www)
1995 University Ave Suite 275 Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253 du...@Franz.COM (internet)
The "-A" interfaces actually use the current code page. For an
American, the default would be 1252, which is a superset of ASCII. A
Japanese user might have 932, which is a multi-byte encoding almost
identical to shift-JIS. The current code page can be changed on the
fly, provided you have the new page installed on your machine (but
people don't seem to do that much).
> Suppose the wide characters used by the operating system are not
> consistent with the wide characters used to represent extended-char
> in the Lisp implementation.
Then it would be a lousy FFI that didn't offer a way to do the right
translation.
> Should there also be a "bytes-for-host" function? Should the thing
> returned not be a Lisp string at all? After all, if we're talking
> about what gets passed to the OS, maybe the OS byte-string should be
> returned as foreign data.
It might be useful, depending on how you need to process the result.
It would be more modular to get a Lisp string, and have interfaces to
convert that to a foreign string, especially on Windows, where you
might end up calling either a multi-byte (A) or a wide (W) interface.
--
Pekka P. Pirinen
Adaptive Memory Management Group, Harlequin Limited
Life is complex. It has real and imaginary components.
- Tpotter_voyager.cris.com (Tom_Potter)
Duane Rettig wrote: [1999-08-26 08:48 -0700]
> Vassil Nikolov <v...@einet.bg> writes:
>
> > * provide a function that `translates' a Lisp pathname designator into
> > am operating system filename (this functionality is present in a
> > Lisp implementation anyway since it must be able to construct
> > OS-level filenames from Lisp pathnames in order to make system
> > calls).
>
> The real problem with this requirement/request is the assumption
> that the filename eventually passed to the operating system fully
> specifies the desired file.
There is no such assumption. The reason for this requirement, request,
wish, or whatever, is to have a way of knowing what happens between
the Lisp implementation and the operating system so that one can
apply from that point on knowledge about the behaviour of the operating
system to find out what happens to the file.
|...|
> There is no unix or windows function that I know of which is
> equivalent to what you are asking for in lisp, which, given a
> file name string will return the precise name of the file being
> opened (perhaps it would better be specified as a filesystem/inode
> pair, for most precision, but even that is a can of worms in the
> presence of distributed filesystems). Thus, the idea of getting
> "the real file name" is, I believe, misguided, since it will never
> yield an answer you can count on.
I never imagined that by having such functionality I would get `the
real file name'; if anything I wrote suggested that, that was quite
inadvertent. That functionality would only get me `the real
argument to the file system interface,' so that I can speak the
language of the operating system if I want. That means that
once I have the `string to host' (in Lisp Machine terminology
that I have just learned from Kent Pitman and that I hope I am
using correctly), I only need knowledge about the OS to find out
where my file is (or to tell other non-Lisp programs about my
file). It may require a lot of work to do that, but at least I would
have gotten maximum assistance from Lisp for that.
While there isn't a Unix or Windows function that would provide
the `true file name,' by sufficient reading of the documentation
one can collect enough knowledge about Unix (I dare not say the
same about Windows...) in order to be able to find that true name
at least in most cases. It becomes harder, however, if one does
not have the exact file name with which the operating system
started.
In any case, Lisp's file system interface does call the operating
system's file system interface and does pass file names to it
that it produces from Lisp pathnames, therefore this functionality
exists at least internally in the Lisp implementation.
Howard R. Stearns wrote: [1999-08-26 09:34 -0500]
|...|
> Imagine that Unix open() was not provided, and that applications had to
> look up inode numbers themselves. In this case, there wouldn't even be
> a "operating-system string" used for access.
But this is a hypothetical case. Even if there is an operating system
in use that does not itself provide the mapping from file names to
file objects (inodes or whatever), it would be so exotic that one
would not have to worry about it. (After all, why not imagine that
Lisp would have to access the raw devices themselves (we get a
Lisp Machine I suppose...).)
> On Windows, as I understand it, the open function in the ASCII library
> can be used to access ASCII file names, but I'm not to sure about how
> "character code tables" come into this. Furthermore, there is an open
> function in a wide character library that can be used to access "any"
> file name. Suppose the wide characters used by the operating system are
> not consistent with the wide characters used to represent extended-char
> in the Lisp implementation. (Suppose the Lisp doesn't support
> extended-char at all!) For example, if one is fixed width wide and the
> other is a different fixed width wide or multi-byte, what should
> string-for-host return? Should there also be a "bytes-for-host"
> function? Should the thing returned not be a Lisp string at all? After
> all, if we're talking about what gets passed to the OS, maybe the OS
> byte-string should be returned as foreign data.
The `string-for-host' function could return different kinds of strings
depending on the flavour of the file system interface function that
gets called, e.g. strings of 8-bit characters or strings of 16-bit characters
as appropriate. If the characters available in the Lisp implementation
do not match directly the characters available in the operating system
environment, then the Lisp implementation would have to have some
translation mechanism for characters anyway.
> What about a Lisp system that supports pathname access to a database,
> where individual records in the database are packaged up and stored on
> the OS filesystem as a single big file? (I.e., the database uses it's
> own mechanism for accessing individual database "files" within the OS
> file.) Suppose further that the database API operates not by
> namestrings, but by "record numbers".
Isn't this a very hypothetical example? Anyway, I suppose that if
this is implemented properly, the pathnames involved would have
a host and/or device component such that it does not exist in the
OS filesystem world, so there will be a way to identify such a case
programmatically.
|...|