Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] File Spec

5 views
Skip to first unread message

Vladimir Lipskiy

unread,
Aug 25, 2003, 8:12:19 PM8/25/03
to perl6-internals
----- Original Message -----
From: "Leopold Toetsch" <l.to...@nextra.at>
Sent: Thursday, August 07, 2003 12:51 PM
Subject: TWEAKS: Takers Wanted - Effort And Knowledge Sought

> Platform code
> -------------
>We need some functions to deal with paths and files like File::Spec.
>For loading include files or runtime extension some search path should
>be available to locate these files (a la "use lib LIST;").
>For now runtime/parrot/{include,dynext} and the current working
>directory would be sufficient.

I ain't 100% sure what Leo wanted there and afraid that my patch is out of
place. Though it presets rudimentary support for the Parrot File::Spec-like
functions which are as follows: curdir, catdir, catfile.

I should warn you the patch is a lack of any documentation. Examples of
usage can be found in file_spec.t. Nevetheless does it need writing some
documentation on for non-perl folks and if it does where should I put it in?
The docs directory?

Next. In the future I'll need to be able to do some find 'n' replace
actions in order to clean the trash off of paths. The perl version
uses the regexes like these:

$path =~ s|/+|/|g unless($^O eq 'cygwin'); # xx////xx -> xx/xx
$path =~ s|(/\.)+/|/|g; # xx/././xx ->xx/xx
$path =~ s|^(\./)+||s unless $path eq "./"; # ./xx -> xx
$path =~ s|^/(\.\./)+|/|s; # /../../xx ->xx
$path =~ s|/\Z(?!\n)|| unless $path eq "/";# xx/ -> xx

The bodkin is whether I should take advantage of string_str_index,
string_replace and the rest Co or there is a better solution? In any
case it never uses long paths, so we won't be violently penalized while
using any of find 'n' replace sheme.

The last. I beg to be excused I couldn't prepare unified diffs
of file.ops, file_spec.c, file_spec.h, and file_spec.t with
diff -N -u. Alas. The better I got was:

cvs server: I know nothing about file.ops
cvs server: I know nothing about file_spec.c
cvs server: I know nothing about include/parrot/file_spec.h
cvs server: I know nothing about t/op/file_spec.t

Probably -N works only with files that have already been added
or removed and I have no write access to add those files to
the repository. I won't be surprised if "oops! I did something
wrong again".

Comments, requests, threats are welcome, you know.

file_spec.diff
file.ops
file_spec.c
file_spec.h
file_spec.t

Leopold Toetsch

unread,
Sep 1, 2003, 3:57:07 AM9/1/03
to Vladimir Lipskiy, perl6-i...@perl.org
Vladimir Lipskiy <fors...@kaluga.ru> wrote:

[ my first answer seems to be missing ]

> From: "Leopold Toetsch" <l.to...@nextra.at>
> Subject: TWEAKS: Takers Wanted - Effort And Knowledge Sought

>> Platform code
>> -------------
>>We need some functions to deal with paths and files like File::Spec.
>>For loading include files or runtime extension some search path should
>>be available to locate these files (a la "use lib LIST;").
>>For now runtime/parrot/{include,dynext} and the current working
>>directory would be sufficient.

> I ain't 100% sure what Leo wanted there and afraid that my patch is out of
> place. Though it presets rudimentary support for the Parrot File::Spec-like
> functions which are as follows: curdir, catdir, catfile.

Albeit File::Spec is using catfile and catdir, I don't like the function
names ("cat file" is on *nix what "type file" is on Win*). Maybe
concat_pathname and concat_filename is better.

> I should warn you the patch is a lack of any documentation. Examples of
> usage can be found in file_spec.t. Nevetheless does it need writing some
> documentation on for non-perl folks and if it does where should I put it in?
> The docs directory?

docs/dev is the place for documents about internal functionality and
design decisions.

WRT the patch - please can people having experience with different
platforms have a look at it, if the functionality would be able to cope
with all platform weirdness.

>=3Dhead1 NAME

[ can you switch your mailer to plain text, thanks ]
[ WRT diff: make a copy of your original tree, do modifications there
and then "cd ..; diff -urN parrot parrot-modified" ]

Thanks,
leo

Vladimir Lipskiy

unread,
Sep 1, 2003, 7:38:36 AM9/1/03
to l...@toetsch.at, perl6-i...@perl.org
Leo wrote:
> Albeit File::Spec is using catfile and catdir, I don't like the function
> names ("cat file" is on *nix what "type file" is on Win*). Maybe
> concat_pathname and concat_filename is better.

Yes, indeed. I'm for having concat_pathname only since this patch or
the File::Spec module makes no difference when concatenates paths and
files (though I can be mistaken on account of VMS, Dan? (~:). So catdir
and catfile give the same result. Morever, catfile is sort of a wrapper
around
catdir and does nothing smarter than just calling catdir on all platforms.

We can bring concat_filename in either (I don't mind) but as an alias of
concat_pathname. I don't know how to implement this(I mean aliasing)
in terms of parrot, though. Can we do it in some elegant way?

However, for consistensy's sakes, I really really want that we have only
concat_pathname, because whether we do concatenating of dirs or
dirs & file we always do the same -- concatenate a path.

> docs/dev is the place for documents about internal functionality and
> design decisions.

Okay.

> WRT the patch - please can people having experience with different
> platforms have a look at it, if the functionality would be able to cope
> with all platform weirdness.

The time being, it can works properly only on windows and unix platforms.
Why is it so? I feel I should give some explanations on how it works.

There is only one generic function catdir, but not many ones as we have in
File::Spec. And there are some filters[1], which we can assign to an array
Filters.

typedef void (*ParrotFSFilter)(struct Parrot_Interp *, STRING **);

ParrotFSFilter Filters[] = {
filter_1,
filter_2,
... ,
filter_n
};

When we have such a PASM code as

set S0, "foo_dir"
set S1, "bar_dir"
catdir S0, S1

it firstly calls the file_spec_catdir() function which just only glues
parts with an OS specific directory separator and directs the control
to another function, that is file_spec_filter(). No doubt after the gluing
a path can contain some trash like successive slashes, that's why we
call file_spec_filter, anyway, which in its turn calls each function
registered
on the Filters array. Filters could be an OS specific, there is no sense
to register filter that does the # xx///xx ->xx/xx changes when you are
working on cygwin. Another question is how we can add an OS specific
filter -- it's nothing to do:

ParrotFSFilter Filters[] = {
file_spec_some_filter
#ifndef PARROT_OS_NAME_IS_CYGWIN
file_spec_successive_slashes_filter,
#endif
file_spec_filter_which_deletes_redundant_root_direct
#ifdef UNIX
file_spec_vms_specific_filter,
#endif
file_spec_yet_another_filter,
and so on
};

If somebody imagines a plan that could manage without macroing,
you know, ideas are always welcome.

Now, when you know how it's supposed to work, I can return to
the question "why can it works properly only on windows and unix
platforms". The answer is: Filters haven't been implemented yet.
Because I am still hesitating on accounts of what would be the best
solution for find 'n' search actions. And wish I could have heard some
comments on that. To clarify what the heck I'm talknig about I put
the following fragment that I have cut off of my inital mail

----

Next. In the future I'll need to be able to do some find 'n' replace
actions in order to clean the trash off of paths. The perl version
uses the regexes like these:

$path =~ s|/+|/|g unless($^O eq 'cygwin'); # xx////xx -> xx/xx
$path =~ s|(/\.)+/|/|g; # xx/././xx ->xx/xx
$path =~ s|^(\./)+||s unless $path eq "./"; # ./xx -> xx
$path =~ s|^/(\.\./)+|/|s; # /../../xx ->xx
$path =~ s|/\Z(?!\n)|| unless $path eq "/";# xx/ -> xx

The bodkin is whether I should take advantage of string_str_index,

string_replace and friends or there is a better solution? In any


case it never uses long paths, so we won't be violently penalized while
using any of find 'n' replace sheme.

----

There is one more thing to have been said, for some cases a result obtained
with the parrot file spec will devirege from a result obtained with the perl
one.
For instance,

set S0, ""
set S1, ""
concat_pathname S0, S1
print S1

prints "", but File::Spec's equivalent

my $path = catdir("", "");
print $path;

prints "/" on UNIX, windows, and so forth. I don't think it's the Right
result,
though you can argue with me on that account. I'm gonna document all
divegrences.

> [ can you switch your mailer to plain text, thanks ]

Yep. I regularly do that. But sometimes my MTA outwits me.

> [ WRT diff: make a copy of your original tree, do modifications there
> and then "cd ..; diff -urN parrot parrot-modified" ]

Thanks, indeed. I'll try that as soon as I prepare a new patch.

Vladimir Lipskiy

unread,
Sep 1, 2003, 7:40:18 AM9/1/03
to perl6-internals

Vladimir Lipskiy

unread,
Sep 2, 2003, 12:03:02 AM9/2/03
to perl6-internals, Michael G Schwern
> Though I haven't been following this thread, it seems you're coming up
> with some File::Spec-like thing for Parrot?

Exactly.

> I'd recommend looking at Ken Williams' excellent Path::Class module

Surely, I will.

> So yes, you must distinguish between concatenating directories and files.
>
> You also must worry about volumes.

Yeah .. I'll consider that.

Tanks alot, Michael

Michael G Schwern

unread,
Sep 1, 2003, 7:55:39 PM9/1/03
to Vladimir Lipskiy, l...@toetsch.at, perl6-i...@perl.org
Though I haven't been following this thread, it seems you're coming up
with some File::Spec-like thing for Parrot?

I'd recommend looking at Ken Williams' excellent Path::Class module
which gives you actual file and directory objects. EXTREMELY useful when
you're in an ultra-cross platform environment such as Parrot. I wish I
had them for MakeMaker instead of fucking around with File::Spec. Consider
using Path::Class for inspiration rather than File::Spec.


On Mon, Sep 01, 2003 at 02:38:36PM +0300, Vladimir Lipskiy wrote:
> Leo wrote:
> > Albeit File::Spec is using catfile and catdir, I don't like the function
> > names ("cat file" is on *nix what "type file" is on Win*). Maybe
> > concat_pathname and concat_filename is better.
>
> Yes, indeed. I'm for having concat_pathname only since this patch or
> the File::Spec module makes no difference when concatenates paths and
> files (though I can be mistaken on account of VMS, Dan? (~:). So catdir
> and catfile give the same result. Morever, catfile is sort of a wrapper
> around
> catdir and does nothing smarter than just calling catdir on all platforms.

On VMS catfile and catdir do very different things because VMS filepath
syntax distinguishs between files and directories explicitly.

Unix:
/dir1/dir2/dir3
/dir1/dir2/file

Windows:
\dir1\dir2\dir3
\dir1\dir2\file

VMS:
[dir1.dir2.dir3]
[dir1.dir2]file

So yes, you must distinguish between concatenating directories and files.

You also must worry about volumes.

Unix:
No user visible concept of a volume

Windows:
VOLUME:\dir1\dir2\file

VMS:
VOLUME:[dir1.dir2]file


--
Michael G Schwern sch...@pobox.com http://www.pobox.com/~schwern/
Operation Thrusting Peach

mar...@kurahaupo.gen.nz

unread,
Sep 3, 2003, 9:00:28 PM9/3/03
to Michael G Schwern, Vladimir Lipskiy, l...@toetsch.at, perl6-i...@perl.org
On Mon, 1 Sep 2003, Michael G Schwern wrote:
> You also must worry about volumes.
> Unix: No user visible concept of a volume
> Windows: VOLUME:\dir1\dir2\file
> VMS: VOLUME:[dir1.dir2]file

This has been worrying me for some years. The concept of "volume" has
different implications for different platforms.

[please excuse long rambling explanation...]

One could argue that the mount points in Unix, though normally invisible,
are "volumes" in the sense that they do affect the semantics of certains
system calls, most especially "rename" and "link", but depending on mount
options also "open", "write", "ioctl" and others. Making them visible is
normally exhorbitantly expensive though, so you don't want to do so unless
absolutely necessary.

It's also clear that the relationships between "volume" and "root directory"
differ. For Mac, volumes are within a pseudo root directory, whereas for Win32
a root directory exists on a volume. So although they share the same names,
they aren't really portable concepts in any meaningful way.

What these various OSes do share is a concept of "current locus" (or loci)
within some filename space.

* On Unix both the "working" and "root" directories can be changed;

* On Windows the current (working) directory is a feature of the current
volume; changing to another volume and back again will bring you to the
same working directory, even if you changed the current directory on
another volume.
(This behaviour changes between different versions of Windows.)

* On Classic Mac (and VMS?) only the "working directory" can be changed; the
"root directory" is faked to be the top of the startup volume;

* On RMX an arbitrary number [*] of "current loci" can be established, and
refered to as if they were independent volumes, or accessed by open
"handles" (much like filedescriptors); the standard C library uses these to
fake the behaviour of various POSIX functions, but these loci can be
shared between processes and thus the POSIX emulation can be fooled.

* Similarly, versions of Unix which have "fchdir" and/or "fchroot" allow a
working directory or root directory to be selected from an arbitrary number
of already-opened directories;

* Some (ancient) systems don't have any directory hierachy, so a root
directory is meaningless

But also importantly, in the general case it is not possible to determine a
path between two loci, and in particular between a root directory and a working
directory.

* In Unices with "fchdir" to have a current working directory that is outside
the current root directory;

* Filesystem permissions may prevent traversing from one locus to another;
(normally this would prevent construction of a path from one to the other,
but even given such a path, it might not be usable)

The more important question is how do we interpret these things to decide if
certain operations should reasonable be expected to succeed? Give or take
ownership issues of course...

Some of them we already can do somewhat portably:

* How do we take the results of "readdir" and make them usable?

* If we use "chdir", how do we later get back to the same working directory?

* Is a given filename dependent on the working directory?

* Do two pathnames A and B refer to the same entity?
Just by inspecting the pathnames?
By checking whether they're links to the same file (inode)?

* Do two pathnames A and B refer to entities in the same directory?

If so then we can assume that if permissions allow us to access A then they
will probably also allow us to access B. Not that we shouldn't check the
results of both attempts of course, but if one succeeds and the other fails
then we would be excused for just bailing instead of trying harder.

Some of them are a lot harder to do portably:

* Can we rename a file from name A to name B? A directory?
If it's one that we just created? One that we got from "readdir"?

How can we construct A from B or B from A to guarantee that we can?

Roughly this translates to "are A and B on the same volume?" unless
you're on Unix where we pretend that there aren't any volumes...

* How do we do transactional file replacement? That is, either replace
a target file with a complete replacement, or not at all.

On Unix we do this by creating a temporary file in the same directory and
once it has been completely written, renaming it to replace the target
atomically. Or just deleting it to roll back the transaction.

Assuming this method is possible for another OS, how do we construct the
temporary filename from the target filename?

* Can we create a hard link from name A to name B? A symbolic link?

How can we construct A from B or B from A to guarantee that we can?

Given two pathnames A and B, how do we make the shortest relative path C
between them (to use for a relative symbolic link)?

On Unix you can create a hard link anywhere under the same mount point; on
Win-NT4-POSIX links can only be created within the same directory.

* If we rename a symlink from name A and pointing at B, to name C, will it
still refer to the same file?

How can we construct A+B from C or C from A+B to guarantee that it will?

If we can't, how do we create a new symlink D that *will* refer to the same
file? Or a new name E which it will refer to?

And do all the above without requiring A and C to be in the same directory?

I would strongly recommend deprecating any distinction between "volume" and
"path", and instead provide functions which focus on allowing us to answer the
above questions in simple portable ways.

But in the end, since Windows, MacOS, VMS and even RMX all have POSIX
emulation, do we really care? Maybe we should just have functions for "convert
native name to POSIX" and "convert POSIX name to native" and be done with it?

-Martin

[* Ok, an "arbitrary" number really means a 32-bit number -- or smaller]

PS: don't forget, I said "give or take filesystem permissions"


Leopold Toetsch

unread,
Sep 4, 2003, 6:36:34 AM9/4/03
to mar...@kurahaupo.gen.nz, perl6-i...@perl.org
mar...@kurahaupo.gen.nz <mar...@kurahaupo.gen.nz> wrote:

[ snipped a lot of explanations ]

Please keep in mind, that the intended usage inside Parrot just should
be to locate some standard include or extension files for Parrot
internals. More abstraction and complexity can always be added above
that or implemented by HLLs.

leo

Chris Allan

unread,
Sep 4, 2003, 4:09:27 PM9/4/03
to l...@toetsch.at, perl6-i...@perl.org


Is there a plan for operating systems without Unix-like hierarchical
directory structures (eg IBM I-Series, I think z/OS, I'd assume many
other enterprise OSs)? There are further difficulties in that some of
these have multiple filesystems which look totally different from each
other etc.

In general how much effort is it likely to be to get Parrot working on
systems which don't look at all like Unix? I've tried to get Perl 5 to
build on os/400 before and it wasn't a pleasant experience. Any chance
it'll be easier to port Parrot?

Chris

mar...@kurahaupo.gen.nz

unread,
Sep 5, 2003, 3:20:53 AM9/5/03
to perl6-i...@perl.org
On Thu, 4 Sep 2003 mar...@kurahaupo.gen.nz wrote:
> On Mon, 1 Sep 2003, Michael G Schwern wrote:
> > You also must worry about volumes.
[my long explanation snipped]

Sorry, wrong list; this is a standard-module issue, not an implementation
issue or even a core-language issue.

-Martin


Gordon Henriksen

unread,
Sep 6, 2003, 10:48:54 AM9/6/03
to mar...@kurahaupo.gen.nz, perl6-i...@perl.org
Lots of good points.

Something that the Mac OS (even OS X) has which most Unix variants don't
are directory IDs and file IDs. The Carbon APIs use a FSSpec structure,
which is a volume ID, directory ID, and file name. (volume ID, file ID
is good enough to identify a file which exists already, but each of the
volume ID, directory ID, and file name is needed to create a new file.)
It's resilient if the directory is moved, but more importantly actually
offers very significant performance and memory usage improvements in
programs which keep tabs on lots of files (e.g., make). Would be cool if
that functionality could be exposed in a portable way, so that parrot
programs would inherit it without having to do much. Not that I think it
can be. But i would be cool.

Java's tackled this. On Unix platforms, Java represents a single
volume (/), whereas Classic Mac OS and Windows can have multiple
volumes. Mount points are ignored—they're just directories. Each volume
has root directory. Volume names might not be unique (Mac OS)...

As for pathname equivalence, There Be Dragons Here. In particular, each
directory (when mount points are treated as directories) could
potentially have different equivalence semantics. (e.g., on Mac OS X,
consider a UFS [ASCII, case sensitive] mount point beneath an HFS+ /
[Unicode, case insensitive], visa versa...) And hard links and
symlinks...

On Wednesday, September 3, 2003, at 09:00 , mar...@kurahaupo.gen.nz
wrote:

Gordon Henriksen
mali...@mac.com

0 new messages