Beats the hell out of os.path, which is an ugly thing indeed. The OO
interface means you could use the interface nicely to implement other
things, like URLs. The problem? It's just some module. The various os
functions (of which path replaces quite a few) have become idiomatic to
me, and I'm sure others as well. I find myself reluctant to use it in
code that's not essentially private, because it's changing something
small and seemingly trivial, and people won't be familiar with it.
The solution? It should be a builtin! Or, if not a builtin, included
in the os module. But I actually like the idea of it being a builtin --
if open is a builtin, path stands right up there too. It would get rid
of 90% of the use of the os module.
Thoughts? Reactions?
Ian
I would greatly appreaciate such a module in the std library, but while
Jason's module has some very cool features, to my taste it goes a bit
too far with overloading operators. I really don't like overloading a/b
to mean os.path.join(a, b) and I don't think the default iterator should
do a listdir (although list(mypath) is indeed cute). If it were toned
down a bit in this area I think we may be able to make a good case for
including it in the std library.
Just
I love it and have been using it in a few personal projects. My only
gripe is its monolithic nature :)
Van
Yes, looks nice.
> Beats the hell out of os.path, which is an ugly thing indeed. The OO
> interface means you could use the interface nicely to implement other
> things, like URLs. The problem? It's just some module. The various os
> functions (of which path replaces quite a few) have become idiomatic to
> me, and I'm sure others as well. I find myself reluctant to use it in
> code that's not essentially private, because it's changing something
> small and seemingly trivial, and people won't be familiar with it.
>
> The solution? It should be a builtin! Or, if not a builtin, included
> in the os module. But I actually like the idea of it being a builtin --
> if open is a builtin, path stands right up there too. It would get rid
> of 90% of the use of the os module.
>
> Thoughts? Reactions?
I agree that something like Jason Orendorff's path module should go into
the standard library. I've coded a similar module and i think that
a discussion about certain design decisions would probably improve our
approaches.
For example Jason lets the "path" object inherit from "str" (or unicode)
but i think it's better to provide a "__str__" method so that you can say
str(pathinstance).endswith('.py')
and *not* base the path object on str/unicode.
unicode(pathinstance)
would just fail if your platform doesn't support this. First, i tried
the inheritance approach, btw, but it is ambigous (e.g. for the
join-method (str.join and os.path.join).
Also, my module provides most of the os.path.* methods as "filters" so
you can say
dirs = filter(isdir, list_obj_pathobjects)
fnames = filter(AND(nolink, isfile), list_obj_pathobjects)
in addition to
pathobject.isfile()
etc.
Recently, i also did some experimentation with "virtual-fs" features so
that you can transparently access http/ftp/svn files/directories. I even
got that to work with "<tab>-completion" but that was quite a hack :-)
I am pretty sure that virtual-fs-like-extensibility would be a big
"selling" point and would motivate the use of such a module and
finally the inclusion into the stdlib. Of course, the local-fs should
be the convenient case but it shouldn't be hard to use the same methods
for accessing remote "repositories".
Anyway, i am all for going in this direction and would probably
like to participate in such a development and design effort.
cheers,
holger
I've never wanted to iterate over a string, and find that an annoying
feature in Python (I'd *much* rather get an exception), so covering up
the previous string behavior doesn't seem a big deal to me. But I can
see why iterating over a path may be a little too magic. But paths are
also containers (at least if they point to a directory), so iterating
over them seems only natural. I could see something like mypath.dir()
being reasonable alternative (mypath.list() also, but that looks funny
to me because "list" is special to my eye).
I do like the /, though. mypath.joinpath(filename) is rather
long-winded (since he wisely avoids reusing the join method). / has no
meaning for strings, so it's not really overloading the operator, merely
adding it in a specific context. Several operators are reused for
different meanings, % in particular comes to mind, this doesn't seem
that bad. I like the way it looks and feels. It feels better to me
than using + for string concatenation ;) -- maybe because division is
not all that common an operation anyway.
Ian
> I am pretty sure that virtual-fs-like-extensibility would be a big
> "selling" point and would motivate the use of such a module and
> finally the inclusion into the stdlib. Of course, the local-fs should
> be the convenient case but it shouldn't be hard to use the same methods
> for accessing remote "repositories".
Excellent point!
Just
> On Tue, 2003-07-08 at 03:17, Just wrote:
> > I would greatly appreaciate such a module in the std library, but
> > while Jason's module has some very cool features, to my taste it
> > goes a bit too far with overloading operators. I really don't like
> > overloading a/b to mean os.path.join(a, b) and I don't think the
> > default iterator should do a listdir (although list(mypath) is
> > indeed cute). If it were toned down a bit in this area I think we
> > may be able to make a good case for including it in the std library.
>
> I've never wanted to iterate over a string, and find that an annoying
> feature in Python (I'd *much* rather get an exception),
It's basically a side effect of having a __getitem__ that takes
integers.
> so covering
> up the previous string behavior doesn't seem a big deal to me.
It's not just that, but iterating over a path could _also_ mean to
iterate over the path elements, so it's not obvious it should iterate
over the directory contents. Also, what if path points to a file?
> But I
> can see why iterating over a path may be a little too magic. But
> paths are also containers (at least if they point to a directory), so
> iterating over them seems only natural. I could see something like
> mypath.dir() being reasonable alternative (mypath.list() also, but
> that looks funny to me because "list" is special to my eye).
I'd say path.listdir() is most natural.
> I do like the /, though. mypath.joinpath(filename) is rather
> long-winded (since he wisely avoids reusing the join method). / has
> no meaning for strings, so it's not really overloading the operator,
> merely adding it in a specific context.
It'll confuse the hell out of people who expect / to mean "divide" ;-)
> Several operators are reused
> for different meanings, % in particular comes to mind, this doesn't
> seem that bad. I like the way it looks and feels. It feels better
> to me than using + for string concatenation ;) -- maybe because
> division is not all that common an operation anyway.
I tend to think that this whole discussion shows that _maybe_ it's not
such a good idea to subclass str/unicode after all. (Aside: path.py
taught me about the existence of os.path.supports_unicode_filenames in
2.3, but at least on my platform (OSX) it has the wrong value. I opened
a bug, #767645.)
Some string methods are handy on paths, such as .endswith(), while
others are not, like .upper() or .splitlines(). Other string method
names or operators would make sense for paths, but -- as you say -- are
out of the question since their meaning would be quite different, eg.
.join() and +.
I find overloading / unneccesary, since it's not like you'll have to
write
a.join(b).join(c).join(d)
but rather
a.join(b, c, d)
which I don't think is all that bad.
Btw. long ago (2001, before Jason's path module) I posted a balloon to
python-dev about this subject:
http://mail.python.org/pipermail/python-dev/2001-August/016663.html
There was hardly any response, and Guido said we wasn't too enthusiastic
about the idea, so we might have to work hard to pull this off ;-).
Also, with unicode, Guido's suggestion that any object that implements
__str__ can be used with open() seems to be no longer true. At least I
can't get it to work.
Just
Just> a.join(b).join(c).join(d)
Just> but rather
Just> a.join(b, c, d)
Just> which I don't think is all that bad.
Yes, but
a/b/c/d
is nicely analogous to Unix pathname syntax.
Skip
> Just> a.join(b, c, d)
>
> Just> which I don't think is all that bad.
>
> Yes, but
>
> a/b/c/d
>
> is nicely analogous to Unix pathname syntax.
But when the items are variables, what you read is not what you get.
Often you'll want (some) literals, and then you get
path = basePath/"a"/"b"/"c"
...and _that_ I find quite horrible...
(Did I mention that / usually means divide in Python? ;-)
Just
device:<dir.dir.dir>file.ext;version
where most components were optional. The device: part could also be a
'logical name' (basically an alias) for a directory or device, I don't
remember if it could alias a file name too.
The Common Lisp pathname type might be worth looking into,
<http://www.iti.informatik.tu-darmstadt.de/cl-hyperspec/Body/sec_19-2.html>
They have done a lot of work to try to get it right, and from what
I hear they did a good job.
--
Hallvard
Just> Ooh, it _can_ get worse ;-/
Just> Also: this would not be portable on platforms not using / as
Just> os.sep, ...
Not necessarily. My guess (again, without trying it) is that it does the
right thing. Right near the top of
http://www.jorendorff.com/articles/python/path/
Jason writes:
I like for my code to be cross-platform, but I tired of typing
os.path.join in about 1994.
>> Sure, just like '%' means modulo in Python, but it seems to have
>> found a home in printf-style string expansion.
Just> True, but string expansion is quite old (possibly even Python 0.9
Just> or 1.0?), so most people are used to it. (Although, newbies
Just> without a C background are usually baffled by it. I know I was,
Just> back then...).
Just because a better alternative to os.path turns up now is no reason to
discount it.
Nonetheless, before anything like Jason's path module is incorporated into
the standard distribution, a PEP is almost certainly required. I imagine
there are some things which could be done better (or at least differently)
to make the overall module more acceptable.
Skip
> Just> But when the items are variables, what you read is not what you
> Just> get. Often you'll want (some) literals, and then you get
>
> Just> path = basePath/"a"/"b"/"c"
>
> Just> ...and _that_ I find quite horrible...
>
> I don't know for sure, but I suspect the above could also be
>
> path = basePath/"a/b/c"
Ooh, it _can_ get worse ;-/
Also: this would not be portable on platforms not using / as os.sep, so
is almost equivalent to not using os.path at all and doing
path = basePath + "/a/b/c"
> Still not perfect, but in any case, the '/' is meant to be
> suggestive, not literal. Perhaps you would have preferred he use
> ':'? ;-)
Heh...
> Just> (Did I mention that / usually means divide in Python? ;-)
>
> Sure, just like '%' means modulo in Python, but it seems to have
> found a home in printf-style string expansion.
True, but string expansion is quite old (possibly even Python 0.9 or
1.0?), so most people are used to it. (Although, newbies without a C
background are usually baffled by it. I know I was, back then...).
Just
Just> But when the items are variables, what you read is not what you
Just> get. Often you'll want (some) literals, and then you get
Just> path = basePath/"a"/"b"/"c"
Just> ...and _that_ I find quite horrible...
I don't know for sure, but I suspect the above could also be
path = basePath/"a/b/c"
Still not perfect, but in any case, the '/' is meant to be suggestive, not
literal. Perhaps you would have preferred he use ':'? ;-)
Just> (Did I mention that / usually means divide in Python? ;-)
Sure, just like '%' means modulo in Python, but it seems to have found a
home in printf-style string expansion.
Skip
> (Did I mention that / usually means divide in Python? ;-)
I believe you did... But doesn't % usually mean modulo in Python?
:^)
Nick
--
# sigmask | 0.2 | 2003-01-07 | public domain | feed this to a python
print reduce(lambda x,y:x+chr(ord(y)-1),'Ojdl!Wbshjti!=obwAqbusjpu/ofu?','')
Note that this overlaps a bit with urllib and urllib2. Just something
that would need thinking about.
John
I believe you'll find it's more common for it to mean "format",
but the point is that it is well understood that it means *either*,
depending on context.
Using / for this new concatenation-like behaviour is tantamount
to adding new syntax to Python again... :-(
-Peter
sure, i used *urllib* under the hood :-)
holger
Hallvard> device:<dir.dir.dir>file.ext;version
Hallvard> where most components were optional.
which is (not too surprisingly) very similar to VMS:
device:[dir.dir.dir]file.ext;version
Skip
> Just> Also: this would not be portable on platforms not using / as
> Just> os.sep, ...
>
> Not necessarily.
True, but it takes guessing: "did the author really mean to specify
several path components here, or is he/she intentionally using the unix
path separater in file names on a platform that _does't_ use / as the
path separator?". In the face of ambiguity etc.
> My guess (again, without trying it) is that it does
> the right thing.
It doesn't...
> Nonetheless, before anything like Jason's path module is incorporated
> into the standard distribution, a PEP is almost certainly required.
> I imagine there are some things which could be done better (or at
> least differently) to make the overall module more acceptable.
Absolutely. I really like Holger's observation that having path objects
levels the road to virtual file systems (eg. use a zip file as a file
system). Anyone intersted in volunteering to write that PEP? I'd like to
contribute in some way, but I'm not going to write more than 1 PEP per
year :-).
Just
True... and as I think about it, a lot of the actually interesting
string methods wouldn't be performed on the entire path anyway. Things
like path.name.startswith('img'), or path.ext == 'jpg'. You could also
do things like override equals, so that two Windows paths would match
case-insensitively, and other things that would be bad to change in a
string subclass.
> I find overloading / unneccesary, since it's not like you'll have to
> write
>
> a.join(b).join(c).join(d)
>
> but rather
>
> a.join(b, c, d)
>
> which I don't think is all that bad.
And you could reuse the join method, which is better than joinpath (but
joinpath is better than overriding string's join, if you are subclassing
string).
> Btw. long ago (2001, before Jason's path module) I posted a balloon to
> python-dev about this subject:
> http://mail.python.org/pipermail/python-dev/2001-August/016663.html
>
> There was hardly any response, and Guido said we wasn't too enthusiastic
> about the idea, so we might have to work hard to pull this off ;-).
>
> Also, with unicode, Guido's suggestion that any object that implements
> __str__ can be used with open() seems to be no longer true. At least I
> can't get it to work.
You don't want to use open() anyway, that breaks the possibility of
alternate filesystems. There should be an open method, like
path.open('w'). Then a URL object (called maybe url?) would also have
an open method, that obviously would do a much different thing. (And
just as I'm thinking of a url class, things like .exists() would be
surprisingly useful, even though urllib doesn't expose these file-like
operations very directly)
The only other way is if a new magic method -- __open__ -- came into
being. That would be interesting (where "interesting" can be read
several ways ;).
Ian
I'm starting to think the same thing. Not so much because of join, but
because it doesn't actually offer many advantages. Many methods that
look for a filename will be using "type(arg) is type('')", so you'd have
to pass a real string object in anyway -- and people who say "but you
should use isinstance(arg, str)" are obviously forgetting that you
couldn't do this not very long ago, and lots of code uses type
comparison at this point.
> Also, my module provides most of the os.path.* methods as "filters" so
> you can say
>
> dirs = filter(isdir, list_obj_pathobjects)
> fnames = filter(AND(nolink, isfile), list_obj_pathobjects)
>
> in addition to
>
> pathobject.isfile()
> etc.
That's not necessary with list comprehension, since you can just do:
[p for p in list_obj_pathobjects if p.isdir()]
> Recently, i also did some experimentation with "virtual-fs" features so
> that you can transparently access http/ftp/svn files/directories. I even
> got that to work with "<tab>-completion" but that was quite a hack :-)
>
> I am pretty sure that virtual-fs-like-extensibility would be a big
> "selling" point and would motivate the use of such a module and
> finally the inclusion into the stdlib. Of course, the local-fs should
> be the convenient case but it shouldn't be hard to use the same methods
> for accessing remote "repositories".
Yes, virtual filesystems are certainly an important idea here. Almost
makes me wonder if path() should also recognize URLs by default...
probably not, as that isn't always desired, and a URL is going to create
a significantly different object than a mere filesystem path, even
though its interface will be very similar.
Ian
Which the Amiga's version borrows (borrowed) heavily from:
device:dir/dir/dir/file (no filename extensions)
The nifty part was that the <device:> could be the name
of a physical device (such as "DF0:" meaning "floppy drive 0",
or a user-defined logical device (such as "LIBS:").
The latter was usually called an "assign", and you could let
it point to *multiple* locations in your filesystem *at once*.
Sorry--way off topic here, but the above path syntax
brings back some good memories ;-)
--Irmen de Jong
Interesting, but I think a bad idea. I don't believe Python has been
ported to Tops-20, and I'm not sure if there's a viable VMS port
either. Most filesystems don't have the complexity that the Lisp
pathname encapsulates. If someone was using VMS paths, I would assume
they would subclass path for that OS, adding the portions that applied.
I think it's unreasonable to expect people programming on normal
platforms to pay attention to components like version, so even including
it in a structured manner is asking for trouble.
On some level such filesystems would probably be supportable, though.
You just wouldn't adapt the filesystem's native structure, though
presumably your os module would know how to parse such a path and emit
such a path. But like you can use / instead of \ for filenames on
Windows, I would expect / to work on most other filesystems as well.
Ian
> Interesting, but I think a bad idea. I don't believe Python has been
> ported to Tops-20, and I'm not sure if there's a viable VMS port
> either. Most filesystems don't have the complexity that the Lisp
> pathname encapsulates. If someone was using VMS paths, I would assume
> they would subclass path for that OS, adding the portions that applied.
> I think it's unreasonable to expect people programming on normal
> platforms to pay attention to components like version, so even including
> it in a structured manner is asking for trouble.
There is talk that Windows will have versioning in its next filesystem
(WinFS). It would surprise me if there weren't similar plans on the
Linux side.
--
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 (800) 735-0555
right. Also i prefer my objects to not have a "polluted" namespace.
> > Also, my module provides most of the os.path.* methods as "filters" so
> > you can say
> >
> > dirs = filter(isdir, list_obj_pathobjects)
> > fnames = filter(AND(nolink, isfile), list_obj_pathobjects)
> >
> > in addition to
> >
> > pathobject.isfile()
> > etc.
>
> That's not necessary with list comprehension, since you can just do:
>
> [p for p in list_obj_pathobjects if p.isdir()]
but i use the same idea (filter-functions) for more advanced walkers:
p = path('/music')
for i in p.filterwalk(AND(nolink, isfile, isplayable, match(repattern))):
play_mp3(i)
where filterwalk is a generator because i don't want the playscript to
first try to gather *all* files for obvious reasons (as would happen with
list comprehension). This has proven to be incredibly useful and easy to
read (if you don't engange in list-comprehension <-> functional-style
wars). Just because Guido somewhat dislikes "functional support" like
lambda, map, filter and friends to be in the __builtin__ module
doesn't mean it's bad :-)
cheers,
holger
Right. Reiserfs plans this and Subversion has it (speaking about URLs here, not
only local pathes). But i think the way you specify versions will be
vastly different so the best bet probably is to pass an additional argument to
a path-like object, e.g. for subversion
fn = svnpath('py.py', rev=1050)
or remotely
fn = svnpath('http://.../py.py', rev=7362)
where 'rev' specifies a revision number (which identifies exactly
one state of a subversion-repository). Other than that, a 'svnpath'
could probably behave like a regular local 'path' object, i guess.
Hmmm, the above wouldn't be hard to do because svn has python bindings
on all levels ... but enough advertisement :-)
Either way, i believe that path/file versioning deserves some thoughts
as it might be a "next big thing" (besides java and .NET, of course :-)
cheers,
holger
from filesystems import fs_win32share # might e.g., use or clone samba infrastructure?
puterfs = fs_win32share.mount(r'\\puter\sharename') #(cf. virtual block device steps below)
p = puterfs.path(r'\music')
for i in p.filterwalk(AND(nolink, isfile, isplayable, match(repattern))):
play_mp3(i)
... i.e., play mp3's stored as windows shared files via LAN.
Taking a cue from os.path, which is posixpath for slackware linux, and ntpath for NT4, perhaps
they could be callable as os.path('/some/path/to/a/dir') to create a path object suitable for
the default file system, referring to the specified directory. A path object could then
have a file method, and the builtin file function might really be the bound method os.path('').file.
It isn't right now, so I write it out below to be clear. An useful binding might be os.file also.
from filesystems import f_as_blockdev
a_drive_vbd = f_as_blockdev.mount(os.path('').file(r'\\.\A:')) # physical NT floppy A
from filesystems import fs_apple
applefdfs = fs.apple.mount(a_drive_vbd)
srcp = applefdfs.path(r'\music')
dstp = os.path('.')
for i in p.filterwalk(AND(nolink, isfile, isplayable, match(repattern))):
dstp.copy(i)
or maybe looking at a CDROM file as an apple floppy image, and copying some files to the
local file system, e.g.,
from filesystems import f_as_blockdev
a_drive_img_vbd = f_as_blockdev.mount(os.path('').file(r'X:\apple\floppies\fdimg.1')) # CDROM X:
from filesystems import fs_apple
applefdfs = fs.apple.mount(a_drive_img_vbd)
srcp = applefdfs.path(r'\music')
dstp = os.path('.') # assumes callable instantiates path object for default local file system
for i in p.filterwalk(AND(nolink, isfile, isplayable, match(repattern))):
dstp.copy(i) # assumes copy method copying to './filename' for filter-surviving filenames.
Etc., etc., (ok, very short songs on floppy ;-)
I.e., path magic would be file-system-appropriate, yet provide a uniform generic interface
with reasonable (but presumably configurable in some file-system-specific ways) defaults.
File systems would be mounted using virtual block devices, unless the mount method for the
file system can reasonably bypass that to synthesize a file system object using non-block/char
device access, e.g., as a local proxy object for a remote file system (or virtual file-system
view, e.g., of an html href/imgref tree or news thread, or database, etc. etc.).
<ramble>
Note that a virtual file system could well have GUI side effects when written to. The frame buffer
device would be interesting to capture access to via a virtual file system module along the lines of
the above. Note also that GUI windows have a hierarchy that could map to a virtual file object hierarchy.
Imagine a virtual svg file system mounted to an x-window instance, so that when you wrote svg source
to it you would get the visual effects in that window. Alternatively, mount on a full-screen virtual
device, etc. Creating virtual sub-"directories" could create child windows...
I see various plotting packages factored into this form as well. And binary mode writes for fast
stuff. For grahic vfs's IWT their mount methods should accept a virtual frame buffer device, so
that something transparent and fast can ultimately talk almost directly to hardware. Maybe it
could be prototyped using tkinter infrastructure, though. Or pygame/sdl.
I've been thinking of prototyping a simple plotting graphic vfs along the above lines. Maybe its
calcomp driver writing nostalgia (I wrote a rasterizing driver to plot calcomp command streams on
a versatec 'way back. Of course a random access canvas will make it easier (there wasn't space
to brute force a full image and then feed the raster plotter ;-)
</ramble>
Just a couple of thoughts to throw in the idea hopper (this variation HOTTOMH, so very alpha ;-)
Too many irons in the coals (not to say fire ;-/)
Regards,
Bengt Richter
> Interesting, but I think a bad idea. (...) If someone was using VMS
> paths, I would assume they would subclass path for that OS, adding the
> portions that applied.
It would be pointless to include _data structures_ for components that
are not supported on any system Python is ported to, but for subclassing
to make sense, some of the _interface_ would have to be in place. Like
the possibility of usin path.joindir() vs. path.joinfile() or something
depending on whether the result should be a file or directory path. And
just path.join() for people who don't care. Assuming there will be a
join method, of course.
Also, you may need some special handling of 'device:' on Windows.
> I think it's unreasonable to expect people programming on normal
> platforms to pay attention to components like version, so even
> including it in a structured manner is asking for trouble.
I dunno. People have already mentioned coming systems where versions
will be availbale.
--
Hallvard
Since *no one* will ever use joindir or joinfile, why would it be
helpful? Modern systems just don't make that distinction, and people
aren't going to make that distinction in their code.
> Also, you may need some special handling of 'device:' on Windows.
Yes, and the network portion as well (\\server\...). However, it would
still be handled textually. I.e., path(r'\\server') would happen to
create this network path, and path(r'\something_else') wouldn't. The
Windows implementation of path would presumably have an attribute to get
at "server" (.unc or something), while you'd get an AttributeError on
Posix systems.
> > I think it's unreasonable to expect people programming on normal
> > platforms to pay attention to components like version, so even
> > including it in a structured manner is asking for trouble.
>
> I dunno. People have already mentioned coming systems where versions
> will be availbale.
But we have no idea what it will look like, or how it may be represented
in a filename (if at all!) -- so implementing something based on that
would be a little optimistic. You're likely to create an interface that
won't make sense. Better to leave it unspecified until there's an
actual system you want to support, at which point the interface will
seem much clearer. Predictive design is a very bad idea.
Ian
Is it available anywhere? It would be nice to be able to try both, for comparison.
Paul.
> sorry, not right now. I'll try to make a release soonish.
>
> holger
>
I cannot access the path.py module from
http://www.jorendorff.com/articles/python/path/
Someone can be so kind to email it to me?
Thanks in advance!
---
Paolo Invernizzi
paoloinvernizzi at dmsware.com
Florian
--
Florian Schulze
> Try google with "cache:http://..." this worked for me.
Done.
Thanks for the advice.
---
Paolo
Because we disagree about whether or not anyone will use it:-)
>> Also, you may need some special handling of 'device:' on Windows.
>
> Yes, and the network portion as well (\\server\...). However, it would
> still be handled textually.
FIne by me. I wasn't thinking of what the internals would look like at
all.
>>> I think it's unreasonable to expect people programming on normal
>>> platforms to pay attention to components like version, so even
>>> including it in a structured manner is asking for trouble.
>>
>> I dunno. People have already mentioned coming systems where versions
>> will be availbale.
>
> But we have no idea what it will look like, or how it may be represented
> in a filename (if at all!) -- so implementing something based on that
> would be a little optimistic. You're likely to create an interface that
> won't make sense. Better to leave it unspecified until there's an
> actual system you want to support, at which point the interface will
> seem much clearer. Predictive design is a very bad idea.
Actually I disagree here. The danger of designing to an existing system
is that another system may come along where the versioning doesn't fit
our design. I think it's a good idea to design it - but not necessaril
implement it - before we see how it works on a real system. Then the
real system comes will either prove that we did good enough, or that we
didn't. In the latter case, it may be better to leave it out anyway.
--
Hallvard
There was some discussion on it here:
http://groups.google.com/groups?th=42ab4db337b60ce3
Just a few comments:
Ian and Holger wondered why 'path' should subclass 'str'. It's because
a path is a string. Benefit: you can pass 'path' objects to functions
that expect strings (like functions in 'win32file'). I find this
really useful in practice.
I agree with Just that 'path' shouldn't override '__iter__()'. I'll
change this eventually.
I think Just is the first to argue that 'path / filename' is confusing.
I find it intuitive. Other people have chosen / for this purpose,
independently: see the O'Reilly book _Python Cookbook_ [1], recipe 4.17,
and the Boost path object [2].
I do believe 'path' should be in the standard library (if not builtin).
I enjoy it and I use it all the time. My perception is that the Python
core dev team doesn't see any particular need for it. If anyone wants me
to, I'll write the PEP.
for f in path('/music').walkfiles('*.mp3'):
play_mp3(f)
Cheers,
Jason
[1] http://safari.oreilly.com/?xmlid=0-596-00167-3
[2] http://www.boost.org/libs/filesystem/doc/path.htm#operator_slash
IMO you'll almost never use the following string-methods on a 'Path' object:
capitalize center count decode encode
expandtabs find index isalnum isalpha isdigit
islower isspace istitle isupper
ljust lstrip rjust splitlines startswith
swapcase title translate zfill
and so these methods pollute a Path object's name-space quite a bit.
Also 'join', '__contains__', startswith etc. produce some ambigouity.
I think it's convenient enough to use "str(path)" if passing a 'path'
instance as a string somewhere.
cheers,
holger
If the path object has a __str__ method, apparently it should work
without explicit conversion. However, this seems to fail for me on OSX,
where an attempt is made to convert to unicode. Providing a __unicode__
method doesn't help. But then again, I think we'd be fine if we add the
most used path-taking functions to the path object as methods. I can
even see adding some win-specific methods to it.
Just
Just wrote:
> holger krekel <py...@devel.trillke.net> wrote:
> > I think it's convenient enough to use "str(path)" if passing a 'path'
> > instance as a string somewhere.
>
> If the path object has a __str__ method, apparently it should work
> without explicit conversion. However, this seems to fail for me on OSX,
> where an attempt is made to convert to unicode. Providing a __unicode__
> method doesn't help. But then again, I think we'd be fine if we add the
> most used path-taking functions to the path object as methods. I can
> even see adding some win-specific methods to it.
Yes, i think adding platform specific methods to a Path object makes sense.
A friend and me started working on (local and subversion) Path
implementations last week. Currently a Path instance provides
these "path-taking" methods
open
read
write
visit (a recursive walker)
listdir
stat
load/save (unpickle/pickle object)
setmtime (set modification time, uses os.utime)
apart from all the os.path.* stuff like 'exists', 'dirname' etc.
Providing these "path-taking" methods on the Path object is very important
because otherwise you'll have to convert back and fro for using those
os.* and os.path.* or builtin methods (which is evil).
cheers,
holger
holger> capitalize center count decode encode
holger> expandtabs find index isalnum isalpha isdigit
holger> islower isspace istitle isupper
holger> ljust lstrip rjust splitlines startswith
holger> swapcase title translate zfill
In practice, I rarely use the above methods on string objects. The only
exception is startswith. ;-)
Skip
I feel like this would lead to some annoying behavior in some
circumstances. Most particularly, I'm thinking of:
def dosomething(file):
if type(file) is type(""):
file = open(file)
...
This isn't uncommon in functions that take pathnames or file objects.
While isinstance(path, str) works, it was not an option until 2.2. So
you'd be forced to do str(pathname) sometimes anyway, to deal with this.
Ideally, interfaces would be changed to use a .open() method on the path
instead of opening the string representation (as Holger's implementation
does), so in the long term it would be nice to abandon direct string
representations entirely. It would also make it more clear when you had
a real path object and when you just had a string.
Ian
I like read and write too -- I do:
f = open(filename)
contents = f.read()
f.close()
All the time (when I'm uninterested in streaming or performance, which
is most of the time I deal with files). Or just open(filename).read()
and let garbage collection fix it up, even if it seems a little messy.
A single method to encapsulate that would be nice, and of course write
gives symmetry. Hmmm... Jason's distinguishes bytes (binary) and text
(which is potentially encoded). I kind of like that distinction.
Jason had walkers both for all files, just non-directory files, and
directory files. This seems useful to me, and by making it explicit I
might just start distinguishing text from binary (which I don't now
because I am forgetful). And a globbing walker, though I don't know how
much of an advantage that would be over list comprehension. Actually,
all his walkers have a globbing option.
> apart from all the os.path.* stuff like 'exists', 'dirname' etc.
> Providing these "path-taking" methods on the Path object is very important
> because otherwise you'll have to convert back and fro for using those
> os.* and os.path.* or builtin methods (which is evil).
dirname is a good name, since it should return a path object, not a
"name" (which to me implies a string). I think Jason's module uses a
parent attribute, though it also supports dirname(), and a name
attribute instead of basename() (though that does not return a path
object). And things like dirname make less sense in some non-path
situations, like a URL. Probably not too much renaming should occur,
but at least a little may be appropriate.
Ian
We currently only have one 'visit' method that accepts a filter for returning
results and a filter for recursing into the tree. You can use and
combine multiple filters like so:
root = Path('...)
for path in root.visit(AND(isdir, nolink)):
# iterates over all non-link dirs in the tree (breadth-first)
or
for path in root.visit(AND(isfile, endswith('.txt')), nodotfile):
# iterates over all '*.txt' files but not recursing into ".*"
and so on. This proved to be flexible and convenient and mostly avoids
the need for multiple walk-methods.
cheers,
holger
Yeah... but we know that's not going to get into the standard library.
It requires a big namespace, logic functions (AND, OR, etc.), and it
confuses functions with these filter objects, which are named the same
(and even if the filter objects can be used as functions, it's still
confusing). It's a style that doesn't exist in the standard library,
and it seems unlikely that it would get in here.
The multiple walk methods would only be a shortcut anyway. Again, they
might be difficult in a situation like a URL where directory and file
are intermingled (and maybe ReiserFS 4...?) -- which maybe is okay, a
urlpath object simply wouldn't implement that walker.
Ian
Maybe right. This is not my first priority, anyway, but i also thought that
functional style is just not liked among the builtins.
Anyway, the "filter functions" are indeed just callables which accept
Path objects. You could as well take the unbound method Path.isdir
but this feels ugly and isn't flexible enough.
I don't exactly know what you mean by "big namespace". The filters are
all contained in a 'filter' submodule because they can apply to
multiple Path implementations anyway.
> The multiple walk methods would only be a shortcut anyway. Again, they
> might be difficult in a situation like a URL where directory and file
> are intermingled (and maybe ReiserFS 4...?) -- which maybe is okay, a
> urlpath object simply wouldn't implement that walker.
Yep, URL pathes have no notion of directories and files. Thus a general
URL path can't have a 'listdir' method and thus we can't recurse.
You can easily special case it for Apache's "Indexes" view, though :-)
holger
I'm not worried about "namespace pollution", but you're right that
strings and paths are generally used for different things. I also
agree 'join()' is a wart.
> I think it's convenient enough to use "str(path)" if passing a 'path'
> instance as a string somewhere.
Hmmm. If the plan were to convert the whole standard library to accept
path objects for pathnames, I would likely agree. But when you say
"str(p)" is "convenient enough", you're saying I need this rule in my head:
Don't pass path objects to functions that take path arguments.
Pass string objects instead.
This is a type rule. Such a thing has no place in Python.
Furthermore, this rule is counterlogical! I would have to change
"mimetypes.guess_type(mypath)" to "mimetypes.guess_type(str(mypath))".
-- j
Or even better, call the appropriate Path method :-)
> This is a type rule. Such a thing has no place in Python.
Oh, the stdlib has lots of places where it expects certain types in
certain places. Look for e.g. 'isinstance'.
> Furthermore, this rule is counterlogical! I would have to change
> "mimetypes.guess_type(mypath)" to "mimetypes.guess_type(str(mypath))".
I'd just call this a little inconvenient. And i wouldn't mind adding
a guess_type method (which would work even better for URL's or
subversion-urls).
cheers,
holger
> for path in root.visit(AND(isdir, nolink)):
> for path in root.visit(AND(isfile, endswith('.txt')), nodotfile):
I've used the AND trick before, as well as tricks to support "isdir &&
nolink".
Still, as these things get more complicated, its easier to just do
for path in root.visit(lambda name: isfile(name) and name.endswith(".txt"))
-or-
def myfilter(name):
return isfile(name) and name.endswith(".txt")
for path in root.visit(myfilter):
rather than use an prefix-style function interface.
This doesn't introduce any new programming styles, which makes it
easier to understand.
The exception is if the result builds up some sort of parse tree which
can be further analyzed for performance, which is not the case here.
Andrew
da...@dalkescientific.com
Interesting, I started a project modifying Jason's Path module to work
on subversion trees as well. I didn't get too far before putting the
project on a back-burner so I'm glad to hear someone else is thinking
the same way :)
My extensions to Path included an additional argument to "open" that
included a version number, and a mechanism for retrieving some kind of
"metadata" associated with the file.
I also made another Path module that implements a "poor mans cms" if
subversion/rcs/cvs are not available. It uses hidden files with version
numbers in the filename to emulate a real version control system.
Van Gale
It's interesting that different kinds of filesystems (or
filesystem-like-things) have very different kinds of metadata
available. Like last-modified, last-accessed, inode (identity),
version, title, branch, mimetype, log message, etc. And then there's
information that's not quite metadata... like <link ref> data, or the
volume name, the host, etc.
I feel like a common interface for these different filesystems should
somehow degrade well in terms of metadata, or expedite introspection in
some fashion.
The differences on the client side are probably easier to handle, as
they can be handled by the constructor, which might look different for
different filesystems. Like url('http://whatever', user='bob',
password='secret', proxy='http://myproxy'), or
cvs(pserver='cvs.sourceforge.net', repository='python'). Or should
there be a string-based representation (i.e., URIs)? Of course for
symmetry then __str__ would always return a URI, but for many
circumstances we'd prefer a more concise notation, like a filesystem
path (though most other cases would be acceptable squeezed into URIs).
I'd have placed the version in the object itself, not as an argument to
open. Then you'd want to query for alternate versions, most recent
version -- maybe some version identifier that meant most recent... a
similar situation might be language negotiation with an HTTP file.
Ian
WebDAV does, though, doesn't it? But you can still edit the directory
resource, so it gets overloaded. WebDAV's use of GET is messed up.
And we should specify HTTP, of course, since FTP does have a notion of
directories, and possibly other URL methods would as well.
But this is a digression...
Ian
>On Fri, 2003-07-25 at 20:10, Van Gale wrote:
>> Interesting, I started a project modifying Jason's Path module to work
>> on subversion trees as well. I didn't get too far before putting the
>> project on a back-burner so I'm glad to hear someone else is thinking
>> the same way :)
>>
>> My extensions to Path included an additional argument to "open" that
>> included a version number, and a mechanism for retrieving some kind of
>> "metadata" associated with the file.
>
>It's interesting that different kinds of filesystems (or
>filesystem-like-things) have very different kinds of metadata
>available. Like last-modified, last-accessed, inode (identity),
>version, title, branch, mimetype, log message, etc. And then there's
>information that's not quite metadata... like <link ref> data, or the
>volume name, the host, etc.
>
>I feel like a common interface for these different filesystems should
>somehow degrade well in terms of metadata, or expedite introspection in
>some fashion.
>
IMO a mounted file system per se should be represented by an object, and then
that object should have the methods to deliver generic or file-system-specific
file and path and walking objects etc.
After all, even NT can see DOS partitions, vs NTFS vs raw floppy
and HD images of potentially foreign formats. And my slackware linux sees one DOS partition
that can be alternately booted, but can read from slackware via a mount.
Cf. another post in this thread (which didn't get any response ;-)
Regards,
Bengt Richter
It's even working although i am not sure we stay with the
subversion-python bindings as they are fragile and incomplete at places.
We might switch to using the commandline "svn" utility for the time beeing.
> My extensions to Path included an additional argument to "open" that
> included a version number, and a mechanism for retrieving some kind of
> "metadata" associated with the file.
We instantiate the Path like so
path = SvnPath('http://codespeak.net/svn/vpath/trunk/dist', rev=X)
where X is either -1 (default) meaning it should grab the latest
revision or some positive revision number. When you 'visit' or
'listdir' or 'open' on that 'path' you stay in the same revision
and thus get a consistent view. this is obviously a nice property.
Btw, via the above URL you'll get our current implementation with
lots of unittests. You currently need subversion-python-bindings
which are not exactly easy to get going unless you already have a
server-side install.
> I also made another Path module that implements a "poor mans cms" if
> subversion/rcs/cvs are not available. It uses hidden files with version
> numbers in the filename to emulate a real version control system.
I thought about this too. Right now we just want to make it easy and
complete enough.
cheers,
holger
I am not very familiar with the low-level details of WebDAV but i think
determining if something is a directory is done by a PROPGET command.
cheers,
holger
Ian Bicking wrote:
> ...
> I feel like a common interface for these different filesystems should
> somehow degrade well in terms of metadata, or expedite introspection in
> some fashion.
I wouldn't try to play too many tricks with meta-data.
> The differences on the client side are probably easier to handle, as
> they can be handled by the constructor, which might look different for
> different filesystems. Like url('http://whatever', user='bob',
> password='secret', proxy='http://myproxy'), or
> cvs(pserver='cvs.sourceforge.net', repository='python'). Or should
> there be a string-based representation (i.e., URIs)?
String-based representations are often not specified. e.g.
subversion/webdav/deltax don't define a URL format to get
to a certain revision. The other approach (keyword-args) is
a more generic and better way IMO.
cheers,
holger
> Jason Orendorff wrote:
[...about passing path objects to library methods that expect a string...]
> > This is a type rule. Such a thing has no place in Python.
>
> Oh, the stdlib has lots of places where it expects certain types in
> certain places. Look for e.g. 'isinstance'.
It's not even a strict type rule. It's just that a path object
wouldn't implement the string interface. I don't know why that would
have 'no place in Python', or be 'counterlogical'.
John