[Python-Dev] PEP 428 - pathlib API questions

3 views
Skip to first unread message

Ben Hoyt

unread,
Nov 24, 2013, 5:00:09 PM11/24/13
to Python-Dev
PEP 428 looks nice. Thanks, Antoine!

I have a couple of questions about the module name and API. I think
I've read through most of the previous discussion, but may have missed
some, so please point me to the right place if there have already been
discussions about these things.

1) Someone on reddit.com/r/Python asked "Is the import going to be
'pathlib'? I thought the renaming going on of std lib things with the
transition to Python 3 sought to remove the spurious usage of
appending 'lib' to libs?" I wondered about this too. Has this been
discussed/answered?

2) I think the operation of "suffix" and "suffixes" is good, but not
so much the name. I saw Ben Finney's original suggestion about
multiple extensions etc
(https://mail.python.org/pipermail/python-ideas/2012-October/016437.html).

However, it seems there was no further discussion about why not
"extension" and "extensions"? I have never heard a filename extension
being called a "suffix". I know it is a suffix in the sense of the
English word, but I've never heard it called that in this context, and
I think context is important. Put another way, "extension" is obvious
and guessable, "suffix" isn't.

3) Obviously pathlib isn't going in the stdlib in Python 2.x, but I'm
wondering about writing portable code when you want the string version
of the path. In Python 3.x you'll call str(path_obj), but in Python
2.x that will fail if the path has unicode chars in it, and you'll
need to use unicode(path_obj), which of course doesn't work 3.x. Is
this just a fact of life, or would .str() or .as_string() help for
2.x/3.x portability?

4) Is path_obj.glob() recursive? In the PEP it looks like it is if the
pattern starts with '**', but in the pep428 branch of the code there
are both glob() and rglob() functions. I've never seen the ** syntax
before (though admittedly I'm a Windows dev), and much prefer the
explicitness of having two functions, or maybe even better,
path_obj.glob('*.py', recursive=True).

Seems much more Pythonic to provide an actual argument (or different
function) for this change in behaviour, rather than stuffing the
"recursive flag" inside the pattern string.

Has this ship already sailed with http://bugs.python.org/issue13968?
Which I also think should also be rglob(pattern) or glob(pattern,
recursive=True). Of course, if this ship has already sailed, it's
definitely better for pathlib's glob to match glob.glob.

Thanks,
Ben
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Antoine Pitrou

unread,
Nov 24, 2013, 5:29:21 PM11/24/13
to pytho...@python.org

Hello,

On Mon, 25 Nov 2013 11:00:09 +1300
Ben Hoyt <ben...@gmail.com> wrote:
>
> 1) Someone on reddit.com/r/Python asked "Is the import going to be
> 'pathlib'? I thought the renaming going on of std lib things with the
> transition to Python 3 sought to remove the spurious usage of
> appending 'lib' to libs?" I wondered about this too. Has this been
> discussed/answered?

Well, "path" is much too common already, and it's an obvious variable
name for a filesystem path, so "pathlib" is better to avoid name
clashes.

> 2) I think the operation of "suffix" and "suffixes" is good, but not
> so much the name. I saw Ben Finney's original suggestion about
> multiple extensions etc
> (https://mail.python.org/pipermail/python-ideas/2012-October/016437.html).
>
> However, it seems there was no further discussion about why not
> "extension" and "extensions"? I have never heard a filename extension
> being called a "suffix". I know it is a suffix in the sense of the
> English word, but I've never heard it called that in this context, and
> I think context is important. Put another way, "extension" is obvious
> and guessable, "suffix" isn't.

Well, perhaps :-), but nobody opposed suffix and suffixes at the time.
Note the API is provisional, so we can still make it change, but
obviously the barrier for changes is higher now that the PEP is
accepted and the beta has been cut.

> 3) Obviously pathlib isn't going in the stdlib in Python 2.x, but I'm
> wondering about writing portable code when you want the string version
> of the path. In Python 3.x you'll call str(path_obj), but in Python
> 2.x that will fail if the path has unicode chars in it, and you'll
> need to use unicode(path_obj), which of course doesn't work 3.x.

The behaviour of unicode paths in Python 2 is erratic
(system-dependent). pathlib can't really fix it: Python 2 doesn't know
about a well-defined filesystem encoding.

> 4) Is path_obj.glob() recursive?

This is documented:
http://docs.python.org/dev/library/pathlib.html#pathlib.Path.glob
http://docs.python.org/dev/library/pathlib.html#pathlib.Path.rglob

> Seems much more Pythonic to provide an actual argument (or different
> function) for this change in behaviour, rather than stuffing the
> "recursive flag" inside the pattern string.

It's not a flag, it's a different wildcard. This allows e.g. a library
function to call glob() and users to pass a recursive or non-recursive
pattern as they wish.

> Has this ship already sailed with http://bugs.python.org/issue13968?

This issue is still open, so no :-)

Regards

Antoine.

Ben Hoyt

unread,
Nov 24, 2013, 6:03:38 PM11/24/13
to Antoine Pitrou, Python-Dev
> Well, "path" is much too common already, and it's an obvious variable
> name for a filesystem path, so "pathlib" is better to avoid name
> clashes.

Yep, that makes total sense, thanks.

>> However, it seems there was no further discussion about why not
>> "extension" and "extensions"? I have never heard a filename extension
>> being called a "suffix". I know it is a suffix in the sense of the
>> English word, but I've never heard it called that in this context, and
>> I think context is important. Put another way, "extension" is obvious
>> and guessable, "suffix" isn't.
>
> Well, perhaps :-), but nobody opposed suffix and suffixes at the time.
> Note the API is provisional, so we can still make it change, but
> obviously the barrier for changes is higher now that the PEP is
> accepted and the beta has been cut.

Okay. I won't push hard :-) as "suffix" isn't terrible, but has anyone
else never (or rarely) heard the term "suffix" applied to filename
extensions?

>> 3) Obviously pathlib isn't going in the stdlib in Python 2.x, but I'm
>> wondering about writing portable code when you want the string version
>> of the path. In Python 3.x you'll call str(path_obj), but in Python
>> 2.x that will fail if the path has unicode chars in it, and you'll
>> need to use unicode(path_obj), which of course doesn't work 3.x.
>
> The behaviour of unicode paths in Python 2 is erratic
> (system-dependent). pathlib can't really fix it: Python 2 doesn't know
> about a well-defined filesystem encoding.

Fair enough.

>> 4) Is path_obj.glob() recursive?
>
> This is documented:
> http://docs.python.org/dev/library/pathlib.html#pathlib.Path.glob
> http://docs.python.org/dev/library/pathlib.html#pathlib.Path.rglob
>
>> Seems much more Pythonic to provide an actual argument (or different
>> function) for this change in behaviour, rather than stuffing the
>> "recursive flag" inside the pattern string.
>
> It's not a flag, it's a different wildcard. This allows e.g. a library
> function to call glob() and users to pass a recursive or non-recursive
> pattern as they wish.

Okay, just saw those docs now -- thanks. Fair enough re "it's a
different wildcard". At the least I don't think there should be two
ways to do it -- in other words, either rglob() or glob('**'), both
seems very un-PEP 20 to me. My preference is rglob(), but
glob(recursive=True) would be fine too.

>> Has this ship already sailed with http://bugs.python.org/issue13968?
>
> This issue is still open, so no :-)

Same goes for this issue -- there should be OOWTDI, and my preference
is rglob() or glob(recursive=True). But maybe issue 13968's behaviour
can be determined by pathlib's now that pathlib is the one getting
done first.

Thanks,
Ben.

Greg Ewing

unread,
Nov 24, 2013, 6:06:59 PM11/24/13
to Python-Dev
Ben Hoyt wrote:
> However, it seems there was no further discussion about why not
> "extension" and "extensions"? I have never heard a filename extension
> being called a "suffix".

You can't have read many unix man pages, then! I just
searched for "suffix" in the gcc man page, and found
this:

For any given input file, the file name suffix determines what kind of
compilation is done:

> I know it is a suffix in the sense of the
> English word, but I've never heard it called that in this context, and
> I think context is important.

This probably depends on your background. In my experience,
the term "extension" arose in OSes where it was a formal
part of the filename syntax, often highly constrained.
E.g. RT11, CP/M, early MS-DOS.

Unix has never had a formal notion of extensions like that,
only informal conventions, and has called them suffixes at
least some of the time for as long as I can remember.

> 4) Is path_obj.glob() recursive? In the PEP it looks like it is if the
> pattern starts with '**',

I don't think it has to *start* with **. Rather, the ** is
a pattern that can span directory separators. It's not a
flag that applies to the whole thing -- a pattern could have
a * in one place and a ** in another.

--
Greg

Ben Hoyt

unread,
Nov 24, 2013, 6:12:51 PM11/24/13
to Greg Ewing, Python-Dev
>> However, it seems there was no further discussion about why not
>> "extension" and "extensions"? I have never heard a filename extension
>> being called a "suffix".
>
>
> You can't have read many unix man pages, then!

Huh, no I haven't! Certainly not regularly, as I'm almost exclusively
a Windows user. :-)

> This probably depends on your background. In my experience,
> the term "extension" arose in OSes where it was a formal
> part of the filename syntax, often highly constrained.
> E.g. RT11, CP/M, early MS-DOS.
>
> Unix has never had a formal notion of extensions like that,
> only informal conventions, and has called them suffixes at
> least some of the time for as long as I can remember.

Yes, seems like it definitely is background-dependent. I'm
Windows-centric. I stand corrected, and recant my position on
"suffix". :-)

>> 4) Is path_obj.glob() recursive? In the PEP it looks like it is if the
>> pattern starts with '**',
>
>
> I don't think it has to *start* with **. Rather, the ** is
> a pattern that can span directory separators. It's not a
> flag that applies to the whole thing -- a pattern could have
> a * in one place and a ** in another.

Oh okay, that makes more sense. It definitely needs more thorough
documentation in that case. I would still prefer the simpler and more
explicit rglob() / recursive=True rather than pattern new syntax, but
I don't feel as strongly anymore.

-Ben

Nick Coghlan

unread,
Nov 24, 2013, 6:35:29 PM11/24/13
to Ben Hoyt, pytho...@python.org


On 25 Nov 2013 09:14, "Ben Hoyt" <ben...@gmail.com> wrote:
>
> >> 4) Is path_obj.glob() recursive? In the PEP it looks like it is if the
> >> pattern starts with '**',
> >
> >
> > I don't think it has to *start* with **. Rather, the ** is
> > a pattern that can span directory separators. It's not a
> > flag that applies to the whole thing -- a pattern could have
> > a * in one place and a ** in another.
>
> Oh okay, that makes more sense. It definitely needs more thorough
> documentation in that case. I would still prefer the simpler and more
> explicit rglob() / recursive=True rather than pattern new syntax, but
> I don't feel as strongly anymore.

Using "**" for directory spanning globs is also another case of us borrowing a reasonably common idiom from *nix systems that may not be familiar to Windows users.

Cheers,
Nick.

>
> -Ben
> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> https://mail.python.org/mailman/listinfo/python-dev

> Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

Ben Hoyt

unread,
Nov 24, 2013, 6:42:08 PM11/24/13
to Nick Coghlan, Python-Dev
> Using "**" for directory spanning globs is also another case of us borrowing
> a reasonably common idiom from *nix systems that may not be familiar to
> Windows users.

Okay, *nix wins then. :-) Python's stdlib is already fairly
*nix-oriented (even when it's being cross-platform), so I guess it's
not a big deal.

My only remaining concern then is that there shouldn't be more than
one way to do recursive globbing in a new API like this. Why does
rglob() exist when the documentation simply says "like calling glob()
but with '**' added in front of the pattern"?

http://docs.python.org/dev/library/pathlib.html#pathlib.Path.rglob

-Ben
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Nick Coghlan

unread,
Nov 24, 2013, 10:15:57 PM11/24/13
to Ben Hoyt, pytho...@python.org


On 25 Nov 2013 09:42, "Ben Hoyt" <ben...@gmail.com> wrote:
>
> > Using "**" for directory spanning globs is also another case of us borrowing
> > a reasonably common idiom from *nix systems that may not be familiar to
> > Windows users.
>
> Okay, *nix wins then. :-) Python's stdlib is already fairly
> *nix-oriented (even when it's being cross-platform), so I guess it's
> not a big deal.
>
> My only remaining concern then is that there shouldn't be more than
> one way to do recursive globbing in a new API like this. Why does
> rglob() exist when the documentation simply says "like calling glob()
> but with '**' added in front of the pattern"?
>
> http://docs.python.org/dev/library/pathlib.html#pathlib.Path.rglob

Because it's a layered API - embedding ** in the pattern is a strictly more powerful interface, but can be a little tricky to get your head around (especially if you don't use a shell that has the feature). rglob() is simpler, but not as flexible.

We offer that kind of multi-level API fairly often. For example, subprocess.call() and friends are simpler interfaces for particular ways of using the powerful-but-complex subprocess.Popen API. The metaprogramming stack (functions, classes, decorators, descriptors, metaclasses) similarly offers the ability to trade increased complexity for increases in power and flexibility.

In these cases, the "obvious way" is to use the simplest API that covers the use case, and only reach for the more complex API when you genuinely need it.

Cheers,
Nick.

>
> -Ben

Serhiy Storchaka

unread,
Nov 25, 2013, 2:51:02 AM11/25/13
to pytho...@python.org
25.11.13 01:35, Nick Coghlan написав(ла):

> Using "**" for directory spanning globs is also another case of us
> borrowing a reasonably common idiom from *nix systems that may not be
> familiar to Windows users.

Rather from Java world.


_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Charles-François Natali

unread,
Nov 25, 2013, 3:12:37 AM11/25/13
to Greg Ewing, Python-Dev
2013/11/25 Greg Ewing <greg....@canterbury.ac.nz>:
> Ben Hoyt wrote:
>>
>> However, it seems there was no further discussion about why not
>> "extension" and "extensions"? I have never heard a filename extension
>> being called a "suffix".
>
>
> You can't have read many unix man pages, then! I just
> searched for "suffix" in the gcc man page, and found
> this:
>
> For any given input file, the file name suffix determines what kind of
> compilation is done:
>
>
>> I know it is a suffix in the sense of the
>> English word, but I've never heard it called that in this context, and
>> I think context is important.
>
>
> This probably depends on your background. In my experience,
> the term "extension" arose in OSes where it was a formal
> part of the filename syntax, often highly constrained.
> E.g. RT11, CP/M, early MS-DOS.
>
> Unix has never had a formal notion of extensions like that,
> only informal conventions, and has called them suffixes at
> least some of the time for as long as I can remember.

Indeed.
Just for reference, here's an extract of POSIX basename(1) man page [1]:
"""
SYNOPSIS

basename string [suffix]

DESCRIPTION

The string operand shall be treated as a pathname, as defined in XBD
Pathname. The string string shall be converted to the filename
corresponding to the last pathname component in string and then the
suffix string suffix, if present, shall be removed.
"""

[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/basename.html


cf
Reply all
Reply to author
Forward
0 new messages