[Python-Dev] pathlib handling of trailing slash (Issue #21039)

428 views
Skip to first unread message

Isaac Schwabacher

unread,
Aug 6, 2014, 7:46:59 PM8/6/14
to pytho...@python.org
pathlib.Path currently strips trailing slashes from pathnames, but this behavior contradicts POSIX (http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_12), which specifies that the resolution of the pathname of a symbolic link to a directory in the context of a function that operates on symbolic links shall depend on whether the pathname has a trailing slash:

> 4.12 Pathname Resolution
> ========================
>
> [...]
>
> A pathname that contains at least one non- <slash> character and that ends with one or more trailing <slash> characters shall not be resolved successfully unless the last pathname component before the trailing <slash> characters names an existing directory or a directory entry that is to be created for a directory immediately after the pathname is resolved. Interfaces using pathname resolution may specify additional constraints[1] when a pathname that does not name an existing directory contains at least one non- <slash> character and contains one or more trailing <slash> characters.
>
> If a symbolic link is encountered during pathname resolution, the behavior shall depend on whether the pathname component is at the end of the pathname and on the function being performed. If all of the following are true, then pathname resolution is complete:
>
> 1. This is the last pathname component of the pathname.
> 2. The pathname has no trailing <slash>.
> 3. The function is required to act on the symbolic link itself, or certain arguments direct that the function act on the symbolic link itself.
>
> In all other cases, the system shall prefix the remaining pathname, if any, with the contents of the symbolic link. [...]

The following sentence appeared in an earlier version of POSIX (http://pubs.opengroup.org/onlinepubs/009604499/basedefs/xbd_chap04.html#tag_04_11) but has since been removed:

> A pathname that contains at least one non-slash character and that ends with one or more trailing slashes shall be resolved as if a single dot character ( '.' ) were appended to the pathname.

Is this important enough to preserve trailing slashes?

- Isaac Schwabacher
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Antoine Pitrou

unread,
Aug 6, 2014, 8:13:27 PM8/6/14
to pytho...@python.org

Le 06/08/2014 18:36, Isaac Schwabacher a écrit :
>>
>> If a symbolic link is encountered during pathname resolution, the
>> behavior shall depend on whether the pathname component is at the
>> end of the pathname and on the function being performed. If all of
>> the following are true, then pathname resolution is complete:
>>
>> 1. This is the last pathname component of the pathname. 2. The
>> pathname has no trailing <slash>. 3. The function is required to
>> act on the symbolic link itself, or certain arguments direct that
>> the function act on the symbolic link itself.
>>
>> In all other cases, the system shall prefix the remaining pathname,
>> if any, with the contents of the symbolic link. [...]

So the only case where this would make a difference is when calling a
"function acting on the symbolic link itself" (such as lstat() or
unlink()) on a path with a trailing slash:

>>> os.lstat('foo')
os.stat_result(st_mode=41471, st_ino=1981954, st_dev=2050, st_nlink=1,
st_uid=1000, st_gid=1000, st_size=4, st_atime=1407370025,
st_mtime=1407370025, st_ctime=1407370025)
>>> os.lstat('foo/')
os.stat_result(st_mode=17407, st_ino=917505, st_dev=2050, st_nlink=7,
st_uid=0, st_gid=0, st_size=4096, st_atime=1407367916,
st_mtime=1407369857, st_ctime=1407369857)

>>> pathlib.Path('foo').lstat()
os.stat_result(st_mode=41471, st_ino=1981954, st_dev=2050, st_nlink=1,
st_uid=1000, st_gid=1000, st_size=4, st_atime=1407370037,
st_mtime=1407370025, st_ctime=1407370025)
>>> pathlib.Path('foo/').lstat()
os.stat_result(st_mode=41471, st_ino=1981954, st_dev=2050, st_nlink=1,
st_uid=1000, st_gid=1000, st_size=4, st_atime=1407370037,
st_mtime=1407370025, st_ctime=1407370025)

But you can also call resolve() explicitly if you want to act on the
link target rather than the link itself:

>>> pathlib.Path('foo/').resolve().lstat()
os.stat_result(st_mode=17407, st_ino=917505, st_dev=2050, st_nlink=7,
st_uid=0, st_gid=0, st_size=4096, st_atime=1407367916,
st_mtime=1407369857, st_ctime=1407369857)

Am I overlooking other cases?

Regards

Antoine.

Alexander Belopolsky

unread,
Aug 6, 2014, 8:51:10 PM8/6/14
to Antoine Pitrou, Python-Dev

On Wed, Aug 6, 2014 at 8:11 PM, Antoine Pitrou <ant...@python.org> wrote:
Am I overlooking other cases?

There are many interfaces where trailing slash is significant.  For example, rsync uses trailing slash on the target directory to avoid creating an additional directory level at the destination.  Loosing it when passing path strings through pathlib.Path() may be a source of bugs.

Antoine Pitrou

unread,
Aug 6, 2014, 9:57:11 PM8/6/14
to pytho...@python.org

pathlib is generally concerned with filesystem operations written in
Python, not arbitrary third-party tools. Also it is probably easy to
append the trailing slash in your command-line invocation, if so desired.

Ben Finney

unread,
Aug 6, 2014, 10:13:40 PM8/6/14
to pytho...@python.org
Antoine Pitrou <ant...@python.org> writes:

> Le 06/08/2014 20:50, Alexander Belopolsky a écrit :

> > There are many interfaces where trailing slash is significant. […]


> > Loosing it when passing path strings through pathlib.Path() may be a
> > source of bugs.
>
> pathlib is generally concerned with filesystem operations written in
> Python, not arbitrary third-party tools.

The operating system shell is more than an “arbitrary third-party tool”,
though; it preserves paths, and handles invoking commands.

You seem to be saying that ‘pathlib’ is not intended to be helpful for
constructing a shell command. Will its documentation warn that is so?

> Also it is probably easy to append the trailing slash in your
> command-line invocation, if so desired.

The trouble is that one can desire it, and construct a path knowing that
the presence or absence of a trailing slash has semantic significance;
and then have it unaccountably altered by the pathlib.Path code. This is
worse than preserving the semantic value.

--
\ “But Marge, what if we chose the wrong religion? Each week we |
`\ just make God madder and madder.” —Homer, _The Simpsons_ |
_o__) |
Ben Finney

Antoine Pitrou

unread,
Aug 6, 2014, 10:32:41 PM8/6/14
to pytho...@python.org

Le 06/08/2014 22:12, Ben Finney a écrit :
> You seem to be saying that ‘pathlib’ is not intended to be helpful for
> constructing a shell command.

pathlib lets you do operations on paths. It also gives you a string
representation of the path that's expected to designate that path when
talking to operating system APIs. It doesn't give you the possibility to
store other semantic variations ("whether a new directory level must be
created"); that's up to you to add those.

(similarly, it doesn't have separate classes to represent "a file", "a
directory", "a non-existing file", etc.)

Regards

Antoine.

Guido van Rossum

unread,
Aug 7, 2014, 11:09:18 AM8/7/14
to Antoine Pitrou, Python-Dev
Hm. I personally consider a trailing slash significant. It feels semantically different (and in some cases it is) so I don't think it should be normalized. The behavior of os.path.split() here feels right.


Paul Moore

unread,
Aug 8, 2014, 8:28:23 AM8/8/14
to Antoine Pitrou, Python Dev
On 7 August 2014 02:55, Antoine Pitrou <ant...@python.org> wrote:
> pathlib is generally concerned with filesystem operations written in Python,
> not arbitrary third-party tools. Also it is probably easy to append the
> trailing slash in your command-line invocation, if so desired.

I had a use case where I wanted to allow a config file to contain
"path: foo" to create a file called foo, and "path: foo/" to create a
directory. It was a shortcut for specifying an explicit "directory:
true" parameter as well.

The fact that pathlib stripped the slash made coding this mildly
tricky (especially as I wanted to cater for Windows users writing
"foo\\"...) It's not a showstopper, but I agree that semantically,
being able to distinguish whether an input had a trailing slash is
sometimes useful.

Paul

Alexander Belopolsky

unread,
Aug 8, 2014, 9:40:47 AM8/8/14
to Paul Moore, Antoine Pitrou, Python Dev

On Fri, Aug 8, 2014 at 8:27 AM, Paul Moore <p.f....@gmail.com> wrote:
I had a use case where I wanted to allow a config file to contain
"path: foo" to create a file called foo, and "path: foo/" to create a
directory. It was a shortcut for specifying an explicit "directory:
true" parameter as well.

Here is my use case: I have a database application that can save a table in a variety of formats based on the supplied file name.  For example, save('t.csv', t) saves in CSV text format while save('t', t)  saves in the default binary format.  In addition, it supports "splayed" format where a table is saved in multiple files across a directory - one file per column.  The native database save function chooses this format when file name ends with a slash: save('t/', t).   I would like to make the save() function in Python that works like this, but takes pathlib.Path instances instead of str, but in the current version, I cannot supply 't/' as a Path instance.
Reply all
Reply to author
Forward
0 new messages