[Python-Dev] proposed os.fspath() change

90 views
Skip to first unread message

Ethan Furman

unread,
Jun 15, 2016, 12:13:09 PM6/15/16
to Python Dev
I would like to make a change to os.fspath().

Specifically, os.fspath() currently raises an exception if something
besides str, bytes, or os.PathLike is passed in, but makes no checks
if an os.PathLike object returns something besides a str or bytes.

I would like to change that to the opposite: if a non-os.PathLike is
passed in, return it unchanged (so no change for str and bytes); if
an os.PathLike object returns something that is not a str nor bytes,
raise.

An example of the difference in the lzma file:

Current code (has not been upgraded to use os.fspath() yet)
-----------------------------------------------------------

if isinstance(filename, (str, bytes)):
if "b" not in mode:
mode += "b"
self._fp = builtins.open(filename, mode)
self._closefp = True
self._mode = mode_code
elif hasattr(filename, "read") or hasattr(filename, "write"):
self._fp = filename
self._mode = mode_code
else:
raise TypeError(
"filename must be a str or bytes object, or a file"
)

Code change if using upgraded os.fspath() (placed before above stanza):

filename = os.fspath(filename)

Code change with current os.fspath() (ditto):

if isinstance(filename, os.PathLike):
filename = os.fspath(filename)

My intention with the os.fspath() function was to minimize boiler-plate
code and make PathLike objects easy and painless to support; having to
discover if any given parameter is PathLike before calling os.fspath()
on it is, IMHO, just the opposite.

There is also precedent for having a __dunder__ check the return type:

--> class Huh:
... def __int__(self):
... return 'string'
... def __index__(self):
... return b'bytestring'
... def __bool__(self):
... return 'true-ish'
...
--> h = Huh()

--> int(h)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __int__ returned non-int (type str)

--> ''[h]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __index__ returned non-int (type bytes)

--> bool(h)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __bool__ should return bool, returned str

Arguments in favor or against?

--
~Ethan~
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Guido van Rossum

unread,
Jun 15, 2016, 12:48:50 PM6/15/16
to Ethan Furman, Python Dev
These are really two separate proposals.

I'm okay with checking the return value of calling obj.__fspath__; that's an error in the object anyways, and it doesn't matter much whether we do this or not (though when approving the PEP I considered this and decided not to insert a check for this). But it doesn't affect your example, does it? I guess it's easier to raise now and change the API in the future to avoid raising in this case (if we find that raising is undesirable) than the other way around, so I'm +0 on this.

The other proposal (passing anything that's not understood right through) is more interesting and your use case is somewhat compelling. Catching the exception coming out of os.fspath() would certainly be much messier. The question remaining is whether, when this behavior is not desired (e.g. when the caller of os.fspath() just wants a string that it can pass to open()), the condition of passing that's neither a string not supports __fspath__ still produces an understandable error. I'm not sure that that's the case. E.g. open() accepts file descriptors in addition to paths, but I'm not sure that accepting an integer is a good idea in most cases -- it either gives a mystery "Bad file descriptor" error or starts reading/writing some random system file, which it then closes once the stream is closed.

Brett Cannon

unread,
Jun 15, 2016, 2:02:03 PM6/15/16
to gu...@python.org, Ethan Furman, Python Dev
On Wed, 15 Jun 2016 at 09:48 Guido van Rossum <gu...@python.org> wrote:
These are really two separate proposals.

I'm okay with checking the return value of calling obj.__fspath__; that's an error in the object anyways, and it doesn't matter much whether we do this or not (though when approving the PEP I considered this and decided not to insert a check for this). But it doesn't affect your example, does it? I guess it's easier to raise now and change the API in the future to avoid raising in this case (if we find that raising is undesirable) than the other way around, so I'm +0 on this.

+0 from me as well. I know in some code in the stdlib that has been ported which prior to adding support was explicitly checking for str/bytes this will eliminate its own checking (obviously not a motivating factor as it's pretty minor).
 

The other proposal (passing anything that's not understood right through) is more interesting and your use case is somewhat compelling. Catching the exception coming out of os.fspath() would certainly be much messier. The question remaining is whether, when this behavior is not desired (e.g. when the caller of os.fspath() just wants a string that it can pass to open()), the condition of passing that's neither a string not supports __fspath__ still produces an understandable error. I'm not sure that that's the case. E.g. open() accepts file descriptors in addition to paths, but I'm not sure that accepting an integer is a good idea in most cases -- it either gives a mystery "Bad file descriptor" error or starts reading/writing some random system file, which it then closes once the stream is closed.

The FD issue of magically passing through an int was also a concern when Ethan brought this up in an issue on the tracker. My argument is that FDs are not file paths and so shouldn't magically pass through if we're going to type-check anything or claim os.fspath() only works with paths (FDs are already open file objects). So in my view  either we go ahead and type-check the return value of __fspath__() and thus restrict everything coming out of os.fspath() to Union[str, bytes] or we don't type check anything and be consistent that os.fspath() simply does is call __fspath__() if present.

And just  because I'm thinking about it, I would special-case the FDs, not os.PathLike (clearer why you care and faster as it skips the override of __subclasshook__):

# Can be a single-line ternary operator if preferred.
if not isinstance(filename, int):
    filename = os.fspath(filename)

Koos Zevenhoven

unread,
Jun 15, 2016, 2:08:16 PM6/15/16
to Guido van Rossum, Python Dev
My proposal at the point of the first PEP draft solved both of these issues.

That version of the fspath function passed anything right through that
was an instance of the keyword-only `type_constraint`. If not, it
would ask __fspath__, and before returning the result, it would check
that __fspath__ returned an instance of `type_constraint` and
otherwise raise a TypeError. `type_constraint=object` would then have
given the behavior you want. I always wanted fspath to spare the
caller from all the instance checking (most of which it does even
now).

The main problem with setting type_constraint to something broader
than (str, bytes) is that then that parameter would affect the return
type of the function, which would at least complicate the type hinting
issue. Mypy might now support things like

@overload
def fspath(path: T, type_constraint: Type[T] = (str, bytes)) -> T: ...

but then again, isinstance and Union are not compatible (for a
reason?), and PEP484 for a reason does not allow tuples like (str,
bytes) in place of Unions.

Anyway, if we were to go back to this behavior, we would need to
decide whether to officially allow a wider type constraint or whether
to leave that to Stack Overflow, so to speak.

-- Koos
> https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com
>



--
--
+ Koos Zevenhoven + http://twitter.com/k7hoven +
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Nick Coghlan

unread,
Jun 15, 2016, 2:31:34 PM6/15/16
to Brett Cannon, Python Dev
On 15 June 2016 at 10:59, Brett Cannon <br...@python.org> wrote:
>
>
> On Wed, 15 Jun 2016 at 09:48 Guido van Rossum <gu...@python.org> wrote:
>>
>> These are really two separate proposals.
>>
>> I'm okay with checking the return value of calling obj.__fspath__; that's
>> an error in the object anyways, and it doesn't matter much whether we do
>> this or not (though when approving the PEP I considered this and decided not
>> to insert a check for this). But it doesn't affect your example, does it? I
>> guess it's easier to raise now and change the API in the future to avoid
>> raising in this case (if we find that raising is undesirable) than the other
>> way around, so I'm +0 on this.
>
> +0 from me as well. I know in some code in the stdlib that has been ported
> which prior to adding support was explicitly checking for str/bytes this
> will eliminate its own checking (obviously not a motivating factor as it's
> pretty minor).

I'd like a strong assertion that the return value of os.fspath() is a
plausible filesystem path representation (so either bytes or str), and
*not* some other kind of object that can also be used for accessing
the filesystem (like a file descriptor or an IO stream)
Note that the LZMA case Ethan cites is one where the code accepts
either an already opened file-like object *or* a path-like object, and
does different things based on which it receives.

In that scenario, rather than introducing an unconditional "filename =
os.fspath(filename)" before the current logic, it makes more sense to
me to change the current logic to use the new protocol check rather
than a strict typecheck on str/bytes:

if isinstance(filename, os.PathLike): # Changed line
filename = os.fspath(filename) # New line
if "b" not in mode:
mode += "b"
self._fp = builtins.open(filename, mode)
self._closefp = True
self._mode = mode_code
elif hasattr(filename, "read") or hasattr(filename, "write"):
self._fp = filename
self._mode = mode_code
else:
raise TypeError(
"filename must be a path-like or file-like object"
)

I *don't* think it makes sense to weaken the guarantees on os.fspath
to let it propagate non-path-like objects.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Guido van Rossum

unread,
Jun 15, 2016, 2:41:30 PM6/15/16
to Nick Coghlan, Python-Dev

OK, so let's add a check on the return of __fspath__() and keep the check on path-like or string/bytes.

--Guido (mobile)

Brett Cannon

unread,
Jun 15, 2016, 2:46:09 PM6/15/16
to gu...@python.org, Nick Coghlan, Python-Dev
On Wed, 15 Jun 2016 at 11:39 Guido van Rossum <gu...@python.org> wrote:

OK, so let's add a check on the return of __fspath__() and keep the check on path-like or string/bytes.


I'll update the PEP.

Ethan, do you want to leave a note on the os.fspath() issue to update the code and go through where we've used os.fspath() to see where we can cut out redundant type checks?

Ethan Furman

unread,
Jun 15, 2016, 2:49:14 PM6/15/16
to Python Dev
On 06/15/2016 10:59 AM, Brett Cannon wrote:
> On Wed, 15 Jun 2016 at 09:48 Guido van Rossum wrote:

>> These are really two separate proposals.
>>
>> I'm okay with checking the return value of calling obj.__fspath__;
>> that's an error in the object anyways, and it doesn't matter much
>> whether we do this or not (though when approving the PEP I
>> considered this and decided not to insert a check for this). But it
>> doesn't affect your example, does it? I guess it's easier to raise
>> now and change the API in the future to avoid raising in this case
>> (if we find that raising is undesirable) than the other way around,
>> so I'm +0 on this.
>
> +0 from me as well. I know in some code in the stdlib that has been
> ported which prior to adding support was explicitly checking for
> str/bytes this will eliminate its own checking (obviously not a
> motivating factor as it's pretty minor).

If we accept both parts of this proposal the checking will have to stay
in place as the original argument may not have been bytes, str, nor
os.PathLike.

>> The other proposal (passing anything that's not understood right
>> through) is more interesting and your use case is somewhat
>> compelling. Catching the exception coming out of os.fspath() would
>> certainly be much messier. The question remaining is whether, when
>> this behavior is not desired (e.g. when the caller of os.fspath()
>> just wants a string that it can pass to open()), the condition of
>> passing that's neither a string not supports __fspath__ still
>> produces an understandable error.

This is no different than before os.fspath() existed -- if the function
wasn't checking that the "filename" was a str but just used it as-is,
then whatever strange, possibly-hard-to-debug error they would get now
is the same as what they would have gotten before.

>> I'm not sure that that's the case.
>> E.g. open() accepts file descriptors in addition to paths, but I'm
>> not sure that accepting an integer is a good idea in most cases --
>> it either gives a mystery "Bad file descriptor" error or starts
>> reading/writing some random system file, which it then closes once
>> the stream is closed.

My vision of os.fspath() is simply to reduce rich-path objects to their
component str or bytes representation, and pass anything else through.

The advantage:

- if os.open accepts str/bytes/fd it can prep the argument by
calling os.fspath() and then do it's argument checking all
in one place;

- if lzma accepts bytes/str/filelike-obj it can prep its argument
by calling os.fspath() and then do it's argument checking all in
one place

- if Path accepts str/os.PathLike it can prep it's argument(s)
with os.fspath() and then do its argument checking all in one
place.

> The FD issue of magically passing through an int was also a concern when
> Ethan brought this up in an issue on the tracker. My argument is that
> FDs are not file paths and so shouldn't magically pass through if we're
> going to type-check anything or claim os.fspath() only works with paths
> (FDs are already open file objects). So in my view either we go ahead
> and type-check the return value of __fspath__() and thus restrict
> everything coming out of os.fspath() to Union[str, bytes] or we don't
> type check anything and be consistent that os.fspath() simply does is
> call __fspath__() if present.

This is better than what os.fspath() currently does as it has all the
advantages listed above, but why is checking the output of __fspath__
incompatible with not checking anything else?

> And just because I'm thinking about it, I would special-case the FDs,
> not os.PathLike (clearer why you care and faster as it skips the
> override of __subclasshook__):
>
> # Can be a single-line ternary operator if preferred.
> if not isinstance(filename, int):
> filename = os.fspath(filename)

That example will not do the right thing in the lzma case.

--
~Ethan~
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Koos Zevenhoven

unread,
Jun 15, 2016, 2:52:52 PM6/15/16
to Nick Coghlan, Python Dev
On Wed, Jun 15, 2016 at 9:29 PM, Nick Coghlan <ncog...@gmail.com> wrote:
> On 15 June 2016 at 10:59, Brett Cannon <br...@python.org> wrote:
>>
>>
>> On Wed, 15 Jun 2016 at 09:48 Guido van Rossum <gu...@python.org> wrote:
>>>
>>> These are really two separate proposals.
>>>
>>> I'm okay with checking the return value of calling obj.__fspath__; that's
>>> an error in the object anyways, and it doesn't matter much whether we do
>>> this or not (though when approving the PEP I considered this and decided not
>>> to insert a check for this). But it doesn't affect your example, does it? I
>>> guess it's easier to raise now and change the API in the future to avoid
>>> raising in this case (if we find that raising is undesirable) than the other
>>> way around, so I'm +0 on this.
>>
>> +0 from me as well. I know in some code in the stdlib that has been ported
>> which prior to adding support was explicitly checking for str/bytes this
>> will eliminate its own checking (obviously not a motivating factor as it's
>> pretty minor).
>
> I'd like a strong assertion that the return value of os.fspath() is a
> plausible filesystem path representation (so either bytes or str), and
> *not* some other kind of object that can also be used for accessing
> the filesystem (like a file descriptor or an IO stream)

I agree, so I'm -0.5 on passing through any object (at least by default).
You are making one of my earlier points here, thanks ;). The point is
that the name PathLike sounds like it would mean anything path-like,
except that os.PathLike does not include str and bytes. And I still
think the naming should be a little different.

So that would be (os.Pathlike, str, bytes) instead of just os.PathLike.

> if "b" not in mode:
> mode += "b"
> self._fp = builtins.open(filename, mode)
> self._closefp = True
> self._mode = mode_code
> elif hasattr(filename, "read") or hasattr(filename, "write"):
> self._fp = filename
> self._mode = mode_code
> else:
> raise TypeError(
> "filename must be a path-like or file-like object"
> )
>
> I *don't* think it makes sense to weaken the guarantees on os.fspath
> to let it propagate non-path-like objects.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com

--
+ Koos Zevenhoven + http://twitter.com/k7hoven +

Ethan Furman

unread,
Jun 15, 2016, 3:10:20 PM6/15/16
to Python-Dev
On 06/15/2016 11:44 AM, Brett Cannon wrote:
> On Wed, 15 Jun 2016 at 11:39 Guido van Rossum wrote:

>> OK, so let's add a check on the return of __fspath__() and keep the
>> check on path-like or string/bytes.
>
> I'll update the PEP.
>
> Ethan, do you want to leave a note on the os.fspath() issue to update
> the code and go through where we've used os.fspath() to see where we can
> cut out redundant type checks?

Will do.

I didn't see this subthread before my last post, so unless you agree
with those other changes feel free to ignore it. ;)

--
~Ethan~

Koos Zevenhoven

unread,
Jun 15, 2016, 3:13:09 PM6/15/16
to Guido van Rossum, Python Dev
>> if isinstance(filename, os.PathLike):

By the way, regarding the line of code above, is there a convention
regarding whether implementing some protocol/interface requires
registering with (or inheriting from) the appropriate ABC for it to
work in all situations. IOW, in this case, is it sufficient to
implement __fspath__ to make your type pathlike? Is there a conscious
trend towards requiring the ABC?

-- Koos
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Brett Cannon

unread,
Jun 15, 2016, 3:16:58 PM6/15/16
to Koos Zevenhoven, Guido van Rossum, Python Dev
On Wed, 15 Jun 2016 at 12:12 Koos Zevenhoven <k7h...@gmail.com> wrote:
>>     if isinstance(filename, os.PathLike):

By the way, regarding the line of code above, is there a convention
regarding whether implementing some protocol/interface requires
registering with (or inheriting from) the appropriate ABC for it to
work in all situations. IOW, in this case, is it sufficient to
implement __fspath__ to make your type pathlike? Is there a conscious
trend towards requiring the ABC?

ABCs like os.PathLike can override __subclasshook__ so that registration isn't required (see https://hg.python.org/cpython/file/default/Lib/os.py#l1136). So registration is definitely good to do to be explicit that you're trying to meet an ABC, but it isn't strictly required.

Ethan Furman

unread,
Jun 15, 2016, 3:19:45 PM6/15/16
to Python Dev
On 06/15/2016 12:10 PM, Koos Zevenhoven wrote:
>>> if isinstance(filename, os.PathLike):
>
> By the way, regarding the line of code above, is there a convention
> regarding whether implementing some protocol/interface requires
> registering with (or inheriting from) the appropriate ABC for it to
> work in all situations. IOW, in this case, is it sufficient to
> implement __fspath__ to make your type pathlike? Is there a conscious
> trend towards requiring the ABC?

The ABC is not required, simply having the __fspath__ attribute is
enough. Of course, to actually work that attribute should be a function
that returns a str or bytes object. ;)

--
~Ethan~

Koos Zevenhoven

unread,
Jun 15, 2016, 3:26:40 PM6/15/16
to Brett Cannon, Python Dev
Ok I suppose that's fine, so I propose we update the ABC part in the
PEP with __subclasshook__.

And the other question could be turned into whether to make str and
bytes also PathLike in __subclasshook__.

-- Koos


--
+ Koos Zevenhoven + http://twitter.com/k7hoven +

Brett Cannon

unread,
Jun 15, 2016, 3:59:28 PM6/15/16
to gu...@python.org, Nick Coghlan, Python-Dev

Ethan Furman

unread,
Jun 15, 2016, 4:00:42 PM6/15/16
to pytho...@python.org
On 06/15/2016 12:24 PM, Koos Zevenhoven wrote:
> On Wed, Jun 15, 2016 at 10:15 PM, Brett Cannon wrote:

>> ABCs like os.PathLike can override __subclasshook__ so that registration
>> isn't required (see
>> https://hg.python.org/cpython/file/default/Lib/os.py#l1136). So registration
>> is definitely good to do to be explicit that you're trying to meet an ABC,
>> but it isn't strictly required.

> And the other question could be turned into whether to make str and
> bytes also PathLike in __subclasshook__.

No, for two reasons.

- most str's and bytes' are not paths;
- PathLike indicates a rich-path object, which str's and bytes' are not.

--
~Ethan~

Koos Zevenhoven

unread,
Jun 15, 2016, 5:00:14 PM6/15/16
to Ethan Furman, Python Dev
On Wed, Jun 15, 2016 at 11:00 PM, Ethan Furman <et...@stoneleaf.us> wrote:
> On 06/15/2016 12:24 PM, Koos Zevenhoven wrote:
>>
>> And the other question could be turned into whether to make str and
>> bytes also PathLike in __subclasshook__.
>
> No, for two reasons.
>
> - most str's and bytes' are not paths;

True. Well, at least most str and bytes objects are not *meant* to be
used as paths, even if they could be.

> - PathLike indicates a rich-path object, which str's and bytes' are not.

This does not count as a reason.

If this were called pathlib.PathABC, I would definitely agree [1]. But
since this is called os.PathLike, I'm not quite as sure. Anyway,
including str and bytes is more of a type hinting issue. And since
type hints will in also act as documentation, the naming of types is
becoming more important.

-- Koos

[1] No, I'm not proposing moving this to pathlib

> --
> ~Ethan~
> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com



--
+ Koos Zevenhoven + http://twitter.com/k7hoven +
Reply all
Reply to author
Forward
0 new messages