Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

filesystem encoding 'strict' on Windows

19 views
Skip to first unread message

iMath

unread,
Sep 30, 2016, 1:59:12 AM9/30/16
to
the doc of os.fsencode(filename) says Encode filename to the filesystem encoding 'strict' on Windows, what does 'strict' mean ?

eryk sun

unread,
Sep 30, 2016, 4:43:51 AM9/30/16
to
On Fri, Sep 30, 2016 at 5:58 AM, iMath <redsto...@163.com> wrote:
> the doc of os.fsencode(filename) says Encode filename to the filesystem encoding 'strict'
> on Windows, what does 'strict' mean ?

"strict" is the error handler for the encoding. It raises a
UnicodeEncodeError for unmapped characters. For example:

>>> 'αβψδ'.encode('mbcs', 'strict')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'mbcs' codec can't encode characters in
position 0--1: invalid character

On the other hand, the "replace" error handler is lossy. With the
Windows "mbcs" codec, it substitutes question marks and best-fit
mappings for characters that aren't defined in the system locale's
ANSI codepage (e.g. 1252). For example:

>>> print('αβψδ'.encode('mbcs', 'replace').decode('mbcs'))
aß?d

This is the behavior of os.listdir with bytes paths, which is why
using bytes paths has been deprecated on Windows since 3.3.

In 3.6 bytes paths are provisionally allowed again because the
filesystem encoding has changed to UTF-8 (internally transcoded to the
native UTF-16LE) and uses the "surrogatepass" error handler to allow
lone surrogate codes (allowed by Windows). See PEP 529 for more
information.
0 new messages