Changing default encoding on Windows

102 views
Skip to first unread message

Bruce Dawson

unread,
Feb 23, 2023, 9:20:55 PM2/23/23
to python
On Windows Python uses the system encoding (usually cp1252, but apparently whatever the user has set) which leads to occasional failures, most recently here. Meanwhile Linux uses utf-8.

I updated the Python coding guidelines but that scales badly. PEP 686 will make utf-8 the default in Python 3.15, so I think a smarter plan would be to enable utf-8 by default now, thus fixing all of Chrome.

That is, instead of crrev.com/c/4289829 we could pass -X utf8 to Python3, or set PYTHONUTF8=1 in the environment. I have confirmed that these both work as desired.

Any thoughts or preferences?

I'm also not sure where we would have to hook this. I have confirmed that passing -X utf8 in python3.bat and vpython3.bat does the trick, but I can't tell if that is genius or terrible. Thoughts?

--
Bruce Dawson, he/him

Bruce Dawson

unread,
Feb 23, 2023, 9:30:57 PM2/23/23
to python
This clearly deserves a bug so I filed crbug.com/1418846. Discussion should probably happen there.
--
Bruce Dawson, he/him

Mike Frysinger

unread,
Feb 23, 2023, 11:49:42 PM2/23/23
to Bruce Dawson, python
On Thu, Feb 23, 2023 at 9:20 PM 'Bruce Dawson' via python <pyt...@chromium.org> wrote:
On Windows Python uses the system encoding (usually cp1252, but apparently whatever the user has set) which leads to occasional failures, most recently here. Meanwhile Linux uses utf-8.

this is incorrect.  Linux uses whatever the user uses.  it's just that nowadays most users use a UTF8 compatible locale. although that is certainly not 100%.

I updated the Python coding guidelines but that scales badly. PEP 686 will make utf-8 the default in Python 3.15, so I think a smarter plan would be to enable utf-8 by default now, thus fixing all of Chrome.

we're detecting this in CrOS via pylint plugin.  we require calls to open APIs to specify explicit encoding if the interface is text based.

That is, instead of crrev.com/c/4289829 we could pass -X utf8 to Python3, or set PYTHONUTF8=1 in the environment. I have confirmed that these both work as desired.

doesn't -X also suffer from a scaling problem ?  every explicit `python` & `vpython` invocation needs updating, as does the shebangs in scripts.

setting PYTHONUTF8 in the env also suffers the same issue.

this also doesn't work for scripts that are invoked directly (e.g. we run some 3rd party tool that runs a python script).

Any thoughts or preferences?

I'm also not sure where we would have to hook this. I have confirmed that passing -X utf8 in python3.bat and vpython3.bat does the trick, but I can't tell if that is genius or terrible. Thoughts?

if we were better about not allowing `python` or `python3` usage at all, we could have the `vpython3` wrapper always force the env var before invoking the downloaded python binary.
-mike

Bruce Dawson

unread,
Feb 24, 2023, 12:24:13 PM2/24/23
to Mike Frysinger, python
Ah - thanks for the clarification about the Linux behavior.

Can you share more information about the pylint plugin to detect this? I'd love to have a presubmit to warn on existing bad usage and error on new usage.

-X would scale nicely on Windows if we implemented it inside python3.bat and vpython3.bat, because those are supposed to be the choke points that all Python 3 execution goes through. That doesn't solve it for Linux, however Linux is not as serious a problem because a utf-8 OS seems to be more common. I thought that Linux didn't need solving at all, but I was wrong.
--
Bruce Dawson, he/him

Mike Frysinger

unread,
Feb 24, 2023, 3:27:29 PM2/24/23
to Bruce Dawson, python

Bruce Dawson

unread,
Feb 24, 2023, 4:18:58 PM2/24/23
to Mike Frysinger, python
I'd be interested in bringing that to Chromium's presubmit system. If we could opt-in one directory at a time that would be ideal, but either way it would be good.

Another strategy would be to get Chromium building with some unusual code page, and then once we're at that stage have a presubmit that catches any new uses of open without an encoding. That would focus the effort on the open calls that actually matter (most of them don't).

BTW, I think the utf8 checking CL that landed was actually crrev.com/c/2744859
--
Bruce Dawson, he/him

Bruce Dawson

unread,
Feb 24, 2023, 5:30:10 PM2/24/23
to python, Bruce Dawson, python, vap...@chromium.org
I decided to experiment with overriding the built-in open function to detect problematic calls, by inserting this code at the beginning of every PRESUBMIT.py script (easily done given how presubmit_support.py works):

import traceback
old_open = open
def open(file, mode='r', buffering=- 1, encoding=None, errors=None, newline=None, closefd=True, opener=None):
  if not 'b' in mode and encoding == None:
    print('No-encoding when opening %s with %s at:\n%s\n' % (file, mode, ''.join(traceback.format_stack(None, 8))))
  return old_open(file, mode, buffering, encoding, errors, newline, closefd, opener)

When reading files this could even be smart enough to only report a problem if the file being read contains non-ASCII characters. So far it's found three errors, including this one:

No-encoding when opening build/OWNERS.setnoparent with r at:
  File "c:\src\depot_tools\presubmit_support.py", line 2120, in main
    return DoPresubmitChecks(
  File "c:\src\depot_tools\presubmit_support.py", line 1817, in DoPresubmitChecks
    results += executer.ExecPresubmitScript(presubmit_script, filename)
  File "c:\src\depot_tools\presubmit_support.py", line 1586, in ExecPresubmitScript
    return self._execute_with_local_working_directory(script_text,
  File "c:\src\depot_tools\presubmit_support.py", line 1648, in _execute_with_local_working_directory
    self._run_check_function(function_name, context, sink,
  File "c:\src\depot_tools\presubmit_support.py", line 1687, in _run_check_function
    result = eval(function_name + '(*__args)', context)
  File "<string>", line 1, in <module>
  File "c:\src\chromium\src\PRESUBMIT.py", line 3807, in CheckSetNoParent
    with open(allowed_owners_files_file, 'r') as f:
  File "c:\src\chromium\src\PRESUBMIT.py", line 15, in open
    print('No-encoding when opening %s with %s at:\n%s\n' % (file, mode, ''.join(traceback.format_stack(None, 8))))

I'm not sure if there is any easy equivalent to this for scripts used during builds or tests.

Mike Frysinger

unread,
Feb 25, 2023, 10:21:04 AM2/25/23
to Bruce Dawson, python
the module we have in chromite is meant to not be tied to chromite, so it could be lifted out easily, but we haven't structured things so they could be used directly. something we could consider.
-mike 
Reply all
Reply to author
Forward
0 new messages