On 17 Jun 2018, at 14:02, Stephen J. Turnbull <turnbull....@u.tsukuba.ac.jp> wrote:Folks. There are standards. "1252" *is not* an alias for
"windows-1252" according to the IANA, while "866" *is* an alias for
"IBM866" according to the same authority. Most 3-digit "IBMxxx" ARE
aliased to both "cpxxx" and just "xxx", but not all. None of
"IBM874", "874", or "cp874" exists according to the IANA.
That doesn't mean that the bug is best fixed by adding an alias.
If the error was failing to find encoding "ltain-1", would we add an
alias or fix the spelling? If 874 is not an official alias, we should
consider it a misspelling and fix the misspelling, not add an alias.
But either way, the point Stephen is making is that even if 874 is a
legitimate alias, that shouldn't give us carte blanche to add numeric
aliases for every encoding.
> On 18 Jun 2018, at 02:34, Steven D'Aprano <st...@pearwood.info> wrote:
>
>> Sure, but for at least one user Python 3.6 fails to start because
>> initialising the sys.std* streams fails due to not finding a “874”
>> encoding.
>
> That doesn't mean that the bug is best fixed by adding an alias.
I agree, I’ve mentioned in the issue that I’d like to understand why python looks for an encoding with this name.
>
> If the error was failing to find encoding "ltain-1", would we add an
> alias or fix the spelling? If 874 is not an official alias, we should
> consider it a misspelling and fix the misspelling, not add an alias.
That depends, if a major platform ships with locales where the encoding is misspelled we have little choice but to add an alias. To state it too blunt: standards are fine until they conflict with reality.
>
> But either way, the point Stephen is making is that even if 874 is a
> legitimate alias, that shouldn't give us carte blanche to add numeric
> aliases for every encoding.
Possibly just for the “cp…” encodings, but IMHO only if we confirm that the code to look for the preferred encoding returns a codepage number on Windows and changing that code leads to worse results than adding numeric aliases for the “cp…” encodings.
Ronald
> Possibly just for the “cp…” encodings, but IMHO only if we confirm
> that the code to look for the preferred encoding returns a codepage
> number on Windows and changing that code leads to worse results
> than adding numeric aliases for the “cp…” encodings.
Almost all of the CPxxx encodings have multiple aliases[1], so I just
don't see the point unless numeric-only code page designations are
baked in to default "locales"[2] in official releases by major OS
vendors. And probably not even then, since it should be easy enough
to provide a proper "locale" and/or PYTHONIOENCODING setting.
Of course we should help the reporter figure out what's going on and
help them fix it with appropriate system configuration. If that
doesn't work, then (and *only then*) we could think about doing a
stupid thing.
Footnotes:
[1] Granted, "874" only has "windows-874" registered with the IANA,
so it's kind of salient. Still, if numeric-only aliases were a
"thing", surely we'd have heard about it by now---I first encountered
Thai encodings in 1990 (ok, that was TIS 620, but windows-874 is
basically TIS plus Microsoft punctuation extensions IIRC), Thais do
use computers in their native language a lot.
[2] Scare quotes to refer to appropriate platform facilities, as
neither Windows nor Mac OS is strictly conformant to POSIX on this.
On 21 Jun 2018, at 09:17, Stephen J. Turnbull <turnbull....@u.tsukuba.ac.jp> wrote:
Ronald Oussoren writes:Possibly just for the “cp…” encodings, but IMHO only if we confirm
that the code to look for the preferred encoding returns a codepage
number on Windows and changing that code leads to worse results
than adding numeric aliases for the “cp…” encodings.
Almost all of the CPxxx encodings have multiple aliases[1], so I just
don't see the point unless numeric-only code page designations are
baked in to default "locales"[2] in official releases by major OS
vendors. And probably not even then, since it should be easy enough
to provide a proper "locale" and/or PYTHONIOENCODING setting.
Of course we should help the reporter figure out what's going on and
help them fix it with appropriate system configuration. If that
doesn't work, then (and *only then*) we could think about doing a
stupid thing.
> The user shouldn’t have to do anything other than install Python. IMHO
> were doing something wrong when the python interpreter doesn’t start up
> with a default system configuration
There's no evidence in the issue that I can see that suggests that the
user installed Python into the default system configuration. I see a
bunch of Python developers who have no access to the OP's system
configuration demonstrating that something that shouldn't work and never
has worked doesn't work, then providing a patch to make it work. This
despite the fact that the OP hasn't provided any configuration details
that would confirm this is a system default setting.
I wouldn't object to making it work if there were any evidence that it
is a real problem that other users will encounter. But there isn't any
such evidence yet, it's a non-standard alias according to Microsoft's
own IANA registration, and Steven d'Aprano's argument that such aliases
may be ambiguous is plausible, though I haven't seen confirmation it
would be problem in practice.
> (when the user explicitly sets a bogus PYTHONIOENCODING or locale all
> bets are off,
I'm assuming that is the case, based on the fact that none of my two
;-) Thai students ever had this problem, nor have I seen a report of
this problem for any encoding in either Emacs or Python contexts since
about 1990, nor has the OP posted anything about his/her
configuration.
> although even then warning about and then ignoring bad settings
> would be more userfriendly than the current behavior)
If Python is told to talk YTREWQ and it doesn't know how to talk YTREWQ,
ignoring the problem is not possible if any input or output in YTREWQ is
required. The program will crash with a much harder to understand error
message describing "undecodable input" in an encoding the user doesn't
expect. My own experience is that soldiering on is the least user-
friendly thing to do, as typically there's a trivial change that the
user can make to resolve the problem optimally.
The obvious thing to do is to fall back to ASCII, which almost certainly
is compatible with the terminal, the log files, and the user's eyes and
brain, emit a warning, and quit. That is what we do. The warning seems
OK: the OP also diagnosed the missing alias, likely with little trouble.
Steve