You don't need PEP 3131 to have visually confusable identifiers.
MyObject = My0bject = "many fonts use the same glyph for O and 0"
rn = m = 23 # try reading this in Ariel with a small font size
x += l
I don't think it's up to Python to protect you from arbitrarily poor choices
in identifiers and typefaces, or against obfuscated code (whether deliberately
so or by accident). Use of confusable identifiers is a code-quality issue,
little different from any other code-quality issue:
class myfunction:
def __init__(a, b, c, d, e, f, g, h, i, j, k, l):
a.b = b-e+k*h
a.a = i + 1j*j
a.l = ll + l1 + l
a.somebodytoldmeishouldusemoredesccriptivevaraiblenames = g+d
a.somebodytoldmeishouldusemoredesccribtivevaraiblenames = c+f
You surely wouldn't expect Python to protect you from ignorant or obnoxious
programmers who wrote code like that. I likewise don't think Python should
protect you from programmers who do things like this:
py> A = 42
py> Α = 23
py> A == Α
False
Besides, just because you and I can't distinguish A from Α in my editor,
using one particular choice of font, doesn't mean that the author or his
intended audience (Greek programmers perhaps?) can't distinguish them, using
their editor and a more suitable typeface. The two characters are distinct
using Courier or Lucinda Typewriter, to mention only two.
> Is the proposal mentioned in the PEP (to use something based on Unicode
> Technical Standard #39 [3]) something that might be implemented at any
> point?
> [3] http://unicode.org/reports/tr39/#Confusable_Detection
I would welcome "confusable detection" in the standard library, possibly a
string method "skeleton" or some other interface to the Confusables file,
perhaps in unicodedata. And I would encourage code checkers like PyFlakes,
PyLint, PyChecker to check for confusable identifiers. But I do not believe
that this should be built into the Python language itself.
--
Steven
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas
> py> A = 42
> py> Α = 23
> py> A == Α
> False
It will never be possible to catch all confusables, which is one
reason that the unicode property stalled.
It seems like it would be reasonable to at least warn when identifiers
are not all in the same script -- but real-world examples from Emacs
Lisp made it clear that this is often intentional. There were still
clear word-boundaries, but it wasn't clear how that word-boundary
detection could be properly automated in the general case.
> Besides, just because you and I can't distinguish A from Α in my editor,
> using one particular choice of font, doesn't mean that the author or his
> intended audience (Greek programmers perhaps?) can't distinguish them,
In many cases, it does -- for the letters to look different requires
an unnatural font choice, though perhaps not so extreme as the
print-the-hex-code font.
> I would welcome "confusable detection" in the standard library, possibly a
> string method "skeleton" or some other interface to the Confusables file,
> perhaps in unicodedata.
I would too, and agree that it shouldn't be limited to identifiers.
-jJ
http://www.python.org/dev/peps/pep-3131/#rationale
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
But the Python keywords and more importantly the documentation is English. Don't you need to be able
to speak/write English in order to code Python anyway? And if you keep you code+comments English you
can access a much larger developer pool (all developers who speak English should by my hypothesis be
a superset of all developers who speak a certain language).
Please; the PEP has been discussed quite a lot when it was proposed,
and believe me, yours is not an unfamiliar argument :) You're about
5 years late.
Georg
I have the impression that latin-1 chars were/are (unofficially)
accepted in Python2.
> But the Python keywords and more importantly the documentation is
> English.
I know of at least one translation
http://docs.python.org.ar/tutorial/contenido.html
though keeping up with changes is obvious a problem.
There are multiple books in multiple languages. When I went to a
bookstore in Japan, the program languages sections had about 8 for
Python. I suspect that is more than most equivalent US bookstores.
--
Terry Jan Reedy
I didn't want to start a discussion. I just wanted to know why one would implement such a language
feature. Guido's answer cleared it up for me, thanks. I can see the purpose in an educational
setting (not in production code of anything a little bit bigger).
-panzi
Well, in that case I would have said "read the PEP": I think it's well
explained there.
Georg
Those examples would be a lot more compelling if there was an
acceptable way to input those characters. Maybe we could support some
kind of input method that enabled LaTeX style math notation as used by
scientists for writing equations in papers?
--
--Guido van Rossum (python.org/~guido)
On Mon, Oct 1, 2012 at 1:26 PM, Oscar Benjamin
> Non-ascii identifiers have other possible uses. I'll repost the case
> that started this discussion on python-tutor (attached in case it
> doesn't display):
>
> '''
> #!/usr/bin/env python3
> # -*- encoding: utf-8 -*-
>
> # Parameters
> α = 1
> β = 0.1
> γ = 1.5
> δ = 0.075
>
> # Initial conditions
> xₒ = 10
> yₒ = 5
> Zₒ = xₒ, yₒ
>
Those examples would be a lot more compelling if there was an
acceptable way to input those characters. Maybe we could support some
kind of input method that enabled LaTeX style math notation as used by
scientists for writing equations in papers?
Very nice!
With the right editor, of course, it's not a problem :)
(Emacs has a TeX input method with which I could type this example without
problems.)
Georg
> Oh but why isn't it named Python für Kinder? :-)
It looks like Germans have adopted "kid" as an abbreviation
for "kinder", just like we use it as an abbreviation for
"child". Or maybe we got it from them -- it's closer to
their original word than ours!
They seem to be using our plural, though -- "kids", not
"kidden"...
--
Greg
Sometimes we use the ...s for plural as well, especially for acronyms, words of English or French
origin and last names. But it would not be ...en, maybe ...er. Is there any German word that uses
...en for plural? I don't think so. Anyway, "kids" is definitely an anglicism, because we pronounce
it "English" and not like it would be pronounced if it where derived from "Kind" (it would be more
like "keed"). German today is full of anglicisms.
But then, there are some German words used by English people as well: gesundheit, kindergarten,
über, blitz(krieg), angst (used as something different as the German word), abseiling ("abseilen" in
German), doppelgänger, gestalt, poltergeist, Zeitgeist...
> I still don't understand why unicode characters are allowed at all
> in identifier names.
"Consenting adults." 'nuff said?
An anecdote. Back when I was first learning Japanese, I maintained an
Emacs interface to EDICT, a free Japanese-English dictionary. The
code was smart enough to parse morphosyntax (inflection of verbs and
adjectives) into dictionary forms, but I wasn't (and according to my
daughter, still am not<wink/>). So I asked my tutor for help.
Although a total non-programmer, he was able to read the grammar
easily because the state names (identifiers for callable objects) were
written in Japanese, using the standard grammatical name for the
inflection. The "easy" part comes in because although his English was
good, it wasn't good enough to disentangle Lisp gobbledygook from the
morphosyntax data had it been written in ASCII. But he was able to
read and comment on the whole grammar in about half an hour because he
could just skip *all* the ASCII!
> On 2012-10-01 21:51, Guido van Rossum wrote:
> > Those examples would be a lot more compelling if there was an
> > acceptable way to input those characters. Maybe we could support
> > some kind of input method that enabled LaTeX style math notation as
> > used by scientists for writing equations in papers?
>
> I think that's up to the OS or the text editor.
Agreed. Make of these identifiers will need to be typed at an OS command
line, after all (e.g. for naming a test case to run, as one which
springs easily to mind).
Solve the keyboard input problem in the OS layer – as someone who
anticipates working with non-ASCII characters must already do – and you
solve it for Python code as well. I don't think it's Python's business
to get involved at the input method level.
--
\ “The apparent lesson of the Inquisition is that insistence on |
`\ uniformity of belief is fatal to intellectual, moral, and |
_o__) spiritual health.” —_The Uses Of The Past_, Herbert J. Muller |
Ben Finney
This page seems to think that some do:
http://german.about.com/od/grammar/a/PluralNounsWithnENEndings.htm
--
Greg
> Solve the keyboard input problem in the OS layer – as someone who
> anticipates working with non-ASCII characters must already do – and you
> solve it for Python code as well.
That simply isn't true for symbol characters and Greek letters. I
still let either TeX or XEmacs translate TeX macros for me. I don't
even know how to type an integral sign in Mac OS X Terminal
(conveniently, that is -- of course there's always the character
palette), and if I wanted directed quotation marks (I don't), I'd just
use ASCII quotes and let XEmacs translate those, too.
There ought to be a standard way to get those symbols and punctuation,
preferably ASCII-based, on any terminal, using the standard Python
interpreter.
> I still let either TeX or XEmacs translate TeX macros for me. I don't
> even know how to type an integral sign in Mac OS X Terminal
> (conveniently, that is -- of course there's always the character
> palette), and if I wanted directed quotation marks (I don't), I'd just
> use ASCII quotes and let XEmacs translate those, too.
Right. So you've solved it for one program only, not the OS which is (or
should be) responsible for turning what you type into characters,
uniformly across all applications you have keyboard input for.
> There ought to be a standard way to get those symbols and punctuation,
> preferably ASCII-based, on any terminal
Definitely agreed with this. Indeed, it's my point: the problem should
be solved in one place for the user of the computer, not separately per
application or framework.
> using the standard Python interpreter.
If you mean that the Python interpreter should be aware of the solution,
why? That's solving it at the wrong level, because any non-Python
program (such as a shell or an editor) gets no benefit from that.
If you mean that the single, one-point solution should work across all
programs, including the standard Python interpreter, then yes I agree.
I'm saying the OS is the right place to solve it, by installing an
appropriate input method (or whatever each OS calls them).
--
\ “In economics, hope and faith coexist with great scientific |
`\ pretension and also a deep desire for respectability.” —John |
_o__) Kenneth Galbraith, 1970-06-07 |
Ben Finney
> Ben Finney writes:
> > Right. So you've solved it for one program only, not the OS
>
> You seem to be under a misconception. Emacs *is* an OS […]
… all it needs is a good editor? :-)
(I'm claiming permission for that snark because Emacs is my primary
editor.)
> > I'm saying the OS is the right place to solve it, by installing an
> > appropriate input method (or whatever each OS calls them).
>
> I doubt very many people used to and fond of LaTeX would agree with
> you, since AFAIK there aren't any OSes providing TeX macros as an
> input method.
I've shown several LaTeX-comfortable people IBus on GNOME and/or KDE
(for GNU+Linux), and they were very glad that it has a LaTeX input
method. So anyone who is fond of LaTeX and has IBus or an equivalent
input method engine on their OS can agree.
> AFAICS it's not available on my Mac.
That's a shame. Maybe some OS vendors don't want to support users
extending the OS functionality? Or maybe your OS does have such a thing
available. I haven't been motivated to look for it.
> While I don't particularly favor it, it may be the best compromise, as
> many people are familiar with it, and many many symbols are available
> with familiar, intuitive names so that non-TeXnical typists can often
> guess them.
Agreed. Which is why I advocate installing such an input method in one's
OS input method engine, so that input method is available for all
applications.
--
\ “I thought I'd begin by reading a poem by Shakespeare, but then |
`\ I thought ‘Why should I? He never reads any of mine.’” —Spike |
_o__) Milligan |
Ben Finney