[Python-ideas] Visually confusable unicode characters in identifiers

Oscar Benjamin

unread,

Sep 30, 2012, 10:00:23 AM9/30/12

to python-ideas

Having just discovered that PEP 3131 [1] enables me to use greek letters to

represent variables in equations, it was pointed out to me that it also allows

visually confusable characters in identifiers [2].

When I previously read the PEP I thought that the normalisation process

resolved these issues but now I see that the PEP leaves it as an open problem.

I also previously thought that the PEP would be irrelevant if I was using

ascii-only code but now I can see that if a GREEK CAPITAL LETTER ALPHA can

sneak into my code (just like those pesky tab characters) I could still have a

visually undetectable bug.

An example to show how an issue could arise:

"""

#!/usr/bin/env python3

code = '''

{0} = 123

{1} = 456

print('"{0}" == "{1}":', "{0}" == "{1}")

print('{0} == {1}:', {0} == {1})

'''

def test_identifier(identifier1, identifier2):

exec(code.format(identifier1, identifier2))

test_identifier('\u212b', '\u00c5') # Different Angstrom code points

test_identifier('A', '\u0391') # LATIN/GREEK CAPITAL A/ALPHA

"""

When I run this I get:

$ ./test.py

"Å" == "Å": False

Å == Å: True

"A" == "Α": False

A == Α: False

Is the proposal mentioned in the PEP (to use something based on Unicode

Technical Standard #39 [3]) something that might be implemented at any point?

Oscar

References:

[1] http://www.python.org/dev/peps/pep-3131/#open-issues

[2] http://article.gmane.org/gmane.comp.python.tutor/78116

[3] http://unicode.org/reports/tr39/#Confusable_Detection

Steven D'Aprano

unread,

Sep 30, 2012, 11:10:18 AM9/30/12

to python...@python.org

On 01/10/12 00:00, Oscar Benjamin wrote:
> Having just discovered that PEP 3131 [1] enables me to use greek letters to
> represent variables in equations, it was pointed out to me that it also
> allows visually confusable characters in identifiers [2].

You don't need PEP 3131 to have visually confusable identifiers.

MyObject = My0bject = "many fonts use the same glyph for O and 0"

rn = m = 23 # try reading this in Ariel with a small font size

x += l

I don't think it's up to Python to protect you from arbitrarily poor choices
in identifiers and typefaces, or against obfuscated code (whether deliberately
so or by accident). Use of confusable identifiers is a code-quality issue,
little different from any other code-quality issue:

class myfunction:
def __init__(a, b, c, d, e, f, g, h, i, j, k, l):
a.b = b-e+k*h
a.a = i + 1j*j
a.l = ll + l1 + l
a.somebodytoldmeishouldusemoredesccriptivevaraiblenames = g+d
a.somebodytoldmeishouldusemoredesccribtivevaraiblenames = c+f

You surely wouldn't expect Python to protect you from ignorant or obnoxious
programmers who wrote code like that. I likewise don't think Python should
protect you from programmers who do things like this:

py> A = 42
py> Α = 23
py> A == Α
False

Besides, just because you and I can't distinguish A from Α in my editor,
using one particular choice of font, doesn't mean that the author or his
intended audience (Greek programmers perhaps?) can't distinguish them, using
their editor and a more suitable typeface. The two characters are distinct
using Courier or Lucinda Typewriter, to mention only two.

> Is the proposal mentioned in the PEP (to use something based on Unicode
> Technical Standard #39 [3]) something that might be implemented at any
> point?

> [3] http://unicode.org/reports/tr39/#Confusable_Detection

I would welcome "confusable detection" in the standard library, possibly a
string method "skeleton" or some other interface to the Confusables file,
perhaps in unicodedata. And I would encourage code checkers like PyFlakes,
PyLint, PyChecker to check for confusable identifiers. But I do not believe
that this should be built into the Python language itself.

--
Steven
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas

Jim Jewett

unread,

Oct 1, 2012, 11:43:04 AM10/1/12

to Steven D'Aprano, python...@python.org

On 9/30/12, Steven D'Aprano <st...@pearwood.info> wrote:
> On 01/10/12 00:00, Oscar Benjamin wrote:

> py> A = 42
> py> Α = 23
> py> A == Α
> False

It will never be possible to catch all confusables, which is one
reason that the unicode property stalled.

It seems like it would be reasonable to at least warn when identifiers
are not all in the same script -- but real-world examples from Emacs
Lisp made it clear that this is often intentional. There were still
clear word-boundaries, but it wasn't clear how that word-boundary
detection could be properly automated in the general case.

> Besides, just because you and I can't distinguish A from Α in my editor,
> using one particular choice of font, doesn't mean that the author or his
> intended audience (Greek programmers perhaps?) can't distinguish them,

In many cases, it does -- for the letters to look different requires
an unnatural font choice, though perhaps not so extreme as the
print-the-hex-code font.

> I would welcome "confusable detection" in the standard library, possibly a
> string method "skeleton" or some other interface to the Confusables file,
> perhaps in unicodedata.

I would too, and agree that it shouldn't be limited to identifiers.

-jJ

Mathias Panzenböck

unread,

Oct 1, 2012, 12:07:19 PM10/1/12

to python...@python.org

I still don't understand why unicode characters are allowed at all in identifier names. Is the
reason for this written down somewhere?

Chris Angelico

unread,

Oct 1, 2012, 12:19:40 PM10/1/12

to python...@python.org

On Tue, Oct 2, 2012 at 2:07 AM, Mathias Panzenböck
<grosser.me...@gmx.net> wrote:
> I still don't understand why unicode characters are allowed at all in
> identifier names. Is the reason for this written down somewhere?

Same reason you're allowed more than two letters in your identifiers:
to allow programmers to make variable names meaningful. The problem
isn't with Unicode, anyway; there are plenty of fonts in which l and 1
are practically identical, and unless your font is monospaced, you
probably will have trouble distinguishing __________rn___ from
__________m___ (just how many underscores IS that?). It's up to the
programmer to be smart about his names.

ChrisA

Robert Kern

unread,

Oct 1, 2012, 12:43:40 PM10/1/12

to python...@python.org

On 10/1/12 5:07 PM, Mathias Panzenböck wrote:
> I still don't understand why unicode characters are allowed at all in identifier
> names. Is the reason for this written down somewhere?

http://www.python.org/dev/peps/pep-3131/#rationale

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Mathias Panzenböck

unread,

Oct 1, 2012, 1:02:07 PM10/1/12

to python...@python.org

On 10/01/2012 06:43 PM, Robert Kern wrote:
> On 10/1/12 5:07 PM, Mathias Panzenböck wrote:
>> I still don't understand why unicode characters are allowed at all in identifier
>> names. Is the reason for this written down somewhere?
>
> http://www.python.org/dev/peps/pep-3131/#rationale
>

But the Python keywords and more importantly the documentation is English. Don't you need to be able
to speak/write English in order to code Python anyway? And if you keep you code+comments English you
can access a much larger developer pool (all developers who speak English should by my hypothesis be
a superset of all developers who speak a certain language).

Massimo DiPierro

unread,

Oct 1, 2012, 1:18:31 PM10/1/12

to Robert Kern, python...@python.org

The great thing about open source is that is has brought the world together. I am not an english speaker and I learned the meaning of IF, THEN, FOR, WHILE, not in the context of the English language, but as keywords of the Basic programming language. The fact that they are english words has is accidental. The great thing about code is (used to be) that anybody can read and understand what others write.

When I used program in Italy, I had to deal with latin-1 characters. This was never a problem. Not even in Cobol, Basic, Clipper, or Paradox because data should be separated from code. Data may contain latin-1 or unicode or whatever. Code always contains ASCII and if one does not mix the two there is never a problem.

Allowing unicode in variable names blurs this separation. It makes code written people speaking one language unreadable by people speaking a different language.

I should point out that most of my students are Chinese. They do not have any problem with reading and writing code using the english alphabet.

Any one of us could design better power plugs for our homes. That does not mean it would be a good idea to do so.

Massimo

Guido van Rossum

unread,

Oct 1, 2012, 1:44:42 PM10/1/12

to Mathias Panzenböck, python...@python.org

On Mon, Oct 1, 2012 at 10:02 AM, Mathias Panzenböck
<grosser.me...@gmx.net> wrote:
> On 10/01/2012 06:43 PM, Robert Kern wrote:
>>
>> On 10/1/12 5:07 PM, Mathias Panzenböck wrote:
>>>
>>> I still don't understand why unicode characters are allowed at all in
>>> identifier
>>> names. Is the reason for this written down somewhere?
>>
>>
>> http://www.python.org/dev/peps/pep-3131/#rationale
>>
>
> But the Python keywords and more importantly the documentation is English.
> Don't you need to be able to speak/write English in order to code Python
> anyway? And if you keep you code+comments English you can access a much
> larger developer pool (all developers who speak English should by my
> hypothesis be a superset of all developers who speak a certain language).

Hi Matthias,

Your objections go pretty much exactly along the lines of my original
resistance to this proposal (which was proposed many times before it
got to be a PEP). What finally made me change my mind was talking to
educators who were teaching Python in countries where not only English
is not the primary language, the primary language is not even related
to English. (E.g. Chinese or Japanese.)

Teaching the students the necessary language keywords and standard
library names is not that difficult; even if English *is* your primary
language you have to learn what they mean in the context of
programming. (Example: "print" comes from a very ancient mode of using
computers where the only form of output was through a physical
printer.)

But these students often have a very limited English vocabulary, and
their science and math classes (which are often useful starting points
for programming exercises) are usually taught in the native language.
So when teachers show students example programs it helps if they can
name e.g. their variables and functions in the native language.
Comments are also often written in the native language. Here, it
really helps if the students can type their native language directly
rather than having to use the Latin transcription (even if they often
also have to learn the latter, for unrelated pragmatic reasons).

From your name and email it sounds like your native language might be
German. Like me, you probably take pride in your English skills and
like me, you write all your code using English for identifiers and
comments. However, for students just learning to program and not yet
well-versed in English, that would be like trying to teach them
multiple things at once. It may work for the smartest students, but it
probably would be unnecessarily off-putting for many others.

As an example in German, I found a Python book aimed at middle- and
high-schoolers written in German, Python für Kids. You can look inside
it on the Amazon website:
http://www.amazon.com/Python-f%C3%BCr-Kids/dp/3826609514#reader_3826609514
-- the examples use German words for most module and variable names.
Luckily German limited to ASCII is still fairly readable ("fuer"
instead of "für" etc.), so Unicode is not strictly needed for this
case -- but you can understand that in languages whose native alphabet
is not English, Unicode is essential for the same style of
introduction.

I'm sure there are also examples beyond education -- e.g. in a program
for calculating dutch taxes I would use the dutch names for the
various technical terms naming concepts in dutch tax law, and again,
in the case of the Dutch language that doesn't require Unicode, but
for many other languages it would.

I hope this helps. (Also note, as the PEP states explicitly, that the
Python standard library should use only ASCII and English for
identifiers and comments, except in those unittests that are
specifically testing the Unicode identifiers feature.)

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,

Oct 1, 2012, 1:51:54 PM10/1/12

to Massimo DiPierro, Robert Kern, python...@python.org

On Mon, Oct 1, 2012 at 10:18 AM, Massimo DiPierro
<massimo....@gmail.com> wrote:
> The great thing about open source is that is has brought the world together. I am not an english speaker and I learned the meaning of IF, THEN, FOR, WHILE, not in the context of the English language, but as keywords of the Basic programming language. The fact that they are english words has is accidental. The great thing about code is (used to be) that anybody can read and understand what others write.
>
> When I used program in Italy, I had to deal with latin-1 characters. This was never a problem. Not even in Cobol, Basic, Clipper, or Paradox because data should be separated from code. Data may contain latin-1 or unicode or whatever. Code always contains ASCII and if one does not mix the two there is never a problem.
>
> Allowing unicode in variable names blurs this separation. It makes code written people speaking one language unreadable by people speaking a different language.
>
> I should point out that most of my students are Chinese. They do not have any problem with reading and writing code using the english alphabet.
>
> Any one of us could design better power plugs for our homes. That does not mean it would be a good idea to do so.

Our posts crossed. I hope my explanation makes sense to you. The age /
grade level of students probably matters; all classes in middle or
high school are typically taught in the native language, but in
University more and more courses are taught in English (some European
countries are even making English the mandatory teachkng language at
the University level).

Not everything you design is meant to be a better power plug for the
world. Sometimes you just need to find a way to fit *your* oven in
*your* cabinet, and cutting up some planks in a way that wouldn't work
for anyone else is fine.

--
--Guido van Rossum (python.org/~guido)

Georg Brandl

unread,

Oct 1, 2012, 1:48:44 PM10/1/12

to python...@python.org

On 10/01/2012 07:02 PM, Mathias Panzenböck wrote:
> On 10/01/2012 06:43 PM, Robert Kern wrote:
>> On 10/1/12 5:07 PM, Mathias Panzenböck wrote:
>>> I still don't understand why unicode characters are allowed at all in identifier
>>> names. Is the reason for this written down somewhere?
>>
>> http://www.python.org/dev/peps/pep-3131/#rationale
>>
>
> But the Python keywords and more importantly the documentation is English. Don't you need to be able
> to speak/write English in order to code Python anyway? And if you keep you code+comments English you
> can access a much larger developer pool (all developers who speak English should by my hypothesis be
> a superset of all developers who speak a certain language).

Please; the PEP has been discussed quite a lot when it was proposed,
and believe me, yours is not an unfamiliar argument :) You're about
5 years late.

Georg

Antoine Pitrou

unread,

Oct 1, 2012, 2:04:06 PM10/1/12

to python...@python.org

On Mon, 1 Oct 2012 10:44:42 -0700
Guido van Rossum <gu...@python.org> wrote:
>
> As an example in German, I found a Python book aimed at middle- and
> high-schoolers written in German, Python für Kids. You can look inside
> it on the Amazon website:
> http://www.amazon.com/Python-f%C3%BCr-Kids/dp/3826609514#reader_3826609514

Oh but why isn't it named Python für Kinder? :-)

Regards

Antoine.

--
Software development and contracting: http://pro.pitrou.net

Guido van Rossum

unread,

Oct 1, 2012, 2:10:32 PM10/1/12

to Antoine Pitrou, python...@python.org

On Mon, Oct 1, 2012 at 11:04 AM, Antoine Pitrou <soli...@pitrou.net> wrote:
> On Mon, 1 Oct 2012 10:44:42 -0700
> Guido van Rossum <gu...@python.org> wrote:
>>
>> As an example in German, I found a Python book aimed at middle- and
>> high-schoolers written in German, Python für Kids. You can look inside
>> it on the Amazon website:
>> http://www.amazon.com/Python-f%C3%BCr-Kids/dp/3826609514#reader_3826609514
>
> Oh but why isn't it named Python für Kinder? :-)

Probably to be "cool" for the "kids". Why is a mobile phone in Germany
called a "Handy" ?

--
--Guido van Rossum (python.org/~guido)

Jakob Bowyer

unread,

Oct 1, 2012, 2:12:41 PM10/1/12

to Guido van Rossum, Antoine Pitrou, python...@python.org

Because it fits in your hand? And its handy? :)

Terry Reedy

unread,

Oct 1, 2012, 2:21:05 PM10/1/12

to python...@python.org

On 10/1/2012 1:02 PM, Mathias Panzenböck wrote:
> On 10/01/2012 06:43 PM, Robert Kern wrote:
>> On 10/1/12 5:07 PM, Mathias Panzenböck wrote:
>>> I still don't understand why unicode characters are allowed at all in
>>> identifier
>>> names. Is the reason for this written down somewhere?
>>
>> http://www.python.org/dev/peps/pep-3131/#rationale

I have the impression that latin-1 chars were/are (unofficially)
accepted in Python2.

> But the Python keywords and more importantly the documentation is
> English.

I know of at least one translation
http://docs.python.org.ar/tutorial/contenido.html
though keeping up with changes is obvious a problem.

There are multiple books in multiple languages. When I went to a
bookstore in Japan, the program languages sections had about 8 for
Python. I suspect that is more than most equivalent US bookstores.

--
Terry Jan Reedy

Massimo DiPierro

unread,

Oct 1, 2012, 3:29:46 PM10/1/12

to Guido van Rossum, Robert Kern, python...@python.org

Hello Guido,

it does make sense. The only point I tried to make is that, because something is allowed, it does mean it should be encouraged.
I am sure there are instructors who want to teach to code using Japanese of Chinese variable names. Python gives them a way to do so.
Yet, if they do so, they would be isolating their students and their code from the rest of the world.

Massimo

Mathias Panzenböck

unread,

Oct 1, 2012, 3:33:13 PM10/1/12

to python...@python.org

On 10/01/2012 07:48 PM, Georg Brandl wrote:
> On 10/01/2012 07:02 PM, Mathias Panzenböck wrote:
>> On 10/01/2012 06:43 PM, Robert Kern wrote:
>>> On 10/1/12 5:07 PM, Mathias Panzenböck wrote:
>>>> I still don't understand why unicode characters are allowed at all in identifier
>>>> names. Is the reason for this written down somewhere?
>>>
>>> http://www.python.org/dev/peps/pep-3131/#rationale
>>>
>>
>> But the Python keywords and more importantly the documentation is English. Don't you need to be able
>> to speak/write English in order to code Python anyway? And if you keep you code+comments English you
>> can access a much larger developer pool (all developers who speak English should by my hypothesis be
>> a superset of all developers who speak a certain language).
>
> Please; the PEP has been discussed quite a lot when it was proposed,
> and believe me, yours is not an unfamiliar argument :) You're about
> 5 years late.
>
> Georg
>

I didn't want to start a discussion. I just wanted to know why one would implement such a language
feature. Guido's answer cleared it up for me, thanks. I can see the purpose in an educational
setting (not in production code of anything a little bit bigger).

-panzi

Nick Coghlan

unread,

Oct 1, 2012, 3:37:24 PM10/1/12

to Massimo DiPierro, Robert Kern, python...@python.org

On Tue, Oct 2, 2012 at 12:59 AM, Massimo DiPierro
<massimo....@gmail.com> wrote:
> Hello Guido,
>
> it does make sense. The only point I tried to make is that, because something is allowed, it does mean it should be encouraged.
> I am sure there are instructors who want to teach to code using Japanese of Chinese variable names. Python gives them a way to do so.
> Yet, if they do so, they would be isolating their students and their code from the rest of the world.

Only if they *stop* there. The idea is just to allow the learning
curve to be made gentler - as people learn the standard library and
the tools on PyPI, then yes, it will still be necessary to continue
learning English in order to make use of those tools (especially as
many of them won't have translated documentation).

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Georg Brandl

unread,

Oct 1, 2012, 4:03:21 PM10/1/12

to python...@python.org

On 10/01/2012 09:33 PM, Mathias Panzenböck wrote:
> On 10/01/2012 07:48 PM, Georg Brandl wrote:
>> On 10/01/2012 07:02 PM, Mathias Panzenböck wrote:
>>> On 10/01/2012 06:43 PM, Robert Kern wrote:
>>>> On 10/1/12 5:07 PM, Mathias Panzenböck wrote:
>>>>> I still don't understand why unicode characters are allowed at all in identifier
>>>>> names. Is the reason for this written down somewhere?
>>>>
>>>> http://www.python.org/dev/peps/pep-3131/#rationale
>>>>
>>>
>>> But the Python keywords and more importantly the documentation is English. Don't you need to be able
>>> to speak/write English in order to code Python anyway? And if you keep you code+comments English you
>>> can access a much larger developer pool (all developers who speak English should by my hypothesis be
>>> a superset of all developers who speak a certain language).
>>
>> Please; the PEP has been discussed quite a lot when it was proposed,
>> and believe me, yours is not an unfamiliar argument :) You're about
>> 5 years late.
>>
>> Georg
>>
>
> I didn't want to start a discussion. I just wanted to know why one would implement such a language
> feature.

Well, in that case I would have said "read the PEP": I think it's well
explained there.

Georg

Georg Brandl

unread,

Oct 1, 2012, 4:04:51 PM10/1/12

to python...@python.org

On 10/01/2012 08:10 PM, Guido van Rossum wrote:
> On Mon, Oct 1, 2012 at 11:04 AM, Antoine Pitrou <soli...@pitrou.net> wrote:
>> On Mon, 1 Oct 2012 10:44:42 -0700
>> Guido van Rossum <gu...@python.org> wrote:
>>>
>>> As an example in German, I found a Python book aimed at middle- and
>>> high-schoolers written in German, Python für Kids. You can look inside
>>> it on the Amazon website:
>>> http://www.amazon.com/Python-f%C3%BCr-Kids/dp/3826609514#reader_3826609514
>>
>> Oh but why isn't it named Python für Kinder? :-)
>
> Probably to be "cool" for the "kids". Why is a mobile phone in Germany
> called a "Handy" ?

And why, oh why, do we have to buy our bread rolls at a "Backshop" nowadays...

Georg

Oscar Benjamin

unread,

Oct 1, 2012, 4:26:07 PM10/1/12

to Mathias Panzenböck, python...@python.org

On 1 October 2012 20:33, Mathias Panzenböck

<grosser.me...@gmx.net> wrote:
>
> On 10/01/2012 07:48 PM, Georg Brandl wrote:
>>
>> On 10/01/2012 07:02 PM, Mathias Panzenböck wrote:
>>>
>>> On 10/01/2012 06:43 PM, Robert Kern wrote:
>>>>
>>>> On 10/1/12 5:07 PM, Mathias Panzenböck wrote:
>>>>>
>>>>> I still don't understand why unicode characters are allowed at all in identifier
>>>>> names. Is the reason for this written down somewhere?
>>>>
>>>>
>>>> http://www.python.org/dev/peps/pep-3131/#rationale
>>>>
>>>
>>> But the Python keywords and more importantly the documentation is English. Don't you need to be able
>>> to speak/write English in order to code Python anyway? And if you keep you code+comments English you
>>> can access a much larger developer pool (all developers who speak English should by my hypothesis be
>>> a superset of all developers who speak a certain language).
>>
>>
>> Please; the PEP has been discussed quite a lot when it was proposed,
>> and believe me, yours is not an unfamiliar argument :) You're about
>> 5 years late.
>>
>> Georg
>>
>
> I didn't want to start a discussion. I just wanted to know why one would implement such a language feature. Guido's answer cleared it up for me, thanks. I can see the purpose in an educational setting (not in production code of anything a little bit bigger).

Non-ascii identifiers have other possible uses. I'll repost the case
that started this discussion on python-tutor (attached in case it
doesn't display):

'''
#!/usr/bin/env python3
# -*- encoding: utf-8 -*-

# Parameters
α = 1
β = 0.1
γ = 1.5
δ = 0.075

# Initial conditions
xₒ = 10
yₒ = 5
Zₒ = xₒ, yₒ

# Solution parameters
tₒ = 0
Δt = 0.001
T = 10

# Lotka-Volterra derivative
def f(Z, t):
x, y = Z
ẋ = x * (α - β*y)
ẏ = -y * (γ - δ*x)
return ẋ, ẏ

# Accumulate results from Euler stepper
tᵢ = tₒ
Zᵢ = Zₒ
Zₜ, t = [], []
while tᵢ <= tₒ + T:
Zₜ.append(Zᵢ)
t.append(tᵢ)
Zᵢ = [Zᵢⱼ+ Δt*Żᵢⱼ for Zᵢⱼ, Żᵢⱼ in zip(Zᵢ, f(Zᵢ, tᵢ))]
tᵢ += Δt

# Output since I don't have plotting libraries in Python 3
print('t', 'x', 'y')
for tᵢ, (xᵢ, yᵢ) in zip(t, Zₜ):
print(tᵢ, xᵢ, yᵢ)
'''

Oscar

lv.py

Guido van Rossum

unread,

Oct 1, 2012, 4:51:34 PM10/1/12

to Oscar Benjamin, python...@python.org

Those examples would be a lot more compelling if there was an
acceptable way to input those characters. Maybe we could support some
kind of input method that enabled LaTeX style math notation as used by
scientists for writing equations in papers?

--
--Guido van Rossum (python.org/~guido)

Andre Roberge

unread,

Oct 1, 2012, 4:55:30 PM10/1/12

to python...@python.org

On Mon, Oct 1, 2012 at 5:51 PM, Guido van Rossum <gu...@python.org> wrote:

On Mon, Oct 1, 2012 at 1:26 PM, Oscar Benjamin

SNIP

> Non-ascii identifiers have other possible uses. I'll repost the case
> that started this discussion on python-tutor (attached in case it
> doesn't display):
>
> '''
> #!/usr/bin/env python3
> # -*- encoding: utf-8 -*-
>
> # Parameters
> α = 1
> β = 0.1
> γ = 1.5
> δ = 0.075
>
> # Initial conditions
> xₒ = 10
> yₒ = 5
> Zₒ = xₒ, yₒ
>

SNIP

Those examples would be a lot more compelling if there was an
acceptable way to input those characters. Maybe we could support some
kind of input method that enabled LaTeX style math notation as used by
scientists for writing equations in papers?

+1000

André Roberge

Oscar Benjamin

unread,

Oct 1, 2012, 5:46:50 PM10/1/12

to Guido van Rossum, python...@python.org

On 1 October 2012 21:51, Guido van Rossum <gu...@python.org> wrote:
> On Mon, Oct 1, 2012 at 1:26 PM, Oscar Benjamin
> <oscar.j....@gmail.com> wrote:
>> # Parameters
>> α = 1
>> β = 0.1
>> γ = 1.5
>> δ = 0.075
>>
>> # Initial conditions
>> xₒ = 10
>> yₒ = 5
>> Zₒ = xₒ, yₒ

>

> Those examples would be a lot more compelling if there was an
> acceptable way to input those characters. Maybe we could support some
> kind of input method that enabled LaTeX style math notation as used by
> scientists for writing equations in papers?

Sympy already has a few of the basic TeX concepts. I imagine that something like Sympy notebooks (a browser-based interface) might one day gain support for this. A readline-ish method to do it would be a great extension to isympy (since it already works for output):

$ isympy
IPython console for SymPy 0.7.1.rc1 (Python 2.7.3-64-bit) (ground types: python)

In [1]: Symbol('beta')
Out[1]: β

In [2]: Symbol('c_1')
Out[2]: c₁

Oscar

Georg Brandl

unread,

Oct 1, 2012, 5:54:04 PM10/1/12

to python...@python.org

On 10/01/2012 10:51 PM, Guido van Rossum wrote:
> On Mon, Oct 1, 2012 at 1:26 PM, Oscar Benjamin

>> Non-ascii identifiers have other possible uses. I'll repost the case
>> that started this discussion on python-tutor (attached in case it
>> doesn't display):

Very nice!

With the right editor, of course, it's not a problem :)

(Emacs has a TeX input method with which I could type this example without
problems.)

Georg

Matthew Woodcraft

unread,

Oct 1, 2012, 6:28:09 PM10/1/12

to python...@python.org

On 2012-10-01 21:51, Guido van Rossum wrote:
> Those examples would be a lot more compelling if there was an
> acceptable way to input those characters. Maybe we could support some
> kind of input method that enabled LaTeX style math notation as used by
> scientists for writing equations in papers?

I think that's up to the OS or the text editor.

In Emacs, this works:
M-x set-input-method tex

-M-

Greg Ewing

unread,

Oct 1, 2012, 7:24:27 PM10/1/12

to python...@python.org

Antoine Pitrou wrote:

> Oh but why isn't it named Python für Kinder? :-)

It looks like Germans have adopted "kid" as an abbreviation
for "kinder", just like we use it as an abbreviation for
"child". Or maybe we got it from them -- it's closer to
their original word than ours!

They seem to be using our plural, though -- "kids", not
"kidden"...

--
Greg

Mathias Panzenböck

unread,

Oct 1, 2012, 8:06:35 PM10/1/12

to python...@python.org

On 10/02/2012 01:24 AM, Greg Ewing wrote:
> Antoine Pitrou wrote:
>
>> Oh but why isn't it named Python für Kinder? :-)
>
> It looks like Germans have adopted "kid" as an abbreviation
> for "kinder", just like we use it as an abbreviation for
> "child". Or maybe we got it from them -- it's closer to
> their original word than ours!
>
> They seem to be using our plural, though -- "kids", not
> "kidden"...
>

Sometimes we use the ...s for plural as well, especially for acronyms, words of English or French
origin and last names. But it would not be ...en, maybe ...er. Is there any German word that uses
...en for plural? I don't think so. Anyway, "kids" is definitely an anglicism, because we pronounce
it "English" and not like it would be pronounced if it where derived from "Kind" (it would be more
like "keed"). German today is full of anglicisms.

But then, there are some German words used by English people as well: gesundheit, kindergarten,
über, blitz(krieg), angst (used as something different as the German word), abseiling ("abseilen" in
German), doppelgänger, gestalt, poltergeist, Zeitgeist...

Steven D'Aprano

unread,

Oct 1, 2012, 9:32:06 PM10/1/12

to python...@python.org

On 02/10/12 05:29, Massimo DiPierro wrote:

> it does make sense. The only point I tried to make is that,
> because something is allowed, it does mean it should be
> encouraged. I am sure there are instructors who want to teach
>to code using Japanese of Chinese variable names. Python gives
> them a way to do so. Yet, if they do so, they would be
>isolating their students and their code from the rest of the
>world.

People very often over-estimate the cost of that isolation, and
over-value access to the rest of the world.

The average open source piece of software has one, maybe two,
contributors. What do they care if millions of English-speaking
programmers can't contribute when they weren't going to contribute
regardless of the language? Perhaps the convenience of being able
to read your own code in your own native language outweighs the
loss of being able to attract contributors that you can't even
talk to.

And for proprietary software, again it is irrelevant. If a Chinese
company writes Chinese software for Chinese users with Chinese
developers, why would they want to write it in English? Perhaps
they have little choice due to the overwhelming trend towards English
in programming languages, but there's no positive benefit to using
a non-native language.

Quite frankly, and I'm saying this as somebody who only speaks
English, I think that the use of English as the single lingua franca
of computer programming is as unnecessary (and ultimately as harmful)
as the use of Latin and then French as the sole lingua franca of
science and mathematics. I expect that it too will be a passing phase.

By the way, are you familiar with ChinesePython and IronPerunis?

http://www.chinesepython.org/english/english.html
http://ironperunis.codeplex.com/

--
Steven

Stephen J. Turnbull

unread,

Oct 1, 2012, 11:48:07 PM10/1/12

to Mathias Panzenböck, python...@python.org

Mathias Panzenböck writes:

> I still don't understand why unicode characters are allowed at all
> in identifier names.

"Consenting adults." 'nuff said?

An anecdote. Back when I was first learning Japanese, I maintained an
Emacs interface to EDICT, a free Japanese-English dictionary. The
code was smart enough to parse morphosyntax (inflection of verbs and
adjectives) into dictionary forms, but I wasn't (and according to my
daughter, still am not<wink/>). So I asked my tutor for help.

Although a total non-programmer, he was able to read the grammar
easily because the state names (identifiers for callable objects) were
written in Japanese, using the standard grammatical name for the
inflection. The "easy" part comes in because although his English was
good, it wasn't good enough to disentangle Lisp gobbledygook from the
morphosyntax data had it been written in ASCII. But he was able to
read and comment on the whole grammar in about half an hour because he
could just skip *all* the ASCII!

Stephen J. Turnbull

unread,

Oct 2, 2012, 12:11:58 AM10/2/12

to Guido van Rossum, python...@python.org

Guido van Rossum writes:

> Those examples would be a lot more compelling if there was an
> acceptable way to input those characters.

Hey!! What's "unacceptable" about Emacs??<duck/>

> Maybe we could support some kind of input method that enabled LaTeX
> style math notation as used by scientists for writing equations in
> papers?

If you're talking about interactive use, Emacs has a method based on
searching the Unicode character database.

LaTeX math notation has a number of potential pitfalls. In
particular, the sub-/superscript notation can be applied to anything,
not just characters that happen to have *script versions in Unicode.
Also, not everything that seems to a character in LaTeX necessarily
has a corresponding Unicode character.

Ben Finney

unread,

Oct 2, 2012, 12:25:40 AM10/2/12

to python...@python.org

Matthew Woodcraft <mat...@woodcraft.me.uk>
writes:

> On 2012-10-01 21:51, Guido van Rossum wrote:
> > Those examples would be a lot more compelling if there was an
> > acceptable way to input those characters. Maybe we could support
> > some kind of input method that enabled LaTeX style math notation as
> > used by scientists for writing equations in papers?
>
> I think that's up to the OS or the text editor.

Agreed. Make of these identifiers will need to be typed at an OS command
line, after all (e.g. for naming a test case to run, as one which
springs easily to mind).

Solve the keyboard input problem in the OS layer – as someone who
anticipates working with non-ASCII characters must already do – and you
solve it for Python code as well. I don't think it's Python's business
to get involved at the input method level.

--
\ “The apparent lesson of the Inquisition is that insistence on |
`\ uniformity of belief is fatal to intellectual, moral, and |
_o__) spiritual health.” —_The Uses Of The Past_, Herbert J. Muller |
Ben Finney

Greg Ewing

unread,

Oct 2, 2012, 1:09:14 AM10/2/12

to python...@python.org

Mathias Panzenböck wrote:
> But it would not be
> ...en, maybe ...er. Is there any German word that uses ...en for plural?
> I don't think so.

This page seems to think that some do:

http://german.about.com/od/grammar/a/PluralNounsWithnENEndings.htm

--
Greg

Stephen J. Turnbull

unread,

Oct 2, 2012, 4:04:55 AM10/2/12

to Ben Finney, python...@python.org

Ben Finney writes:

> Solve the keyboard input problem in the OS layer – as someone who
> anticipates working with non-ASCII characters must already do – and you
> solve it for Python code as well.

That simply isn't true for symbol characters and Greek letters. I
still let either TeX or XEmacs translate TeX macros for me. I don't
even know how to type an integral sign in Mac OS X Terminal
(conveniently, that is -- of course there's always the character
palette), and if I wanted directed quotation marks (I don't), I'd just
use ASCII quotes and let XEmacs translate those, too.

There ought to be a standard way to get those symbols and punctuation,
preferably ASCII-based, on any terminal, using the standard Python
interpreter.

Serhiy Storchaka

unread,

Oct 2, 2012, 6:43:07 AM10/2/12

to python...@python.org

On 01.10.12 23:51, Guido van Rossum wrote:
> Those examples would be a lot more compelling if there was an
> acceptable way to input those characters. Maybe we could support some
> kind of input method that enabled LaTeX style math notation as used by
> scientists for writing equations in papers?

\u03B1

Java already allows this outside of the string literals. And it
sometimes causes unpleasant effects.

Ben Finney

unread,

Oct 2, 2012, 7:39:12 AM10/2/12

to python...@python.org

"Stephen J. Turnbull" <ste...@xemacs.org>
writes:

> I still let either TeX or XEmacs translate TeX macros for me. I don't
> even know how to type an integral sign in Mac OS X Terminal
> (conveniently, that is -- of course there's always the character
> palette), and if I wanted directed quotation marks (I don't), I'd just
> use ASCII quotes and let XEmacs translate those, too.

Right. So you've solved it for one program only, not the OS which is (or
should be) responsible for turning what you type into characters,
uniformly across all applications you have keyboard input for.

> There ought to be a standard way to get those symbols and punctuation,
> preferably ASCII-based, on any terminal

Definitely agreed with this. Indeed, it's my point: the problem should
be solved in one place for the user of the computer, not separately per
application or framework.

> using the standard Python interpreter.

If you mean that the Python interpreter should be aware of the solution,
why? That's solving it at the wrong level, because any non-Python
program (such as a shell or an editor) gets no benefit from that.

If you mean that the single, one-point solution should work across all
programs, including the standard Python interpreter, then yes I agree.

I'm saying the OS is the right place to solve it, by installing an
appropriate input method (or whatever each OS calls them).

--
\ “In economics, hope and faith coexist with great scientific |
`\ pretension and also a deep desire for respectability.” —John |
_o__) Kenneth Galbraith, 1970-06-07 |
Ben Finney

Stephen J. Turnbull

unread,

Oct 3, 2012, 1:31:46 AM10/3/12

to Ben Finney, python...@python.org

Ben Finney writes:

> "Stephen J. Turnbull" <ste...@xemacs.org>
> writes:
>
> > I still let either TeX or XEmacs translate TeX macros for me. I don't
> > even know how to type an integral sign in Mac OS X Terminal
> > (conveniently, that is -- of course there's always the character
> > palette), and if I wanted directed quotation marks (I don't), I'd just
> > use ASCII quotes and let XEmacs translate those, too.
>
> Right. So you've solved it for one program only, not the OS

You seem to be under a misconception. Emacs *is* an OS, it just runs
on top of the more primitive OSes normally associated with the term. ;-)

> I'm saying the OS is the right place to solve it, by installing an
> appropriate input method (or whatever each OS calls them).

I doubt very many people used to and fond of LaTeX would agree with
you, since AFAIK there aren't any OSes providing TeX macros as an
input method. AFAICS it's not available on my Mac.

While I don't particularly favor it, it may be the best compromise, as
many people are familiar with it, and many many symbols are available
with familiar, intuitive names so that non-TeXnical typists can often
guess them.

Ben Finney

unread,

Oct 3, 2012, 6:20:34 PM10/3/12

to python...@python.org

"Stephen J. Turnbull" <ste...@xemacs.org>
writes:

> Ben Finney writes:
> > Right. So you've solved it for one program only, not the OS
>

> You seem to be under a misconception. Emacs *is* an OS […]

… all it needs is a good editor? :-)

(I'm claiming permission for that snark because Emacs is my primary
editor.)

> > I'm saying the OS is the right place to solve it, by installing an
> > appropriate input method (or whatever each OS calls them).
>
> I doubt very many people used to and fond of LaTeX would agree with
> you, since AFAIK there aren't any OSes providing TeX macros as an
> input method.

I've shown several LaTeX-comfortable people IBus on GNOME and/or KDE
(for GNU+Linux), and they were very glad that it has a LaTeX input
method. So anyone who is fond of LaTeX and has IBus or an equivalent
input method engine on their OS can agree.

> AFAICS it's not available on my Mac.

That's a shame. Maybe some OS vendors don't want to support users
extending the OS functionality? Or maybe your OS does have such a thing
available. I haven't been motivated to look for it.

> While I don't particularly favor it, it may be the best compromise, as
> many people are familiar with it, and many many symbols are available
> with familiar, intuitive names so that non-TeXnical typists can often
> guess them.

Agreed. Which is why I advocate installing such an input method in one's
OS input method engine, so that input method is available for all
applications.

--
\ “I thought I'd begin by reading a poem by Shakespeare, but then |
`\ I thought ‘Why should I? He never reads any of mine.’” —Spike |
_o__) Milligan |
Ben Finney

Stephen J. Turnbull

unread,

Oct 4, 2012, 11:11:34 PM10/4/12

to Ben Finney, python...@python.org

Ben Finney writes:

> I've shown several LaTeX-comfortable people IBus on GNOME and/or KDE
> (for GNU+Linux), and they were very glad that it has a LaTeX input
> method.

I'm happy to be proved wrong!

> > AFAICS it's not available on my Mac.
>
> That's a shame. Maybe some OS vendors don't want to support users
> extending the OS functionality? Or maybe your OS does have such a thing
> available. I haven't been motivated to look for it.

I have looked for it; if it's available on Mac OS X, it's not easy to
find. I suspect the same is true for Windows.

> Agreed. Which is why I advocate installing such an input method in one's
> OS input method engine, so that input method is available for all
> applications.

Whatever makes you think I don't? That's *exactly* why I live in
XEmacs, because it provides me with a portable environment for mixing
English and math with a language whose orthography puts Brainf*ck
syntax to shame.

But pragmatically speaking, Unicode support is a sore point for
Python. "Screw you if you don't know how to conveniently input
integral signs on your OS" is not a message we want to be sending.

Reply all

Reply to author

Forward