[Python-ideas] allow `lambda' to be spelled λ

Stephan Houben

unread,

Jul 12, 2016, 8:39:10 AM7/12/16

to python...@python.org

Hi list,

Here is my speculative language idea for Python:

Allow the following alternative spelling of the keyword `lambda':

λ

(That is "Unicode Character 'GREEK SMALL LETTER LAMDA' (U+03BB).")

Background:

I have been using the Vim "conceal" functionality with a rule which visually

replaces lambda with λ when editing Python files. I find this a great improvement in

readability since λ is visually less distracting while still quite distinctive.

(The fact that λ is syntax-colored as a keyword also helps with this.)

However, at the moment the nice syntax is lost when looking at the file through another editor or viewer.

Therefore I would really like this to be an official part of the Python syntax.

I know people have been clamoring for shorter lambda-syntax in the past, I think this is

a nice minimal extension.

Example code:

lst.sort(key=lambda x: x.lookup_first_name())

lst.sort(key=λ x: x.lookup_first_name())

# Church numerals

zero = λ f: λ x: x

one = λ f: λ x: f(x)

two = λ f: λ x: f(f(x))

(Yes, Python is my favorite Scheme dialect. Why did you ask?)

Note that a number of other languages already allow this. (Racket, Haskell).

You can judge the aesthetics of this on your own code with the following sed command.

sed 's/\<lambda\>/λ/g'

Advantages:

* The lambda keyword is quite long and distracts from the "meat" of the lambda expression.

Replacing it by a single-character keyword improves readability.

* The resulting code resembles more closely mathematical notation (in particular, lambda-calculus notation),

so it brings Python closer to being "executable pseudo-code".

* The alternative spelling λ/lambda is quite intuitive (at least to anybody who knows Greek letters.)

Disadvantages:

For your convenience already noticed here:

* Introducing λ is introducing TIMTOWTDI.

* Hard to type with certain editors.

But note that the old syntax is still available.

Easy to fix by upgrading to VIM ;-)

* Will turn a pre-existing legal identifier λ into a keyword.

So backward-incompatible.

Needless to say, my personal opinion is that the advantages outweigh the disadvantages. ;-)

Greetings,

Stephan

Random832

unread,

Jul 12, 2016, 9:45:36 AM7/12/16

to python...@python.org

On Tue, Jul 12, 2016, at 08:38, Stephan Houben wrote:
> I know people have been clamoring for shorter lambda-syntax in the
> past, I think this is a nice minimal extension.

How about a haskell-style backslash?
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

tritiu...@sdamon.com

unread,

Jul 12, 2016, 2:38:53 PM7/12/16

to Stephan Houben, python...@python.org

For better or worse, except for string literals which can be anything as long as you set a coding comment, python is pure ascii which simplifies everything. Lambda is not in the first 128 characters of Unicode, so it is highly unlikely to be accepted.

· ‘Lambda’ is exactly as discouraging to type as it needs to be. A more likely to be accepted alternate keyword is ‘whyareyounotusingdef’

· Python doesn’t attempt to look like mathematical formula

· ‘Lambda’ spelling is intuitive to most people who program

· TIMTOWTDI isn’t a religious edict. Python is more pragmatic than that.

· It’s hard to type in ALL editors unless your locale is set to (ancient?) Greek.

· … What are you doing to have an identifier outside of ‘[A-Za-z_][A-Za-z0-9_]+’?

Chris Angelico

unread,

Jul 12, 2016, 3:29:38 PM7/12/16

to python-ideas

On Wed, Jul 13, 2016 at 4:36 AM, <tritiu...@sdamon.com> wrote:
> For better or worse, except for string literals which can be anything as
> long as you set a coding comment, python is pure ascii which simplifies
> everything. Lambda is not in the first 128 characters of Unicode, so it is
> highly unlikely to be accepted.

Incorrect as of Python 3 - it's pure Unicode :) You can have
identifiers that use non-ASCII characters. The core language in the
default interpreter is all ASCII in order to make it easy for most
people to type, but there are variant Pythons that translate the
keywords into other languages (I believe there's a Chinese Python, and
possibly Korean?), which are then free to use whatever character set
they like. A variant Python would be welcome to translate all the
operators and keywords into single-character tokens, using Unicode
symbols for NOT EQUAL TO and so on - including using U+03BB in place
of 'lambda'.

ChrisA

Sven R. Kunze

unread,

Jul 12, 2016, 4:28:36 PM7/12/16

to python...@python.org

Completely matches my opinion. Thanks.

Bernardo Sulzbach

unread,

Jul 12, 2016, 6:12:22 PM7/12/16

to python...@python.org

Racket has this. I write lambdas there and I don't use (or know people
that use) the symbol anyway. It really does not save a lot.

Python is more of a hammer than a beautiful mathematical tool, so the
examples don't seem too convincing.

Plus, 80 columns and shortness does not make so much sense nowadays.

Ben Finney

unread,

Jul 12, 2016, 7:47:35 PM7/12/16

to python...@python.org

Stephan Houben <steph...@gmail.com>
writes:

> Here is my speculative language idea for Python:

Thank you for raising it.

> Allow the following alternative spelling of the keyword `lambda':
> λ

> […]

> Therefore I would really like this to be an official part of the
> Python syntax.
>
> I know people have been clamoring for shorter lambda-syntax in the
> past, I think this is a nice minimal extension.

The question is not whether it would be nice, nor how many people have
clamoured for it. That you feel this supports the proposal isn't a good
sign :-)

The question is: What significant improvements to the language are made
by this proposal, to counteract the significant cost of *any* such
change to the language syntax?

> Advantages:
>
> * The lambda keyword is quite long and distracts from the "meat" of
> the lambda expression. Replacing it by a single-character keyword
> improves readability.

I disagree on this point. Making lambda easier is an attractive
nuisance; the ‘def’ statement is superior (for Python) in most
situations, and so lamda expressions should not be easier than that.

> * The resulting code resembles more closely mathematical notation (in
> particular, lambda-calculus notation), so it brings Python closer to
> being "executable pseudo-code".

How is that an advantage?

> * The alternative spelling λ/lambda is quite intuitive (at least to
> anybody who knows Greek letters.)

I reject this use of “intuitive; no one knows intuitively what lambda
is, what λ is, what the correspondence between them is, and what they
mean in various contexts. All of that needs to be learned, specifically.

so if this is an advantage, it needs to be expressed somehow other than
“intuitive”. Maybe you mean “familiar”, and avoid that term because it
makes for a weaker argument?

> Disadvantages:

I agree with your assessment of disadvantages, and re-iterate the
inherent disadvantage that any language change brings significant cost
to the Python core developers and the whole Python community. That's why
most such suggestions have a significant hurdle to demonstrate a
benefit.

--
\ “Prediction is very difficult, especially of the future.” |
`\ —Niels Bohr |
_o__) |
Ben Finney

Joao S. O. Bueno

unread,

Jul 12, 2016, 8:01:09 PM7/12/16

to Ben Finney, Python-Ideas

Stephan:
Have you met the coconut project already?
https://pypi.python.org/pypi/coconut,
http://coconut.readthedocs.io/en/master/DOCS.html
They create a superset of Python with the aim to allow writting smaller
functional programs, and have a super-featured functional set of
operators and such.

It is a pure-Python project that pre-compiles their "coconut" program
files to .py at compile time -
They have a shorter syntax for lambda already - maybe that could be of
use to you - and
maybe you can get them to accept your suggestion - it certainly would fit there.

"""
Lambdas
Coconut provides the simple, clean -> operator as an alternative to
Python’s lambda statements. The operator has the same precedence as
the old statement.

Rationale

In Python, lambdas are ugly and bulky, requiring the entire word
lambda to be written out every time one is constructed. This is fine
if in-line functions are very rarely needed, but in functional
programming in-line functions are an essential tool.

Example:

dubsums = map((x, y) -> 2*(x+y), range(0, 10), range(10, 20))

Ryan Gonzalez

unread,

Jul 12, 2016, 8:18:27 PM7/12/16

to Stephan Houben, python-ideas

Doesn't this kind of violate Python's "one way to do it"?

(Also, sorry for the top post; I'm on mobile right now...)

--
Ryan
[ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong.
http://kirbyfan64.github.io/

Stephen J. Turnbull

unread,

Jul 12, 2016, 10:05:17 PM7/12/16

to Chris Angelico, python-ideas

Chris Angelico writes:

> A variant Python would be welcome to translate all the operators
> and keywords into single-character tokens, using Unicode symbols
> for NOT EQUAL TO and so on - including using U+03BB in place of
> 'lambda'.

Probably it would not be "welcome", except in the usual sense that
"Python is open source, you can do what you want".

There was extensive discussion about the issues surrounding the
natural languages used by programmers in source documentation (eg,
identifier choice and comments) at the time of PEP 263. The mojibake
(choice of charset) problem has largely improved since then, thanks to
Unicode adoption, especially UTF-8. But the "Tower of Babel" issue
has not. Fundamentally, it's like women's clothes (they wear them to
impress, ie, communicate to, other women -- few men have the interest
to understand what is impressive ;-): programming is about programmers
communicating to other programmers. Maintaining the traditional
spelling of keywords and operators is definitely useful for that
purpose.

This is not to say that individuals who want a personalized[1]
language are wrong, just that it would have a net negative impact on
communication in teams.

BTW, Barry long advocated use of some variant syntaxes (the one I like
to remember inaccurately is "><" instead of "!="), and in fact
provided an easter egg import (barry_is_flufl or something like that)
that changed the syntax to suit him. I believe that module is pure
Python, so people who want to customize the lexical definition of
Python at the language level can do so AFAIK. You could probably even
spell it "import λ等" (to take a very ugly page from the Book of GNU,
mixing scripts in a single word -- the Han character means "etc.").

Footnotes:
[1] I don't have a better word. I mean something like "seasoned to
taste", almost "tasteful" but not quite.

Chris Angelico

unread,

Jul 12, 2016, 11:26:50 PM7/12/16

to python-ideas

On Wed, Jul 13, 2016 at 12:04 PM, Stephen J. Turnbull
<ste...@xemacs.org> wrote:
> Chris Angelico writes:
>
> > A variant Python would be welcome to translate all the operators
> > and keywords into single-character tokens, using Unicode symbols
> > for NOT EQUAL TO and so on - including using U+03BB in place of
> > 'lambda'.
>
> Probably it would not be "welcome", except in the usual sense that
> "Python is open source, you can do what you want".
>

> ... programmers communicating to other programmers. ...

>
> This is not to say that individuals who want a personalized[1]
> language are wrong, just that it would have a net negative impact on
> communication in teams.

A fair point. But Python has a strong mathematical side (look how big
the numpy/scipy/matplotlib communities are), and we've already seen
how strongly they prefer "a @ b" to "a.matmul(b)". If there's support
for a language variant that uses more and shorter symbols, that would
be where I'd expect to find it.

ChrisA
not a mathematician, although I play one on the internet sometimes

Steven D'Aprano

unread,

Jul 13, 2016, 12:01:24 AM7/13/16

to python...@python.org

On Wed, Jul 13, 2016 at 11:04:19AM +0900, Stephen J. Turnbull wrote:

> There was extensive discussion about the issues surrounding the
> natural languages used by programmers in source documentation (eg,
> identifier choice and comments) at the time of PEP 263. The mojibake
> (choice of charset) problem has largely improved since then, thanks to
> Unicode adoption, especially UTF-8. But the "Tower of Babel" issue
> has not. Fundamentally, it's like women's clothes (they wear them to
> impress, ie, communicate to, other women -- few men have the interest
> to understand what is impressive ;-): programming is about programmers
> communicating to other programmers.

With respect Stephen, that's codswallop :-)

It might be true that the average bogan[1] bloke or socially awkward
geek (including myself) might not care about impressive clothes, but
many men do dress to compete. The difference is more socio-economic:
typically women dress to compete across most s-e groups, while men
mostly do so only in the upper-middle and upper classes. And in the
upper classes, competition tends to be more understated and subtle
("good taste"), i.e. expensive Italian suits rather than hot pants.

Historically, it is usually men who dress like peacocks to impress
socially, while women are comparatively restrained. The drab business
suit of Anglo-American influence is no more representative of male
clothing through the ages as is the Communist Chinese "Mao suit".

And as for programmers... the popularity of one-liners, the obfuscated C
competition, code golf, "clever coding tricks" etc is rarely for the
purposes of communication *about code*. Communication is taking place,
but its about social status and cleverness. There's a very popular
StackOverflow site dedicated to code golf, where you will see people
have written their own custom languages specifically for writing terse
code. Nobody expects these languages to be used by more than a handful
of people. That's not their point.

> Maintaining the traditional
> spelling of keywords and operators is definitely useful for that
> purpose.

Okay, let's put aside the social uses of code-golfing and equivalent,
and focus on quote-unquote "real code", where programmers care more
about getting the job done and keeping it maintainable rather than
competing with other programmers for status, jobs, avoiding being the
sacrifical goat in the next round of stack-ranked layoffs, etc.

You're right of course that traditional spelling is useful, but perhaps
not as much as you think. After all, one person's traditional spelling
is another person's confusing notation and a third person's excessively
verbose spelling. Not too many people like Cobol-like spelling:

add 1 to the_number

over "n += 1". So I think that arguments for keeping "traditional
spelling" are mostly about familiarity. If we learned lambda calculus in
high school, perhaps λ would be less exotic.

I think that there is a good argument to be made in favour of increasing
the amount of mathematical notation used in code, but I would think that
since a lot of my code is mathematical in nature. I can see that makes
my code atypical.

Coming back to the specific change suggested here, λ as an alternative
keyword for lambda, I have a minor and major objection:

The minor objection is that I think that λ is too useful a one-letter
symbol to waste on a comparatively rare usage, anonymous functions. In
mathematical code, I would prefer to keep λ for wavelength, or for the
radioactive decay constant, rather than for anonymous functions.

The major objection is that I think its still too hard to expect the
average programmer to be able to produce the λ symbol on demand. We
don't all have a Greek keyboard :-)

I *don't* think that expecting programmers to learn λ is too difficult.
It's no more difficult than the word "lambda", or that | means bitwise
OR. Or for that matter, that * means multiplication. Yes, I've seen
beginners stumped by that. (Sometimes we forget that * is not something
you learn in maths class.)

So overall, I'm a -1 on this specific proposal.

[1] Non-Australians will probably recognise similar terms hoser,
redneck, chav, gopnik, etc.

--
Steve

Pavol Lisy

unread,

Jul 13, 2016, 12:51:06 AM7/13/16

to Ben Finney, python...@python.org

On 7/13/16, Ben Finney <ben+p...@benfinney.id.au> wrote:
> Stephan Houben <steph...@gmail.com>

> writes:

>> * The resulting code resembles more closely mathematical notation (in
>> particular, lambda-calculus notation), so it brings Python closer to
>> being "executable pseudo-code".
>
> How is that an advantage?

Could help promote python.

Pavol Lisy

unread,

Jul 13, 2016, 1:44:04 AM7/13/16

to Stephan Houben, python...@python.org

On 7/12/16, Stephan Houben <steph...@gmail.com> wrote:
> Hi list,
>
> Here is my speculative language idea for Python:
>
> Allow the following alternative spelling of the keyword `lambda':
>
> λ
>
> (That is "Unicode Character 'GREEK SMALL LETTER LAMDA' (U+03BB).")
>
> Background:
>
> I have been using the Vim "conceal" functionality with a rule which
> visually
> replaces lambda with λ when editing Python files. I find this a great
> improvement in
> readability since λ is visually less distracting while still quite
> distinctive.
> (The fact that λ is syntax-colored as a keyword also helps with this.)
>
> However, at the moment the nice syntax is lost when looking at the file
> through another editor or viewer.
> Therefore I would really like this to be an official part of the Python
> syntax.

1.
What is future of coding?

I feel it is not only language what could translate your ideas into reality.

Artificial intelligence in (future) editors (and also vim conceal) is
probably right way to enhance your coding power (with lambdas too).

2.
If we like to enhance python syntax with Unicode characters then I
think it is good to see larger context. There is ocean of
possibilities how to make it. (probably good possibilities too). For
example Unicode could help to add new operators. But it also brings a
lot of questions (how to write Knuth's arrow on my editor?) and
difficulties (how to give possibilities to implement class functions
for this (*) new operators? How to make for example triple arrow
possible?).

I propose to be prepared before opening Pandora's box. :)

(*) all of it? And it means probably all after future enhancement of
Unicode too?

3.
Questions around "only one possibilities how to write it" could be
probably answered with this?

a<b
a.__lt__(b)

Chris Angelico

unread,

Jul 13, 2016, 1:49:59 AM7/13/16

to python-ideas

On Wed, Jul 13, 2016 at 3:43 PM, Pavol Lisy <pavol...@gmail.com> wrote:
> 3.
> Questions around "only one possibilities how to write it" could be
> probably answered with this?
>
> a<b
> a.__lt__(b)

Those aren't the same, though. One is the interface, the other is the
implementation. Dunder methods are for defining, not for calling.
Also:

rosuav@sikorsky:~$ python3
Python 3.6.0a2+ (default:4ef2404d343e, Jul 11 2016, 12:37:20)
[GCC 5.3.1 20160528] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class B:
... def __gt__(self, other):
... print("Am I greater than %s?" % other)
... return False
...
>>> a = 5
>>> b = B()
>>> a < b
Am I greater than 5?
False

Operators can have multiple implementations (in this case, the
interpreter found that "int < B" didn't have an implementation, so it
switched it to b > a and re-evaluated).

ChrisA

Ben Finney

unread,

Jul 13, 2016, 1:53:18 AM7/13/16

to python...@python.org

Pavol Lisy <pavol...@gmail.com> writes:

> Questions around "only one possibilities how to write it" could be
> probably answered with this?
>
> a<b
> a.__lt__(b)

The maxim is not “only one way”. That is a common misconception, but it
is easily dispelled: read the Zen of Python (by ‘import this’ in the
interactive prompt).

Rather, the maxim is “There should be one obvious way to do it”, with a
parenthetical “and preferably only one”.

So the emphasis is on the way being *obvious*, and all other ways being
non-obvious. This leads, of course, to choosing the best way to also be
the one obvious way to do it.

Your example above supports this: the comparison ‘a < b’ is the one
obvious way to compare whether ‘a’ is less than ‘b’.

--
\ “It is forbidden to steal hotel towels. Please if you are not |
`\ person to do such is please not to read notice.” —hotel, |
_o__) Kowloon, Hong Kong |
Ben Finney

Pavol Lisy

unread,

Jul 13, 2016, 2:57:42 AM7/13/16

to Ben Finney, python...@python.org

On 7/13/16, Ben Finney <ben+p...@benfinney.id.au> wrote:

I don't support this lambda proposal (in this moment - but probably
somebody could convince me).

But if we will accept it then Unicode version could be the obvious one
couldn't be?

Matt Gilson

unread,

Jul 13, 2016, 3:31:11 AM7/13/16

to Pavol Lisy, Ben Finney, Python-Ideas

I don't either, but I'm glad it was brought up regardless...

But if we will accept it then Unicode version could be the obvious one
couldn't be?

I certainly hope not. As a user with no lambda key on my keyboard, the only way that I know to input one is to do a google search for "unicode greek letter lambda" and copy/paste one of those characters into my editor :-).

FWIW, I think that the cons here far outweigh the pros -- As a former physicist, when I see a lambda character, I immediately think of a whole host of things (wavelength!) and none of them is "anonymous function". Perhaps someone who works more with lambda calculus (or with a more rigorous comp-sci background) would disagree -- but my point is that this notation would possibly only serve a small community -- and it could possibly break code for another small group of users who are using lambdas as variable names to be clever (which is another practice that I wouldn't support...) and finally, I think it might just be confusing for other people (1-character non-ascii keywords? If my editor's syntax definition wasn't up-to-date, I'd definitely expect a `SyntaxError` from that).

All of that aside, it seems like the pros that the original poster mentioned could be gained by writing a plugin for your editor that makes the swap on save and load. Apparently this already exists for some editors -- Why risk breaking existing code to add a syntax that can be handled by an editor extension?

_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

--

Matt Gilson // SOFTWARE ENGINEER

E: ma...@getpattern.com // P: 603.892.7736

We’re looking for beta testers. Go here to sign up!

Paul Moore

unread,

Jul 13, 2016, 4:54:43 AM7/13/16

to Steven D'Aprano, Python-Ideas

On 13 July 2016 at 05:00, Steven D'Aprano <st...@pearwood.info> wrote:
> Not too many people like Cobol-like spelling:
>
> add 1 to the_number
>
> over "n += 1". So I think that arguments for keeping "traditional
> spelling" are mostly about familiarity. If we learned lambda calculus in
> high school, perhaps λ would be less exotic.

It's probably also relevant in this context that more "modern"
languages tend to avoid the term lambda but embrace "anonymous
functions" with syntax such as

(x, y) -> x+y

or whatever.

So while "better syntax for lambda expressions" is potentially a
reasonable goal, I don't think that perpetuating the concept/name
"lambda" is necessary or valuable.

Paul

Sven R. Kunze

unread,

Jul 13, 2016, 6:44:42 AM7/13/16

to python...@python.org

On 13.07.2016 10:51, Paul Moore wrote:
> On 13 July 2016 at 05:00, Steven D'Aprano <st...@pearwood.info> wrote:
>> Not too many people like Cobol-like spelling:
>>
>> add 1 to the_number
>>
>> over "n += 1". So I think that arguments for keeping "traditional
>> spelling" are mostly about familiarity. If we learned lambda calculus in
>> high school, perhaps λ would be less exotic.
> It's probably also relevant in this context that more "modern"
> languages tend to avoid the term lambda but embrace "anonymous
> functions" with syntax such as
>
> (x, y) -> x+y
>
> or whatever.
>
> So while "better syntax for lambda expressions" is potentially a
> reasonable goal, I don't think that perpetuating the concept/name
> "lambda" is necessary or valuable.

Exactly, there's not much value in having yet another way of writing
'lambda:'. Keeping other languages in mind and the conservative stance
Python usually takes, the arrow ('=>') would be the only valid
alternative for a "better syntax for lambda expressions". However, IIRC,
this has been debated and won't happen.

Personally, I have other associations with λ. Thus, I would rather see
it as a variable name in such contexts.

David Mertz

unread,

Jul 13, 2016, 12:43:39 PM7/13/16

to Chris Angelico, python-ideas

I use the vim conceal plugin myself too. It's whimsical, but I like the appearance of it. So I get the sentiment of the original poster. But in my conceal configuration, I substitute a bunch of characters visually (if the attachment works, and screenshot example of some, but not all will be in this message). And honestly, having my text editor make the substitution is exactly what I want.

If anyone really wanted, a very simple preprocessor (really just a few lines of sed, or a few str.replace() calls) could transform your *.py+ files into *.py by simple string substitution. There's no need to change the inherent syntax. This would not be nearly as big a change as a superset language like Coconut (which looks interesting), it's just 1-to-1 correspondence between strings.

Moreover, even if special characters *were* part of Python, I'd probably want my text editor to provide me shortcuts or aliases. I have no idea how to enter the Unicode GREEK LUNATE EPSILON SYMBOL directly from my keyboard. It would be much more practical for me to type '\epsilon' (a lá LaTeX), or really just 'in', and let my editor alias/substitute that for me. Likewise, to get a GREEK SMALL LETTER LAMDA, a very nice editor shortcut would be 'lambda'. Same goes if someone make a py+-preprocessor that takes the various special symbols—I'd still want mnemonics that are more easily available on my keyboard.

On Tue, Jul 12, 2016 at 8:25 PM, Chris Angelico <ros...@gmail.com> wrote:

A fair point. But Python has a strong mathematical side (look how big
the numpy/scipy/matplotlib communities are), and we've already seen
how strongly they prefer "a @ b" to "a.matmul(b)". If there's support
for a language variant that uses more and shorter symbols, that would
be where I'd expect to find it.

--

Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons. Intellectual property is
to the 21st century what the slave trade was to the 16th.

vim-screen.jpg

Stephen J. Turnbull

unread,

Jul 13, 2016, 2:16:10 PM7/13/16

to Steven D'Aprano, python...@python.org

Steven D'Aprano writes:

> And as for programmers... the popularity of one-liners, the
> obfuscated C competition, code golf, "clever coding tricks" etc is
> rarely for the purposes of communication *about code*.

Sure, but *Python* is popular because it's easy to communicate *with*
and (usually) *about* Python code, and it does pretty well on "terse"
for many algorithmic idioms. (Yes, there are other reasons --
reasonable performance, batteries included, etc. That doesn't make
the design of the language *not* a reason for its popularity.)

You seem to be understanding my statements to be much more general
than they are. I'm only suggesting that this applies to Python as we
know and love it, and to Pythonic tradition.

> The major objection is that I think its still too hard to expect the
> average programmer to be able to produce the λ symbol on demand. We
> don't all have a Greek keyboard :-)

So what? If you run Mac OS X, Windows, or X11, you do have a keyboard
capable of producing Greek. And the same chords work in any Unicode-
capable editor, it's just that the Greek letters aren't printed on the
keycaps. Neither are emoticons, nor the CUA gestures (bucky-X[1],
bucky-C, bucky-V, and the oh-so-useful bucky-Z) but those are
everywhere. Any 10-year-old can find them somehow! To the extent
that Python would consider such changes (ie, a half-dozen or so
one-character replacements for multicharacter operators or keywords),
it would be very nearly as learnable to type them as to read them.

The problem (if it exists, of course -- obviously, I believe it does
but YMMV) is all about overloading people's ability to perceive the
meaning of code without reading it token by token.

Footnotes:
[1] Bucky = Control, Alt, Meta, Command, Option, Windows, etc. keys.

Paul Moore

unread,

Jul 13, 2016, 3:21:19 PM7/13/16

to Stephen J. Turnbull, Python-Ideas

On 13 July 2016 at 19:15, Stephen J. Turnbull <ste...@xemacs.org> wrote:
> So what? If you run Mac OS X, Windows, or X11, you do have a keyboard
> capable of producing Greek. And the same chords work in any Unicode-
> capable editor, it's just that the Greek letters aren't printed on the
> keycaps. Neither are emoticons, nor the CUA gestures (bucky-X[1],
> bucky-C, bucky-V, and the oh-so-useful bucky-Z) but those are
> everywhere. Any 10-year-old can find them somehow!

Um, as someone significantly older than 10 years old, I don't know how
to type a lambda character on my Windows UK keyboard... Note that
memorising the Unicode code point, and doing the weird numpad "enter a
numeric character code" trick that I can never remember how to do,
doesn't count as a realistic option...

If it were that easy to type arbitrary characters, why would Vim have
the "digraph" facility to insert Unicode characters using 2-character
abbreviations?

Paul

Alexander Belopolsky

unread,

Jul 13, 2016, 4:42:49 PM7/13/16

to Paul Moore, Stephen J. Turnbull, Python-Ideas

On Wed, Jul 13, 2016 at 2:20 PM, Paul Moore <p.f....@gmail.com> wrote:

Um, as someone significantly older than 10 years old, I don't know how
to type a lambda character on my Windows UK keyboard...

FWIW, in IPython/Jupyter notebooks one can type \lambda followed by a tab to get the λ character. The difficulty of typing is a red herring. Once it is a part of the language, every editor targeted at a Python programmer will provide the means to type λ with fewer than 6 keystrokes (6 is the number of keystrokes needed to type "lambda" without autocompletion.) The unfamiliarity is also not an issue. I am yet to meet a Python programer who knows what the keyword "lambda" is but does not know how the namesake Greek character looks.

I am +0 on this feature. I would be +1 if λ was not already a valid identifier.

This is one of those features that can easily be ignored by someone who does not need it, but can significantly improve the experience of those who do.

Alan Cristhian

unread,

Jul 13, 2016, 4:57:27 PM7/13/16

to Alexander Belopolsky, Stephen J. Turnbull, Python-Ideas

Another identifier could be "/\" that looks like the uppercase lambda.

John Wong

unread,

Jul 13, 2016, 5:13:38 PM7/13/16

to Alexander Belopolsky, Stephen J. Turnbull, Python-Ideas

I -1 on this feature. Sorry to be blunt. Are we going to add omega, delta, psilon and the entire Greek alphabet? There should be one and only one way to write code in Python as far as a valid identifier is concerned. Is there an existing exception? I am not saying the experiences of others do not matter, but we should take a step back and see does this actually make sense?

Also, how do you exactly enter this character for someone who doesn't really enter unicode character except for the ASCII alphanumeric characters on a daily basis? How many users can we retain and even convert with this approach? Is it really worth the go?

Thanks.

John

Alexander Belopolsky

unread,

Jul 13, 2016, 9:29:23 PM7/13/16

to John Wong, Stephen J. Turnbull, Python-Ideas

> On Jul 13, 2016, at 4:12 PM, John Wong <gokop...@gmail.com> wrote:
>
> Sorry to be blunt. Are we going to add omega, delta, psilon and the entire Greek alphabet?

Breaking news: the entire Greek alphabet is already available for use in Python. If someone wants to write code that looks like a series of missing character boxes on your screen she already can.

PS: There is no letter called "psilon" in the Greek alphabet or anywhere else in the Unicode.

Ethan Furman

unread,

Jul 13, 2016, 9:58:06 PM7/13/16

to python...@python.org

On 07/13/2016 06:28 PM, Alexander Belopolsky wrote:

>> On Jul 13, 2016, at 4:12 PM, John Wong wrote:

>> Sorry to be blunt. Are we going to add omega, delta, psilon and the entire Greek alphabet?
>
> Breaking news: the entire Greek alphabet is already available for use in Python.

Indeed. We can all use any letters in unicode for our identifiers.

Does Python currently use any one-letter keywords? So why start now?

--
~Ethan~

Steven D'Aprano

unread,

Jul 13, 2016, 10:45:19 PM7/13/16

to python...@python.org

On Wed, Jul 13, 2016 at 03:42:01PM -0500, Alexander Belopolsky wrote:
> On Wed, Jul 13, 2016 at 2:20 PM, Paul Moore <p.f....@gmail.com> wrote:
>
> >
> > Um, as someone significantly older than 10 years old, I don't know how
> > to type a lambda character on my Windows UK keyboard...
>
>
> FWIW, in IPython/Jupyter notebooks one can type \lambda followed by a tab
> to get the λ character. The difficulty of typing is a red herring. Once
> it is a part of the language, every editor targeted at a Python programmer
> will provide the means to type λ

Great for those using an editor targeted at Python programmers, but most
editors are more general than that. Which means that programmers will
find themselves split into two camps: those who can easily type λ, and
those that cannot.

In the 1980s and 90s, I was a Macintosh user, and one nice feature of
the Macs at the time was the ease of typing non-ASCII characters. (Of
course there were a lot fewer back then: MacRoman is an 8-bit extention
to ASCII, compared to Unicode with its thousands of code points.)
Consequently I've used an Apple-specific language that included
operators like ≠ ≤ ≥ and it is *really nice*.

But Apple has the advantage of controlling the entire platform and they
could ensure that these characters could be input from any application
on any machine using exactly the same key sequence. (By memory, it was
option-= to get ≠.) We don't have that advantage, and frankly I think
you are underestimating the *practical* difficulties for input.

I recently discovered (by accident!) the Linux compose key. So now I
know how to enter µ at the keyboard: COMPOSE mu does the job. So maybe
COMPOSE lambda works? Nope. How about COMPOSE l or shift-l or ll or la
or yy (its an upside down y, right, and COMPOSE ee gives ə)?

No, none of these things work on my system. They may work on your system: since
discovering COMPOSE, I keep coming across people who state "oh, its easy
to type such-and-such a character, just type COMPOSE key-sequence, its
standard and will work on EVERY LINUX SYSTEM EVERYWHERE". Not a chance.
The key bindings for COMPOSE are anything but standard.

And COMPOSE is *really* hard to use well: it gives no feedback if you
make a mistake except to silently ignore your keypresses (or insert the
wrong character). So invariably, every time I want to enter a non-ASCII
character, it takes me out of "thinking about code" into "thinking about
how to enter characters", sometimes for minutes at a time as I hunt for
the character in "Character Map" or google for it on the Internet.

It may be reasonable to argue that code is read more than it is written:

- suppose that reading λ has a *tiny* benefit of 1% over "lambda"
(for those who have learned what it means);
- but typing it is (lets say) 50 times harder than typing "lambda";
- but we read code 50 times as often as we type it;
- so the total benefit (50*1.01 - 50) is positive.

Invent your own numbers, and you'll come up with your own results. I
don't think there's any *objective* way to decide this question. And
that's why I don't think that Python should take this step: let other
languages experiment with non-ASCII keywords first, or let people
experiment with translators that transform ≠ into != and λ into lambda.

--
Steve

Alexander Belopolsky

unread,

Jul 13, 2016, 11:18:03 PM7/13/16

to Ethan Furman, python...@python.org

> On Jul 13, 2016, at 8:57 PM, Ethan Furman <et...@stoneleaf.us> wrote:
>
> Does Python currently use any one-letter keywords?

No.

> So why start now?

There is no slippery slope or a hidden agenda in this proposal. The lambda keyword is unique in the Python language. It is the only keyword that does not have an obvious meaning as an English word or an abbreviation of such. There is no other keyword that literally is a name of a letter. (Luckily, iota and rho are spelled range and len in Python and those are not keywords anyways.)

Arguably, it would be more Pythonic to spell an anonymous function creation keyword as "function" or "fun" ("func" does not smell right), but Python ended up with an English name for a Greek letter. That was before Unicode became ubiquitous. I don't see much harm in allowing a proper spelling for it now.

Alexander Belopolsky

unread,

Jul 13, 2016, 11:32:41 PM7/13/16

to Steven D'Aprano, python...@python.org

> On Jul 13, 2016, at 9:44 PM, Steven D'Aprano <st...@pearwood.info> wrote:
>
> Which means that programmers will
> find themselves split into two camps: those who can easily type λ, and
> those that cannot.

We already have two camps: those who don't mind using "lambda" and those who would only use "def."

I would expect that those who will benefit most are people who routinely write expressions that involve a lambda that returns a lambda that returns a lambda. There is a niche for such programming style and using λ instead of lambda will improve the readability of such programs for those who can understand them in the current form.

For the "def" camp, the possibility of a non-ascii spelling will serve as yet another argument to avoid using anonymous functions.

Alexander Belopolsky

unread,

Jul 14, 2016, 12:02:03 AM7/14/16

to Steven D'Aprano, python...@python.org

> On Jul 13, 2016, at 9:44 PM, Steven D'Aprano <st...@pearwood.info> wrote:
>

> I think
> you are underestimating the *practical* difficulties for input.

I appreciate those difficulties (I am typing this on an iPhone), but I think they are irrelevant. I can imagine 3 scenarios:

1. (The 99% case) You will never see λ in the code and never write it yourself. You can be happily unaware of this feature.

2. You see λ occasionally, but don't like it. You continue using spelled out "lambda" (or just use "def") in the code that you write.

3. You work on a project where local coding style mandates that lambda is spelled λ. In this case, there will be plenty of places in the code base to copy and paste λ from. (In the worst case you copy and paste it from the coding style manual.) More likely, however, the project that requires λ would have a precommit hook that translates lambda to λ in all new code and you can continue using the 6-character keyword in your input.

Ben Finney

unread,

Jul 14, 2016, 12:21:16 AM7/14/16

to python...@python.org

Alexander Belopolsky
<alexander....@gmail.com> writes:

> We already have two camps: those who don't mind using "lambda" and
> those who would only use "def."

I don't know anyone in the latter camp, do you?

I am in the camp that loves ‘lambda’ for some narrowly-specified
purposes *and* thinks ‘def’ is generally a better tool.

--
\ “… correct code is great, code that crashes could use |
`\ improvement, but incorrect code that doesn’t crash is a |
_o__) horrible nightmare.” —Chris Smith, 2008-08-22 |
Ben Finney

João Santos

unread,

Jul 14, 2016, 2:14:18 AM7/14/16

to Alexander Belopolsky, python...@python.org

3. You work on a project where local coding style mandates that lambda is spelled λ. In this case, there will be plenty of places in the code base to copy and paste λ from. (In the worst case you copy and paste it from the coding style manual.) More likely, however, the project that requires λ would have a precommit hook that translates lambda to λ in all new code and you can continue using the 6-character keyword in your input.

That this your solution makes the proposoal a -100 and if you're going to implement a precommit hook to convert lambda to λ you might has well implement a custom coding that supports λ (like pyxl does for html\xml tags). Python isn't and shouldn't be APL.

David Mertz

unread,

Jul 14, 2016, 2:40:16 AM7/14/16

to Steven D'Aprano, python-ideas

On Wed, Jul 13, 2016 at 7:44 PM, Steven D'Aprano <st...@pearwood.info> wrote:

- suppose that reading λ has a *tiny* benefit of 1% over "lambda"
(for those who have learned what it means);
- but typing it is (lets say) 50 times harder than typing "lambda";
- but we read code 50 times as often as we type it;
- so the total benefit (50*1.01 - 50) is positive.

I actually *do* think λ is a little bit more readable. And I have no idea how to type it directly on my El Capitan system with the ABC Extended keyboard. But I still get 100% of the benefit in readability simply by using vim's conceal feature. If I used a different editor I'd have to hope for a similar feature (or program it myself), but this is purely a display question. Similarly, I think syntax highlighting makes my code much more readable, but I don't want colors for keywords built into the language. That is, and should remain, a matter of tooling not core language (I don't want https://en.wikipedia.org/wiki/ColorForth for Python).

FWIW, my conceal configuration is at link I give in a moment. I've customized a bunch of special stuff besides lambda, take it or leave it:

http://gnosis.cx/bin/.vim/after/syntax/python.vim

Random832

unread,

Jul 14, 2016, 9:11:30 AM7/14/16

to python...@python.org

On Tue, Jul 12, 2016, at 09:42, Random832 wrote:
> On Tue, Jul 12, 2016, at 08:38, Stephan Houben wrote:
> > I know people have been clamoring for shorter lambda-syntax in the
> > past, I think this is a nice minimal extension.
>
> How about a haskell-style backslash?

Nobody has any thoughts at all?

SW

unread,

Jul 14, 2016, 9:22:57 AM7/14/16

to python...@python.org

To me the backslash already has a fairly strong association with "the
next character is a literal". Overloading it would feel very strange.

S

Chris Angelico

unread,

Jul 14, 2016, 9:28:24 AM7/14/16

to python-ideas

On Thu, Jul 14, 2016 at 11:22 PM, SW <walk...@hotmail.co.uk> wrote:
> To me the backslash already has a fairly strong association with "the
> next character is a literal". Overloading it would feel very strange.

But it also has the meaning of "the next character is special", such
as \n for newline or \uNNNN for a Unicode escape. However, I suspect
there might be a parsing conflict:

do_stuff(stuff_with_long_name, more_stuff, what_is_next_arg, \

At that point in the parsing, are you looking at a lambda function or
a line continuation? Sure, style guides would decry this (put the
backslash with its function, dummy!), but the parser can't depend on
style guides being followed.

-1 on using backslash for this.
-0 on λ.

ChrisA

SW

unread,

Jul 14, 2016, 9:49:54 AM7/14/16

to python...@python.org

Ah, yes, sorry- it certainly holds that meaning to me as well. I agree
with your stated views on this (and ratings):

-1 on using backslash for this.
-0 on λ.

Thanks,
S

Cody Piersall

unread,

Jul 14, 2016, 9:54:38 AM7/14/16

to python-ideas

On Tue, Jul 12, 2016 at 7:38 AM, Stephan Houben <steph...@gmail.com> wrote:
> Hi list,
>
> Here is my speculative language idea for Python:
>
> Allow the following alternative spelling of the keyword `lambda':
>
> λ
>
> (That is "Unicode Character 'GREEK SMALL LETTER LAMDA' (U+03BB).")

Just to be a small data point, I have written code that uses λ as a
variable name (as someone mentioned elsewhere in the thread, Jupyter
Notebook makes typing Greek characters easy). Because this would
break code that I have written, and I suspect it would break other
code as well, I am -1 on the proposal. How selfish of me!

Cody

Sven R. Kunze

unread,

Jul 14, 2016, 4:20:16 PM7/14/16

to python...@python.org

On 14.07.2016 08:39, David Mertz wrote:

On Wed, Jul 13, 2016 at 7:44 PM, Steven D'Aprano <st...@pearwood.info> wrote:

- suppose that reading λ has a *tiny* benefit of 1% over "lambda"
(for those who have learned what it means);
- but typing it is (lets say) 50 times harder than typing "lambda";
- but we read code 50 times as often as we type it;
- so the total benefit (50*1.01 - 50) is positive.

I actually *do* think λ is a little bit more readable. And I have no idea how to type it directly on my El Capitan system with the ABC Extended keyboard. But I still get 100% of the benefit in readability simply by using vim's conceal feature. If I used a different editor I'd have to hope for a similar feature (or program it myself), but this is purely a display question. Similarly, I think syntax highlighting makes my code much more readable, but I don't want colors for keywords built into the language. That is, and should remain, a matter of tooling not core language (I don't want https://en.wikipedia.org/wiki/ColorForth for Python).

Very good point. That now is basically the core argument against it at least for me. So, -100 on the proposal from me. :)

FWIW, my conceal configuration is at link I give in a moment. I've customized a bunch of special stuff besides lambda, take it or leave it:

http://gnosis.cx/bin/.vim/after/syntax/python.vim

Nice thing. This could help those using lambda a lot (whatever reason they might have to do so). I will redirect it to somebody relying on vim heavily for Python development. Thanks. :)

--

Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons. Intellectual property is
to the 21st century what the slave trade was to the 16th.

John Wong

unread,

Jul 14, 2016, 6:13:55 PM7/14/16

to Sven R. Kunze, python...@python.org

On Thu, Jul 14, 2016 at 4:19 PM, Sven R. Kunze <srk...@mail.de> wrote:

On 14.07.2016 08:39, David Mertz wrote:

.....

That is, and should remain, a matter of tooling not core language (I don't want https://en.wikipedia.org/wiki/ColorForth for Python).

Very good point. That now is basically the core argument against it at least for me. So, -100 on the proposal from me. :)

+1. Editor's job, neither CPython's interpreter nor core grammar.

On Wed, Jul 13, 2016 at 9:28 PM, Alexander Belopolsky <alexander....@gmail.com> wrote:

> On Jul 13, 2016, at 4:12 PM, John Wong <gokop...@gmail.com> wrote:
>
> Sorry to be blunt. Are we going to add omega, delta, psilon and the entire Greek alphabet?

Breaking news: the entire Greek alphabet is already available for use in Python. If someone wants to write code that looks like a series of missing character boxes on your screen she already can.

You misunderstood my point. I was referring to identifier, what the proposal is asking. Of course unicode is availalble, people always argue about unicode, who wouldn't know Python doesn't support unicode.

Why should I write pi in two English characters instead of typing π? Python is so popular among the science community, so shouldn't we add that as well? Excerpt from the question on http://programmers.stackexchange.com/questions/16010/is-it-bad-to-use-unicode-characters-in-variable-names:

t = (µw-µl)/c  # those are used in
e = ε/c        # multiple places.
σw_new = (σw**2 * (1 - (σw**2)/(c**2)*Wwin(t, e)) + γ**2)**.5

If we were to vote on popularity, we'd look at comparing the size of functional programmers vs scientists. Not saying functional programmers don't matter (already stressed this in my previous comment), but this is more like an editor plugin. I personally would love to see such in VIM.

Thanks.

John

Paul Moore

unread,

Jul 14, 2016, 6:28:04 PM7/14/16

to John Wong, python...@python.org

On 14 July 2016 at 23:13, John Wong <gokop...@gmail.com> wrote:
> Why should I write pi in two English characters instead of typing π? Python
> is so popular among the science community, so shouldn't we add that as well?
> Excerpt from the question on
> http://programmers.stackexchange.com/questions/16010/is-it-bad-to-use-unicode-characters-in-variable-names:
>
> t = (µw-µl)/c # those are used in
> e = ε/c # multiple places.
> σw_new = (σw**2 * (1 - (σw**2)/(c**2)*Wwin(t, e)) + γ**2)**.5

I'm not sure what you're saying here. You do realise that the above is
perfectly valid Python 3? The SO question you quote is referring to
the fact that identifiers are restricted to (Unicode) *letters* and
that symbol characters can't be used as variable names.

All of which is tangential to the question here which is about using
Unicode in a *keyword*.

Paul

Giampaolo Rodola'

unread,

Jul 14, 2016, 7:20:03 PM7/14/16

to Paul Moore, python...@python.org

On Fri, Jul 15, 2016 at 12:27 AM, Paul Moore <p.f....@gmail.com> wrote:

On 14 July 2016 at 23:13, John Wong <gokop...@gmail.com> wrote:
> Why should I write pi in two English characters instead of typing π? Python
> is so popular among the science community, so shouldn't we add that as well?
> Excerpt from the question on
> http://programmers.stackexchange.com/questions/16010/is-it-bad-to-use-unicode-characters-in-variable-names:
>
> t = (µw-µl)/c # those are used in
> e = ε/c # multiple places.
> σw_new = (σw**2 * (1 - (σw**2)/(c**2)*Wwin(t, e)) + γ**2)**.5

I'm not sure what you're saying here. You do realise that the above is
perfectly valid Python 3? The SO question you quote is referring to
the fact that identifiers are restricted to (Unicode) *letters* and
that symbol characters can't be used as variable names.

All of which is tangential to the question here which is about using
Unicode in a *keyword*.

I would personally feel bad about using non-ASCII or even non-english variable names in code. Heck, I feel so bad about non-ASCII in code that I even mispell the à in my last name (Rodolà) and type a' instead. Extending that to a keyword sounds even worse.

When Python 3 was cooking I remember there were debates on whether removing "lambda". It stayed, and I'm glad it did, but IMO that should tell it's not important enough to deserve the breakage of a rule which has never been broken (non-ASCII for a keyword).

--

Giampaolo - http://grodola.blogspot.com

Nick Coghlan

unread,

Jul 15, 2016, 5:01:54 AM7/15/16

to Giampaolo Rodola', python...@python.org

On 15 July 2016 at 09:18, Giampaolo Rodola' <g.ro...@gmail.com> wrote:
> On Fri, Jul 15, 2016 at 12:27 AM, Paul Moore <p.f....@gmail.com> wrote:
>> All of which is tangential to the question here which is about using
>> Unicode in a *keyword*.
>
>
> I would personally feel bad about using non-ASCII or even non-english
> variable names in code. Heck, I feel so bad about non-ASCII in code that I
> even mispell the à in my last name (Rodolà) and type a' instead. Extending
> that to a keyword sounds even worse.

Unicode-as-identifier makes a lot of sense in situations where you
have a data-driven API (like a pandas dataframe or
collections.namedtuple) and the data you're working with contains
Unicode characters. Hence my choice of example in
http://developerblog.redhat.com/2014/09/09/transition-to-multilingual-programming-python/
- it's easy to imagine cases where the named tuple attributes are
coming from a data source like headers in a CSV file, and in
situations like that, folks shouldn't be forced into awkward
workarounds just because their data contains non-ASCII characters.

> When Python 3 was cooking I remember there were debates on whether removing
> "lambda". It stayed, and I'm glad it did, but IMO that should tell it's not
> important enough to deserve the breakage of a rule which has never been
> broken (non-ASCII for a keyword).

This I largely agree with, though. The *one* argument for improvement
I see potentially working is the one I advanced back in March when I
suggested that adding support for Java's lambda syntax might be worth
doing: https://mail.python.org/pipermail/python-ideas/2016-March/038649.html

However, any proposals along those lines need to be couched in terms
of how they will advance the Python ecosystem as a whole, rather than
"I like using lambda expressions in my code, but I don't like the
'lambda' keyword", as we have a couple of decades worth of evidence
informing us that the latter isn't sufficient justification for
change.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Rustom Mody

unread,

Jul 17, 2016, 11:41:42 PM7/17/16

to python-ideas, g.ro...@gmail.com, python...@python.org, ncog...@gmail.com

On Friday, July 15, 2016 at 2:31:54 PM UTC+5:30, Nick Coghlan wrote:

However, any proposals along those lines need to be couched in terms
of how they will advance the Python ecosystem as a whole, rather than
"I like using lambda expressions in my code, but I don't like the
'lambda' keyword", as we have a couple of decades worth of evidence
informing us that the latter isn't sufficient justification for
change.

As to the importance of lambdas, on a more conceptual level, most people understand that λ-calculus is theoretically important.
A currently running discussion that may indicate that this is true and
pragmatic/software engineering levels also
http://degoes.net/articles/destroy-all-ifs

At the other end of the spectrum on notational/lexical question…

On Wednesday, July 13, 2016 at 10:13:39 PM UTC+5:30, David Mertz wrote:

I use the vim conceal plugin myself too. It's whimsical, but I like the appearance of it. So I get the sentiment of the original poster. But in my conceal configuration, I substitute a bunch of characters visually (if the attachment works, and screenshot example of some, but not all will be in this message). And honestly, having my text editor make the substitution is exactly what I want.

which I find very pretty!

More in the same direction: http://blog.languager.org/2014/04/unicoded-python.html
Not of course to be taken too literally but rather that the post-ASCII world is any-which-how going that direction

As for

Nick Coghlan wrote:

Unicode-as-identifier makes a lot of sense in situations

Do consider:

>>> Α = 1
>>> A = 2
>>> Α + 1 == A
True
>>>

Can (IMHO) go all the way to https://en.wikipedia.org/wiki/IDN_homograph_attack

Discussion on python list at
https://mail.python.org/pipermail/python-list/2016-April/706544.html

Rustom Mody

unread,

Jul 18, 2016, 12:20:20 AM7/18/16

to python-ideas, python...@python.org, ben+p...@benfinney.id.au

On Thursday, July 14, 2016 at 9:51:16 AM UTC+5:30, Ben Finney wrote:

Alexander Belopolsky
<alexander....@gmail.com> writes:
> We already have two camps: those who don't mind using "lambda" and
> those who would only use "def."
I don't know anyone in the latter camp, do you?
I am in the camp that loves ‘lambda’ for some narrowly-specified
purposes *and* thinks ‘def’ is generally a better tool.

I suspect the two major camps are those who consider CS to be a branch of math and those who dont
Those who dont typically have a strong resistance to the facts of history:
http://blog.languager.org/2015/03/cs-history-0.html

Nick Coghlan

unread,

Jul 19, 2016, 12:50:29 AM7/19/16

to Rustom Mody, python-ideas, python...@python.org

On 18 July 2016 at 13:41, Rustom Mody <rusto...@gmail.com> wrote:
> Do consider:
>
>>>> Α = 1
>>>> A = 2
>>>> Α + 1 == A
> True
>>>>
>
> Can (IMHO) go all the way to
> https://en.wikipedia.org/wiki/IDN_homograph_attack

Yes, we know - that dramatic increase in the attack surface is why
PyPI is still ASCII only, even though full Unicode support is
theoretically possible.

It's not a major concern once an attacker already has you running
arbitrary code on your system though, as the main problem there is
that they're *running arbitrary code on your system*. , That means the
usability gains easily outweigh the increased obfuscation potential,
as worrying about confusable attacks at that point is like worrying
about a dripping tap upstairs when the Brisbane River is already
flowing through the ground floor of your house :)

Rustom Mody

unread,

Jul 19, 2016, 1:29:35 AM7/19/16

to python-ideas, rusto...@gmail.com, python...@python.org, ncog...@gmail.com

On Tuesday, July 19, 2016 at 10:20:29 AM UTC+5:30, Nick Coghlan wrote:

On 18 July 2016 at 13:41, Rustom Mody <rusto...@gmail.com> wrote:
> Do consider:
>
>>>> Α = 1
>>>> A = 2
>>>> Α + 1 == A
> True
>>>>
>
> Can (IMHO) go all the way to
> https://en.wikipedia.org/wiki/IDN_homograph_attack
Yes, we know - that dramatic increase in the attack surface is why
PyPI is still ASCII only, even though full Unicode support is
theoretically possible.
It's not a major concern once an attacker already has you running
arbitrary code on your system though, as the main problem there is
that they're *running arbitrary code on your system*. , That means the
usability gains easily outweigh the increased obfuscation potential,
as worrying about confusable attacks at that point is like worrying
about a dripping tap upstairs when the Brisbane River is already
flowing through the ground floor of your house :)
Cheers,

There was this question on the python list a few days ago:
Subject: SyntaxError: Non-ASCII character

Chris Angelico pointed out the offending line:
wf = wave.open(“test.wav”, “rb”)
(should be wf = wave.open("test.wav", "rb") instead)

Since he also said:
> The solution may be as simple as running "python3 script.py" rather than "python script.py".

I pointed out that the python2 error was more helpful (to my eyes) than python3s

Python3

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ariston/foo.py", line 31
wf = wave.open(“test.wav”, “rb”)
^
SyntaxError: invalid character in identifier

Python2

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "foo.py", line 31
SyntaxError: Non-ASCII character '\xe2' in file foo.py on line 31, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

IOW
1. The lexer is internally (evidently from the error message) so ASCII-oriented that any “unicode-junk” just defaults out to identifiers (presumably comments are dealt with earlier) and then if that lexing action fails it mistakenly pinpoints a wrong *identifier* rather than just an impermissible character like python 2
combine that with
2. matrix mult (@) Ok to emulate perl but not to go outside ASCII

makes it seem (to me) python's unicode support is somewhat wrongheaded.

Neil Girdhar

unread,

Jul 19, 2016, 3:09:04 AM7/19/16

to python-ideas, rusto...@gmail.com, python...@python.org, ncog...@gmail.com

One solution would be to restrict identifiers to only Unicode characters in appropriate classes. The open quotation mark is in the code class for punctuation, so it doesn't make sense to have it be part of an identifier.

http://www.fileformat.info/info/unicode/category/index.htm

Rustom Mody

unread,

Jul 19, 2016, 3:32:39 AM7/19/16

to python-ideas, rusto...@gmail.com, python...@python.org, ncog...@gmail.com

On Tuesday, July 19, 2016 at 12:39:04 PM UTC+5:30, Neil Girdhar wrote:

One solution would be to restrict identifiers to only Unicode characters in appropriate classes. The open quotation mark is in the code class for punctuation, so it doesn't make sense to have it be part of an identifier.

http://www.fileformat.info/info/unicode/category/index.htm

Python (3) is doing that alright as far as I can see:
https://docs.python.org/3/reference/lexical_analysis.html#identifiers

The point is that when it doesn’t fall in the classification(s) the error it raises suggests that the lexer is not really unicode-aware

Neil Girdhar

unread,

Jul 19, 2016, 6:32:07 AM7/19/16

to python-ideas, rusto...@gmail.com, python...@python.org, ncog...@gmail.com

Sounds like a bug in the lexer? Or maybe a feature request.

Steven D'Aprano

unread,

Jul 19, 2016, 7:21:25 AM7/19/16

to python...@python.org

On Mon, Jul 18, 2016 at 10:29:34PM -0700, Rustom Mody wrote:

> There was this question on the python list a few days ago:
> Subject: SyntaxError: Non-ASCII character

[...]

> I pointed out that the python2 error was more helpful (to my eyes) than
> python3s

And I pointed out how I thought the Python 3 error message could be
improved, but the Python 2 error message was not very good.

> Python3
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/home/ariston/foo.py", line 31
> wf = wave.open(“test.wav”, “rb”)
> ^
> SyntaxError: invalid character in identifier

It would be much more helpful if the caret lined up with the offending
character. Better still, if the offending character was actually stated:

wf = wave.open(“test.wav”, “rb”)
^
SyntaxError: invalid character '“' in identifier

> Python2
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "foo.py", line 31
> SyntaxError: Non-ASCII character '\xe2' in file foo.py on line 31, but no
> encoding declared; see http://python.org/dev/peps/pep-0263/ for details

As I pointed out earlier, this is less helpful. The line itself is not
shown (although the line number is given), nor is the offending
character. (Python 2 can't show the character because it doesn't know
what it is -- it only knows the byte value, not the encoding.) But in
the person's text editor, chances are they will see what looks to them
like a perfectly reasonable character, and have no idea which is the
byte \xe2.

> IOW
> 1. The lexer is internally (evidently from the error message) so
> ASCII-oriented that any “unicode-junk” just defaults out to identifiers
> (presumably comments are dealt with earlier) and then if that lexing action
> fails it mistakenly pinpoints a wrong *identifier* rather than just an
> impermissible character like python 2

You seem to be jumping to a rather large conclusion here. Even if you
are right that the lexer considers all otherwise-unexpected characters
to be part of an identifier, why is that a problem?

I agree that it is mildly misleading to say

invalid character '“' in identifier

when “ is not part of an identifier:

py> '“test'.isidentifier()
False

but I don't think you can jump from that to your conclusion that
Python's unicode support is somewhat "wrongheaded". Surely a much
simpler, less inflammatory response would be to say that this one
specific error message could be improved?

But... is it REALLY so bad? What if we wrote it like this instead:

py> result = my§function(arg)
File "<stdin>", line 1
result = my§function(arg)

^
SyntaxError: invalid character in identifier

Isn't it more reasonable to consider that "my§function" looks like it is
intended as an identifier, but it happens to have an illegal character
in it?

> combine that with
> 2. matrix mult (@) Ok to emulate perl but not to go outside ASCII

How does @ emulate Perl?

As for your second part, about not going outside of ASCII, yes, that is
official policy for Python operators, keywords and builtins.

> makes it seem (to me) python's unicode support is somewhat wrongheaded.

--
Steve

Neil Girdhar

unread,

Jul 19, 2016, 7:36:17 AM7/19/16

to python...@googlegroups.com, python...@python.org

It's a problem because those characters could never be part of an identifier. So it seems like a bug.

--

---
You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/-gsjDSht8VU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-ideas...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rustom Mody

unread,

Jul 19, 2016, 8:05:12 AM7/19/16

to python-ideas, python...@python.org

On Tuesday, July 19, 2016 at 5:06:17 PM UTC+5:30, Neil Girdhar wrote:

On Tue, Jul 19, 2016 at 7:21 AM Steven D'Aprano wrote:
On Mon, Jul 18, 2016 at 10:29:34PM -0700, Rustom Mody wrote:

> IOW
> 1. The lexer is internally (evidently from the error message) so
> ASCII-oriented that any “unicode-junk” just defaults out to identifiers
> (presumably comments are dealt with earlier) and then if that lexing action
> fails it mistakenly pinpoints a wrong *identifier* rather than just an
> impermissible character like python 2

You seem to be jumping to a rather large conclusion here. Even if you
are right that the lexer considers all otherwise-unexpected characters
to be part of an identifier, why is that a problem?

It's a problem because those characters could never be part of an identifier. So it seems like a bug.

An armchair-design solution would say: We should give the most appropriate answer for every possible unicode character category
This would need to take all the Unicode character-categories and Python lexical-categories and 'cross-product' them — a humongous task to little advantage

A more practical solution would be to take the best of the python2 and python3 current approaches:
"Invalid character XX in line YY"
and just reveal nothing about what lexical category — like identifier — python thinks the char is coming in.

The XX is like python2 and the YY like python3
If it can do better than '\xe2' — ie a codepoint — that’s a bonus but not strictly necessary

Neil Girdhar

unread,

Jul 19, 2016, 10:11:38 AM7/19/16

to python...@googlegroups.com, python...@python.org

On Tue, Jul 19, 2016 at 8:18 AM Rustom Mody <rusto...@gmail.com> wrote:

On Tuesday, July 19, 2016 at 5:06:17 PM UTC+5:30, Neil Girdhar wrote:

On Tue, Jul 19, 2016 at 7:21 AM Steven D'Aprano wrote:
On Mon, Jul 18, 2016 at 10:29:34PM -0700, Rustom Mody wrote:

> IOW
> 1. The lexer is internally (evidently from the error message) so
> ASCII-oriented that any “unicode-junk” just defaults out to identifiers
> (presumably comments are dealt with earlier) and then if that lexing action
> fails it mistakenly pinpoints a wrong *identifier* rather than just an
> impermissible character like python 2

You seem to be jumping to a rather large conclusion here. Even if you
are right that the lexer considers all otherwise-unexpected characters
to be part of an identifier, why is that a problem?

It's a problem because those characters could never be part of an identifier. So it seems like a bug.

An armchair-design solution would say: We should give the most appropriate answer for every possible unicode character category
This would need to take all the Unicode character-categories and Python lexical-categories and 'cross-product' them — a humongous task to little advantage

I don't see why this is a "humongous task". Anyway, your solution boils down to the simplest fix in the lexer which is to block some characters from matching any category, does it not?

A more practical solution would be to take the best of the python2 and python3 current approaches:
"Invalid character XX in line YY"
and just reveal nothing about what lexical category — like identifier — python thinks the char is coming in.

The XX is like python2 and the YY like python3
If it can do better than '\xe2' — ie a codepoint — that’s a bonus but not strictly necessary

Rustom Mody

unread,

Jul 19, 2016, 10:40:42 AM7/19/16

to python-ideas, python...@python.org

On Tuesday, July 19, 2016 at 7:41:38 PM UTC+5:30, Neil Girdhar wrote:

On Tue, Jul 19, 2016 at 8:18 AM Rustom Mody wrote:

On Tuesday, July 19, 2016 at 5:06:17 PM UTC+5:30, Neil Girdhar wrote:

On Tue, Jul 19, 2016 at 7:21 AM Steven D'Aprano wrote:
On Mon, Jul 18, 2016 at 10:29:34PM -0700, Rustom Mody wrote:

> IOW
> 1. The lexer is internally (evidently from the error message) so
> ASCII-oriented that any “unicode-junk” just defaults out to identifiers
> (presumably comments are dealt with earlier) and then if that lexing action
> fails it mistakenly pinpoints a wrong *identifier* rather than just an
> impermissible character like python 2

You seem to be jumping to a rather large conclusion here. Even if you
are right that the lexer considers all otherwise-unexpected characters
to be part of an identifier, why is that a problem?

It's a problem because those characters could never be part of an identifier. So it seems like a bug.

An armchair-design solution would say: We should give the most appropriate answer for every possible unicode character category
This would need to take all the Unicode character-categories and Python lexical-categories and 'cross-product' them — a humongous task to little advantage

I don't see why this is a "humongous task". Anyway, your solution boils down to the simplest fix in the lexer which is to block some characters from matching any category, does it not?

Block? Not sure what you mean… Nothing should change (in the simplest solution at least) apart from better error messages
My suggested solution involved this:
Currently the lexer — basically an automaton — reveals which state its in when it throws error involving "identifier"
Suggested change:

if in_ident_state:
if current_char is allowable as ident_char:
     continue as before
elif current_char is ASCII:
     Usual error
else:
     throw error eliding the "in_ident state"
else:
as is...

BTW after last post I tried some things and found other unsatisfactory (to me) behavior in this area; to wit:

>>> x = 0o19

File "<stdin>", line 1

x = 0o19
^
SyntaxError: invalid syntax

Of course the 9 cannot come in an octal constant but "Syntax Error"??
Seems a little over general

My preferred fix:
make a LexicalError sub exception to SyntaxError

Rest should follow for both

Disclaimer: I am a teacher and having a LexicalError category makes it nice to explain some core concepts
However I understand there are obviously other more pressing priorities than to make python superlative as a CS-teaching language :-)

Nick Coghlan

unread,

Jul 19, 2016, 10:56:41 AM7/19/16

to Rustom Mody, python...@python.org

On 19 July 2016 at 22:05, Rustom Mody <rusto...@gmail.com> wrote:
> A more practical solution would be to take the best of the python2 and
> python3 current approaches:
> "Invalid character XX in line YY"
> and just reveal nothing about what lexical category — like identifier —
> python thinks the char is coming in.

There's historically been relatively little work put into designing
the error messages coming out of the lexer, so if it's a task you're
interested in stepping up and taking on, you could probably find
someone willing to review the patches.

But if you perceive "Volunteers used their time as efficiently as
possible whilst fully Unicode enabling the CPython compilation
toolchain, since it was a dependency that needed to be addressed in
order to permit other more interesting changes, rather than an
inherently rewarding activity in its own right" as "wrongheaded", you
may want to spend some time considering the differences between
community-driven and customer-driven development.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Steven D'Aprano

unread,

Jul 19, 2016, 11:01:16 AM7/19/16

to python...@python.org

On Tue, Jul 19, 2016 at 07:40:42AM -0700, Rustom Mody wrote:

> My suggested solution involved this:
> Currently the lexer — basically an automaton — reveals which state its in
> when it throws error involving "identifier"
> Suggested change:
>
> if in_ident_state:
> if current_char is allowable as ident_char:
> continue as before
> elif current_char is ASCII:
> Usual error
> else:
> throw error eliding the "in_ident state"
> else:
> as is...

I'm sorry, you've lost me. Is this pseudo-code (1) of the current
CPython lexer, (2) what you imagine the current CPython lexer does, or
(3) what you think it should do? Because you call it a "change", but
you're only showing one state, so it's not clear if its the beginning or
ending state.

Basically I guess what I'm saying is that if you are suggesting a
concrete change to the lexer, you should be more precise about what
needs to actually change.

> BTW after last post I tried some things and found other unsatisfactory (to
> me) behavior in this area; to wit:
>
> >>> x = 0o19
> File "<stdin>", line 1
> x = 0o19
> ^
> SyntaxError: invalid syntax
>
> Of course the 9 cannot come in an octal constant but "Syntax Error"??
> Seems a little over general
>
> My preferred fix:
> make a LexicalError sub exception to SyntaxError

What's the difference between a LexicalError and a SyntaxError?

Under what circumstances is it important to distinguish between them?

It would be nice to have a more descriptive error message, but why
should I care whether the invalid syntax "0o19" is caught by a lexer or
a parser or the byte-code generator or the peephole optimizer or
something else? Really all I need to care about is:

- it is invalid syntax;
- why it is invalid syntax (9 is not a legal octal digit);
- and preferably, that it is caught at compile-time rather than run-time.

--
Steve

Nick Coghlan

unread,

Jul 19, 2016, 11:04:10 AM7/19/16

to Rustom Mody, python...@python.org

On 20 July 2016 at 00:40, Rustom Mody <rusto...@gmail.com> wrote:
> Disclaimer: I am a teacher and having a LexicalError category makes it nice
> to explain some core concepts
> However I understand there are obviously other more pressing priorities than
> to make python superlative as a CS-teaching language :-)

Given the motives of some of the volunteers in the community, "I am a
teacher, the current error confuses my students, and I'm willing to
discuss proposed alternative error messages with them to see if
they're improvements" can actually be a good way to get people
interested in helping out :)

The reason that can help is that the main problem with "improving"
error messages, is that it can be really hard to tell whether the
improvements are actually improvements or not (in some cases it's
obvious, but in others it's hard to decide when you've reached a point
of "good enough", so you throw up your hands and say "Eh, it's at
least distinctive enough that people will be able to find it on Stack
Overflow").

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Chris Angelico

unread,

Jul 19, 2016, 11:14:01 AM7/19/16

to python...@python.org

On Wed, Jul 20, 2016 at 1:03 AM, Nick Coghlan <ncog...@gmail.com> wrote:
> The reason that can help is that the main problem with "improving"
> error messages, is that it can be really hard to tell whether the
> improvements are actually improvements or not (in some cases it's
> obvious, but in others it's hard to decide when you've reached a point
> of "good enough", so you throw up your hands and say "Eh, it's at
> least distinctive enough that people will be able to find it on Stack
> Overflow").

Plus, there are all sorts of errors that look very different to
humans, but identical to the parser. Steven showed us an example where
an invalid character looked like it belonged in the identifier, yet to
the parser it's just "this Unicode category isn't valid here", same as
the bad-quote one. In a *very* few situations, a single error is
common enough to be worth special-casing (eg print without
parentheses, in recent Pythons), but otherwise, Stack Overflow or
python-list will do a far better job of diagnosis than the lexer ever
can. Ultimately, syntax errors represent an error in translation from
a human's concept to the text that's given to the interpreter, and
trying to reconstruct the concept from the errant text is better done
by a human than a computer. Obviously it's great when the computer can
read your mind, but it's never going to be perfect.

ChrisA

MRAB

unread,

Jul 19, 2016, 1:28:07 PM7/19/16

to python...@python.org

On 2016-07-19 16:13, Chris Angelico wrote:
> On Wed, Jul 20, 2016 at 1:03 AM, Nick Coghlan <ncog...@gmail.com> wrote:
>> The reason that can help is that the main problem with "improving"
>> error messages, is that it can be really hard to tell whether the
>> improvements are actually improvements or not (in some cases it's
>> obvious, but in others it's hard to decide when you've reached a point
>> of "good enough", so you throw up your hands and say "Eh, it's at
>> least distinctive enough that people will be able to find it on Stack
>> Overflow").
>
> Plus, there are all sorts of errors that look very different to
> humans, but identical to the parser. Steven showed us an example where
> an invalid character looked like it belonged in the identifier, yet to
> the parser it's just "this Unicode category isn't valid here", same as
> the bad-quote one. In a *very* few situations, a single error is
> common enough to be worth special-casing (eg print without
> parentheses, in recent Pythons), but otherwise, Stack Overflow or
> python-list will do a far better job of diagnosis than the lexer ever
> can. Ultimately, syntax errors represent an error in translation from
> a human's concept to the text that's given to the interpreter, and
> trying to reconstruct the concept from the errant text is better done
> by a human than a computer. Obviously it's great when the computer can
> read your mind, but it's never going to be perfect.
>

Unicode has a couple of properties called "ID_Start" and "ID_Continue".

The codepoint '“' doesn't match either of them, which is a good hint
that Python shouldn't really be saying "invalid character in identifier"
(it's the first character, but it can't be part of an identifier).

Stephen J. Turnbull

unread,

Jul 20, 2016, 12:45:05 PM7/20/16

to Nick Coghlan, python...@python.org

Nick Coghlan writes:

> The reason that can help is that the main problem with "improving"
> error messages, is that it can be really hard to tell whether the
> improvements are actually improvements or not

Personally, I think the real issue here is that the curly quote (and
things like mathematical PRIME character) are easily confused with
Python syntax and it all looks like grit on Tim's monitor. I tried
substituting an emoticon and the DOUBLE INTEGRAL, and it was quite
obvious what was wrong from the Python 3 error message.<wink/>

However, in this case, as far as I can tell from the error messages
induced by playing with ASCII, Python 3.5 thinks that all non-
identifier ASCII characters are syntactic (so for example it says that

with open($file.txt") as f:

is "invalid syntax"). But for non-ASCII characters (I guess including
the Latin 1 set?) they are either letters, numerals, or just plain not
valid in a Python program AIUI (outside of strings and comments, of
course).

I would think the lexer could just treat each invalid character as an
invalid_token, which is always invalid in Python syntax, and the error
would be a SyntaxError with the message formatted something like

"invalid character {} = U+{:04X}".format(ch, ord(ch))

This should avoid the strange placement of the position indicator,
too.

If someday we decide to use an non-ASCII character for a syntactic
purpose, that's a big enough compatibility break in itself that
changing the invalid character set (and thus the definition of
invalid_token) is insignificant.

I'm pretty sure this is what a couple of earlier posters have in mind,
too.

Danilo J. S. Bellini

unread,

Jul 20, 2016, 5:17:38 PM7/20/16

to Stephen J. Turnbull, python...@python.org

1. Using SyntaxError for lexical errors sounds as strange as saying a misspell/typo is a syntax mistake in a natural language. A new "LexicalError" or "TokenizerError" for that makes sense. Perhaps both this new exception and SyntaxError should inherit from a new CompileError class. But the SyntaxError is already covering cases alike with the TabError (an IndentationError), which is a lexical analysis error, not a parser one [1]. To avoid such changes while keeping the name, at least the SyntaxError docstring should be "Compile-time error." instead of "Invalid Syntax.", and the documentation should be explicit that it isn't only about parsing/syntax/grammar but also about lexical analysis errors.

2. About those lexical error messages, the caret is worse than the lack of it when it's not aligned, but unless I'm missing something, one can't guarantee that the terminal is printing the error message with the right encoding. Including the row and column numbers in the message would be helpful.

3. There are people who like and use unicode chars in identifiers. Usually I don't like to translate comments/identifiers to another language, but I did so myself, using variable names with accents in Portuguese for a talk [2], mostly to give it a try. Surprisingly, few people noticed that until I said. The same can be said about Sympy scripts, where symbols like Greek letters would be meaningful (e.g. μ for the mean, σ for the standard deviation and Σ for the covariance matrix), so I'd argue it's quite natural.

4. Unicode have more than one codepoint for some symbols that look alike, for example "Σ𝚺𝛴𝜮𝝨𝞢" are all valid uppercase sigmas. There's also "∑", but this one is invalid in Python 3. The italic/bold/serif distinction seems enough for a distinction, and when editing a code with an Unicode char like that, most people would probably copy and paste the symbol instead of typing it, leading to a consistent use of the same symbol.

5. New keywords, no matter whether they fit into the 7-bit ASCII or requires Unicode, unavoidably breaks backwards compatibility at least to some degree. That happened with the "nonlocal" keyword in Python 3, for example.

6. Python 3 code is UTF-8 and Unicode identifiers are allowed. Not having Unicode keywords is merely contingent on Python 2 behavior that emphasized ASCII-only code (besides comments and strings).

7. The discussion isn't about lambda or anti-lambda bias, it's about keyword naming and readability. Who gains/loses with that resource? It won't hurt those who never uses lambda and never uses Unicode identifiers. Perhaps Sympy users would feel harmed by that, as well as other scientific packages users, but looking for the "λ" char in GitHub I found no one using it alone within Python code. The online Python books written in Greek that I found were using only English identifiers.

8. I don't know if any consensus can emerge in this matter about lambdas, but there's another subject that can be discussed together: macros. What OP wants is exactly a "#define λ lambda", which would be only in the code that uses/needs such symbol with that meaning. A minimal lexical macro that just apply a single keyword token replacement by a identifier-like token is enough for him. I don't know a nice way to do that, something like "from __replace__ import lambda_to_λ" or even "def λ is lambda" would avoid new keywords, but I also don't know how desired this resource is (perhaps to translate the language keywords to another language?).

7. I really don't like the editor "magic", it would be better to create a packaging/setup.py translation script than that (something like 2to3). It's not about coloring/highlighting, nor about editors/IDEs features, it's about seeing the object/file itself, and colors never change that AFAIK. Also, most code I read isn't using my editor, sometimes it comes from cat/diff (terminal stdout output), vim/gedit/pluma (editor), GitHub/BitBucket (web), blogs/forums/e-mails, gitk, Spyder (IDE), etc.. That kind of "view" replacement would compromise some code alignment (e.g. multiline strings/comments) and line length, besides being a problem to look for code with tools like find + grep/sed/awk (which I use all the time). Still worse are the git hooks to perform the replacement before/after a commit: how should one test a code that uses that? It somehow feels out of control.

[1] https://docs.python.org/3/reference/lexical_analysis.html

[2] http://www.slideshare.net/djsbellini/20140416-garoa-hc-strategy

--

Danilo J. S. Bellini
---------------
"It is not our business to set up prohibitions, but to arrive at conventions." (R. Carnap)

Pavol Lisy

unread,

Jul 21, 2016, 12:54:42 AM7/21/16

to Danilo J. S. Bellini, python...@python.org

On 7/20/16, Danilo J. S. Bellini <danilo....@gmail.com> wrote:

> 4. Unicode have more than one codepoint for some symbols that look alike,
> for example "Σ𝚺𝛴𝜮𝝨𝞢" are all valid uppercase sigmas. There's also "∑",
> but this one is invalid in Python 3. The italic/bold/serif distinction
> seems enough for a distinction, and when editing a code with an Unicode
> char like that, most people would probably copy and paste the symbol
> instead of typing it, leading to a consistent use of the same symbol.

I am not sure what do you like to say, so for sure some info:

PEP-3131 (https://www.python.org/dev/peps/pep-3131/): "All identifiers
are converted into the normal form NFKC while parsing; comparison of
identifiers is based on NFKC."

From this point of view all sigmas are same:

set(unicodedata.normalize('NFKC', i) for i in "Σ𝚺𝛴𝜮𝝨𝞢") == {'Σ'}

Rustom Mody

unread,

Jul 21, 2016, 1:08:34 AM7/21/16

to python-ideas, rusto...@gmail.com, python...@python.org, ncog...@gmail.com

On Tuesday, July 19, 2016 at 8:26:41 PM UTC+5:30, Nick Coghlan wrote:

But if you perceive "Volunteers used their time as efficiently as
possible whilst fully Unicode enabling the CPython compilation
toolchain, since it was a dependency that needed to be addressed in
order to permit other more interesting changes, rather than an
inherently rewarding activity in its own right" as "wrongheaded", you
may want to spend some time considering the differences between
community-driven and customer-driven development.

Hi Nick

Sorry if I caused offense. Ive been using Python since around 2001 and its
been a strikingly pleasant relationship.
There have been surprisingly few times when python let me down in a class
(Only exception I remember in all these years:
https://mail.python.org/pipermail/python-list/2011-July/609369.html )

Which is a generally better record than most other languages.
So I remain grateful to Guido and the devs for this pleasing creation.

My “wrongheaded” was (intended) quite narrow and technical:

- The embargo on non-ASCII everywhere in the language except identifiers (strings
and comments obviously dont count as “in” the language
- The opening of identifiers to large swathes of Unicode widens as you say
hugely the surface area of attack

This was solely the contradiction I was pointing out.

Rustom Mody

unread,

Jul 21, 2016, 2:14:25 AM7/21/16

to python-ideas, danilo....@gmail.com, python...@python.org, pavol...@gmail.com

On Thursday, July 21, 2016 at 10:24:42 AM UTC+5:30, Pavol Lisy wrote:

On 7/20/16, Danilo J. S. Bellini <danilo....@gmail.com> wrote:
> 4. Unicode have more than one codepoint for some symbols that look alike,
> for example "Σ𝚺𝛴𝜮𝝨𝞢" are all valid uppercase sigmas. There's also "∑",
> but this one is invalid in Python 3. The italic/bold/serif distinction
> seems enough for a distinction, and when editing a code with an Unicode
> char like that, most people would probably copy and paste the symbol
> instead of typing it, leading to a consistent use of the same symbol.
I am not sure what do you like to say, so for sure some info:
PEP-3131 (https://www.python.org/dev/peps/pep-3131/): "All identifiers
are converted into the normal form NFKC while parsing; comparison of
identifiers is based on NFKC."
From this point of view all sigmas are same:
set(unicodedata.normalize('NFKC', i) for i in "Σ𝚺𝛴𝜮𝝨𝞢") == {'Σ'}

Nice!

>>> Σ = 1
>>> 𝚺 = Σ + 1
>>> 𝛴
2

But not enough

>>> А = 1
>>> A = A + 1

Traceback (most recent call last):
File "<stdin>", line 1, in <module>

NameError: name 'A' is not defined

Moral: The coarser the equivalence-relation the better (within reasonable limits!)
NFKC-equality i coarser than literal-codepoint equality. ∴ Better
But not coarse enough. After all identifiers are meant to identify!

Danilo J. S. Bellini

unread,

Jul 21, 2016, 2:15:11 AM7/21/16

to Pavol Lisy, python...@python.org

2016-07-21 1:53 GMT-03:00 Pavol Lisy <pavol...@gmail.com>:

On 7/20/16, Danilo J. S. Bellini <danilo....@gmail.com> wrote:

> 4. Unicode have more than one codepoint for some symbols that look alike,
> for example "Σ𝚺𝛴𝜮𝝨𝞢" are all valid uppercase sigmas. There's also "∑",
> but this one is invalid in Python 3. The italic/bold/serif distinction
> seems enough for a distinction, and when editing a code with an Unicode
> char like that, most people would probably copy and paste the symbol
> instead of typing it, leading to a consistent use of the same symbol.

I am not sure what do you like to say, so for sure some info:

PEP-3131 (https://www.python.org/dev/peps/pep-3131/): "All identifiers
are converted into the normal form NFKC while parsing; comparison of
identifiers is based on NFKC."

From this point of view all sigmas are same:

set(unicodedata.normalize('NFKC', i) for i in "Σ𝚺𝛴𝜮𝝨𝞢") == {'Σ'}

In this item I just said that most programmers would probably keep the same character in a source code file due to copying and pasting, and that even when it doesn't happen (the copy-and-paste action), visual differences like italic/bold/serif are enough for one to notice (when using another input method).

At first, I was thinking on a code with one of those symbols as a variable name (any of them), but PEP3131 challenges that. Actually, any conversion to a normal form means that one should never use unicode identifiers outside the chosen normal form. It would be better to raise an error instead of converting. If there isn't any lint tool already complaining about that, I strongly believe that's something that should be done. When mixing strings and identifier names, that's not so predictable:

>>> obj = type("SomeClass", (object,), {c: i for i, c in enumerate("Σ𝚺𝛴𝜮𝝨𝞢")})()

>>> obj.𝞢 == getattr(obj, "𝞢")

False

>>> obj.Σ == getattr(obj, "Σ")

True

>>> dir(obj)

[..., 'Σ', '𝚺', '𝛴', '𝜮', '𝝨', '𝞢']

--

Rustom Mody

unread,

Jul 21, 2016, 2:26:59 AM7/21/16

to python-ideas, pavol...@gmail.com, python...@python.org, danilo....@gmail.com

On Thursday, July 21, 2016 at 11:45:11 AM UTC+5:30, Danilo J. S. Bellini wrote:

2016-07-21 1:53 GMT-03:00 Pavol Lisy <pavol...@gmail.com>:
On 7/20/16, Danilo J. S. Bellini <danilo....@gmail.com> wrote:

> 4. Unicode have more than one codepoint for some symbols that look alike,
> for example "Σ𝚺𝛴𝜮𝝨𝞢" are all valid uppercase sigmas. There's also "∑",
> but this one is invalid in Python 3. The italic/bold/serif distinction
> seems enough for a distinction, and when editing a code with an Unicode
> char like that, most people would probably copy and paste the symbol
> instead of typing it, leading to a consistent use of the same symbol.

I am not sure what do you like to say, so for sure some info:

PEP-3131 (https://www.python.org/dev/peps/pep-3131/): "All identifiers
are converted into the normal form NFKC while parsing; comparison of
identifiers is based on NFKC."

From this point of view all sigmas are same:

set(unicodedata.normalize('NFKC', i) for i in "Σ𝚺𝛴𝜮𝝨𝞢") == {'Σ'}

In this item I just said that most programmers would probably keep the same character in a source code file due to copying and pasting, and that even when it doesn't happen (the copy-and-paste action), visual differences like italic/bold/serif are enough for one to notice (when using another input method).

At first, I was thinking on a code with one of those symbols as a variable name (any of them), but PEP3131 challenges that. Actually, any conversion to a normal form means that one should never use unicode identifiers outside the chosen normal form. It would be better to raise an error instead of converting.

Yes Agree
I said “Nice!” for

>>> Σ = 1
>>> 𝚺 = Σ + 1
>>> 𝛴
2

in comparison to:

>>> А = 1
>>> A = A + 1

because the A's look more indistinguishable than the sigmas and are internally more distinct
If the choice is to simply disallow the confusables that’s probably the best choice

IOW
1. Disallow co-existence of confusables (in identifiers)
2. Identify confusables to a normal form — like case-insensitive comparison and like NKFC
3. Leave the confusables to confuse

My choice
1 better than 2 better than 3

Chris Angelico

unread,

Jul 21, 2016, 3:21:27 AM7/21/16

to python-ideas

On Thu, Jul 21, 2016 at 4:26 PM, Rustom Mody <rusto...@gmail.com> wrote:
> IOW
> 1. Disallow co-existence of confusables (in identifiers)
> 2. Identify confusables to a normal form — like case-insensitive comparison
> and like NKFC
> 3. Leave the confusables to confuse
>
> My choice
> 1 better than 2 better than 3

So should we disable the lowercase 'l', the uppercase 'I', and the
digit '1', because they can be confused? What about the confusability
of "m" and "rn"? O and 0 are similar in some fonts. And case
insensitivity brings its own problems - is "ss" equivalent to "ß", and
is "ẞ" equivalent to either? Turkish distinguishes between "i", which
upper-cases to "İ", and "ı", which upper-cases to "I".

We already have interminable debates about letter similarities across
scripts. I'm sure everyone agrees that Cyrillic "и" is not the same
letter as Latin "i", but we have "AАΑ" in three different scripts.
Should they be considered equivalent? I think not, because in any
non-trivial context, you'll know whether the program's been written in
Greek, a Slavic language, or something using the Latin script. But
maybe you disagree. Okay; are "BВΒ" all to be considered equivalent
too? What about "СC"? "XХΧᚷ"? They're visually similar, but they're
not equivalent in any other way. And if you're going to say things
should be considered equivalent solely on the basis of visuals, you
get into a minefield - should U+200B ZERO WIDTH SPACE be completely
ignored, allowing "AB" to be equivalent to "A\u200bB" as an
identifier?

This debate should probably continue on python-list (if anywhere). I
doubt Python is going to change its normalization rules any time soon,
and if it does, it'll need a very solid reason (and probably a PEP
with all the pros and cons).

ChrisA

Stephen J. Turnbull

unread,

Jul 21, 2016, 3:37:21 AM7/21/16

to Danilo J. S. Bellini, python...@python.org

Danilo J. S. Bellini writes:

> 1. Using SyntaxError for lexical errors sounds as strange as saying a
> misspell/typo is a syntax mistake in a natural language.

Well, I find that many typos are discovered even though they look like
(and often enough are) real words, with unacceptable semantics
(sometimes even the same part of speech). So I don't find that analogy
at all compelling -- human recognition of typos is far more complex
than computer recognition of parse errors.

And the Python lexer is very simple, even among translators. It
creates tokens for operators which are more or less self-delimiting,
indentation, strings, and failing that sequences of characters
delimited by spaces, newlines, and operators. Token recognition is
now complete. For tokens of as-yet unknown type, it then checks
whether the token is a keyword, if not, is it a number. If not, in a
syntactically correct program, what's left is an identifier (and I
suppose that's why this error message says "identifier", and why it
points to the end of the token, not the "bad" character). It then
checks the putative identifier and discovers that the token isn't
well-formed as an identifier. I think it's a very good idea to keep
this tokenization process simple.

So in my proposal, it's intentionally not a lexical error, but rather
a new kind of self-delimiting token (with no syntactic role in correct
programs). A lexical error means that the translator failed to
construct (including identifying the syntactic role) a token. That's
very bad. Theoretically speaking, that means all bets are off, who
knows what the rest of the program might mean? Pragmatically, you can
use heuristics to generate error messages and reset the lexer to an
"appropriate" state, but as Nick points out, those heuristics are
unreliable and may do more harm than good, and it's not clear what the
appropriate reset state is.

Making an invalid_token (perhaps a better name for current purposes
would be invalid_character_token) means that there are no lexical
errors (except for UnicodeErrors, but they are "below" the level of
the language definition). This is consistent with current Python
practice for pure ASCII programs:

>>> a$b

File "<stdin>", line 1

a$b
^
SyntaxError: invalid syntax

Note that the caret is in the right place, so '$' is being treated as
an operator. (The same happens with '?', the other non-identifier
non-control ASCII character without specified semantics.)

The advantage is that the tokenized program has much more structure,
and much more restricted valid structure that it can match (correct
positioning of the caret is an immediate benefit, see below), than an
untokenized string (remember, it's already known to contain errors).
Of course you could implicitly do the same thing at the lexical level,
but "explicit is better than implicit". Since we're trying to reason
about invalid programs, the motivation is heuristic either way, but an
explicit definition of invalid_token means that the processing by the
translator is easier to understand, and it would restrict the ways
that handling of this error could change in the future. I consider
that restriction to be a good thing in this context, YMMV.

> 2. About those lexical error messages, the caret is worse than the lack of
> it when it's not aligned, but unless I'm missing something, one can't
> guarantee that the terminal is printing the error message with the right
> encoding.

But it will print the character in the erroneous line and that
character in the error message the same way, which should be enough
(certainly will be enough in the "curly quotes" example). To identify
the exact character that Python is concerned with (regardless of
whether the glyphs in the error message are what the user sees in her
editor) the Unicode scalar (or even the Unicode name, but that
requires importing the Unicode character database which might be
undesirable) is included.

> Including the row and column numbers in the message would be
> helpful.

The line number is already there, the current tokenization process
will set the column number to the place where the caret is. My
proposal fixes this automatically without requiring Python to do more
analysis than "end of token", which it already knows.

> 6. Python 3 code is UTF-8 and Unicode identifiers are allowed. Not having
> Unicode keywords is merely contingent on Python 2 behavior that
> emphasized ASCII-only code (besides comments and strings).

It's more than that. For better or worse, English is the natural
language source for Python keywords (even "elif" is a contraction, and
feels natural to this native speaker), and I can think of no variant
of English where (plausible) candidate keywords can't be spelled with
ASCII. "lambda" itself is the only plausible exception as far as I
know, and even there "lambda calculus" is perfectly good English now.

> 7. I really don't like the editor "magic", it would be better to create a
> packaging/setup.py translation script than that (something like
> 2to3).

2to3 can be used for this purpose, it's quite flexible about the
rulesets that can be defined and specified.

But note that that implies that adding this capability to the stdlib
would fork the language within the CPython implemention, just as
Python 3 is a fork from Python 2. That sounds like a bad idea to me
-- some people have always complained that porting to Python 3 is
almost like learning a new language, many people are already
complaining that Python 3 is getting bigger than they like, and it
would impose a burden on other implementations.

> Still worse are the git hooks to perform the replacement
> before/after a commit: how should one test a code that uses that?
> It somehow feels out of control.

Exactly. All of this discussion about providing an alias for "lambda"
seems out of control, and as a 20-year veteran of Emacs development
(where there is no way to make a clean distinction between language
and stdlib, apparently nobody has ever heard of TOOWTDI, and 3-line
hacks are regularly committed to the core code), it gives me a
terrifying feeling of deja vu.

Improving the message for invalid identifiers of this particular kind,
OTOH, is a straightforward extension of the existing mechanism.

Pavol Lisy

unread,

Jul 21, 2016, 3:38:53 AM7/21/16

to Danilo J. S. Bellini, python...@python.org

[getattr(obj, i) for i in dir(obj) if i in "Σ𝚺𝛴𝜮𝝨𝞢"] # [0, 1, 2, 3, 4, 5]

but:

[obj.Σ, obj.𝚺, obj.𝛴, obj.𝜮, obj.𝝨, obj.𝞢, ] # [0, 0, 0, 0, 0, 0]

So you could mix any of them while editing identifiers. (but you could
not mix them while writing parameters in getattr, setattr and type)

But getattr, setattr and type are other beasts, because they can use
"non identifiers", non letter characters too:

setattr(obj,'+', 7)
dir(obj) # ['+', ...] # but obj.+ is syntax error

setattr(obj,u"\udcb4", 7)
dir(obj) # [..., '\udcb4' ,...]

obj = type("SomeClass", (object,), {c: i for i, c in enumerate("+-*/")})()

Maybe there is still some Babel curse here and some sort of
normalize_dir, normalize_getattr, normalize_setattr, normalize_type
could help? I am not sure. They probably make things more complicated
than simpler.

Rustom Mody

unread,

Jul 21, 2016, 3:47:37 AM7/21/16

to python-ideas, python...@python.org, ros...@gmail.com

I said 1 better than 2 better than 3
Maybe you also want to add:

Special cases aren't special enough to break the rules.
Although practicality beats purity.

followed by

Errors should never pass silently.

IOW setting out 1 better than 2 better than 3 does not necessarily imply its completely achievable

Chris Angelico

unread,

Jul 21, 2016, 3:55:15 AM7/21/16

to python-ideas

No; I'm not saying that. I'm completely disagreeing with #1's value. I
don't think the language interpreter should concern itself with
visually-confusing identifiers. Unicode normalization is about
*equivalent characters*, not confusability, and I think that's as far
as Python should go.

Paul Moore

unread,

Jul 21, 2016, 4:37:59 AM7/21/16

to Chris Angelico, python-ideas

On 21 July 2016 at 08:54, Chris Angelico <ros...@gmail.com> wrote:
> No; I'm not saying that. I'm completely disagreeing with #1's value. I
> don't think the language interpreter should concern itself with
> visually-confusing identifiers. Unicode normalization is about
> *equivalent characters*, not confusability, and I think that's as far
> as Python should go.

+1. There are plenty of ways of writing programs that don't do what
the reader expects. (Non-malicious) people writing Python code
shouldn't be using visually ambiguous identifiers. People running code
they don't trust should check it (and yes, "are there any
non-ASCII/confusable characters used in identifiers" is one check they
should make, among many).

Avoiding common mistakes is a good thing. But that's about as far as
we should go. On that note, though, "smart quotes" do find their way
into code, usually via cut and paste from documents in tools like MS
Word that "helpfully" change straight quotes to smart ones. So it
*may* be worth special casing a check for smart quotes in identifiers,
and reporting something like "was this meant to be a string, but you
accidentally used smart quotes"? On the other hand, people who
routinely copy code samples from sources that mangle the quotes are
probably used to errors of this nature, and know what went wrong even
if the error is unclear. After all, I don't know *any* language that
explicitly checks for this specific error.

Personally, I don't think that the effort required is justified by the
minimal benefit. But if someone were to be inclined to make that
effort and produce a patch, an argument along the above lines
(catching a common cut and paste error) might be sufficient to
persuade someone to commit it.

Paul

Steven D'Aprano

unread,

Jul 21, 2016, 10:26:35 AM7/21/16

to python...@python.org

On Wed, Jul 20, 2016 at 06:16:10PM -0300, Danilo J. S. Bellini wrote:

> 1. Using SyntaxError for lexical errors sounds as strange as saying a
> misspell/typo is a syntax mistake in a natural language.

Why? Regardless of whether the error is found by the tokeniser, the
lexer, the parser, or something else, it is still a *syntax error*. Why
would the programmer need to know, or care, what part of the
compiler/interpreter detects the error?

Also consider that not all Python interpreters will divide up the task
of interpreting code exactly the same way. Tokenisers, lexers and
parsers are very closely related and not necessarily distinct. Should
the *exact same typo* generate TokenError in one Python, LexerError in
another, and ParserError in a third? What is the advantage of that?

> 2. About those lexical error messages, the caret is worse than the lack of
> it when it's not aligned, but unless I'm missing something, one can't
> guarantee that the terminal is printing the error message with the right
> encoding. Including the row and column numbers in the message would be
> helpful.

It would be nice for the caret to point to the illegal character, but
it's not *wrong* to point past it to the end of the token that contains
the illegal character.

> 4. Unicode have more than one codepoint for some symbols that look alike,
> for example "Σ𝚺𝛴𝜮𝝨𝞢" are all valid uppercase sigmas. Ther

Not really. Look at their names:

GREEK CAPITAL LETTER SIGMA
MATHEMATICAL BOLD CAPITAL SIGMA
MATHEMATICAL ITALIC CAPITAL SIGMA
MATHEMATICAL BOLD ITALIC CAPITAL SIGMA
MATHEMATICAL SANS-SERIF BOLD CAPITAL SIGMA
MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL SIGMA

Personally, I don't understand why the Unicode Consortium has included
all these variants. But whatever the reason, the names hint strongly
that they have specialised purposes, and shouldn't be used when you want
the letter Σ.

But, if you do, Python will normalise them all to Σ, so there's no real
harm done, except to the readability of your code.

[...]

> when editing a code with an Unicode
> char like that, most people would probably copy and paste the symbol
> instead of typing it, leading to a consistent use of the same symbol.

You are assuming that the programmer's font includes glyphs for all of
six of those code points. More likely, the programmer will see Σ for the
first code point, and the other five will display as a pair of "missing
glyph" boxes. (That's exactly what I see in my mail client, and in the
Python interpreter.)

Why a pair of boxes? Because they are code points in the Supplementary
Multilingual Planes, and require *two* 16-bit code units in UTF-16. So
naive Unicode software with poor support for the SMPs will display two
boxes, one for each surrogate code point.

Even if the code points display correctly, with distinct glyphs, your
comment that most people will be forced to copy and paste the symbol is
precisely why I am reluctant to see Python introduce non-ASCII keywords
or operators. It's a pity, because I think that non-ASCII operators at
least can make a much richer language (although I wouldn't want to see
anything as extreme as APL). Perhaps I will change my mind in a few more
years, as the popularity of emoji encourage more applications to have
better support for non-ASCII and the SMPs.

[...]

> 6. Python 3 code is UTF-8 and Unicode identifiers are allowed. Not having
> Unicode keywords is merely contingent on Python 2 behavior that emphasized
> ASCII-only code (besides comments and strings).

No, it is a *policy decision*. It is not because Python 2 didn't support
them. Python 2 didn't support non-ASCII identifiers either, but Python 3
intentionally broke with that.

> 7. The discussion isn't about lambda or anti-lambda bias, it's about
> keyword naming and readability. Who gains/loses with that resource? It
> won't hurt those who never uses lambda and never uses Unicode identifiers.

It will hurt those who have to read code with a mystery λ that they
don't know what it means and they have no idea how to search for it. At
least "python lambda" is easy to search for.

It will hurt those who want to use λ as an identifier. I include myself
in that category. I don't want λ to be reserved as a keyword.

I look at it like this: use λ as a keyword makes as much sense as making
f a keyword so that we can save a few characters by writing:

f myfunction(arg, x, y):
pass

instead of def. I use f as an identifier in many places, e.g.:

for f in list_of_functions:
...

or in functional code:

compose(f, g)

Yes, I can *work around it* by naming things f_ instead of f, but that's
ugly. Even though it saves a few keystrokes, I wouldn't want f to be
reserved as a keyword, and the same goes for λ as lambda.

> 8. I don't know if any consensus can emerge in this matter about lambdas,
> but there's another subject that can be discussed together: macros.

I'm pretty sure that Guido has ruled "Over My Dead Body" to anything
resembling macros in Python.

However, we can experiment with adding keywords and macro-like
facilities without Guido's permission. For example:

http://www.staringispolite.com/likepython/

It's a joke, of course, but the technology is real.

Imagine, if you will, that somebody you could declare a "dialect" at the
start of Python modules, just after the optional language cookie:

# -*- coding: utf-8 -*-
# -*- dialect math -*-

which would tell importlib to run the code through some sort of
source/AST transformation before importing it. That will allow us to
localise the keywords, introduce new operators, and all the other things
Guido hates *wink* and still be able to treat the code as normal Python.

A bad idea? Probably an awful one. But it's worth experimenting with it,
It will be fun, and it *just might* turn out to be a good idea.

For the record, in the 1980s and 1990s, Apple used a similar idea for
two of their scripting languages, Hypertalk and Applescript, allowing
users to localise keywords. Hypertalk is now defunct, and Applescript
has dropped that feature, which suggests that it is a bad idea. Or maybe
it was just ahead of its time.

--
Steve

Chris Angelico

unread,

Jul 21, 2016, 10:35:21 AM7/21/16

to python-ideas

On Fri, Jul 22, 2016 at 12:25 AM, Steven D'Aprano <st...@pearwood.info> wrote:
> On Wed, Jul 20, 2016 at 06:16:10PM -0300, Danilo J. S. Bellini wrote:
>> 2. About those lexical error messages, the caret is worse than the lack of
>> it when it's not aligned, but unless I'm missing something, one can't
>> guarantee that the terminal is printing the error message with the right
>> encoding. Including the row and column numbers in the message would be
>> helpful.
>
> It would be nice for the caret to point to the illegal character, but
> it's not *wrong* to point past it to the end of the token that contains
> the illegal character.

And it's currently being explored here:

http://bugs.python.org/issue27582

If you like the idea of the caret pointing somewhere else, join the discussion.

ChrisA

David Mertz

unread,

Jul 21, 2016, 11:00:16 AM7/21/16

to Chris Angelico, python-ideas

This idea of "visually confusable" seems like a very silly thing to worry about, as others have noted.

It's not just that completely different letters from different alphabets may "look similar", it's also that the similarity is completely dependent on the specific font used for display. My favorite font might have clearly distinguished glyphs for the Cyrillic, Roman, and Greek "A", even if your font uses identical glyphs.

So in this crazy scenario, Python would have to gain awareness of the fonts installed in every text editor and display device of every user.

David Mertz

unread,

Jul 21, 2016, 11:10:47 AM7/21/16

to Steven D'Aprano, python-ideas

On Jul 21, 2016 7:26 AM, "Steven D'Aprano" <st...@pearwood.info> wrote:
> You are assuming that the programmer's font includes glyphs for all of
> six of those code points. More likely, the programmer will see Σ for the
> first code point, and the other five will display as a pair of "missing
> glyph" boxes. (That's exactly what I see in my mail client, and in the
> Python interpreter.)

Fwiw, on my OSX laptop, with whatever particular fonts I have installed there, using a particular webmail service in the particular browser I use, I see all six glyphs.

If I were to copy-paste into a text editor, all bets would be off, and depend on the editor and it its settings. Same for interactive shells run in particular terminal apps.

Viewing right now, on my Android tablet and the Gmail app, I see a bunch of missing glyph markers. But quite likely I could install fonts or change settings on this device to render them.

Steven D'Aprano

unread,

Jul 21, 2016, 11:46:34 AM7/21/16

to python...@python.org

On Wed, Jul 20, 2016 at 11:26:58PM -0700, Rustom Mody wrote:

> >>> А = 1
> >>> A = A + 1
>
> because the A's look more indistinguishable than the sigmas and are
> internally more distinct
> If the choice is to simply disallow the confusables that’s probably the
> best choice
>
> IOW
> 1. Disallow co-existence of confusables (in identifiers)

That would require disallowing 1 l and I, as well as O and 0. Or are
you, after telling us off for taking an ASCII-centric perspective, going
to exempt ASCII confusables?

In a dynamic language like Python, how do you prohibit these
confusables? Every time Python does a name binding operation, is it
supposed to search the entire namespace for potential confusables?
That's going to be awful expensive.

Confusables are a real problem in URLs, because they can be used for
phishing attacks. While even the most tech-savvy user is vulnerable, it
is especially the *least* savvy users who are at risk, which makes it
all the more important to protect against confusables in URLs.

But in programming code? Your demonstration with the Latin A and the
Greek alpha Α or Cyrillic А is just a party trick. In a world where most
developers do something like:

pip install randompackage
python -m randompackage

without ever once looking at the source code, I think we have bigger
problems. Or rather, even the bigger problems are not that big.

If you're worried about confusables, there are alternatives other than
banning them: your editor or linter might highlight them. Or rather than
syntax highlighting, perhaps editors should use *semantic highlighting*
and colour-code variables:

https://medium.com/@evnbr/coding-in-color-3a6db2743a1e

in which case your A and A will be highlighted in completely different
colours, completely ruining the trick.

(Aside: this may also help with the "oops I misspelled my variable and
the compiler didn't complain" problem. If "self.dashes" is green and
"self.dahses" is blue, you're more likely to notice the typo.)

--
Steve

Terry Reedy

unread,

Jul 21, 2016, 2:51:44 PM7/21/16

to python...@python.org

On 7/21/2016 10:25 AM, Steven D'Aprano wrote:

> Imagine, if you will, that somebody you could declare a "dialect" at the
> start of Python modules, just after the optional language cookie:
>
> # -*- coding: utf-8 -*-
> # -*- dialect math -*-
>
> which would tell importlib to run the code through some sort of
> source/AST transformation before importing it. That will allow us to
> localise the keywords, introduce new operators, and all the other things
> Guido hates *wink* and still be able to treat the code as normal Python.

Or one could write a 'unipy' extension to an IDE like IDLE that would
translate an entire editor buffer either way. It would take less time
than has been expended pushing for a change that will not happen in the
near future.

> A bad idea? Probably an awful one. But it's worth experimenting with it,
> It will be fun, and it *just might* turn out to be a good idea.
>
> For the record, in the 1980s and 1990s, Apple used a similar idea for
> two of their scripting languages, Hypertalk and Applescript, allowing
> users to localise keywords. Hypertalk is now defunct, and Applescript
> has dropped that feature, which suggests that it is a bad idea. Or maybe
> it was just ahead of its time.
>
>

--

Terry Jan Reedy

Reply all

Reply to author

Forward