Using non-ascii symbols

Christoph Zwerschke

unread,

Jan 23, 2006, 10:09:00 PM1/23/06

to

On the page http://wiki.python.org/moin/Python3%2e0Suggestions
I noticed an interesting suggestion:

"These operators ≤ ≥ ≠ should be added to the language having the
following meaning:

<= >= !=

this should improve readibility (and make language more accessible to
beginners).

This should be an evolution similar to the digraphe and trigraph
(digramme et trigramme) from C and C++ languages."

How do people on this group feel about this suggestion?

The symbols above are not even latin-1, you need utf-8.

(There are not many usefuls symbols in latin-1. Maybe one could use ×
for cartesian products...)

And while they are better readable, they are not better typable (at
least with most current editors).

Is this idea absurd or will one day our children think that restricting
to 7-bit ascii was absurd?

Are there similar attempts in other languages? I can only think of APL,
but that was a long time ago.

Once you open your mind for using non-ascii symbols, I'm sure one can
find a bunch of useful applications. Variable names could be allowed to
be non-ascii, as in XML. Think class names in Arabian... Or you could
use Greek letters if you run out of one-letter variable names, just as
Mathematicians do. Would this be desirable or rather a horror scenario?
Opinions?

-- Christoph

James Stroud

unread,

Jan 23, 2006, 10:16:52 PM1/23/06

to

I can't find "≤, ≥, or ≠" on my keyboard.

James

Robert Kern

unread,

Jan 23, 2006, 10:45:47 PM1/23/06

to pytho...@python.org

James Stroud wrote:

> I can't find "≤, ≥, or ≠" on my keyboard.

Get a better keyboard? or OS?

On OS X,

≤ is Alt-,
≥ is Alt-.
≠ is Alt-=

Fewer keystrokes than <= or >= or !=.

--
Robert Kern
rober...@gmail.com

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Giovanni Bajo

unread,

Jan 24, 2006, 2:54:21 AM1/24/06

to

Robert Kern wrote:

>> I can't find "?, ?, or ?" on my keyboard.

>
> Get a better keyboard? or OS?
>
> On OS X,
>

> ? is Alt-,
> ? is Alt-.
> ? is Alt-=

>
> Fewer keystrokes than <= or >= or !=.

Sure, but I can't find OS X listed as a prerequisite for using Python. So,
while I don't give a damn if those symbols are going to be supported by Python,
I don't think the plain ASCII version should be deprecated. There are too many
situations where it's still useful (coding across old terminals and whatnot).
--
Giovanni Bajo

Steven D'Aprano

unread,

Jan 24, 2006, 6:56:01 AM1/24/06

to

On Tue, 24 Jan 2006 04:09:00 +0100, Christoph Zwerschke wrote:

> On the page http://wiki.python.org/moin/Python3%2e0Suggestions
> I noticed an interesting suggestion:
>
> "These operators ≤ ≥ ≠ should be added to the language having the
> following meaning:
>
> <= >= !=
>
> this should improve readibility (and make language more accessible to
> beginners).
>
> This should be an evolution similar to the digraphe and trigraph
> (digramme et trigramme) from C and C++ languages."
>
> How do people on this group feel about this suggestion?
>
> The symbols above are not even latin-1, you need utf-8.
>
> (There are not many usefuls symbols in latin-1. Maybe one could use ×
> for cartesian products...)

Or for multiplication :-)

> And while they are better readable, they are not better typable (at
> least with most current editors).
>
> Is this idea absurd or will one day our children think that restricting
> to 7-bit ascii was absurd?
>
> Are there similar attempts in other languages? I can only think of APL,
> but that was a long time ago.

My earliest programming was on (classic) Macintosh, which supported a
number of special characters including ≤ ≥ ≠ with the obvious
meanings. They were easy to enter too: the Mac keyboard had (has?) an
option key, and holding the option key down while typing a character would
enter a special character. E.g. option-s gave Greek sigma, option-p gave
pi, option-less-than gave ≤, and so forth. Much easier than trying to
memorize character codes.

I greatly miss the Mac's ease of entering special characters, and I miss
the ability to use proper mathematical symbols for (e.g.) pi, not equal,
and so forth.

> Once you open your mind for using non-ascii symbols, I'm sure one can
> find a bunch of useful applications. Variable names could be allowed to
> be non-ascii, as in XML. Think class names in Arabian... Or you could
> use Greek letters if you run out of one-letter variable names, just as
> Mathematicians do. Would this be desirable or rather a horror scenario?
> Opinions?

I think the use of digraphs like != for not equal is a poor substitute for
a real not-equal symbol. I think the reliance of 7-bit ASCII is horrible
and primitive, but without easier, more intuitive ways of entering
non-ASCII characters, and better support for displaying non-ASCII
characters in the console, I can't see this suggestion going anywhere.

--
Steven.

Claudio Grondi

unread,

Jan 24, 2006, 7:07:32 AM1/24/06

to

One of issues in Python is cross-platform portability. Limiting the
range of symbols to lower ASCII and with specification of a code table
to ASCII is a good deal here. I think, that Unicode is not yet
everywhere and as long it is that way it makes not much sense to go for
it in Python.

Claudio

Ido Yehieli

unread,

Jan 24, 2006, 8:59:20 AM1/24/06

to

>> Is this idea absurd or will one day our children think
>> that restricting to 7-bit ascii was absurd?

Both... this idea will only become none-absurd when unicode will become
as prevalent as ascii, i.e. unicode keyboards, universal support under
almost every application, and so on. Even if you can easly type it on
your macintosh, good luck using it while using said macintosh to ssh or
telnet to a remote server and trying to type unicode...

Juho Schultz

unread,

Jan 24, 2006, 9:33:16 AM1/24/06

to

Christoph Zwerschke wrote:

> "These operators ≤ ≥ ≠ should be added to the language having the
> following meaning:
>
> <= >= !=
>
> this should improve readibility (and make language more accessible to
> beginners).
>

I assume most python beginners know some other programming language, and
are familiar with the >= and friends. Those learning python as their
first programming language will benefit from learning the >= when they
learn a new language.

Unicode is not yet supported everywhere, so some editors/terminals might
display the suggested one-char operators as something else, effectively
"guess what operator I was thinking".

Fortran 90 allowed >, >= instead of .GT., .GE. of Fortran 77. But F90
uses ! as comment symbol and therefore need /= instead of != for
inequality. I guess just because they wanted. However, it is one more
needless detail to remember. Same with the suggested operators.

Rocco Moretti

unread,

Jan 24, 2006, 10:16:17 AM1/24/06

to

Giovanni Bajo wrote:
> Robert Kern wrote:
>
>
>>>I can't find "?, ?, or ?" on my keyboard.

Posting code to newsgroups might get harder too. :-)

Robert Kern

unread,

Jan 24, 2006, 10:58:24 AM1/24/06

to pytho...@python.org

Rocco Moretti wrote:

[James Stroud wrote:]

>>>>I can't find "?, ?, or ?" on my keyboard.
>
> Posting code to newsgroups might get harder too. :-)

His post made it through fine. Your newsreader messed it up.

Christoph Zwerschke

unread,

Jan 24, 2006, 11:02:01 AM1/24/06

to

Giovanni Bajo wrote:
> Sure, but I can't find OS X listed as a prerequisite for using Python. So,
> while I don't give a damn if those symbols are going to be supported by Python,
> I don't think the plain ASCII version should be deprecated. There are too many
> situations where it's still useful (coding across old terminals and whatnot).

I think we should limit the discussion to allowing non-ascii symbols
*alternatively* to (combinations of) ascii chars. Nobody should be
forced to use them since not all editors/OSs and keyboards support it.

Think about moving from ASCII to LATIN-1 or UTF-8 as similar to moving
from ISO 646 to ASCII (http://en.wikipedia.org/wiki/C_trigraph).

I think it is a legitimate question, after UTF-8 becomes more and more
supported.

Editors could provide means to easily enter these symbols once
programming languages start supporting them: Automatic expansion of
ascii combinations, Alt-Combinations (like in OS-X) or popup menus with
all supported symbols.

-- Christoph

Paul Watson

unread,

Jan 24, 2006, 11:05:09 AM1/24/06

to

This will eventually happen in some form. The problem is that we are
still in the infancy of computing. We are using stones and chisels to
express logic. We are currently faced with text characters with which
to express intent. There will come a time when we are able to represent
a program in another form that is readily portable to many platforms.

In the meantime (probably 50 years or so), it would be advantageous to
use a universal character set for coding programs. To that end, the
input to the Python interpreter should be ISO-10646 or a subset such as
Unicode. If the # -*- coding: ? -*- line specifies something other than
ucs-4, then a preprocessor should convert it to ucs-4. When it is
desireable to avoid the overhead of the preprocessor, developers will
find a way to save source code in ucs-4 encoding.

The problem with using Unicode in utf-8 and utf-16 forms is that the
code will forever need to be written and forever execute additional
processing to handle the MBCS and MSCS (Multiple-Short Character Set)
situation.

Ok. Maybe computing is past infancy. But most development environments
are not much past toddler stage.

Robert Kern

unread,

Jan 24, 2006, 11:03:15 AM1/24/06

to pytho...@python.org

[~]$ ssh rk...@192.168.1.66
rk...@192.168.1.66's password:
Linux rkernx2 2.6.12-9-amd64-generic #1 Mon Oct 10 13:27:39 BST 2005 x86_64
GNU/Linux

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
Last login: Mon Jan 9 12:40:28 2006 from 192.168.1.141
[~]$ cat > utf-8.txt
x + y ≥ z
[~]$ cat utf-8.txt
x + y ≥ z

Luck isn't involved.

Christoph Zwerschke

unread,

Jan 24, 2006, 11:31:34 AM1/24/06

to

Juho Schultz wrote:
> Fortran 90 allowed >, >= instead of .GT., .GE. of Fortran 77. But F90
> uses ! as comment symbol and therefore need /= instead of != for
> inequality. I guess just because they wanted. However, it is one more
> needless detail to remember. Same with the suggested operators.

The point is that it is just *not* the same. The suggested operators are
universal symbols (unicode). Nobody would use ≠ as a comment sign. No
need to remember was it .NE. or -ne or <> or != or /= ...

There is also this old dispute of using "=" for both the assignment
operator and equality and how it can confuse newcomers and cause errors.
A consequent use of unicode could solve this problem:

a ← b # Assignment (now "a = b" in Python, a := b in Pascal)
a = b # Eqality (now "a == b" in Python, a = b in Pascal)
a ≡ b # Identity (now "a is b" in Python, @a = @b in Pascal)
a ≈ b # Approximately equal (may be interesting for floats)

(I know this goes one step further as it is incompatible to the existing
use of the = sign in Python).

Another aspect: Supporting such symbols would also be in accord with
Python's trait of being "executable pseudo code."

-- Christoph

Dave Hansen

unread,

Jan 24, 2006, 11:32:25 AM1/24/06

to

On Tue, 24 Jan 2006 16:33:16 +0200 in comp.lang.python, Juho Schultz
<juho.s...@helsinki.fi> wrote:

[...]

>
>Fortran 90 allowed >, >= instead of .GT., .GE. of Fortran 77. But F90
>uses ! as comment symbol and therefore need /= instead of != for
>inequality. I guess just because they wanted. However, it is one more
>needless detail to remember. Same with the suggested operators.

C uses ! as a unary logical "not" operator, so != for "not equal" just
seems to follow, um, logically.

Pascal used <>, which intuitively (to me, anyway ;-) read "less than
or greater than," i.e., "not equal." Perl programmers might see a
spaceship.

Modula-2 used # for "not equal." I guess that wouldn't work well in
Python...

Regards,
-=Dave

--
Change is inevitable, progress is not.

Dave Hansen

unread,

Jan 24, 2006, 11:38:56 AM1/24/06

to

On Tue, 24 Jan 2006 04:09:00 +0100 in comp.lang.python, Christoph
Zwerschke <ci...@online.de> wrote:

[...]

>Once you open your mind for using non-ascii symbols, I'm sure one can
>find a bunch of useful applications. Variable names could be allowed to
>be non-ascii, as in XML. Think class names in Arabian... Or you could
>use Greek letters if you run out of one-letter variable names, just as
>Mathematicians do. Would this be desirable or rather a horror scenario?

The latter, IMHO. Especially variable names. Consider i vs. ì vs. í
vs. î vs. ï vs. ...

Rocco Moretti

unread,

Jan 24, 2006, 11:44:51 AM1/24/06

to

Robert Kern wrote:
> Rocco Moretti wrote:
>
> [James Stroud wrote:]
>
>>>>>I can't find "?, ?, or ?" on my keyboard.
>>
>>Posting code to newsgroups might get harder too. :-)
>
>
> His post made it through fine. Your newsreader messed it up.

I'm not exactally sure what happened - I can see the three charachters
just fine in your (Robert's) and the original (Christoph's) post. In
Giovanni's post, they're rendered as question marks.

My point still stands: _somewere_ along the way the rendering got messed
up for _some_ people - something that wouldn't have happened with the
<=, >= and != digraphs.

(FWIW, my newsreader is Thunderbird 1.0.6.)

Claudio Grondi

unread,

Jan 24, 2006, 12:13:55 PM1/24/06

to

Christoph Zwerschke wrote:
> Juho Schultz wrote:
>
>> Fortran 90 allowed >, >= instead of .GT., .GE. of Fortran 77. But F90
>> uses ! as comment symbol and therefore need /= instead of != for
>> inequality. I guess just because they wanted. However, it is one more
>> needless detail to remember. Same with the suggested operators.
>
>
> The point is that it is just *not* the same. The suggested operators are
> universal symbols (unicode). Nobody would use ≠ as a comment sign. No
> need to remember was it .NE. or -ne or <> or != or /= ...
>
> There is also this old dispute of using "=" for both the assignment
> operator and equality and how it can confuse newcomers and cause errors.
> A consequent use of unicode could solve this problem:
>

Being involved in the discussion about assignment and looking for new
terms which do not cause confusion when explaining what assignment does,
this proposal seems to be a kind of solution:

> a ← b # Assignment (now "a = b" in Python, a := b in Pascal)

^-- this seems to me to be still open for further proposals and
discussion. There is no symbol coming to my mind, but I would be glad if
it would express, that 'a' becomes a reference to a Python object being
currently referred by the identifier 'b' (maybe some kind of <-> ?).

> a = b # Eqality (now "a == b" in Python, a = b in Pascal)
> a ≡ b # Identity (now "a is b" in Python, @a = @b in Pascal)
> a ≈ b # Approximately equal (may be interesting for floats)

^-- this three seem to me to be obvious and don't need to be
further discussed (only implemented as the time for such things will come).

Claudio

Christoph Zwerschke

unread,

Jan 24, 2006, 1:01:12 PM1/24/06

to

Rocco Moretti schrieb:

> My point still stands: _somewere_ along the way the rendering got messed
> up for _some_ people - something that wouldn't have happened with the
> <=, >= and != digraphs.

Yes, but Python is already a bit handicapped concerning posting code
anyway because of its significant whitespace. Also, I believe once
Python will support this, the editors will allow converting "digraphs"
<=, >= and != to symbols back and forth, just as all editors learned to
convert tabs to spaces back and forth... And newsreaders and mailers are
also improving. Some years ago, I used to write all German Umlauts as
digraphs because you could never be sure how they arrived. Nowadays, I'm
using Umlauts as something very normal.

-- Christoph

Fredrik Lundh

unread,

Jan 24, 2006, 1:14:01 PM1/24/06

to pytho...@python.org

Christoph Zwerschke wrote:

> > My point still stands: _somewere_ along the way the rendering got messed
> > up for _some_ people - something that wouldn't have happened with the
> > <=, >= and != digraphs.
>
> Yes, but Python is already a bit handicapped concerning posting code
> anyway because of its significant whitespace. Also, I believe once
> Python will support this, the editors will allow converting "digraphs"
> <=, >= and != to symbols back and forth

umm. if you have an editor that can convert things back and forth, you
don't really need language support for "digraphs"...

</F>

Christoph Zwerschke

unread,

Jan 24, 2006, 1:24:07 PM1/24/06

to

UTF-8 is also the standard encoding of SuSE Linux since I version 9.1.
Both VIM and EMACS provide ways to enter unicode. VIM even supports
digraph input which would be particularly senseful in this case.

-- Christoph

Christoph Zwerschke

unread,

Jan 24, 2006, 1:38:54 PM1/24/06

to

Claudio Grondi wrote:
> There is no symbol coming to my mind, but I would be glad if
> it would express, that 'a' becomes a reference to a Python object being
> currently referred by the identifier 'b' (maybe some kind of <-> ?).

With unicode, you have a lot of possibilities to express this:

a ← b # a = b
a ⇐ b # a = copy(b)
a ⇚ b # a = deepcopy(b)

-- Christoph

Christoph Zwerschke

unread,

Jan 24, 2006, 1:44:28 PM1/24/06

to

Dave Hansen wrote:
> C uses ! as a unary logical "not" operator, so != for "not equal" just
> seems to follow, um, logically.

Consequently, C should have used !> for <= and !< for >= ...

-- Christoph

Christoph Zwerschke

unread,

Jan 24, 2006, 1:56:50 PM1/24/06

to

Dave Hansen wrote:
>> Once you open your mind for using non-ascii symbols, I'm sure one can
>> find a bunch of useful applications. Variable names could be allowed to
>> be non-ascii, as in XML. Think class names in Arabian... Or you could
>> use Greek letters if you run out of one-letter variable names, just as
>> Mathematicians do. Would this be desirable or rather a horror scenario?
>
> The latter, IMHO. Especially variable names. Consider i vs. ì vs. í
> vs. î vs. ï vs. ...

There could be conventions discouraging you to use ambiguous symbols.
Even today, you wouldn't use a lowercase "l" or an "O" because it can be
confused with a digit 1 or 0. But you're right this problem would become
much greater with unicode chars. This kind of pitfall has already been
overlooked with the introduction of international domain names which are
exploitable for phishing attacks...

-- Christoph

Claudio Grondi

unread,

Jan 24, 2006, 1:59:50 PM1/24/06

to

^-- with this above also the notation

a ← b # a = b

starts to be obvious to me, as it covers also some of the specifics of
Python.

Nice idea.

Claudio
>>
>> -- Christoph

Christoph Zwerschke

unread,

Jan 24, 2006, 2:05:54 PM1/24/06

to

Fredrik Lundh wrote:
> umm. if you have an editor that can convert things back and forth, you
> don't really need language support for "digraphs"...

It would just be very impractical to convert back and forth every time
you want to run a program. Python also supports tabs AND spaces though
you can easily convert things.

But indeed, in 100 years or so ;-) if people get accustomed to using
these symbols and input will be easy, digraph support could become
optional and then phase out... Just as now happens with C trigraphs.

-- Christoph

Dave Hansen

unread,

Jan 24, 2006, 2:06:49 PM1/24/06

to

Well, actually, no.

"Less (than) or equal" is <=. "Greater (than) or equal" is >=. "Not
equal" is !=.

If you want to write code for the IOCCC, you could use !(a>b) instead
of a<=b...

Steven D'Aprano

unread,

Jan 24, 2006, 4:26:16 PM1/24/06

to

On Tue, 24 Jan 2006 10:38:56 -0600, Dave Hansen wrote:

> The latter, IMHO. Especially variable names. Consider i vs. ì vs. í
> vs. î vs. ï vs. ...

Agreed, but that's the programmer's fault for choosing stupid variable
names. (One character names are almost always a bad idea. Names which can
be easily misread are always a bad idea.) Consider how easy it is to
shoot yourself in the foot with plain ASCII:

l1 = 0
l2 = 4
...
pages of code
...
assert 11 + l2 = 4

--
Steven.

Dave Hansen

unread,

Jan 24, 2006, 4:58:35 PM1/24/06

to

On Wed, 25 Jan 2006 08:26:16 +1100 in comp.lang.python, Steven
D'Aprano <st...@REMOVETHIScyber.com.au> wrote:

>On Tue, 24 Jan 2006 10:38:56 -0600, Dave Hansen wrote:
>
>> The latter, IMHO. Especially variable names. Consider i vs. ì vs. í
>> vs. î vs. ï vs. ...
>
>Agreed, but that's the programmer's fault for choosing stupid variable
>names. (One character names are almost always a bad idea. Names which can
>be easily misread are always a bad idea.) Consider how easy it is to

I wasn't necessarily expecting single-character names. Indeed, the
different between i and ì is easier to see than the difference
between, say, long_variable_name and long_varìable_name. For me,
anyway.

>shoot yourself in the foot with plain ASCII:
>
>
>l1 = 0
>l2 = 4
>...
>pages of code
>...
>assert 11 + l2 = 4

You've shot yourself twice, there. Python would tell you about the
second error, though.

Steven D'Aprano

unread,

Jan 24, 2006, 5:15:29 PM1/24/06

to

On Tue, 24 Jan 2006 15:58:35 -0600, Dave Hansen wrote:

> On Wed, 25 Jan 2006 08:26:16 +1100 in comp.lang.python, Steven
> D'Aprano <st...@REMOVETHIScyber.com.au> wrote:
>
>>On Tue, 24 Jan 2006 10:38:56 -0600, Dave Hansen wrote:
>>
>>> The latter, IMHO. Especially variable names. Consider i vs. ì vs. í
>>> vs. î vs. ï vs. ...
>>
>>Agreed, but that's the programmer's fault for choosing stupid variable
>>names. (One character names are almost always a bad idea. Names which can
>>be easily misread are always a bad idea.) Consider how easy it is to
>
> I wasn't necessarily expecting single-character names. Indeed, the
> different between i and ì is easier to see than the difference
> between, say, long_variable_name and long_varìable_name. For me,
> anyway.

Sure. But that's no worse than pxfoobrtnamer and pxfoobtrnamer.

I'm not saying that adding more characters to the mix won't increase the
opportunity to pick bad names. But this isn't a new problem, it is an old
problem.

>>shoot yourself in the foot with plain ASCII:
>>
>>
>>l1 = 0
>>l2 = 4
>>...
>>pages of code
>>...
>>assert 11 + l2 = 4
>
> You've shot yourself twice, there.

Deliberately so. The question is, in real code without the assert, should
the result of the addition be 4, 12, 15 or 23?

--
Steven.

James Stroud

unread,

Jan 25, 2006, 12:11:49 AM1/25/06

to

Robert Kern wrote:
> James Stroud wrote:
>
>

>>I can't find "≤, ≥, or ≠" on my keyboard.
>
>
> Get a better keyboard? or OS?

Please talk to my boss. Tell him I want a Quad G5 with about 2 Giga ram.
I'll by the keyboard myself, no problemo.

> On OS X,
>
> ≤ is Alt-,
> ≥ is Alt-.
> ≠ is Alt-=
>
> Fewer keystrokes than <= or >= or !=.
>

James

Robert Kern

unread,

Jan 25, 2006, 2:14:22 AM1/25/06

to pytho...@python.org

James Stroud wrote:
> Robert Kern wrote:
>
>>James Stroud wrote:
>>
>>>I can't find "≤, ≥, or ≠" on my keyboard.
>>
>>Get a better keyboard? or OS?
>
> Please talk to my boss. Tell him I want a Quad G5 with about 2 Giga ram.
> I'll by the keyboard myself, no problemo.

Alternatively, you can simply learn to use the tools in front of you.

http://www.cl.cam.ac.uk/~mgk25/unicode.html#input

Bengt Richter

unread,

Jan 25, 2006, 6:28:18 AM1/25/06

to

On Tue, 24 Jan 2006 04:09:00 +0100, Christoph Zwerschke <ci...@online.de> wrote:

>On the page http://wiki.python.org/moin/Python3%2e0Suggestions
>I noticed an interesting suggestion:
>
>"These operators ≤ ≥ ≠ should be added to the language having the
>following meaning:
>
> <= >= !=
>
>this should improve readibility (and make language more accessible to
>beginners).
>
>This should be an evolution similar to the digraphe and trigraph
>(digramme et trigramme) from C and C++ languages."
>
>How do people on this group feel about this suggestion?
>
>The symbols above are not even latin-1, you need utf-8.
>

Maybe we need a Python unisource type which is abstract like unicode,
and through encoding can be rendered various ways. Of course it would have
internal representation in some encoding, probably utf-16le, but glyphs
for operators and such would be normalized, and then could be rendered
as multi-glyphs or special characters however desired. This means that
unisource would not just be an encoding resulting from decoding just
a character encoding like latin-1, but would be a result of decoding
source in a Python-syntax-sensitive way, differentiating between <=
as a relational operator vs '<=' in a string literal or comment etc.

>(There are not many usefuls symbols in latin-1. Maybe one could use ×
>for cartesian products...)
>
>And while they are better readable, they are not better typable (at
>least with most current editors).
>

>Is this idea absurd or will one day our children think that restricting
>to 7-bit ascii was absurd?

I think it's important to have readable ascii representations available for
programming elements at least.

>
>Are there similar attempts in other languages? I can only think of APL,
>but that was a long time ago.
>

>Once you open your mind for using non-ascii symbols, I'm sure one can
>find a bunch of useful applications. Variable names could be allowed to
>be non-ascii, as in XML. Think class names in Arabian... Or you could
>use Greek letters if you run out of one-letter variable names, just as
>Mathematicians do. Would this be desirable or rather a horror scenario?

>Opinions?
I think there are pros and cons. What if the "href" in HTML could be spelled in
any characters? I.e., some things are part of a standard encoding and representation
system. Some of python is like that. "True" should not be spelled "Vrai" or "Sant," except
in localized messages, IMO, unless perhaps there is a unisource type that normalizes
these things too, and can render in localized formats. ... I guess China is a
pretty big market, so I wonder what they will do.

Someone has to get really excited about it, and have the expertise or willingness
to slog their way to expertise, and the persistence to get something done. And all
that in the face of the fact that much of the problem will be engineering consensus,
not engineering technical solutions. So are you excited? Good luck ;-)

Probably the best anyone with any excitement to spare could do is ask Martin
what he could use help with, if anything. He'd probably not like muddying any
existing clear visions and plans with impractical ramblings though ;-)

Regards,
Bengt Richter

Ido Yehieli

unread,

Jan 25, 2006, 8:07:46 AM1/25/06

to

I still remember it not being supported on most or all big Iron servers
at my previuos uni (were mostly SunOS, Digital UNIX among others)

Peter Hansen

unread,

Jan 25, 2006, 11:14:06 AM1/25/06

to pytho...@python.org

Dave Hansen wrote:
> C uses ! as a unary logical "not" operator, so != for "not equal" just
> seems to follow, um, logically.
>
> Pascal used <>, which intuitively (to me, anyway ;-) read "less than
> or greater than," i.e., "not equal."

For quantitative data, anyway, or things which can be ordered consistently.

It's unclear to me how well this concept maps to other sorts of data.
Complex numbers, for example.

I think "not equal", at least the way our brains handle it in general,
is not equivalent to "less than or greater than".

That is, I think the concept "not equal" is less than or greater than
the concept "less than or greater than". <wink>

-Peter

Steven D'Aprano

unread,

Jan 25, 2006, 6:23:11 PM1/25/06

to

On Wed, 25 Jan 2006 11:14:06 -0500, Peter Hansen wrote:

> I think "not equal", at least the way our brains handle it in general,
> is not equivalent to "less than or greater than".
>
> That is, I think the concept "not equal" is less than or greater than
> the concept "less than or greater than". <wink>

For objects that don't have total ordering, "not equal" != is not the
same as "less than or greater than" <>.

The two obvious examples are complex numbers, where C1 != C2 can be
evaluated, but C1 <> C2 is not defined, and NaNs, where NaN != NaN is
always true but NaN <> NaN is undefined.

--
Steven.

Terry Hancock

unread,

Jan 25, 2006, 6:43:21 PM1/25/06

to pytho...@python.org

On Tue, 24 Jan 2006 04:09:00 +0100

Christoph Zwerschke <ci...@online.de> wrote:
> On the page
> http://wiki.python.org/moin/Python3%2e0Suggestions
> I noticed an interesting suggestion:
>

> "These operators â‰¤ â‰¥ â‰ should be added to the

> language having the following meaning:
>
> <= >= !=
>
> this should improve readibility (and make language more
> accessible to beginners).
>
> This should be an evolution similar to the digraphe and
> trigraph (digramme et trigramme) from C and C++
> languages."
>
> How do people on this group feel about this suggestion?

In principle, and in the long run, I am definitely for it.

Pragmatically, though, there are still a lot of places
where it would cause me pain. For example, it exposes
problems even in reading this thread in my mail client
(which is ironic, considering that it manages to correctly
render Russian and Japanese spam messages. Grrr.).

OTOH, there will *always* be backwards systems, so you
can't wait forever to move to using newer features.

> The symbols above are not even latin-1, you need utf-8.

> And while they are better readable, they are not better

> typable (at least with most current editors).

They're not that bad. I manage to get kana and kanji working
correctly when I really need them.

> Are there similar attempts in other languages? I can only
> think of APL, but that was a long time ago.

I'm pretty sure that there are. The idea of adding UTF8 for
use in identifiers and stuff has been around for awhile for
Python. I'm pretty sure you can do this already in Java,
can't you? (I think I read this somewhere, but I don't
think it gets used much).

> Once you open your mind for using non-ascii symbols, I'm
> sure one can find a bunch of useful applications.
> Variable names could be allowed to be non-ascii, as in
> XML. Think class names in Arabian... Or you could use
> Greek letters if you run out of one-letter variable names,
> just as Mathematicians do. Would this be desirable or
> rather a horror scenario? Opinions?

Greek letters would be a real relief in writing scientific
software. There's something deeply annoying about variables
named THETA, theta, and Theta. Or "w" meaning "omega.

People coming from other programming backgrounds may object
that these uses are less informative. But in the sciences,
some of these symbols have as much recognizability as "+" or
"$" do to other people. Reading math notation from a
scientists, I can be pretty darned certain that "c" is "the
speed of light" or that "epsilon" is a small, allowable
variation in a variable. And so on. It's true that there are
occasionable problems when problem domains merge, but that's
true of words, too.

It would also reduce the difficulty of going back and forth
between the paper describing the math, and the program
using it.

One thing that I also think would be good is to open up the
operator set for Python. Right now you can overload the
existing operators, but you can't easily define new ones.
And even if you do, you are very limited in what you can
use, and understandability suffers.

But unicode provides codeblocks for operators that
mathematicians use for special operators ("circle-times"
etc). That would both reduce confusion for people bothered
by weird choices of overloading "*" and "+" and allow people
who need these features the ability to use them.

It's also relevant that scientists in China and Saudi Arabia
probably use a roman "c" for the speed of light, or a "mu"
to represent a mass, so it's likely more understandable
internationally than using, say "lightspeed" and "mass".

OTOH, using identifiers in many different languages would
have the opposite effect. Right now, English is accepted as
a lingua franca for programming (and I admit that as a
native speaker of English, I benefit from that), but if it
became common practice to use lots of different languages,
cooperation might suffer.

But then, that's probably why English still dominates with
Java. I suspect that just means people wouldn't use it as
much. And I've certainly dealt with source code commented
in Spanish or German. It didn't kill me.

So, I'd say that in the long run:

1) Yes it will be adopted

2) The math and greek-letter type symbols will be the big
win

3) Localized variable names will be useful to some people,
but not widely popular, especially for cooperative free
software projects (of course, in the Far East, for example,
han character names might become very popular as they span
several languages). But I bet it will remain underused so
long as English remains the most popular international trade
language.

In the meantime, though, I predict many luddites will
scream "But it doesn't work on my vintage VT-220 terminal!"
(And I may even be one of them).

Cheers,
Terry

--
Terry Hancock (han...@AnansiSpaceworks.com)
Anansi Spaceworks http://www.AnansiSpaceworks.com

Christoph Zwerschke

unread,

Jan 25, 2006, 11:23:03 PM1/25/06

to

These were some interesting remarks, Terry.

I just asked myself how Chinese programmers feel about this. I don't
know Chinese, but probably they could write a whole program using only
one-character names for variables, and it would be still readable (at
least for Chinese)... Would this be used or would they rather prefer to
write in English on account of compatibilty issues (technical and human
readability in international projects) or because typing these chars is
more cumbersome than ascii chars? Any Chinese here?

-- Christoph

Terry Hancock

unread,

Jan 26, 2006, 10:44:52 AM1/26/06

to pytho...@python.org

On Thu, 26 Jan 2006 01:12:10 -0600
Runsun Pan <pytho...@gmail.com> wrote:
> For the tests that I tried earlier, using han characters
> as the variable names doesn't seem to be possible (Syntax
> Error) in python. I'd love to see if I can use han char
> for all those keywords like import, but it doesn't work.

Yeah, I'm pretty sure we're talking about the future here.
:-)

> That depends. People with ages in the middle or older
> probably have very rare experience of typing han
> characters. But with the popularity of computer
> as well as the development of excellent input packages,
> and most importantly,
> the online-chats that many teenagers hooking to, next
> several geneartions can type han char easily and
> comfortably.

That's interesting. I think many people in the West tend to
imagine han/kanji characters as archaisms that will
disappear (because to most Westerners they seem impossibly
complex to learn and use, "not suited for the modern
world"). I used to think this was likely, although I always
thought the characters were beautiful, so it would be a
shame.

After taking a couple of semesters of Japanese, though, I've
come to appreciate why they are preferred. Getting rid of
them would be like convincing English people to kunvurt to
pur fonetik spelin'.

Which isn't happening either, I can assure you. ;-)

> One thing that is lack in other languages is the "phrase
> input"---- almost every
> han input package provides this customizable feature. With
> all these combined,
> many of youngesters can type as fast as they talk. I
> believe many of them input
> han characters much faster than inputting English.

I guess this is like Canna/SKK server for typing Japanese.
I've never tried to localize my desktop to Japanese (and I
don't think I want to -- I can't read it all that well!),
but I've used kanji input in Yudit and a kanji-enabled
terminal.

I'm not sure I understand how this works, but surely if
Python can provide readline support in the interactive
shell, it ought to be able to handle "phrase input"/"kanji
input." Come to think of it, you probably can do this by
running the interpreter in a kanji terminal -- but Python
just doesn't know what to do with the characters yet.

> The "side effect" of this technology advance might be that
> in the future the
> simplified chinese characters might deprecate, 'cos
> there's no need to simplify
> any more.

Heh. I must say the traditional characters are easier for
*me* to read. But that's probably because the Japanese kanji
are based on them, and that's what I learned. I never could
get the hang of "grass hand" or the "cursive" Chinese han
character style.

I would like to point out also, that as long as Chinese
programmers don't go "hog wild" and use obscure characters,
I suspect that I would have much better luck reading their
programs with han characters, than with, say, the Chinese
phonetic names! Possibly even better than what they thought
were the correct English words, if their English isn't that
good.

Rocco Moretti

unread,

Jan 26, 2006, 10:38:56 AM1/26/06

to

Terry Hancock wrote:

> One thing that I also think would be good is to open up the
> operator set for Python. Right now you can overload the
> existing operators, but you can't easily define new ones.
> And even if you do, you are very limited in what you can
> use, and understandability suffers.

One of the issues that would need to be dealt with in allowing new
operators to be defined is how to work out precedence rules for the new
operators. Right now you can redefine the meaning of addition and
multiplication, but you can't change the order of operations. (Witness
%, and that it must have the same precedence in both multiplication and
string replacement.)

If you allow (semi)arbitrary characters to be used as operators, some
scheme must be chosen for assigning a place in the precedence hierarchy.

Claudio Grondi

unread,

Jan 26, 2006, 11:47:51 AM1/26/06

to

Speaking maybe only for myself:
I don't like implicit rules, so I don't like also any precedence
hierarchy being in action, so for safety reasons I always write even
8+6*2 (==20) as 8+(6*2) to be sure all will go the way I expect it.

Claudio

Christoph Zwerschke

unread,

Jan 26, 2006, 6:32:05 PM1/26/06

to

Claudio Grondi wrote:
> Speaking maybe only for myself:
> I don't like implicit rules, so I don't like also any precedence
> hierarchy being in action, so for safety reasons I always write even
> 8+6*2 (==20) as 8+(6*2) to be sure all will go the way I expect it.

But for people who often use mathematical formulas this looks pretty
weird. If it wasn't a programming language, you wouldn't write an
asterik even, but either a mid dot or nothing. The latter is possible
because contrary to programming languages, you usually use one-letter
names in formulas, so it is clear that ab means a*b, and does not
designate a variable with the name "ab". x**2+y**2+(2*pi*r) looks way
uglier than x²+y²+2πr (another appication for greek letters). Maybe
providing a "formula" or "math style" mode would be sometimes helpful.
Or maybe not, because other conventions of mathematical formulas (long
fraction strokes, using subscript indices and superscript exponents
etc.) couldn't be solved so easily anyway. You would need editors with
the ability to display and input "formula sections" in Python programs
differently. Python would become something like "executable TeX" rather
than "executable pseudo code"...

-- Christoph

Bengt Richter

unread,

Jan 27, 2006, 3:11:24 AM1/27/06

to

Maybe you would like the unambiguousness of
(+ 8 (* 6 2))
or
6 2 * 8 +
?

Hm, ... ISTM you could have a concept of all objects as potential operator
objects as now, but instead of selecting methods of the objects according
to special symbols like + - * etc, allow method selection by rules applied
to a sequence of objects for selecting methods. E.g., say
a, X, b, Y, c
is a sequence of objects (happening to be contained in a tuple expression here).
Now let's define seqeval such that
seqeval((a, X, b, Y, c))
looks at the objects to see if they have certain methods, and then calls some of
those methods with some of the other objects as arguments, and applies rules of
precedence and association to do something useful, producing a final result.

I'm just thinking out loud here, but what I'm getting at is being able to write
8+6*2
as
seqeval((8, PLUS, 6, TIMES, 2))
with the appropriate definitions of seqeval and PLUS and TIMES. This is with a view
to having seqeval as a builtin that does standard processing, and then having
a language change to make white-space-separated expressions like
8 PLUS 6 TIMES 2
be syntactic sugar for an implicit
seqeval((8, PLUS, 6, TIMES, 2))
where PLUS and TIMES may be arbitrary user-defined objects suitable for seqeval.
I'm thinking out loud, so I anticipate syntactic ambiguities in expressions and the need to
use parens etc., but this would in effect let us define arbitrarily named operators.
Precedence might be established by looking for PLUS.__precedence__. But as usual,
parens would control precedence dominantly. E.g.,
(8 PLUS 6) TIMES 2
would be sugar for
seqeval((seqeval(8, PLUS, 6), TIMES, 2)

IOW, we have an object sequence expression analogous to a tuple expression without commas.
I guess generator expressions might be somewhat of a problem to disambiguate sometimes, we'll see
how bad that gets ;-)

One way to detect operator objects would be to test callable(obj), which would allow
for functions and types and bound methods etc. Now there needs to be a way of
handling UNARY_PLUS vs PLUS functionality (obviously the name bindings are just mnemonic
and aren't seen by seqeval unless they're part of the operator object). ...

A sketch:

>>> def seqeval(objseq):
... """evaluate an object sequence. rules tbd."""
... args=[]
... ops=[]
... for obj in objseq:
... if callable(obj):
... if ops[-1:] and obj.__precedence__<= ops[-1].__precedence__:
... args[-2:] = [ops.pop()(*args[-2:])]
... ops.append(obj)
... continue
... elif isinstance(obj, tuple):
... obj = seqeval(obj)
... while len(args)==0 and ops: # unary
... obj = ops.pop()(obj)
... args.append(obj)
... while ops:
... args[-2:] = [ops.pop()(*args[-2:])]
... return args[-1]
...
>>> def PLUS(x, y=None):
... print 'PLUS(%s, %s)'%(x,y)
... if y is None: return x
... else: return x+y
...
>>> PLUS.__precedence__ = 1
>>>
>>> def MINUS(x, y=None):
... print 'MINUS(%s, %s)'%(x,y)
... if y is None: return -x
... else: return x-y
...
>>> MINUS.__precedence__ = 1
>>>
>>> def TIMES(x, y):
... print 'TIMES(%s, %s)'%(x,y)
... return x*y
...
>>> TIMES.__precedence__ = 2
>>>
>>> seqeval((8, PLUS, 6, TIMES, 2))
TIMES(6, 2)
PLUS(8, 12)
20
>>> seqeval(((8, PLUS, 6), TIMES, 2))
PLUS(8, 6)
TIMES(14, 2)
28
>>> seqeval(((8, PLUS, 6), TIMES, (MINUS, 2)))
PLUS(8, 6)
MINUS(2, None)
TIMES(14, -2)
-28
>>> seqeval((MINUS, (8, PLUS, 6), TIMES, (MINUS, 2)))
PLUS(8, 6)
MINUS(14, None)
MINUS(2, None)
TIMES(-14, -2)
28
>>> list(seqeval((i, TIMES, j, PLUS, k)) for i in (2,3) for j in (10,100) for k in (5,7))
TIMES(2, 10)
PLUS(20, 5)
TIMES(2, 10)
PLUS(20, 7)
TIMES(2, 100)
PLUS(200, 5)
TIMES(2, 100)
PLUS(200, 7)
TIMES(3, 10)
PLUS(30, 5)
TIMES(3, 10)
PLUS(30, 7)
TIMES(3, 100)
PLUS(300, 5)
TIMES(3, 100)
PLUS(300, 7)
[25, 27, 205, 207, 35, 37, 305, 307]

Regards,
Bengt Richter

Claudio Grondi

unread,

Jan 27, 2006, 3:47:15 AM1/27/06

to

At the first glance I like this concept much and mean it is very
Pythonic in the sense of the term as I understand it. I would be glad to
see it implemented if it does not result in any side effects or other
problems I can't currently anticipate.

Claudio

Magnus Lycka

unread,

Jan 27, 2006, 5:05:15 AM1/27/06

to

Terry Hancock wrote:
> That's interesting. I think many people in the West tend to
> imagine han/kanji characters as archaisms that will
> disappear (because to most Westerners they seem impossibly
> complex to learn and use, "not suited for the modern
> world").

I don't know about "the West". Isn't it more typical for the
US that people believe that "everybody really wants to be like
us". Here in Sweden, *we* obviously want to be like you, even
if we don't admit it openly, but we don't suffer from the
misconception that this applies to all of the world. ;)

> After taking a couple of semesters of Japanese, though, I've
> come to appreciate why they are preferred. Getting rid of
> them would be like convincing English people to kunvurt to
> pur fonetik spelin'.
>
> Which isn't happening either, I can assure you. ;-)

The Germans just had a spelling reform. Norway had a major
language reform in the mid 19th century to get rid of the old
Danish influences (and still have two completely different ways
of spelling everything). You never know what will happen. You
are also embracing the metric system, inch by inch... ;)

Actually, it seems that recent habit of sending text messages
via mobile phones is the prime driver for reformed spelling
these days.

> I'm not sure I understand how this works, but surely if
> Python can provide readline support in the interactive
> shell, it ought to be able to handle "phrase input"/"kanji
> input." Come to think of it, you probably can do this by
> running the interpreter in a kanji terminal -- but Python
> just doesn't know what to do with the characters yet.

I'm sure the same principles could be used to make a very fast
and less misspelling prone editing environment though. That
could actually be a reason to step away from vi or Emacs (but
I assume it would soon work in Emacs too...)

> I would like to point out also, that as long as Chinese
> programmers don't go "hog wild" and use obscure characters,
> I suspect that I would have much better luck reading their
> programs with han characters, than with, say, the Chinese
> phonetic names! Possibly even better than what they thought
> were the correct English words, if their English isn't that
> good.

You certainly have a point there. Even when I don't work in an
English speaking environment as I do now, I try to write all
comments and variable names etc in English. You never know when
you need to show a code snippet to people who don't read Swedish.
Also, ASCII lacks three of our letters and properly translated
is often better than written with the wrong letters.

On the other hand, if the target users describe their problem
domain with e.g. a Swedish terminology, translating all terms
will take time and increase confusion. Also, there are plenty
of programmers who don't write English so well...

Dave Hansen

unread,

Jan 27, 2006, 10:27:10 AM1/27/06

to

Just a couple half-serious responses to your comment...

On Fri, 27 Jan 2006 11:05:15 +0100 in comp.lang.python, Magnus Lycka
<ly...@carmen.se> wrote:

>Terry Hancock wrote:
>> That's interesting. I think many people in the West tend to
>> imagine han/kanji characters as archaisms that will
>> disappear (because to most Westerners they seem impossibly
>> complex to learn and use, "not suited for the modern
>> world").
>I don't know about "the West". Isn't it more typical for the
>US that people believe that "everybody really wants to be like
>us". Here in Sweden, *we* obviously want to be like you, even
>if we don't admit it openly, but we don't suffer from the
>misconception that this applies to all of the world. ;)

1) Actually, we don't think "everyone wants to be like us." More like
"anyone who doesn't want to be like us is weird."

2) This extends to our own fellow citizens.

Dave Hansen

unread,

Jan 27, 2006, 10:28:23 AM1/27/06

to

On Fri, 27 Jan 2006 08:11:24 GMT in comp.lang.python, bo...@oz.net
(Bengt Richter) wrote:

[...]

>Maybe you would like the unambiguousness of
> (+ 8 (* 6 2))
>or
> 6 2 * 8 +
>?

Well, I do like lisp and Forth, but would prefer Python to remain
Python.

Though it's hard to fit Python into 1k on an 8-bit mocrocontroller...

Runsun Pan

unread,

Jan 27, 2006, 1:50:03 PM1/27/06

to pytho...@python.org

On 1/27/06, Magnus Lycka <ly...@carmen.se> wrote:
> > After taking a couple of semesters of Japanese, though, I've
> > come to appreciate why they are preferred. Getting rid of
> > them would be like convincing English people to kunvurt to
> > pur fonetik spelin'.
> >
> > Which isn't happening either, I can assure you. ;-)
> The Germans just had a spelling reform. Norway had a major
> language reform in the mid 19th century to get rid of the old
> Danish influences (and still have two completely different ways
> of spelling everything). You never know what will happen. You
> are also embracing the metric system, inch by inch... ;)

The simplified chinese exists due to the call for modernization of
language decades ago. That involved the 'upside-down' of almost
entire culture --- nowadays people in China can't even read most of
the documents written just 70~80 years ago. Imagine its damage
to the 'historical sense' of modern chinese !!! The "anti-simplification"
force was thus imaginaribly huge. Actually, not only the original
plan of simplification wasn't completed (only proceded to the 1st
stage; the 2nd stage was put off), there are calls for reversal -- back
to the traditional forms -- lately. Obviously, language reform is not
trivial; Especially, for asian countries, it is probably not as easy as it
is for western countries.

China is still a central authoritarian country. Even with that government
they were unable to push this thru. If any one would even dream about
language reform in democratic Taiwan, I bet the proposal won't even
pass the first step in the congress.

> Actually, it seems that recent habit of sending text messages
> via mobile phones is the prime driver for reformed spelling
> these days.

Well, to solve the problem you can either (1) reform the spelling
of a language to meet the limitation of mobile phones, or (2)
advancing the input device on the mobile phones such that they
can input the language of your choice. For most asian languages,
(1) is certainly out of question.

> > I'm not sure I understand how this works, but surely if
> > Python can provide readline support in the interactive
> > shell, it ought to be able to handle "phrase input"/"kanji
> > input." Come to think of it, you probably can do this by
> > running the interpreter in a kanji terminal -- but Python
> > just doesn't know what to do with the characters yet.
> I'm sure the same principles could be used to make a very fast
> and less misspelling prone editing environment though. That
> could actually be a reason to step away from vi or Emacs (but
> I assume it would soon work in Emacs too...)

True. Actually Google, Answers.com and some other desktop
applications use 'auto-complete' feature already. It might seem
impressive to most western users but, from where I was from
(Taiwan), this 'phrase-input', as well as "showing up in the order
of the most-frequently-use for any specific user", have been
around for about 20 years.

>> I would like to point out also, that as long as Chinese
>> programmers don't go "hog wild" and use obscure characters,
>> I suspect that I would have much better luck reading their
>> programs with han characters, than with, say, the Chinese
>> phonetic names! Possibly even better than what they thought
>> were the correct English words, if their English isn't that
>> good.
> You certainly have a point there. Even when I don't work in an
> English speaking environment as I do now, I try to write all
> comments and variable names etc in English. You never know when
> you need to show a code snippet to people who don't read Swedish.
> Also, ASCII lacks three of our letters and properly translated
> is often better than written with the wrong letters.

If there will be someday that any programming language can
be input with some form like Big5, I believe its intended target
will ONLY be people using only Big5. That means, if it exists, the
chance of showing it to other-language-users probably be extremely
nil, Think about this: there are still a whole lot of people who don't
know English at all. If no such a 'Big5-specific' programming
tool around, their chances of learning programming is completely
rid off.

--
~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~
Runsun Pan, PhD
pytho...@gmail.com
Nat'l Center for Macromolecular Imaging
http://ncmi.bcm.tmc.edu/ncmi/
~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~

Ivan Voras

unread,

Jan 27, 2006, 3:36:55 PM1/27/06

to

Robert Kern wrote:

> On OS X,
>
> ≤ is Alt-,
> ≥ is Alt-.
> ≠ is Alt-=

Thumbs up on the unicode idea, but national keyboards (i.e. non-english)
have already used almost every possible
not-strictly-defined-in-EN-keyboards combination of keys for their own
characters. In particular, the key combinations above are reprogrammed
to something else in my language/keyboard.

But, the idea that Python could be made ready for when the keyboards and
editors start supporting such characters is a good one (i.e. keep both
<= and ≤ for several decades).

It's not a far-out idea. I stumbled about a year ago on a programming
language that INSISTED on unicode characters like ≤ as well as the rest
of mathematical/logical symbols; I don't remember its name but the
source code with characters like that looked absolutely beautiful. I
suppose that one day, when unicode becomes more used than ascii7, "old"
code like current C and python will be considered ugly and unelegant in
appearance :)

Rocco Moretti

unread,

Jan 27, 2006, 4:31:05 PM1/27/06

to

Ivan Voras wrote:

> It's not a far-out idea. I stumbled about a year ago on a programming
> language that INSISTED on unicode characters like ≤ as well as the rest
> of mathematical/logical symbols; I don't remember its name but the
> source code with characters like that looked absolutely beautiful.

Could it be APL?

http://en.wikipedia.org/wiki/APL_programming_language

Although saying it used "Unicode characters" is a bit of a stretch - APL
predated Unicode by some 30+ years.

Neil Hodgson

unread,

Jan 27, 2006, 5:29:20 PM1/27/06

to

Having a bit of a play with some of my spam reduction code.

Original:

def isMostlyCyrillic(u):
if type(u) != type(u""):
u = unicode(u, "UTF-8")
cnt = float(sum(0x400 <= ord(c) < 0x500 for c in u))
return (cnt > 1) and ((cnt / len(u)) > 0.5)

Using more mathematical operators:

def isMostlyCyrillic(u):
if type(u) ≠ type(u""):
u ← unicode(u, "UTF-8")
cnt ← float(∑(0x400 ≤ ord(c) < 0x500 ∀ c ∈ u))
return (cnt > 1) ∧ ((cnt ÷ len(u)) > 0.5)

The biggest win for me is "≠" with "←" also an improvement. I'm so
used to "/" for division that "÷" now looks strange.

Neil

Terry Hancock

unread,

Jan 27, 2006, 7:12:23 PM1/27/06

to pytho...@python.org

On Fri, 27 Jan 2006 12:50:03 -0600
Runsun Pan <pytho...@gmail.com> wrote:
> On 1/27/06, Magnus Lycka <ly...@carmen.se> wrote:

> > Actually, it seems that recent habit of sending text
> > messages via mobile phones is the prime driver for
> > reformed spelling these days.

OMG ru kdng?

Make it stop!

Well, let's just say, I think there should be different
standards for "write once / read once" versus "write once /
read many". The mere use of written language once implied
the latter, but I suppose text messaging breaks that rule.

> Well, to solve the problem you can either (1) reform the
> spelling of a language to meet the limitation of mobile
> phones, or (2) advancing the input device on the mobile
> phones such that they can input the language of your
> choice. For most asian languages, (1) is certainly out of
> question.

IIRC, back in the 1990s there was a *lot* of work in Japan
on optical character recognition, and especially "digital
ink" or "stroke" recognition. With all the pen tablets out
these days, it seems like that would be an awfully good way
to handle ideograms.

First of all, they are, much more than Western alphabets,
strict about stroke order and direction (technically the
Roman alphabet is supposed to be drawn a certain way, but
many people "cheat" -- I think that's harder to get away
with with Asian characters, because they tend not to look
right when drawn wrong). And when you have the actual
stroke sequence data as input, recognition is easier and
more reliable (I think that was the point behind the
"graffiti" system for the Palm Pilot).

Dan Sommers

unread,

Jan 27, 2006, 8:27:40 PM1/27/06

to

On Fri, 27 Jan 2006 22:29:20 GMT,
Neil Hodgson <nyamatong...@gmail.com> wrote:

> ... I'm so used to "/" for division that "÷" now looks strange.

Strange, indeed, and too close to + for me (at least within my
newsreader).

Regards,
Dan

--
Dan Sommers
<http://www.tombstonezero.net/dan/>

Runsun Pan

unread,

Jan 28, 2006, 12:57:16 AM1/28/06

to pytho...@python.org

On 1/27/06, Terry Hancock <han...@anansispaceworks.com> wrote:
> Well, let's just say, I think there should be different
> standards for "write once / read once" versus "write once /
> read many". The mere use of written language once implied
> the latter, but I suppose text messaging breaks that rule.

Since we are on this, let me share with you guys a little 'ice-tip'
for how the younger generations in Taiwan communicate:

A: why did you tell av8d that I am a bmw ?
B: Well, you are just like one of those ogs or obs ...
A: oic, you think you are much q than I ?
B: ...
A: I would 3q if you stop doing so.
B: ok.
A: Orz
B: 88
A: 881

Can you guys figure out the details ?

Here is the decoded version:

A: why did you tell av8d that I am a bmw ?
[8 in our language is pronounced as "ba", so av8d = everybody]

B: Well, you are just like one of those ogs or obs ...
[ogs= oh-ji-sang, obs=oh-ba-sang, Japanese, means old guy, old
woman, respectively]

A: oic, you think you are much q than I ?
[oic=Oh I see; q = cute]

A: I would 3q if you stop doing so.
[ 3q = thank you ]

B: ok.

A: Orz
[ appreciate very much --- it looks like a guy knee down when seeing an Empire ]

B: 88
[ bye-bye ]

A: 881
[ bye-bye with a tone, sometimes 886 = bye-bye-loh ]

The above example is just an extremely simple one. In the real world,
they combined all sort of language sources --- mandarine, japanese,
english, taiwanese ... as well as "shape" like Orz.

This kind of mixture-of-everything is widely used in young
generations, sometimes called "net terms", sometimes called "Martian
words". It faciliates the online activities among youngists, but
creates huge 'generation gaps' --- some dictionaries were published
for high school teachers to study in order for them to talk and
understand their students.

IMO, a language is a living organism, it has its own life and often
evolves with unexpected turns. Maybe in the future some of those
Martian Words will become part of formal Taiwanese, who knows ? :)

> First of all, they are, much more than Western alphabets,
> strict about stroke order and direction (technically the
> Roman alphabet is supposed to be drawn a certain way, but
> many people "cheat" -- I think that's harder to get away
> with with Asian characters, because they tend not to look
> right when drawn wrong). And when you have the actual
> stroke sequence data as input, recognition is easier and
> more reliable (I think that was the point behind the
> "graffiti" system for the Palm Pilot).

But ... to my knowledge, all of the input tablets that using OCR has a
training feature. You can teach the program to recognize your own
order of strokes. The ability to train (be trained) is a very key
element of such an input device.

Ivan Voras

unread,

Jan 28, 2006, 9:05:35 AM1/28/06

to

Rocco Moretti wrote:

> Could it be APL?

No, it was much newer... someone did it as a hobby language.

Jorge Godoy

unread,

Jan 28, 2006, 10:53:22 AM1/28/06

to

Runsun Pan <pytho...@gmail.com> writes:

> Can you guys figure out the details ?
>
> Here is the decoded version:

It looks that with all my 26 years I'm too old to understand something like
that... All I can say is OMG... :-)

> IMO, a language is a living organism, it has its own life and often
> evolves with unexpected turns. Maybe in the future some of those
> Martian Words will become part of formal Taiwanese, who knows ? :)

I am extremely against that for pt_BR (Brazilian Portuguese). There's a TV
channel here that has some movies with "net terms" instead of pt_BR for the
translation...

--
Jorge Godoy <go...@ieee.org>

"Quidquid latine dictum sit, altum sonatur."
- Qualquer coisa dita em latim soa profundo.
- Anything said in Latin sounds smart.

Magnus Lycka

unread,

Jan 29, 2006, 11:48:57 AM1/29/06

to

Runsun Pan wrote:
> The simplified chinese exists due to the call for modernization of
> language decades ago. That involved the 'upside-down' of almost
> entire culture

This is in some ways quite the opposite compared to Nynorsk
in Norway, which was an attempt to revive the old and pure
Norwegian, after being dominated (in politics as well as in
grammar) by Denmark from 1387-1814. (I guess it was a
complicating factor that the end of the union with Denmark
led to a union with Sweden. The Norwegians probably had some
difficulties deciding what neighbour they disliked most. When
they broke out of the union with Sweden in 1905, they actually
elected a Danish prince to be their king.) Anyway, only a
fraction of the Norwegians use Nynorsk today, and the majority
still speak the Danish-based bokmål. On the other hand, the
spelling of bokmål has also been modernized a lot, with a
series of spelling reforms of both languages.

Terry Hancock

unread,

Jan 29, 2006, 9:32:20 PM1/29/06

to pytho...@python.org

On Fri, 27 Jan 2006 23:57:16 -0600
Runsun Pan <pytho...@gmail.com> wrote:
> But ... to my knowledge, all of the input tablets that
> using OCR has a training feature. You can teach the
> program to recognize your own order of strokes. The
> ability to train (be trained) is a very key element of
> such an input device.

Yeah, but I would think that would be a real drawback when
there's something like 2000 to 10,000 characters to train
on! I think you'd need some kind of short cut (maybe you
could share radical information between characters?).

But I guess I assumed this would already be a solved problem
by now. Maybe it was a lot harder than expected.

Dave Hansen

unread,

Jan 30, 2006, 11:03:38 AM1/30/06

to

On Fri, 27 Jan 2006 20:27:40 -0500 in comp.lang.python, Dan Sommers
<m...@privacy.net> wrote:

>On Fri, 27 Jan 2006 22:29:20 GMT,
>Neil Hodgson <nyamatong...@gmail.com> wrote:
>
>> ... I'm so used to "/" for division that "÷" now looks strange.

Indeed, I don't think I've used ÷ for division since about 7th grade,
when I first started taking Algebra (over 30 years ago).

>
>Strange, indeed, and too close to + for me (at least within my
>newsreader).
>

FWIW, it looks closer to - than + in mine. And as you say, _too_
close. IMHO.

Roel Schroeven

unread,

Jan 30, 2006, 11:32:37 AM1/30/06

to

Dave Hansen schreef:

> On Fri, 27 Jan 2006 20:27:40 -0500 in comp.lang.python, Dan Sommers
> <m...@privacy.net> wrote:
>
>> On Fri, 27 Jan 2006 22:29:20 GMT,
>> Neil Hodgson <nyamatong...@gmail.com> wrote:
>>
>>> ... I'm so used to "/" for division that "÷" now looks strange.
>
> Indeed, I don't think I've used ÷ for division since about 7th grade,
> when I first started taking Algebra (over 30 years ago).

I have even never used it, except that it's printed on calculators. In
school we used ":" and afterwards "/".

--
If I have been able to see further, it was only because I stood
on the shoulders of giants. -- Isaac Newton

Roel Schroeven

Alex Martelli

unread,

Jan 31, 2006, 12:35:51 AM1/31/06

to

Dave Hansen <id...@hotmail.com> wrote:

> On Fri, 27 Jan 2006 20:27:40 -0500 in comp.lang.python, Dan Sommers
> <m...@privacy.net> wrote:
>
> >On Fri, 27 Jan 2006 22:29:20 GMT,
> >Neil Hodgson <nyamatong...@gmail.com> wrote:
> >
> >> ... I'm so used to "/" for division that "÷" now looks strange.
>
> Indeed, I don't think I've used ÷ for division since about 7th grade,
> when I first started taking Algebra (over 30 years ago).

I used it in APL, and the last time was less than 20 years ago;-).

Alex

Runsun Pan

unread,

Jan 30, 2006, 12:03:13 PM1/30/06

to pytho...@python.org

>From 1387-1814, a ~430 years period, that's quite a long time.
About the total recountable history of Taiwan... :)

In her 400 some history Taiwan has been occupied by several
foreign powers, including Dutch, Tsing Dynasty from China, Japan,
and KMT party from China again. The long time fight against foreign
powers were all futile, resulted in a 'macro-personality' of getting used
to be slaves.

The mentality of being slaves is that when you have the chance to
play master yourself, you still look up to the old master to either get
approval or beg for mercy. This resulted in a bizzard situation in current
Taiwan that even a local-based, democratic government was elected,
the old foreign power is still the underground power that truly control
all aspects of Taiwan. They reject whatever policies the democratic
government plan. Many nation-wide constructions that the old power
planned and supported when they were in power, they turn their positions
into rejecting those them.

The slave mentality of the public is something that help those old power
to paralize the society. With that, a language reform to reduce the cultural
influence of the foreign power is therefore hopeless in Taiwan (at least
currently).

Maybe Norwegians have some sort of that mentality too ? Considering
that they rather to elect people from the old foreign power ...

> --
> http://mail.python.org/mailman/listinfo/python-list