Assignment Versus Equality

Lawrence D’Oliveiro

未讀,

2016年6月26日凌晨3:37:002016/6/26

收件者：

One of Python’s few mistakes was that it copied the C convention of using “=” for assignment and “==” for equality comparison.

It should have copied the old convention from Algol-like languages (including Pascal), where “:=” was assignment, so “=” could keep a meaning closer to its mathematical usage.

For consider, the C usage isn’t even consistent. What is the “not equal” operator? Is it the “not” operator concatenated with the “equal” operator? No it’s not! It is “!” followed by “=” (assignment), of all things! This fits in more with the following pattern:

A += B <=> A = A + B
A *= B <=> A = A * B

in other words

A != B

should be equivalent to

A = A ! B

BartC

未讀,

2016年6月26日清晨6:48:462016/6/26

收件者：

On 26/06/2016 08:36, Lawrence D’Oliveiro wrote:
> One of Python’s few mistakes was that it copied the C convention of using “=” for assignment and “==” for equality comparison.

One of C's many mistakes. Unfortunately C has been very influential.

However, why couldn't Python have used "=" both for assignment, and for
equality? Since I understand assignment ops can't appear in expressions.

> It should have copied the old convention from Algol-like languages (including Pascal), where “:=” was assignment, so “=” could keep a meaning closer to its mathematical usage.

(I think Fortran and PL/I also used "=" for assignment. Both were more
commercially successful than Algol or Pascal.)

> For consider, the C usage isn’t even consistent. What is the “not equal”
> operator? Is it the “not” operator concatenated with the “equal”
operator?
> No it’s not! It is “!” followed by “=” (assignment), of all things!

I thought "!" /was/ the logical not operator (with "~" being bitwise not).

> This fits in more with the following pattern:
>
> A += B <=> A = A + B
> A *= B <=> A = A * B
>
> in other words
>
> A != B
>
> should be equivalent to
>
> A = A ! B

Yes, that's an another inconsistency in C. Sometimes "<>" was used for
"not equals", or "≠" except there was limited keyboard support for that.
("/=" would have the same problem as "!=")

But again, that doesn't apply in Python as the "!=" in "A != B" can't
appear in expressions.

--
Bartc

Steven D'Aprano

未讀,

2016年6月26日上午9:21:582016/6/26

收件者：

On Sun, 26 Jun 2016 08:48 pm, BartC wrote:

> On 26/06/2016 08:36, Lawrence D’Oliveiro wrote:
>> One of Python’s few mistakes was that it copied the C convention of using
>> “=” for assignment and “==” for equality comparison.
>
> One of C's many mistakes. Unfortunately C has been very influential.
>
> However, why couldn't Python have used "=" both for assignment, and for
> equality? Since I understand assignment ops can't appear in expressions.

Personally, I think that even if there is no *syntactical* ambiguity between
assignment and equality, programming languages should still use different
operators for them. I must admit that my first love is still Pascal's :=
for assignment and = for equality, but C's = for assignment and == for
equality it *almost* as good.

(It loses a mark because absolute beginners confuse the assignment = for the
= in mathematics, which is just different enough to cause confusion.)

But the BASIC style = for both assignment and equality is just begging for
confusion. Even though = is not ambiguous given BASIC's rules, it can still
be ambiguous to beginners who haven't yet digested those rules and made
them second nature.

And even experts don't always work with complete statements. Here is a
snippet of BASIC code:

X = Y

Is it an assignment or an equality comparison? Without seeing the context,
it is impossible to tell:

10 X = Y + 1
20 IF X = Y GOTO 50

Now obviously BASIC was a very popular and successful language, for many
years, despite that flaw. But I wouldn't repeat it in a new language.

>> It should have copied the old convention from Algol-like languages
>> (including Pascal), where “:=” was assignment, so “=” could keep a
>> meaning closer to its mathematical usage.
>
> (I think Fortran and PL/I also used "=" for assignment. Both were more
> commercially successful than Algol or Pascal.)

Fortran 77 used .EQ. for equality. I'm not sure about PL/I.

I'm also not sure I'd agree about the commercial success. Fortran certainly
has been extremely popular, albeit almost entirely in numerical computing.
But PL/I has virtually disappeared from the face of the earth, while Pascal
still has a small but dedicated community based on FreePascal, GNU Pascal,
and Delphi.

(Of the three, FreePascal and Delphi appear to still be getting regular
releases.)

--
Steven
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

Rustom Mody

未讀,

2016年6月26日上午10:26:482016/6/26

收件者：

This is a tad bit unfair (I think)
Initially Basic (BASIC as things were spelt then) used
LET X = Y
for the assignment

The general success of the succinct and confusing approach starting Fortran and
exploding with C I guess prompted the shortening

[My impression: Dont know the history exactly]

MRAB

未讀,

2016年6月26日上午11:42:072016/6/26

收件者：

On 2016-06-26 11:48, BartC wrote:
> On 26/06/2016 08:36, Lawrence D’Oliveiro wrote:
>> One of Python’s few mistakes was that it copied the C convention of using “=” for assignment and “==” for equality comparison.
>
> One of C's many mistakes. Unfortunately C has been very influential.
>
> However, why couldn't Python have used "=" both for assignment, and for
> equality? Since I understand assignment ops can't appear in expressions.
>

[snip]

Python supports chained assignments. For example, "a = b = 0" assigns 0
to both a and b.

I'm not sure how common it is, though. I virtually never use it myself.

Cousin Stanley

未讀,

2016年6月26日上午11:47:422016/6/26

收件者：

Dennis Lee Bieber wrote:

> ....
> but I'm sure we'd have a revolt
> if Python comparison operators looked like:
>
> a .eq. b
> a .ne. b
> a .gt. b .or. c .lt. d
> a .le. b .and. c .ge. d
> ....

As someone who learned fortran in the mid 1960s
and pounded a lot of fortran code in the 1970s,
the code above seems very readable ....

--
Stanley C. Kitching
Human Being
Phoenix, Arizona

BartC

未讀,

2016年6月26日上午11:56:322016/6/26

收件者：

On 26/06/2016 16:47, Cousin Stanley wrote:
> Dennis Lee Bieber wrote:
>
>> ....
>> but I'm sure we'd have a revolt
>> if Python comparison operators looked like:
>>
>> a .eq. b
>> a .ne. b
>> a .gt. b .or. c .lt. d
>> a .le. b .and. c .ge. d
>> ....
>
> As someone who learned fortran in the mid 1960s
> and pounded a lot of fortran code in the 1970s,
> the code above seems very readable ....

I did a year of it in the 1970s. Looks funny in lower case though.

(Note, for those who don't know (old) Fortran, that spaces and tabs are
not significant. So those dots are needed, otherwise "a eq b" would be
parsed as "aeqb".)

--
Bartc

BartC

未讀,

2016年6月26日中午12:09:052016/6/26

收件者：

Well, if it's allowed, then it doesn't matter how common it is.

So "=" couldn't be used with a different meaning inside expressions as
it would make this ambiguous.

It also raises the possibility of a bug when someone intends to write
"a=b==0" but writes "a=b=c" instead.

In that case I would have supported the use of ":=" for assignment.

--
Bartc

Marko Rauhamaa

未讀,

2016年6月26日中午12:11:272016/6/26

收件者：

Dennis Lee Bieber <wlf...@ix.netcom.com>:

> It did... but I'm sure we'd have a revolt if Python comparison

> operators looked like:
>
> a .eq. b
> a .ne. b
> a .gt. b .or. c .lt. d
> a .le. b .and. c .ge. d

Yuck, who'd ever want to look at an eyesore like that. In Python, we
will always stick to the pleasant elegance of

__eq__
__ne__
__gt__
__ge__
__lt__
__le__

Marko

Christopher Reimer

未讀,

2016年6月26日下午3:47:472016/6/26

收件者：

I started writing a BASIC interpreter in Python. The rudimentary version
for 10 PRINT "HELLO, WORLD!" and 20 GOTO 10 ran well. The next version
to read each line into a tree structure left me feeling over my head. So
I got "Writing Compilers & Interpreters: An Applied Approach" by Ronald
Mak (1991 edition) from Amazon, which uses C for coding and Pascal as
the target language. I know a little bit of C and nothing of Pascal.
Translating an old dialect of C into modern C, learning Pascal and
figuring out the vagaries of BASIC should make for an interesting
learning experience.

Chris R.

Christopher Reimer

未讀,

2016年6月26日下午3:53:422016/6/26

收件者：

On 6/26/2016 8:41 AM, MRAB wrote:

> On 2016-06-26 11:48, BartC wrote:

>> On 26/06/2016 08:36, Lawrence D’Oliveiro wrote:
>>> One of Python’s few mistakes was that it copied the C convention of
>>> using “=” for assignment and “==” for equality comparison.
>>
>> One of C's many mistakes. Unfortunately C has been very influential.
>>
>> However, why couldn't Python have used "=" both for assignment, and for
>> equality? Since I understand assignment ops can't appear in expressions.
>>

> [snip]
>
> Python supports chained assignments. For example, "a = b = 0" assigns
> 0 to both a and b.
>
> I'm not sure how common it is, though. I virtually never use it myself.

How can you not use chained assignments? I thought Python was the art of
the clever one-liners. :)

Chris R.

Michael Torrie

未讀,

2016年6月26日下午5:29:102016/6/26

收件者：

On 06/26/2016 12:47 PM, Christopher Reimer wrote:
> I started writing a BASIC interpreter in Python. The rudimentary version
> for 10 PRINT "HELLO, WORLD!" and 20 GOTO 10 ran well. The next version
> to read each line into a tree structure left me feeling over my head. So
> I got "Writing Compilers & Interpreters: An Applied Approach" by Ronald
> Mak (1991 edition) from Amazon, which uses C for coding and Pascal as
> the target language. I know a little bit of C and nothing of Pascal.
> Translating an old dialect of C into modern C, learning Pascal and
> figuring out the vagaries of BASIC should make for an interesting
> learning experience.

Sounds like fun. Every aspiring programmer should write an interpreter
for some language at least once in his life!

I imagine that any modern dialect of BASIC has a very complex grammar.
The syntax is full of ambiguities, of which the "=" operator is the
least of them. In many dialects there several versions of END to
contend with, for example. And then there are a lot of legacy
constructs with special syntax such as LINE (0,0)-(100,100),3,BF.

I have a soft spot in my heart for BASIC, since that's what I grew up
on. I still follow FreeBASIC development. It's a very mature language
and compiler now, though it struggles to find a reason to exist I think.
It can't decide if it's C with a different syntax, or C++ with a
different syntax (object-oriented and everything) or maybe something in
between or completely different.

Gregory Ewing

未讀,

2016年6月26日晚上7:22:392016/6/26

收件者：

BartC wrote:
> On 26/06/2016 08:36, Lawrence D’Oliveiro wrote:
>
>> One of Python’s few mistakes was that it copied the C convention of
>> using “=” for assignment and “==” for equality comparison.
>
> One of C's many mistakes. Unfortunately C has been very influential.

I'm not sure it's fair to call it a mistake. C was
designed for expert users, and a tradeoff was likely
made based on the observation that assignment is
used much more often than equality testing.

> However, why couldn't Python have used "=" both for assignment, and for
> equality?

Because an expression on its own is a valid statement,
so

a = b

would be ambiguous as to whether it meant assigning b
to a or evaluating a == b and discarding the result.

--
Greg

Gregory Ewing

未讀,

2016年6月26日晚上7:38:362016/6/26

收件者：

BartC wrote:
> I did a year of it in the 1970s. Looks funny in lower case though.

It's interesting how our perceptions of such things change.
Up until my second year of university, my only experiences
of computing had all been in upper case. Then we got a
lecturer who wrote all his Pascal on the blackboard in
lower case, and it looked extremely weird. Until then the
idea of writing code in anything other than upper case
hadn't even occurred to me.

I quickly got used to it though, and nowadays, code written
in upper case looks very quaint and old-fashioned!

--
Greg

BartC

未讀,

2016年6月26日晚上7:39:292016/6/26

收件者：

On 27/06/2016 00:22, Gregory Ewing wrote:
> BartC wrote:
>> On 26/06/2016 08:36, Lawrence D’Oliveiro wrote:
>>
>>> One of Python’s few mistakes was that it copied the C convention of
>>> using “=” for assignment and “==” for equality comparison.
>>
>> One of C's many mistakes. Unfortunately C has been very influential.
>
> I'm not sure it's fair to call it a mistake. C was
> designed for expert users, and a tradeoff was likely
> made based on the observation that assignment is
> used much more often than equality testing.

You mean the rationale was based on saving keystrokes?

A shame they didn't consider that when requiring parentheses around
conditionals, semicolons, /*...*/ around comments, %d format codes and
elaborate for-statements then!

But you might be right in that it was probably based on existing usage
of Fortran, PL/I and maybe even Basic. (Its predecessor 'B' used "=",
but B came through BCPL which I believe used ":="; perhaps the mistake
was in discarding that.)

>> However, why couldn't Python have used "=" both for assignment, and
>> for equality?
>
> Because an expression on its own is a valid statement,
> so
>
> a = b
>
> would be ambiguous as to whether it meant assigning b
> to a or evaluating a == b and discarding the result.

And that would be another reason why == is needed for equality.

--
Bartc

Gregory Ewing

未讀,

2016年6月26日晚上7:41:542016/6/26

收件者：

Christopher Reimer wrote:
> How can you not use chained assignments? I thought Python was the art of
> the clever one-liners. :)

No, Python is the art of writing clever one-liners
using more than one line.

--
Greg

Bob Gailer

未讀,

2016年6月27日上午8:23:072016/6/27

收件者：

On Jun 26, 2016 5:29 PM, "Michael Torrie" <tor...@gmail.com> wrote:
>
> On 06/26/2016 12:47 PM, Christopher Reimer wrote:

> Sounds like fun. Every aspiring programmer should write an interpreter
> for some language at least once in his life!

In the mid 1970' s I helped maintain an installation of IBM' s APL
interpreter at Boeing Computer Services. APL uses its own special character
set, making code unambiguous and terse. It used a left-arrow for
assignment, which was treated as just another operator. I will always miss
"embedded assignment".

A year later I worked on a project that ran on a CDC 6600? where only
FORTRAN was available. The program's job was to apply user's commands to
manage a file system. FORTRAN was not the best language for that task, so I
designed my own language, and wrote an interpreter for it in FORTRAN. In
retrospect a very good decision. That program was in use for 10 years!

Rustom Mody

未讀,

2016年6月27日上午8:48:382016/6/27

收件者：

On Monday, June 27, 2016 at 5:53:07 PM UTC+5:30, Bob Gailer wrote:

> On Jun 26, 2016 5:29 PM, "Michael Torrie" wrote:
> >
> > On 06/26/2016 12:47 PM, Christopher Reimer wrote:
>
> > Sounds like fun. Every aspiring programmer should write an interpreter
> > for some language at least once in his life!
>
> In the mid 1970' s I helped maintain an installation of IBM' s APL
> interpreter at Boeing Computer Services. APL uses its own special character
> set, making code unambiguous and terse. It used a left-arrow for
> assignment, which was treated as just another operator. I will always miss
> "embedded assignment".

In the past, APL's ← may not have been practicable (ie
- without committing to IBM... which meant
- $$$
- Also it was a hardware commitment (trackball?)
- etc

Today that '←' costs asymptotically close to '='
To type :: 3 chars CAPSLOCK '<' '-'
To setup :: [On Ubuntu Unity]
System-Settings → Keyboard → Shortcuts-Tab → Typing → Make Compose CAPSLOCK
To see :: Its ONE CHAR, just like '=' and ½ of ':='

tl;dr Anyone opposing richer charsets is guaranteedly using arguments from 1970

PS Google Groups is wise enough to jump through hoops trying to encode my message
above as latin-1, then as Windows 1252 and only when that does not work as
UTF-8
ie it garbles ← into <- etc
So some हिंदी (ie Hindi) to convince GG to behave

Steven D'Aprano

未讀,

2016年6月27日上午9:28:262016/6/27

收件者：

On Mon, 27 Jun 2016 10:48 pm, Rustom Mody wrote:

> PS Google Groups is wise enough to jump through hoops trying to encode my
> message above as latin-1, then as Windows 1252 and only when that does not
> work as UTF-8

There is nothing admirable about GG (or any other newsreader or email
client) defaulting to legacy encodings like Latin-1 and especially not
Windows 1252.

Certainly the user should be permitted to explicitly set the encoding, but
otherwise the program should default to UTF-8.

Marko Rauhamaa

未讀,

2016年6月27日上午9:58:362016/6/27

收件者：

Steven D'Aprano <st...@pearwood.info>:

> On Mon, 27 Jun 2016 10:48 pm, Rustom Mody wrote:
>
>> PS Google Groups is wise enough to jump through hoops trying to
>> encode my message above as latin-1, then as Windows 1252 and only
>> when that does not work as UTF-8
>
> There is nothing admirable about GG (or any other newsreader or email
> client) defaulting to legacy encodings like Latin-1 and especially not
> Windows 1252.
>
> Certainly the user should be permitted to explicitly set the encoding,
> but otherwise the program should default to UTF-8.

The users should be completely oblivious to such technicalities as
character encodings.

As for those technicalities, a MIME-compliant client is free to use any
well-defined, widely-used character encoding as long as it is properly
declared.

Marko

Grant Edwards

未讀,

2016年6月27日上午10:00:052016/6/27

收件者：

On 2016-06-26, BartC <b...@freeuk.com> wrote:

> (Note, for those who don't know (old) Fortran, that spaces and tabs are
> not significant. So those dots are needed, otherwise "a eq b" would be
> parsed as "aeqb".)

I've always been baffled by that.

Were there other languages that did something similar?

Why would a language designer think it a good idea?

Did the poor sod who wrote the compiler think it was a good idea?

--
Grant Edwards grant.b.edwards Yow! I left my WALLET in
at the BATHROOM!!
gmail.com

Marko Rauhamaa

未讀,

2016年6月27日上午10:09:352016/6/27

收件者：

Grant Edwards <grant.b...@gmail.com>:

> On 2016-06-26, BartC <b...@freeuk.com> wrote:
>
>> (Note, for those who don't know (old) Fortran, that spaces and tabs
>> are not significant. So those dots are needed, otherwise "a eq b"
>> would be parsed as "aeqb".)
>
> I've always been baffled by that.
>
> Were there other languages that did something similar?

In XML, whitespace between tags is significant unless the document type
says otherwise. On the other hand, leading and trailing space in
attribute values is insignificant unless the document type says
otherwise.

> Why would a language designer think it a good idea?
>
> Did the poor sod who wrote the compiler think it was a good idea?

Fortran is probably not too hard to parse. XML, on the other hand, is
impossible to parse without the document type at hand. The document type
not only defines the whitespace semantics but also the availability and
meaning of the "entities" (e.g., © for ©). Add namespaces to that,
and the mess is complete.

Marko

Rustom Mody

未讀,

2016年6月27日上午10:10:222016/6/27

收件者：

On Monday, June 27, 2016 at 7:30:05 PM UTC+5:30, Grant Edwards wrote:
> On 2016-06-26, BartC

> > (Note, for those who don't know (old) Fortran, that spaces and tabs are
> > not significant. So those dots are needed, otherwise "a eq b" would be
> > parsed as "aeqb".)
>
> I've always been baffled by that.
>
> Were there other languages that did something similar?
>
> Why would a language designer think it a good idea?
>
> Did the poor sod who wrote the compiler think it was a good idea?

I think modern ideas like lexical analysis preceding parsing
and so on came some decade after Fortran.
My guess is that Fortran was first implemented -- 'somehow or other'
Then these properties emerged -- more or less bugs that had got so entrenched
that they had to be dignified as 'features'

Analogy: Python's bool as 1½-class because bool came into python a good decade
after python and breaking old code is a bigger issue than fixing control
constructs to be bool-strict

Rustom Mody

未讀,

2016年6月27日上午10:23:162016/6/27

收件者：

On Monday, June 27, 2016 at 6:58:26 PM UTC+5:30, Steven D'Aprano wrote:
> On Mon, 27 Jun 2016 10:48 pm, Rustom Mody wrote:
>
> > PS Google Groups is wise enough to jump through hoops trying to encode my
> > message above as latin-1, then as Windows 1252 and only when that does not
> > work as UTF-8
>
>
> There is nothing admirable about GG (or any other newsreader or email
> client) defaulting to legacy encodings like Latin-1 and especially not
> Windows 1252.
>
> Certainly the user should be permitted to explicitly set the encoding, but
> otherwise the program should default to UTF-8.

Its called sarcasm...

Also how is GG deliberately downgrading clear unicode content to be kind to
obsolete clients at recipient end different from python 2 → 3 making breaking
changes but not going beyond ASCII lexemes?

Just think: Some poor mono-lingual ASCII-user may suffer an aneurism...
Completely atrocious!

Alain Ketterlin

未讀,

2016年6月27日上午10:49:052016/6/27

收件者：

Grant Edwards <grant.b...@gmail.com> writes:

> On 2016-06-26, BartC <b...@freeuk.com> wrote:
>
>> (Note, for those who don't know (old) Fortran, that spaces and tabs are
>> not significant. So those dots are needed, otherwise "a eq b" would be
>> parsed as "aeqb".)
>
> I've always been baffled by that.
> Were there other languages that did something similar?

Probably a lot at that time.

> Why would a language designer think it a good idea?

Because when you punch characters one by one on a card, you quickly get
bored with less-than-useful spaces.

> Did the poor sod who wrote the compiler think it was a good idea?

I don't know, but he has a good excuse: he was one of the first to ever
write a compiler (see https://en.wikipedia.org/wiki/Compiler, the
section on History).

You just called John Backus a "poor sod". Think again.

-- Alain.

MRAB

未讀,

2016年6月27日上午11:27:402016/6/27

收件者：

On 2016-06-27 14:59, Grant Edwards wrote:
> On 2016-06-26, BartC <b...@freeuk.com> wrote:
>

>> (Note, for those who don't know (old) Fortran, that spaces and tabs are
>> not significant. So those dots are needed, otherwise "a eq b" would be
>> parsed as "aeqb".)
>

> I've always been baffled by that.
>
> Were there other languages that did something similar?
>

Algol 60 and Algog 68.

> Why would a language designer think it a good idea?
>

It let you have identifiers like "grand total"; there was no need for
camel case or underscores to separate the parts of the name.

Grant Edwards

未讀,

2016年6月27日上午11:43:182016/6/27

收件者：

On 2016-06-27, MRAB <pyt...@mrabarnett.plus.com> wrote:
> On 2016-06-27 14:59, Grant Edwards wrote:
>> On 2016-06-26, BartC <b...@freeuk.com> wrote:
>>

>>> (Note, for those who don't know (old) Fortran, that spaces and tabs are
>>> not significant. So those dots are needed, otherwise "a eq b" would be
>>> parsed as "aeqb".)
>>

>> I've always been baffled by that.
>>
>> Were there other languages that did something similar?
>
> Algol 60 and Algog 68.

Ah, I never knew that Algol ignored spaces also. I had a vague
recollection that they keyword namespace and variable namespaces were
speparate, which allowed some rather odd looking (by modern standards)
code.

> It let you have identifiers like "grand total"; there was no need for
> camel case or underscores to separate the parts of the name.

It's interesting how completely that concept has dissappeared from
modern languages.

--
Grant Edwards grant.b.edwards Yow! Send your questions to
at ``ASK ZIPPY'', Box 40474,
gmail.com San Francisco, CA 94140,
USA

Gene Heskett

未讀,

2016年6月27日上午11:44:352016/6/27

收件者：

On Monday 27 June 2016 09:28:00 Steven D'Aprano wrote:

> On Mon, 27 Jun 2016 10:48 pm, Rustom Mody wrote:
> > PS Google Groups is wise enough to jump through hoops trying to
> > encode my message above as latin-1, then as Windows 1252 and only
> > when that does not work as UTF-8
>
> There is nothing admirable about GG (or any other newsreader or email
> client) defaulting to legacy encodings like Latin-1 and especially not
> Windows 1252.
>
> Certainly the user should be permitted to explicitly set the encoding,
> but otherwise the program should default to UTF-8.
>

Both of you mentioned 2 bad words, now go and warsh yer fungers with some
of grandma's lye soap.

>
> --
> Steven
> “Cheer up,” they said, “things could be worse.” So I cheered up, and
> sure enough, things got worse.

Cheers, Gene Heskett
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>

Lawrence D’Oliveiro

未讀,

2016年6月27日下午5:56:592016/6/27

收件者：

On Tuesday, June 28, 2016 at 3:27:40 AM UTC+12, MRAB wrote:
>
> On 2016-06-27 14:59, Grant Edwards wrote:
>>
>> Were there other languages that did something similar?
>>
> Algol 60 and Algog 68.

Algol 68 was actually slightly different. There were two separate alphabets: one used for names of constants, variables, routines and labels, where spaces were ignored, and a different one used for type names (called “modes”) and reserved words, where spaces were significant.

The convention was to use lowercase for the former and uppercase for the latter.

Example here <http://www.codecodex.com/wiki/Perform_simple_mathematical_operations_on_two_matrices#Algol_68>.

Lawrence D’Oliveiro

未讀,

2016年6月27日下午6:45:352016/6/27

收件者：

On Tuesday, June 28, 2016 at 3:27:40 AM UTC+12, MRAB wrote:

> On 2016-06-27 14:59, Grant Edwards wrote:
>> Why would a language designer think it a good idea?
>>
> It let you have identifiers like "grand total"; there was no need for
> camel case or underscores to separate the parts of the name.

Another nifty thing (well, I thought so at the time) was that FORTRAN had no reserved words.

Though I wondered, in statements like

FORMAT(...complex expression with lots of nested parentheses...) = ...

how much work the parser would have to do before deciding that it was an array assignment, not a FORMAT statement?

Then some FORTRAN dialects allowed constant definitions using syntax like

PARAMETER N = 3

which broke the no-reserved-words convention. Luckily, this was standardized as the much less headache-inducing (for the compiler writer)

PARAMETER(N = 3)

PL/I (which was almost named “FORTRAN VI” at one stage) added significant whitespace, but managed to keep the no-reserved-words convention--almost. There was just one peculiar set of exceptions...

BartC

未讀,

2016年6月27日晚上7:08:162016/6/27

收件者：

On 27/06/2016 23:45, Lawrence D’Oliveiro wrote:
> On Tuesday, June 28, 2016 at 3:27:40 AM UTC+12, MRAB wrote:
>> On 2016-06-27 14:59, Grant Edwards wrote:
>>> Why would a language designer think it a good idea?
>>>
>> It let you have identifiers like "grand total"; there was no need for
>> camel case or underscores to separate the parts of the name.
>
> Another nifty thing (well, I thought so at the time) was that FORTRAN had no reserved words.
>
> Though I wondered, in statements like
>
> FORMAT(...complex expression with lots of nested parentheses...) = ...
>
> how much work the parser would have to do before deciding that it was an array assignment, not a FORMAT statement?

You just design the compiler to do the same processing in each case, ie.
parse a <name> followed (<expression>), then mark the result AST
fragment as either an Array term, or Format statement, depending on what
follows, and whether the name is "format".

I suppose the compiler could decide to backtrack and re-parse based on
the knowledge that is one or the other, but that's a messy way of doing it.

--
Bartc

sohca...@gmail.com

未讀,

2016年6月27日晚上8:33:552016/6/27

收件者：

XML isn't a programming language. I don't think it's relevant to the conversation.

Steven D'Aprano

未讀,

2016年6月27日晚上9:06:062016/6/27

收件者：

On Tue, 28 Jun 2016 12:23 am, Rustom Mody wrote:

> On Monday, June 27, 2016 at 6:58:26 PM UTC+5:30, Steven D'Aprano wrote:
>> On Mon, 27 Jun 2016 10:48 pm, Rustom Mody wrote:
>>
>> > PS Google Groups is wise enough to jump through hoops trying to encode
>> > my message above as latin-1, then as Windows 1252 and only when that
>> > does not work as UTF-8
>>
>>
>> There is nothing admirable about GG (or any other newsreader or email
>> client) defaulting to legacy encodings like Latin-1 and especially not
>> Windows 1252.
>>
>> Certainly the user should be permitted to explicitly set the encoding,
>> but otherwise the program should default to UTF-8.
>
> Its called sarcasm...

Ah, sorry about that, I didn't realise.

Some human languages have native support for flagging sarcasm, e.g. there's
a sarcasm marker called temherte slaqî used by some Ethiopic languages to
indicate sarcasm and other unreal statements. It apparently looks somewhat
like an upside down exclamation mark (¡).

Another common solution is to use "scare quotes" around the sarcastic key
words. Or you could tag the sentence with <sarcasm> </sarcasm> tags. Most
people I see using this last one just show the close tag, often
abbreviating it to just /s on its own.

> Also how is GG deliberately downgrading clear unicode content to be kind
> to obsolete clients at recipient end different from python 2 → 3 making
> breaking changes but not going beyond ASCII lexemes?

Oh yes, I completely agree, obviously GvR is literally worse than Hitler
because he hasn't added a bunch of Unicode characters with poor support for
input and worse support for output as essential syntactic elements to
Python.

/s

Lawrence D’Oliveiro

未讀,

2016年6月27日晚上9:14:412016/6/27

收件者：

On Tuesday, June 28, 2016 at 2:23:16 AM UTC+12, Rustom Mody wrote:
> python 2 → 3 making breaking changes but not going beyond ASCII lexemes?

You do know Python 3 allows Unicode letters in identifiers, right?

Steven D'Aprano

未讀,

2016年6月27日晚上9:28:352016/6/27

收件者：

On Tue, 28 Jun 2016 01:27 am, MRAB wrote:

> On 2016-06-27 14:59, Grant Edwards wrote:
>> On 2016-06-26, BartC <b...@freeuk.com> wrote:
>>
>>> (Note, for those who don't know (old) Fortran, that spaces and tabs are
>>> not significant. So those dots are needed, otherwise "a eq b" would be
>>> parsed as "aeqb".)
>>
>> I've always been baffled by that.
>>
>> Were there other languages that did something similar?
>>
> Algol 60 and Algog 68.

Are you sure about that? I'd like to see a citation, as everything I've seen
suggests that Algol treats spaces like modern languages.

http://www.masswerk.at/algol60/algol60-syntaxversions.htm

Space is listed as a separator, and *not* in indentifiers.

Steven D'Aprano

未讀,

2016年6月27日晚上9:35:112016/6/27

收件者：

On Mon, 27 Jun 2016 11:59 pm, Grant Edwards wrote:

> On 2016-06-26, BartC <b...@freeuk.com> wrote:
>
>> (Note, for those who don't know (old) Fortran, that spaces and tabs are
>> not significant. So those dots are needed, otherwise "a eq b" would be
>> parsed as "aeqb".)
>
> I've always been baffled by that.
>
> Were there other languages that did something similar?
>
> Why would a language designer think it a good idea?
>
> Did the poor sod who wrote the compiler think it was a good idea?

I don't know if it was a deliberate design decision or not, but I don't
believe that it survived very many releases of the Fortran standard.

Remember that Fortran was THE first high-level language. Its creator, John
Backus, was breaking new ground and doing things that had never been done
before[1], so the things that we take for granted about high-level
programming languages were still being invented. If early Fortran got a few
things wrong, we shouldn't be surprised.

Also the earliest Fortran code was not expected to be typed into a computer.
It was expected to be entered via punched cards, which eliminates the need
for spaces.

[1] Almost. He has previously created a high-level assembly language,
Speedcoding, for IBM, which can be considered the predecessor of Fortran.

Marko Rauhamaa

未讀,

2016年6月28日凌晨12:25:392016/6/28

收件者：

sohca...@gmail.com:

> On Monday, June 27, 2016 at 7:09:35 AM UTC-7, Marko Rauhamaa wrote:
>> Grant Edwards <grant.b...@gmail.com>:

>> > Were there other languages that did something similar?
>>
>> In XML, whitespace between tags is significant unless the document type
>> says otherwise. On the other hand, leading and trailing space in
>> attribute values is insignificant unless the document type says
>> otherwise.
>>
>> > Why would a language designer think it a good idea?
>> >
>> > Did the poor sod who wrote the compiler think it was a good idea?
>>
>> Fortran is probably not too hard to parse. XML, on the other hand, is
>> impossible to parse without the document type at hand. The document type
>> not only defines the whitespace semantics but also the availability and
>> meaning of the "entities" (e.g., © for ©). Add namespaces to that,
>> and the mess is complete.
>

> XML isn't a programming language. I don't think it's relevant to the
> conversation.

The question was about (formal) languages, not only programming
languages.

However, there are programming languages with XML syntax:

<URL: https://en.wikipedia.org/wiki/XSLT>
<URL: http://www.o-xml.org/spec/langspec.html>
<URL: http://xplusplus.sourceforge.net/>

Marko

Rustom Mody

未讀,

2016年6月28日凌晨12:31:282016/6/28

收件者：

Gratuitous Godwin acceleration produceth poor sarcasm -- try again
And while you are at it try and answer the parallel:
Unicode has a major pro and con
Pro: Its a superset and enormously richer than ASCII
Con: It is costly and implementations are spotty

GG downgrades posts containing unicode if it can, thereby increasing reach to
recipients with unicode-broken clients

Likewise this:

> a bunch of Unicode characters with poor support for
> input and worse support for output as essential syntactic elements to
> Python.

sounds like the same logic applied to python

JFTR I am not quarrelling with Guido's choices; just pointing out your
inconsistencies

Rustom Mody

未讀,

2016年6月28日凌晨1:00:432016/6/28

收件者：

On Monday, June 27, 2016 at 8:19:05 PM UTC+5:30, Alain Ketterlin wrote:

> Grant Edwards writes:
> > Did the poor sod who wrote the compiler think it was a good idea?
>
> I don't know, but he has a good excuse: he was one of the first to ever
> write a compiler (see https://en.wikipedia.org/wiki/Compiler, the
> section on History).
>
> You just called John Backus a "poor sod". Think again.

The irony is bigger than you are conveying
1957: Backus made Fortran
20 years later: [1977] He won the Turing award, citation explicitly mentioning
his creation of Fortran.
His Turing award lecture
makes a demand for an alternative functional language (first usage of FP that
I know) and lambasts traditional imperative programming language.
http://worrydream.com/refs/Backus-CanProgrammingBeLiberated.pdf

However in addition to lambasting current languages in general he owns up to
his own contribution to the imperative-programming-goofup:

| I refer to conventional languages as "von Neumann languages" to take note of
| their origin and style, I do not, of course, blame the great mathematician for
| their complexity. In fact, some might say that I bear some responsibility for
| that problem.

I conjecture that it was Backus' clarion call to think more broadly about
paradigms and not merely about syntax details that prompted the next Turing
talk: Floyd's title (1978) *is* Paradigms of Programming though he did not use
the word quite as we do today

Likewise Backus' call to dump the imperative 'word-at-a-time' model and look
to APL to inspiration probably made it possible for an outlier like Iverson to
win the Turing award in 79

All these taken together have inched CS slowly away from the imperative paradigm:
This and other titbits of history: http://blog.languager.org/2015/04/cs-history-1.html

In short for someone in 2016 to laugh at Backus for 1957 mistakes that he had
already realized and crossed over in 1977, and yet continue to use the
imperative paradigm ie the 57-mistake... well the joke is in the opposite direction

Steven D'Aprano

未讀,

2016年6月28日凌晨1:42:592016/6/28

收件者：

On Tuesday 28 June 2016 14:31, Rustom Mody wrote:

> On Tuesday, June 28, 2016 at 6:36:06 AM UTC+5:30, Steven D'Aprano wrote:
>> On Tue, 28 Jun 2016 12:23 am, Rustom Mody wrote:
>> > Also how is GG deliberately downgrading clear unicode content to be kind
>> > to obsolete clients at recipient end different from python 2 → 3 making
>> > breaking changes but not going beyond ASCII lexemes?
>>
>> Oh yes, I completely agree, obviously GvR is literally worse than Hitler
>> because he hasn't added a bunch of Unicode characters with poor support for
>> input and worse support for output as essential syntactic elements to
>> Python.
>>
>> /s
>
> Gratuitous Godwin acceleration produceth poor sarcasm -- try again
> And while you are at it try and answer the parallel:
> Unicode has a major pro and con
> Pro: Its a superset and enormously richer than ASCII

Correct.

> Con: It is costly and implementations are spotty

That's a matter of opinion. What do you mean by "spotty"?

It seems to me that implementations are mostly pretty good, at least as good as
Python 2 narrow builds. Support for astral characters is not as good, but
(apart from some Han users, and a few specialist niches) not as import either.

The big problem is poor tooling: fonts still have many missing characters, and
editors don't make it easy to enter anything not visible on the keyboard.

> GG downgrades posts containing unicode if it can, thereby increasing reach to
> recipients with unicode-broken clients

And how does that encourage clients to support Unicode? It just enables
developers to tell themselves "It's just a few weirdos and foreigners who use
Unicode, ASCII [by which they mean Latin 1] is good enough for everyone."

Its 2016, and it is *way* past time that application developers stop pandering
to legacy encodings by making them the default. If developers saw that 99% of
emails were UTF-8, they would be less likely to think they could avoid learning
about Unicode.

> Likewise this:
>
>> a bunch of Unicode characters with poor support for
>> input and worse support for output as essential syntactic elements to
>> Python.
>
> sounds like the same logic applied to python
>
> JFTR I am not quarrelling with Guido's choices; just pointing out your
> inconsistencies

Oh, it's inconsistencies plural is it? So I have more than one? :-)

In Python 3, source files are treated as UTF-8 by default. That means, if you
want to use Unicode characters in your source code (for variable names,
comments, or in strings) you can, and you don't have to declare a special
encoding. Just save the file in an editor that defaults to UTF-8, and Python is
satisfied. If, for some reason, you need some legacy encoding, you can still
explicitly set it with a coding cookie at the top of the file.

That behaviour is exactly analogous to my position that mail and news clients
should default to UTF-8. But in neither case would people be *required* to
include Unicode characters in their text.

--
Steve

Chris Angelico

未讀,

2016年6月28日凌晨2:05:132016/6/28

收件者：

On Tue, Jun 28, 2016 at 3:42 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> And how does that encourage clients to support Unicode? It just enables
> developers to tell themselves "It's just a few weirdos and foreigners who use
> Unicode, ASCII [by which they mean Latin 1] is good enough for everyone."
>

Or Windows-1252, but declared as Latin-1. (Bane of my life.)

ChrisA

Marko Rauhamaa

未讀,

2016年6月28日凌晨2:12:272016/6/28

收件者：

Chris Angelico <ros...@gmail.com>:

> Or Windows-1252, but declared as Latin-1. (Bane of my life.)

J

Marko

Rustom Mody

未讀,

2016年6月28日凌晨2:14:282016/6/28

收件者：

On Tuesday, June 28, 2016 at 11:12:59 AM UTC+5:30, Steven D'Aprano wrote:
> On Tuesday 28 June 2016 14:31, Rustom Mody wrote:
>
> > On Tuesday, June 28, 2016 at 6:36:06 AM UTC+5:30, Steven D'Aprano wrote:
> >> On Tue, 28 Jun 2016 12:23 am, Rustom Mody wrote:
> >> > Also how is GG deliberately downgrading clear unicode content to be kind
> >> > to obsolete clients at recipient end different from python 2 → 3 making
> >> > breaking changes but not going beyond ASCII lexemes?
> >>
> >> Oh yes, I completely agree, obviously GvR is literally worse than Hitler
> >> because he hasn't added a bunch of Unicode characters with poor support for
> >> input and worse support for output as essential syntactic elements to
> >> Python.
> >>
> >> /s
> >
> > Gratuitous Godwin acceleration produceth poor sarcasm -- try again
> > And while you are at it try and answer the parallel:
> > Unicode has a major pro and con
> > Pro: Its a superset and enormously richer than ASCII
>
> Correct.
>
> > Con: It is costly and implementations are spotty
>
> That's a matter of opinion. What do you mean by "spotty"?

We've had this conversation before.
Ive listed these spottinesses
See http://blog.languager.org/2015/03/whimsical-unicode.html
Specifically the section on ½-assed unicode support

>
> It seems to me that implementations are mostly pretty good, at least as good as
> Python 2 narrow builds. Support for astral characters is not as good, but
> (apart from some Han users, and a few specialist niches) not as import either.
>
> The big problem is poor tooling: fonts still have many missing characters, and
> editors don't make it easy to enter anything not visible on the keyboard.
>
>
> > GG downgrades posts containing unicode if it can, thereby increasing reach to
> > recipients with unicode-broken clients
>
> And how does that encourage clients to support Unicode? It just enables
> developers to tell themselves "It's just a few weirdos and foreigners who use
> Unicode, ASCII [by which they mean Latin 1] is good enough for everyone."
>
> Its 2016, and it is *way* past time that application developers stop pandering
> to legacy encodings by making them the default. If developers saw that 99% of
> emails were UTF-8, they would be less likely to think they could avoid learning
> about Unicode.
>
>
> > Likewise this:
> >
> >> a bunch of Unicode characters with poor support for
> >> input and worse support for output as essential syntactic elements to
> >> Python.
> >
> > sounds like the same logic applied to python
> >
> > JFTR I am not quarrelling with Guido's choices; just pointing out your
> > inconsistencies
>
> Oh, it's inconsistencies plural is it? So I have more than one? :-)

Here's one (below)

>
> In Python 3, source files are treated as UTF-8 by default. That means, if you
> want to use Unicode characters in your source code (for variable names,
> comments, or in strings) you can, and you don't have to declare a special
> encoding. Just save the file in an editor that defaults to UTF-8, and Python is
> satisfied. If, for some reason, you need some legacy encoding, you can still
> explicitly set it with a coding cookie at the top of the file.
>
> That behaviour is exactly analogous to my position that mail and news clients
> should default to UTF-8. But in neither case would people be *required* to
> include Unicode characters in their text.

Python2 had strings and unicode strings u"..."
Python3 has char-strings and byte-strings b"..." with the char-strings
uniformly spanning all of unicode.

Not just a significant change in implementation but in mindset

Yet the way you use Unicode in the sentence above implies that while you
*say* 'Unicode' you mean the set Unicode - ASCII which is exactly
Python2 mindset.

So which mindset do you subscribe to?

Gregory Ewing

未讀,

2016年6月28日凌晨2:15:412016/6/28

收件者：

BartC wrote:
> You mean the rationale was based on saving keystrokes?

Maybe disk space as well -- bytes were expensive
in those days!

--
Greg

Gregory Ewing

未讀,

2016年6月28日凌晨2:18:012016/6/28

收件者：

Dennis Lee Bieber wrote:
> Or my favorite example of a parser headache: which is the loop
> instruction and which is the assignment
>
> DO10I=3,14
> DO 10 I = 3.14

And if the programmer and/or compiler gets it wrong,
your spacecraft crashes into the planet.

--
Greg

Rustom Mody

未讀,

2016年6月28日凌晨2:55:102016/6/28

收件者：

On Tuesday, June 28, 2016 at 9:55:39 AM UTC+5:30, Marko Rauhamaa wrote:
> sohcahtoa82:

>
> > On Monday, June 27, 2016 at 7:09:35 AM UTC-7, Marko Rauhamaa wrote:

> >> Grant Edwards :

> >> > Were there other languages that did something similar?
> >>
> >> In XML, whitespace between tags is significant unless the document type
> >> says otherwise. On the other hand, leading and trailing space in
> >> attribute values is insignificant unless the document type says
> >> otherwise.
> >>
> >> > Why would a language designer think it a good idea?
> >> >
> >> > Did the poor sod who wrote the compiler think it was a good idea?
> >>
> >> Fortran is probably not too hard to parse. XML, on the other hand, is
> >> impossible to parse without the document type at hand. The document type
> >> not only defines the whitespace semantics but also the availability and
> >> meaning of the "entities" (e.g., © for ©). Add namespaces to that,
> >> and the mess is complete.
> >
> > XML isn't a programming language. I don't think it's relevant to the
> > conversation.
>
> The question was about (formal) languages, not only programming
> languages.
>
> However, there are programming languages with XML syntax:
>
> <URL: https://en.wikipedia.org/wiki/XSLT>
> <URL: http://www.o-xml.org/spec/langspec.html>
> <URL: http://xplusplus.sourceforge.net/>

Seriously?!
You need to justify talking XML on a python list?

Which kind of 'python' this list is about?

https://www.facebook.com/nixcraft/photos/a.431194973560553.114666.126000117413375/1338469152833126/?type=3

Gregory Ewing

未讀,

2016年6月28日凌晨3:00:052016/6/28

收件者：

BartC wrote:
> On 27/06/2016 23:45, Lawrence D’Oliveiro wrote:
>
>> FORMAT(...complex expression with lots of nested parentheses...) =
>

> You just design the compiler to do the same processing in each case, ie.
> parse a <name> followed (<expression>), then mark the result AST
> fragment as either an Array term, or Format statement, depending on what
> follows, and whether the name is "format".

Except that the contents of FORMAT statements have their
own unique syntax that's very different from that of
argument lists or array indexes. So processing them
both the same way would introduce its own level of
messiness.

Starting all over again from the beginning of the
statement is probably the least messy way to handle it.

--
Greg

Lawrence D’Oliveiro

未讀,

2016年6月28日凌晨3:17:072016/6/28

收件者：

On Tuesday, June 28, 2016 at 6:14:28 PM UTC+12, Rustom Mody wrote:
> Ive listed these spottinesses
> See http://blog.languager.org/2015/03/whimsical-unicode.html
> Specifically the section on ½-assed unicode support

Remember how those UTF-16-using pieces of software got sucked into it. They were assured it was UCS-2.

As for remembering to correctly free() after malloc(), it’s not that hard to do <https://github.com/ldo/dvd_menu_animator/blob/master/spuhelper.c>.

Chris Angelico

未讀,

2016年6月28日凌晨3:26:302016/6/28

收件者：

On Tue, Jun 28, 2016 at 5:12 PM, Lawrence D’Oliveiro
<lawren...@gmail.com> wrote:
> <https://github.com/ldo/dvd_menu_animator/blob/master/spuhelper.c>.

do { /* once */
if (error) break;
...
} while (false);
do_cleanup;

Why not:

if (error) goto cleanup;
...
cleanup:
do_cleanup;

Oh, right. XKCD 292. I still think it's better to use the goto, though
- you just need velociraptor repellent.

ChrisA

Jussi Piitulainen

未讀,

2016年6月28日凌晨4:04:412016/6/28

收件者：

Marko Rauhamaa writes:

> Chris Angelico wrote:
>
>> Or Windows-1252, but declared as Latin-1. (Bane of my life.)
>
> J

J [1] uses ASCII.

References:

[1] https://en.wikipedia.org/wiki/J_(programming_language)

Lawrence D’Oliveiro

未讀,

2016年6月28日凌晨4:49:122016/6/28

收件者：

On Tuesday, June 28, 2016 at 7:26:30 PM UTC+12, Chris Angelico wrote:
> Why not:
>
> if (error) goto cleanup;
> ...
> cleanup:
> do_cleanup;

They all fall into that same trap. Doesn’t scale. Try one with allocation inside a loop, e.g. lines 488 onwards.

Chris Angelico

未讀,

2016年6月28日清晨6:35:212016/6/28

收件者：

How is that different? You can use a goto inside a for loop just fine.
(You can even, if you are absolutely insane and begging to be murdered
by the future maintainer, use a goto outside a for loop targeting a
label inside. See for example Duff's Device, although that's a switch
rather than an actual goto.) You have a loop, and inside that loop,
the exact same error handling pattern; so it should be possible to
perform the exact same transformation, albeit with a differently-named
local cleanup label.

So, yes. It doesn't scale, if by "scale" you mean "so many separate
local instances of error handling that you lose track of your cleanup
labels". But you should be able to keep half a dozen labels in your
head (if they're named appropriately), and if you have that many local
error handlers, you probably want something to be refactored.

ChrisA

BartC

未讀,

2016年6月28日清晨6:35:312016/6/28

收件者：

On 28/06/2016 01:11, Dennis Lee Bieber wrote:
> On Tue, 28 Jun 2016 00:08:00 +0100, BartC <b...@freeuk.com> declaimed the
> following:

>
>>
>> You just design the compiler to do the same processing in each case, ie.
>> parse a <name> followed (<expression>), then mark the result AST
>> fragment as either an Array term, or Format statement, depending on what
>> follows, and whether the name is "format".
>>

> You're expecting an AST in a compiler from the 50s?
>

Well, FORTRAN in the 50s would have been much simpler too. No nested
subscripts for example. And ASTs are just one of many methods of compiling.

> That might have involved having to punch an output deck of cards for
> each compile, only to then feed that deck back into the reader for the next
> phase of compilation.

We have to assume that at least one complete line could be held in
memory. Then you can reinterpret it as many times as you like. (I don't
know if 1950s FORTRAN has continuation lines, which makes that a little
harder.)

However, not allowing FORMAT as a variable name probably /would/ have
been simpler!

--
Bartc

Random832

未讀,

2016年6月28日上午10:13:322016/6/28

收件者：

On Tue, Jun 28, 2016, at 00:31, Rustom Mody wrote:
> GG downgrades posts containing unicode if it can, thereby increasing
> reach to recipients with unicode-broken clients

That'd be entirely reasonable, except for the excessively broad
application of "if it can".

Certainly it _can_ do it all the time. Just replace anything that
doesn't fit with question marks or hex notation or \N{NAME} or some
human readable pseudo-representation a la unidecode. It could have done
any of those with the Hindi that you threw in to try to confound it, (or
it could have chosen ISCII, which likewise lacks arrow characters, as
the encoding to downgrade to).

It should pick an encoding which it expects recipients to support and
which contains *all* of the characters in the message, as proper
characters and not as pseudo-representations, and downgrade to that if
and only if such an encoding can be found. For most messages, it can use
US-ASCII. For most of the remainder it can use some ISO-8859 or
Windows-125x encoding.

Or include the UTF-8 and some other character set as
multipart/alternative representations.

Marko Rauhamaa

未讀,

2016年6月28日中午12:39:592016/6/28

收件者：

(sorry for the premature previous post)

Random832 <rand...@fastmail.com>:
> All objects, not just black holes, have those properties. The point
> here is that we are in fact observing those properties of an object
> that is not yet (and never will be) a black hole in our frame of
> reference.

A physicist once clarified to me that an almost-black-hole is
practically identical with a black hole because all information about
anything falling in is very quickly red-shifted to oblivion.

However, there is some information that (to my knowledge) is not
affected by the red shift. Here's a thought experiment:

----------
/ \
/ (almost) \ N
| black | |
| hole | S
\ /
\ /
----------

We have a stationary, uncharged (almost) black hole in our vicinity and
decide to send in a probe. We first align the probe so it is perfectly
still wrt the black hole and let it fall in. Inside the probe, we have a
powerful electrical magnet that our compass can detect from a safe
distance away. The probe is also sending us a steady ping over the
radio.

As the probe approaches the event horizon, the ping frequency falls
drastically and the signal frequency is red-shifted below our ability to
receive. However, our compass still points to the magnet and notices
that it "floats" on top of the event horizon:

----------
/ \
/ (almost) \ N
| black ||
| hole |S
\ /
\ /
----------

/
/ compass needle
/

The compass needle shows that the probe is "frozen" and won't budge no
matter how long we wait.

Marko

Marko Rauhamaa

未讀,

2016年6月28日中午12:41:292016/6/28

收件者：

Marko Rauhamaa <ma...@pacujo.net>:

> (sorry for the premature previous post)

Screw it! Wrong thread!

Marko

Random832

未讀,

2016年6月28日下午1:19:192016/6/28

收件者：

On Tue, Jun 28, 2016, at 12:39, Marko Rauhamaa wrote:
> A physicist once clarified to me that an almost-black-hole is
> practically identical with a black hole because all information about
> anything falling in is very quickly red-shifted to oblivion.

Subject to some definition of "quickly" and "oblivion", I expect.

Ian Kelly

未讀,

2016年6月28日下午2:28:362016/6/28

收件者：

I'm skeptical of this. As the ping frequency falls drastically due to
relativistic effects, so too does the observed current powering the
electromagnet, does it not?

Marko Rauhamaa

未讀,

2016年6月28日下午3:41:022016/6/28

收件者：

Ian Kelly <ian.g...@gmail.com>:

> On Tue, Jun 28, 2016 at 10:39 AM, Marko Rauhamaa <ma...@pacujo.net> wrote:
>> Inside the probe, we have a powerful electrical magnet that our
>> compass can detect from a safe distance away.
>>

>> [...]

>>
>> The compass needle shows that the probe is "frozen" and won't budge no
>> matter how long we wait.
>
> I'm skeptical of this. As the ping frequency falls drastically due to
> relativistic effects, so too does the observed current powering the
> electromagnet, does it not?

Actually, that would be a great question for a physicist to resolve.
Next question: would a permanent magnet make any difference?

I admit I changed my thought experiment at the last minute to use a
magnet instead of a charge because I could more realistically imagine a
powerful magnet and a simple detector. That may have been a mistake.

A charge, however, would do the "floating" I presume. It's difficult to
find a straight answer online. The topic of a charge falling into a
black hole is addressed from one angle at:

<URL: http://adsabs.harvard.edu/full/1971PASP...83..633R>

This is from an answer by a guy who says he's got a PhD in general
relativity:

there's no problem with information falling IN to a black hole, which
is allowed to externally display it's mass, charge, angular momentum
and linear momentum, all of which get inprinted on the horizon as
matter falls in

<URL: http://physics.stackexchange.com/questions/6432/gravitational-r
edshift-of-virtual-photons>

Again, I'd like a physicist to give a straight answer.

Marko

Gene Heskett

未讀,

2016年6月28日下午5:11:452016/6/28

收件者：

I am not a physicist, but consider this:

At the event horizon, which is that point where the mass of the probe has
become very near infinite because it is moving very very close to the
speed of light, time also becomes stretched by the same effect, so that
in the probe as it falls thru the event horizon, time is so stretched
that for the people in the probe, everything matches up and to them it
took perhaps a microsecond to fall past the horizon. But to an external
observer, its entirely possible that the probe is frozen at the horizon
because the infinite mass prevents the infall from completion for
billions of our years. In fact, I'd say that it will eventually sink
thru and disappear, but it will do so because the holes event horizon
will grow as it absorbs the mass of other things its pulling in,
placeing the probe inside the horizon without the probe moving inward.

Somewhere in all that math that breaks down inside the event horizon, is
probably the reason that well fed black holes are also ejecting matter
from their rotational axis, surplus because the event horizon is also
subject to the infinite limitation, and because its "stuck" the surplus
matter over and above that which creates the event horizon, is ejected
from the poles of its spin axis. We have millions of examples of that
in the visible universe. Just be glad as can be that we don't have one
for a neighbor that could point one of those beams at earth from 10 ly
away. Instant planet wide sterilization.

Cheers, Gene Heskett
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>

Lawrence D’Oliveiro

未讀,

2016年6月28日晚上8:43:092016/6/28

收件者：

On Tuesday, June 28, 2016 at 10:35:21 PM UTC+12, Chris Angelico wrote:

> On Tue, Jun 28, 2016 at 6:49 PM, Lawrence D’Oliveiro wrote:
> > On Tuesday, June 28, 2016 at 7:26:30 PM UTC+12, Chris Angelico wrote:
>>> Why not:
>>>
>>> if (error) goto cleanup;
>>> ...
>>> cleanup:
>>> do_cleanup;
>>
>> They all fall into that same trap. Doesn’t scale. Try one with
>> allocation inside a loop, e.g. lines 488 onwards.
>
> How is that different? You can use a goto inside a for loop just fine.

You have my code; show us how you can do better.

Steven D'Aprano

未讀,

2016年6月28日晚上10:28:042016/6/28

收件者：

On Wed, 29 Jun 2016 12:13 am, Random832 wrote:

> On Tue, Jun 28, 2016, at 00:31, Rustom Mody wrote:
>> GG downgrades posts containing unicode if it can, thereby increasing
>> reach to recipients with unicode-broken clients
>
> That'd be entirely reasonable, except for the excessively broad
> application of "if it can".
>
> Certainly it _can_ do it all the time. Just replace anything that
> doesn't fit with question marks or hex notation or \N{NAME} or some
> human readable pseudo-representation a la unidecode. It could have done
> any of those with the Hindi that you threw in to try to confound it, (or
> it could have chosen ISCII, which likewise lacks arrow characters, as
> the encoding to downgrade to).

Are you suggesting that email clients and newsreaders should silently mangle
the text of your message behind your back? Because that's what it sounds
like you're saying.

I understand technical limitations. If I'm using a client that can't cope
with anything but (say) ISCII or Latin-1, then I'm all out of luck if I
want to write an email containing Greek or Cyrillic. I get that.

But if the client allows me to type Greek or Cyrillic into the editor, and
then accepts that message for sending, and it mangles it into "question
marks or hex notation or \N{NAME}" (for example), that's a disgrace and
completely unacceptable.

Yes, software *is capable of doing so*, in the same way that software is
capable of deleting all the vowels from your post, or replacing the
word "medieval" with "medireview":

http://northernplanets.blogspot.com.au/2007/01/medireview.html

This is not a good idea.

> It should pick an encoding which it expects recipients to support and
> which contains *all* of the characters in the message,

That would be UTF-8. That's a no-brainer. Why would you use any other
encoding?

If you use UTF-8, it just works. It supports the *entire* Unicode character
set, which is a superset of virtually all code pages and encodings you are
likely to encounter in practice. (No, your software probably isn't running
on a 1980s vintage Atari, and if you're in Japan using TRON you've got your
own software.) And your text widget or editor surely supports Unicode,
because if it didn't, the user couldn't type those Hindi or Greek letters.

So there's an obvious, sensible algorithm:

- take the user's Unicode text, and encode it to UTF-8

In pseudo-code:

content = text.encode('utf-8')

And there's the actual algorithm used by mail clients and newsreaders:

- take the user's Unicode text, and try encoding it as a variety of
different encodings (US-ASCII, Latin-1, maybe a few others); if they fail,
then fall back to UTF-8

Or in pseudo-code:

list_of_encodings = ['US-ASCII', 'Latin-1', ...]
for encoding in list_of_encodings:
try:
content = text.encode(encoding)
break
except UnicodeEncodingError:
pass
else:
content = text.encode('utf-8')

Why would you write the second instead of the first? It's just *dumb code*.
Maybe 20 year old applications could be excused for thinking that this
newfangled Unicode thing should be the last resort instead of the code page
system, but its 2016 now and code pages are just holding us back.

This is *especially* egregious since UTF-8 text containing only ASCII
characters is (by design) indistinguishable from US-ASCII, so even if there
is some application out there from 1980 that can only cope with ASCII, your
UTF-8 email will be perfectly readable to the degree that it only
uses "plain text".

> as proper
> characters and not as pseudo-representations, and downgrade to that if
> and only if such an encoding can be found. For most messages, it can use
> US-ASCII. For most of the remainder it can use some ISO-8859 or
> Windows-125x encoding.

There's never any need to downgrade to a non-Unicode encoding, at least not
by default. Well, maybe in Asia, I don't know how well Asian software
supports Unicode.

--
Steven
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

Steven D'Aprano

未讀,

2016年6月28日晚上10:36:102016/6/28

收件者：

On Tue, 28 Jun 2016 12:10 am, Rustom Mody wrote:

> Analogy: Python's bool as 1½-class because bool came into python a good
> decade after python and breaking old code is a bigger issue than fixing
> control constructs to be bool-strict

That analogy fails because Python bools being implemented as ints is not a
bug to be fixed, but a useful feature.

There are downsides, of course, but there are also benefits. It comes down
to a matter of personal preference whether you think that bools should be
abstract True/False values or concrete 1/0 values. Neither decision is
clearly wrong, it's a matter of what you value.

Whereas some decisions are just dumb:

https://www.jwz.org/blog/2010/10/every-day-i-learn-something-new-and-stupid/

Rustom Mody

未讀,

2016年6月28日晚上11:57:442016/6/28

收件者：

Yeah
I remember it silently converted «guillemets» to <<guillemets>>
[This is an experiment :-) ]

Rustom Mody

未讀,

2016年6月29日凌晨12:03:362016/6/29

收件者：

On Wednesday, June 29, 2016 at 9:27:44 AM UTC+5:30, Rustom Mody wrote:
> On Wednesday, June 29, 2016 at 7:58:04 AM UTC+5:30, Steven D'Aprano wrote:
> > This is not a good idea.

> [This is an experiment :-) ]

Um...
So now its working
ie the offensive behavior (to me also) that Steven described is not the case any
more??

Or did I mis-remember which characters it mauls and which leaves alone
Maybe a ‹single guillemet› ?

Chris Angelico

未讀,

2016年6月29日凌晨12:20:242016/6/29

收件者：

On Wed, Jun 29, 2016 at 12:35 PM, Steven D'Aprano <st...@pearwood.info> wrote:
>
> Whereas some decisions are just dumb:
>
> https://www.jwz.org/blog/2010/10/every-day-i-learn-something-new-and-stupid/

"""It would also be reasonable to assume that any sane language
runtime would have integers transparently degrade to BIGNUMs, making
the choice of accuracy over speed, but of course that almost never
happens..."""

Python 2 did this, but Python 3 doesn't. Does this mean that:

1) The performance advantage of native integers is negligible?
2) The performance benefit of having two representations for integers
isn't worth the complexity of one data type having two
representations?
3) The advantage of merging the types was so great that it was done in
the most straight-forward way, and then nobody got around to doing
performance testing?
4) Something else?

ChrisA

Lawrence D’Oliveiro

未讀,

2016年6月29日凌晨12:51:582016/6/29

收件者：

On Wednesday, June 29, 2016 at 4:20:24 PM UTC+12, Chris Angelico wrote:
>> https://www.jwz.org/blog/2010/10/every-day-i-learn-something-new-and-stupid/
>
> """It would also be reasonable to assume that any sane language
> runtime would have integers transparently degrade to BIGNUMs, making
> the choice of accuracy over speed, but of course that almost never
> happens..."""
>
> Python 2 did this, but Python 3 doesn't.

Huh?

ldo@theon:~> python3
Python 3.5.1+ (default, Jun 10 2016, 09:03:40)
[GCC 5.4.0 20160603] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 2 ** 64
18446744073709551616
>>> type(2 ** 64)
<class 'int'>

Random832

未讀,

2016年6月29日凌晨12:52:522016/6/29

收件者：

On Tue, Jun 28, 2016, at 22:27, Steven D'Aprano wrote:
> Are you suggesting that email clients and newsreaders should silently
> mangle the text of your message behind your back? Because that's what
> it sounds like you're saying.

That was how I was characterizing Rustom Mody's position; he seemed to
be justifying a supposed "feature" of Google Groups to mangle Unicode
arrow characters into ASCII pseudorepresentations such as <- and ->.

Chris Angelico

未讀,

2016年6月29日凌晨1:07:252016/6/29

收件者：

The transparent shift from machine-word to bignum is what no longer
exists. Both Py2 and Py3 will store large integers as bignums; Py2 has
two separate data types (int and long), with ints generally
outperforming longs, but Py3 simply has one (called int, but
functionally like Py2's long), and does everything with bignums.
There's no longer a boundary - instead, everything gets the "bignum
tax". How steep is that tax? I'm not sure, but microbenchmarking shows
that there definitely is one. How bad is it in real-world code? No
idea.

ChrisA

Steven D'Aprano

未讀,

2016年6月29日凌晨1:26:462016/6/29

收件者：

On Wednesday 29 June 2016 14:51, Lawrence D’Oliveiro wrote:

> On Wednesday, June 29, 2016 at 4:20:24 PM UTC+12, Chris Angelico wrote:
>>> https://www.jwz.org/blog/2010/10/every-day-i-learn-something-new-and-
stupid/
>>
>> """It would also be reasonable to assume that any sane language
>> runtime would have integers transparently degrade to BIGNUMs, making
>> the choice of accuracy over speed, but of course that almost never
>> happens..."""
>>
>> Python 2 did this, but Python 3 doesn't.
>
> Huh?

Chris is referring to the fact that in Python 2, there were two distinct
integer types, int and long. Originally they were entirely independent, and int
overflow would give you an exception:

# Python 1.5
>>> type(2147483647)
<type 'int'>
>>> 2147483647 + 1
Traceback (innermost last):
File "<stdin>", line 1, in ?
OverflowError: integer addition

To do BIGNUM arithmetic, you needed to explicitly specify at least one argument
was a long:

>>> 2147483647 + 1L
2147483648L

but from about 2.4 or thereabouts, Python would automatically promote ints to
longs when a calculation got too big:

# Python 2.7, 32-bit build
py> type(2147483647)
<type 'int'>
py> type(2147483647 + 1)
<type 'long'>

which is the behaviour JMZ is referring to.

BUT in Python 3, the distinction between int and long is gone by dropping int
and renaming long as "int". So all Python ints are BIGNUMs.

In principle Python might use native 32 or 64 bit ints for small values and
secretly promote them to BIGNUMs when needed, but as far as I know no
implementation of Python currently does this.

--
Steve

Lawrence D’Oliveiro

未讀,

2016年6月29日凌晨1:52:082016/6/29

收件者：

On Wednesday, June 29, 2016 at 5:26:46 PM UTC+12, Steven D'Aprano wrote:
> BUT in Python 3, the distinction between int and long is gone by dropping
> int and renaming long as "int". So all Python ints are BIGNUMs.

I don’t understand what the problem is with this. Is there supposed to be some issue with performance? Because I can’t see it.

Rustom Mody

未讀,

2016年6月29日凌晨1:56:082016/6/29

收件者：

On Wednesday, June 29, 2016 at 10:37:25 AM UTC+5:30, Chris Angelico wrote:

New to me -- thanks.
I thought it did an FSR type covert machine word → BigInt conversion under the hood.
Tax is one question
Justification for this change is another

Chris Angelico

未讀,

2016年6月29日凌晨2:27:032016/6/29

收件者：

On Wed, Jun 29, 2016 at 3:55 PM, Rustom Mody <rusto...@gmail.com> wrote:
>> The transparent shift from machine-word to bignum is what no longer
>> exists. Both Py2 and Py3 will store large integers as bignums; Py2 has
>> two separate data types (int and long), with ints generally
>> outperforming longs, but Py3 simply has one (called int, but
>> functionally like Py2's long), and does everything with bignums.
>> There's no longer a boundary - instead, everything gets the "bignum
>> tax". How steep is that tax? I'm not sure, but microbenchmarking shows
>> that there definitely is one. How bad is it in real-world code? No
>> idea.
>>
>> ChrisA
>
> New to me -- thanks.
> I thought it did an FSR type covert machine word → BigInt conversion under the hood.
> Tax is one question
> Justification for this change is another

CPython doesn't currently do anything like that, but it would be
perfectly possible to do it invisibly, and thus stay entirely within
the language spec. I'm not aware of any Python implementation that
does this, but it wouldn't surprise me if PyPy has some magic like
that. It's PyPy's kind of thing.

It's also entirely possible that a future CPython will have this kind
of optimization too. It all depends on someone doing the
implementation work and then proving that it's worth the complexity.

ChrisA

Rustom Mody

未讀,

2016年6月29日凌晨2:33:302016/6/29

收件者：

On Wednesday, June 29, 2016 at 11:57:03 AM UTC+5:30, Chris Angelico wrote:

Huh? I though I was just describing python2's behavior:

$ python
Python 2.7.11+ (default, Apr 17 2016, 14:00:29)
[GCC 5.3.1 20160413] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> x=2
>>> type(x)
<type 'int'>
>>> y=x ** 80
>>> y
1208925819614629174706176L
>>> type(y)
<type 'long'>

Rustom Mody

未讀,

2016年6月29日凌晨2:38:022016/6/29

收件者：

Um ok I see the ...L there at the end of 2 ** 80
So its not exactly 'under the hood'

Steven D'Aprano

未讀,

2016年6月29日凌晨2:46:042016/6/29

收件者：

If there is a performance hit, it's probably pretty small. It may have been
bigger back in Python 3.0 or 3.1.

[steve@ando ~]$ python2.7 -m timeit -s "n = 0" "for i in xrange(10000): n += i"
100 loops, best of 3: 1.87 msec per loop

[steve@ando ~]$ python3.3 -m timeit -s "n = 0" "for i in range(10000): n += i"
1000 loops, best of 3: 1.89 msec per loop

Although setting debugging options does make it pretty slow:

[steve@ando ~]$ python/python-dev/3.6/python -m timeit -s "n = 0" "for i in
range(10000): n += i"
100 loops, best of 3: 13.7 msec per loop

--
Steve

Lawrence D’Oliveiro

未讀,

2016年6月29日凌晨4:07:292016/6/29

收件者：

On Wednesday, June 29, 2016 at 6:46:04 PM UTC+12, Steven D'Aprano wrote:
> On Wednesday 29 June 2016 15:51, Lawrence D’Oliveiro wrote:
>
>> On Wednesday, June 29, 2016 at 5:26:46 PM UTC+12, Steven D'Aprano wrote:
>>> BUT in Python 3, the distinction between int and long is gone by dropping
>>> int and renaming long as "int". So all Python ints are BIGNUMs.
>>
>> I don’t understand what the problem is with this. Is there supposed to be
>> some issue with performance? Because I can’t see it.
>
> If there is a performance hit, it's probably pretty small. It may have been
> bigger back in Python 3.0 or 3.1.
>
> [steve@ando ~]$ python2.7 -m timeit -s "n = 0" "for i in xrange(10000): n += i"
> 100 loops, best of 3: 1.87 msec per loop
>
> [steve@ando ~]$ python3.3 -m timeit -s "n = 0" "for i in range(10000): n += i"
> 1000 loops, best of 3: 1.89 msec per loop

Here is what I tried:

ldo@theon:python_try> python2 int_speed_test.py
2 ** 6 is 2 ** 6: True
1000000 iterations of “a = 2 ** 6 // 2 ** 4” took 0.0624719s = 6.24719e-08s/iteration
2 ** 9 is 2 ** 9: False
1000000 iterations of “a = 2 ** 9 // 2 ** 6” took 0.0506701s = 5.06701e-08s/iteration
2 ** 20 is 2 ** 20: False
1000000 iterations of “a = 2 ** 20 // 2 ** 12” took 0.0441589s = 4.41589e-08s/iteration
2 ** 64 is 2 ** 64: False
1000000 iterations of “a = 2 ** 64 // 2 ** 32” took 0.138092s = 1.38092e-07s/iteration
2 ** 96 is 2 ** 96: False
1000000 iterations of “a = 2 ** 96 // 2 ** 64” took 0.1142s = 1.142e-07s/iteration
ldo@theon:python_try> python3 int_speed_test.py
2 ** 6 is 2 ** 6: True
1000000 iterations of “a = 2 ** 6 // 2 ** 4” took 0.0230309s = 2.30309e-08s/iteration
2 ** 9 is 2 ** 9: False
1000000 iterations of “a = 2 ** 9 // 2 ** 6” took 0.0231234s = 2.31234e-08s/iteration
2 ** 20 is 2 ** 20: False
1000000 iterations of “a = 2 ** 20 // 2 ** 12” took 0.020053s = 2.0053e-08s/iteration
2 ** 64 is 2 ** 64: False
1000000 iterations of “a = 2 ** 64 // 2 ** 32” took 0.0182259s = 1.82259e-08s/iteration
2 ** 96 is 2 ** 96: False
1000000 iterations of “a = 2 ** 96 // 2 ** 64” took 0.0173797s = 1.73797e-08s/iteration

As you can see, Python 3 is actually *faster* than Python 2, particularly with smaller-magnitude integers.

Chris Angelico

未讀,

2016年6月29日凌晨4:10:132016/6/29

收件者：

That's not necessarily fair - you're comparing two quite different
Python interpreters, so there might be something entirely different
that counteracts the integer performance. (For example: You're
creating and disposing of large numbers of objects, so the performance
of object creation could affect things hugely.) To make it somewhat
fairer, add long integer performance to the mix. Starting by redoing
your test:

rosuav@sikorsky:~$ python2.7 -m timeit -s "n = 0" "for i in
xrange(10000): n += i"
10000 loops, best of 3: 192 usec per loop
rosuav@sikorsky:~$ python2.7 -m timeit -s "n = 1<<100" "for i in
xrange(10000): n += i"
1000 loops, best of 3: 478 usec per loop
rosuav@sikorsky:~$ python3.4 -m timeit -s "n = 0" "for i in
range(10000): n += i"
1000 loops, best of 3: 328 usec per loop
rosuav@sikorsky:~$ python3.4 -m timeit -s "n = 1<<100" "for i in
range(10000): n += i"
1000 loops, best of 3: 337 usec per loop
rosuav@sikorsky:~$ python3.5 -m timeit -s "n = 0" "for i in
range(10000): n += i"
1000 loops, best of 3: 369 usec per loop
rosuav@sikorsky:~$ python3.5 -m timeit -s "n = 1<<100" "for i in
range(10000): n += i"
1000 loops, best of 3: 356 usec per loop
rosuav@sikorsky:~$ python3.6 -m timeit -s "n = 0" "for i in
range(10000): n += i"
1000 loops, best of 3: 339 usec per loop
rosuav@sikorsky:~$ python3.6 -m timeit -s "n = 1<<100" "for i in
range(10000): n += i"
1000 loops, best of 3: 343 usec per loop

(On this system, python3.4 and python3.5 are Debian-shipped builds of
CPython, and python3.6 is one I compiled from hg today. There's no
visible variance between them, but just in case. I don't have a
python3.3 on here for a fair comparison with your numbers, sorry.)

The way I read this, Python 2.7 is noticeably slower with bignums, but
visibly faster with machine words. Python 3, on the other hand, has
consistent performance whether the numbers fit within a machine word
or not - which is to be expected, since it uses bignums for all
integers. PyPy's performance shows an even more dramatic gap:

rosuav@sikorsky:~$ pypy -m timeit -s "n = 0" "for i in xrange(10000): n += i"
100000 loops, best of 3: 7.59 usec per loop
rosuav@sikorsky:~$ pypy -m timeit -s "n = 1<<100" "for i in
xrange(10000): n += i"
10000 loops, best of 3: 119 usec per loop
rosuav@sikorsky:~$ pypy --version
Python 2.7.10 (5.1.2+dfsg-1, May 17 2016, 18:03:30)
[PyPy 5.1.2 with GCC 5.3.1 20160509]

Sadly, Debian doesn't ship a pypy3 yet, so for consistency, I picked
up the latest available pypy2 and pypy3 from pypy.org.

rosuav@sikorsky:~/tmp$ pypy2-v5.3.1-linux64/bin/pypy -m timeit -s "n =

0" "for i in xrange(10000): n += i"

100000 loops, best of 3: 7.58 usec per loop
rosuav@sikorsky:~/tmp$ pypy2-v5.3.1-linux64/bin/pypy -m timeit -s "n =
1<<100" "for i in xrange(10000): n += i"
10000 loops, best of 3: 115 usec per loop
rosuav@sikorsky:~/tmp$ pypy3.3-v5.2.0-alpha1-linux64/bin/pypy3 -m

timeit -s "n = 0" "for i in range(10000): n += i"

100000 loops, best of 3: 7.56 usec per loop
rosuav@sikorsky:~/tmp$ pypy3.3-v5.2.0-alpha1-linux64/bin/pypy3 -m
timeit -s "n = 1<<100" "for i in range(10000): n += i"
10000 loops, best of 3: 115 usec per loop

Performance comparable to each other (and to the Debian-shipped one,
which is nice - as Adam Savage said, I love consistent data!), and
drastically different between machine words and bignums. So it looks
like PyPy *does* have some sort of optimization going on here, without
ever violating the language spec.

ChrisA

BartC

未讀,

2016年6月29日清晨5:49:232016/6/29

收件者：

On 29/06/2016 06:26, Steven D'Aprano wrote:

>
> BUT in Python 3, the distinction between int and long is gone by dropping int
> and renaming long as "int". So all Python ints are BIGNUMs.
>
> In principle Python might use native 32 or 64 bit ints for small values and
> secretly promote them to BIGNUMs when needed, but as far as I know no
> implementation of Python currently does this.

Presumably the implementation of BIGNUMs would already do something like
this: a number that fits into 64 bits would only use 64 bits. The
overheads of dealing with both small BIGNUMs and big ones, or a mix,
might be lost in the other overheads of CPython.

But I remember when playing with my tokeniser benchmarks earlier this
year that switching from dealing with strings, to integers, didn't make
things much faster (I think they actually made it slower sometimes).

Even if Python has extremely efficient string handling, we know that
low-level string ops normally take longer than low-level integer ops.

So maybe small-integer handling already had enough overhead that
implementing them as small BIGNUMs didn't make much difference, but it
simplified the language.

--
Bartc

Lawrence D’Oliveiro

未讀,

2016年6月29日清晨5:56:252016/6/29

收件者：

On Wednesday, June 29, 2016 at 9:49:23 PM UTC+12, BartC wrote:
> Even if Python has extremely efficient string handling, we know that
> low-level string ops normally take longer than low-level integer ops.

Maybe part of the general principle that, on modern machines, memory is cheap, but accessing memory is expensive?

BartC

未讀,

2016年6月29日清晨6:10:512016/6/29

收件者：

No, it's just fewer instructions. If you do the equivalent of a==b where
both are integers, it might be a couple of instructions in native code.

If both are strings, even of one character each (say the code is
choosing to compare "A" with "B" instead of ord("A") with ord("B"), then
it's a /lot/ more than two instructions.

(With Python there's the side-issue of actually getting the integer
values. Having to call ord() doesn't help the case for using integers.)

--
Bartc

Rustom Mody

未讀,

2016年6月29日清晨6:24:152016/6/29

收件者：

On Wednesday, June 29, 2016 at 8:06:10 AM UTC+5:30, Steven D'Aprano wrote:
> On Tue, 28 Jun 2016 12:10 am, Rustom Mody wrote:
>
> > Analogy: Python's bool as 1½-class because bool came into python a good
> > decade after python and breaking old code is a bigger issue than fixing
> > control constructs to be bool-strict
>
> That analogy fails because Python bools being implemented as ints is not a
> bug to be fixed, but a useful feature.
>
> There are downsides, of course, but there are also benefits. It comes down
> to a matter of personal preference whether you think that bools should be
> abstract True/False values or concrete 1/0 values. Neither decision is
> clearly wrong, it's a matter of what you value.
>
> Whereas some decisions are just dumb:
>
> https://www.jwz.org/blog/2010/10/every-day-i-learn-something-new-and-stupid/

Answered in "Operator Precedence/Boolean" thread where this is more relevant

Steven D'Aprano

未讀,

2016年6月29日上午8:36:192016/6/29

收件者：

Um, the two code snippets do the same thing. Comparing two different
versions of the same interpreter is *precisely* what I intended to do:

- is CPython using boxed native ints faster than CPython using
boxed BigNums, post unification?

No, my test doesn't precisely compare performance of boxed native ints
versus boxed BigNums for the same version, but I don't care about that. I
care about whether the Python interpeter is slower at int arithmetic since
unifying int and long, and my test shows that it isn't.

> (For example: You're
> creating and disposing of large numbers of objects, so the performance
> of object creation could affect things hugely.)

Sure. But in real life code, you're likely to be creating and disposing of
large numbers of objects. And both versions create and dispose of the same
objects, so the test is fair to both versions.

> To make it somewhat
> fairer, add long integer performance to the mix. Starting by redoing
> your test:

Why? That's irrelevant. The comparison I'm looking at is whether arithmetic
was faster using boxed native ints in older versions. In other words, has
there been a performance regression between 2.7 and 3.3?

For int arithmetic, the answer is No. I can make guesses and predictions
about why there is no performance regression:

- native ints were amazingly fast in Python 2.7, and BigNums in Python 3.3
are virtually as fast;

- native ints were horribly slow in Python 2.7, and changing to BigNums is
no slower;

- native ints were amazingly fast in Python 2.7, and BigNums in Python 3.3
are horribly slow, BUT object creation and disposal was horribly slow in
2.7 and is amazingly fast in 3.3, so overall it works out about equal;

- int arithmetic is so fast in Python 2.7, and xrange() so slow, that what I
actually measured was just the cost of calling xrange, and by mere
coincidence it happened to be almost exactly the same speed as bignum
arithmetic in 3.3.

But frankly, I don't really care that much. I'm not so much interested in
micro-benchmarking individual features of the interpreter as caring about
the overall performance, and for that, I think my test was reasonable and
fair.

> rosuav@sikorsky:~$ python2.7 -m timeit -s "n = 0" "for i in
> xrange(10000): n += i"
> 10000 loops, best of 3: 192 usec per loop
> rosuav@sikorsky:~$ python2.7 -m timeit -s "n = 1<<100" "for i in
> xrange(10000): n += i"
> 1000 loops, best of 3: 478 usec per loop

Now *that's* an unfair benchmark, because we know that BigNums get slower as
they get bigger. A BigNum with 30+ digits is not going to perform like a
BigNum with 8 digits.

The right test here would be:

python2.7 -m timeit -s "n = 0L" "for i in xrange(10000): n += i"

On my machine, I get these figures:

[steve@ando ~]$ python2.7 -m timeit -s "n = 0" "for i in xrange(10000):
n += i"

1000 loops, best of 3: 2.25 msec per loop
[steve@ando ~]$ python2.7 -m timeit -s "n = 0L" "for i in xrange(10000):
n += i"
100 loops, best of 3: 2.33 msec per loop

which suggests that even in 2.7, the performance difference between native
ints and BigNums was negligible for smallish numbers. But of course if we
use huge BigNums, they're more expensive:

[steve@ando ~]$ python2.7 -m timeit -s "n = 1 << 100" "for i in
xrange(10000): n += i"
100 loops, best of 3: 2.44 msec per loop

although apparently not *that* much more expensive on my machine. Let's try
something bigger:

[steve@ando ~]$ python2.7 -m timeit -s "n = 1 << 1000" "for i in
xrange(10000): n += i"
100 loops, best of 3: 4.23 msec per loop

Now you can see the cost of really BigNums. But still, that's about 300
digits, so not too shabby.

BartC

未讀,

2016年6月29日上午9:25:162016/6/29

收件者：

On 29/06/2016 13:36, Steven D'Aprano wrote:
> On Wed, 29 Jun 2016 06:09 pm, Chris Angelico wrote:

>> That's not necessarily fair - you're comparing two quite different
>> Python interpreters, so there might be something entirely different
>> that counteracts the integer performance.

> No, my test doesn't precisely compare performance of boxed native ints
> versus boxed BigNums for the same version, but I don't care about that. I
> care about whether the Python interpeter is slower at int arithmetic since
> unifying int and long, and my test shows that it isn't.

> For int arithmetic, the answer is No. I can make guesses and predictions
> about why there is no performance regression:
>
> - native ints were amazingly fast in Python 2.7, and BigNums in Python 3.3
> are virtually as fast;
>
> - native ints were horribly slow in Python 2.7, and changing to BigNums is
> no slower;
>
> - native ints were amazingly fast in Python 2.7, and BigNums in Python 3.3
> are horribly slow, BUT object creation and disposal was horribly slow in
> 2.7 and is amazingly fast in 3.3, so overall it works out about equal;
>
> - int arithmetic is so fast in Python 2.7, and xrange() so slow, that what I
> actually measured was just the cost of calling xrange, and by mere
> coincidence it happened to be almost exactly the same speed as bignum
> arithmetic in 3.3.
>
> But frankly, I don't really care that much. I'm not so much interested in
> micro-benchmarking individual features of the interpreter as caring about
> the overall performance, and for that, I think my test was reasonable and
> fair.

I think there are too many things going on in CPython that would
dominate matters beyond the actual integer arithmetic.

I used this little benchmark:

def fn():
n=0
for i in range(1000000):
n+=i

for k in range(100):
fn()

With CPython, Python 2 took 21 seconds (20 with xrange), while Python 3
was 12.3 seconds (fastest times).

I then ran the equivalent code under my own non-Python interpreter (but
a version using 100% C to keep the test fair), and it was 2.3 seconds.

(That interpreter keeps 64-bit integers and bigints separate. The 64-bit
integers are also value-types, not reference-counted objects.)

When I tried optimising versions, then PyPy took 7 seconds, while mine
took 0.5 seconds.

Testing the same code as C, then unoptimised it was 0.4 seconds, and
optimised, 0.3 seconds (but n was declared 'volatile' to stop the loop
being eliminated completely).

So the actual work involved takes 0.3 seconds. That means Python 3 is
spending 12.0 seconds dealing with overheads. The extra ones of dealing
with bigints would get lost in there!

(If I test that same code using an explicit bigint for n, then it's a
different story. It's too complicated to test for C, but it will likely
be a lot more than 0.3 seconds. And my bigint library is hopelessly
slow, taking some 35 seconds.

So from that point of view, Python is doing a good job of managing a
12-second time using a composite integer/bigint type.

However, the vast majority of integer code /can be done within 64 bits/.
Within 32 bits probably. But like I said, it's possible that other
overheads come into play than just the ones of using bigints, which I
would imagine are streamlined.)

--
Bartc

Chris Angelico

未讀,

2016年6月29日上午9:35:112016/6/29

收件者：

On Wed, Jun 29, 2016 at 10:36 PM, Steven D'Aprano <st...@pearwood.info> wrote:
>> rosuav@sikorsky:~$ python2.7 -m timeit -s "n = 0" "for i in
>> xrange(10000): n += i"
>> 10000 loops, best of 3: 192 usec per loop
>> rosuav@sikorsky:~$ python2.7 -m timeit -s "n = 1<<100" "for i in
>> xrange(10000): n += i"
>> 1000 loops, best of 3: 478 usec per loop
>
> Now *that's* an unfair benchmark, because we know that BigNums get slower as
> they get bigger. A BigNum with 30+ digits is not going to perform like a
> BigNum with 8 digits.

On its own, perhaps. But then I do the exact same tests on Python 3,
and the numbers are virtually identical - suggesting that the bignum
slowdown isn't all that significant at all. But in case you're
worried, I'll do it your way too:

rosuav@sikorsky:~$ python2.7 -m timeit -s "n = 0" "for i in
xrange(10000): n += i"
10000 loops, best of 3: 192 usec per loop
rosuav@sikorsky:~$ python2.7 -m timeit -s "n = 1<<100" "for i in
xrange(10000): n += i"

1000 loops, best of 3: 476 usec per loop
rosuav@sikorsky:~$ python2.7 -m timeit -s "n = 0L" "for i in
xrange(10000): n += i"
1000 loops, best of 3: 476 usec per loop

So, once again, my system shows that there's a definite slowdown from
using bignums - and it's the same whether I start with 1<<100 or 0L.
(In this particular run, absolutely precisely the same, but other runs
showed numbers like 479 and 486.) What's different about your system
that you see short ints as performing exactly the same as long ints?
Obviously you're running on a slower computer than mine (you're seeing
msec values compared to my usec), but that shouldn't be significant.
Is there a massive architectural difference?

rosuav@sikorsky:~$ uname -a
Linux sikorsky 4.6.0-1-amd64 #1 SMP Debian 4.6.1-1 (2016-06-06) x86_64 GNU/Linux

ChrisA

Chris Angelico

未讀,

2016年6月29日上午9:36:152016/6/29

收件者：

On Wed, Jun 29, 2016 at 11:24 PM, BartC <b...@freeuk.com> wrote:
> I used this little benchmark:
>
> def fn():
> n=0
> for i in range(1000000):
> n+=i
>
> for k in range(100):
> fn()

Add, up the top:

try: range = xrange
except NameError: pass

Otherwise, your Py2 tests are constructing a million-element list,
which is a little unfair.

ChrisA

BartC

未讀,

2016年6月29日上午10:47:412016/6/29

收件者：

It made little difference (21 seconds instead of 20 seconds).

But that was on Windows. I remember that Python was much more sluggish
on Windows than under Ubuntu on the same machine. (Maybe the Windows
version was 32-bits or something.)

Trying it on Ubuntu, Py2 takes 6 seconds (using xrange otherwise it's 9
seconds) , while pypy (2.7) manages 0.35 seconds.

pypy normally excels with such loops, but I recall also that it had some
trouble with this particular benchmark, which this version must have fixed.

--
Bartc

Gregory Ewing

未讀,

2016年6月29日晚上8:30:112016/6/29

收件者：

Marko Rauhamaa wrote:

> ----------
> / \
> / (almost) \ N
> | black ||
> | hole |S
> \ /
> \ /
> ----------
>
>
> /
> / compass needle
> /
>
> The compass needle shows that the probe is "frozen" and won't budge no
> matter how long we wait.

All your experiment shows is that the last information we had
about the magnet is that it was nearly stationary just above
the horizon.

It doesn't prove that the probe itself is frozen, any more than
the fact that a photograph you took of something last month
doesn't move proves that the object you photographed is
stuck in the state it was in a month ago.

Keep in mind that changes in the magnetic field propagate at
the speed of light and are subject to the same redshift, etc.
as any other signal. It doesn't matter whether you use a
permanent magnet, an electric charge, or coconuts banged
together in morse code, relativity still applies.

--
Greg

Steven D'Aprano

未讀,

2016年6月29日晚上10:50:262016/6/29

收件者：

On Thu, 30 Jun 2016 10:29 am, Gregory Ewing wrote:

> All your experiment shows is that the last information we had
> about the magnet is that it was nearly stationary just above
> the horizon.
>
> It doesn't prove that the probe itself is frozen, any more than
> the fact that a photograph you took of something last month
> doesn't move proves that the object you photographed is
> stuck in the state it was in a month ago.

The easy way to see that it isn't frozen in place is to try to fly down to
meet it.

> Keep in mind that changes in the magnetic field propagate at
> the speed of light and are subject to the same redshift, etc.
> as any other signal. It doesn't matter whether you use a
> permanent magnet, an electric charge, or coconuts banged
> together in morse code, relativity still applies.

An electric charge is a much better approach. Because it is a monopole, it
is detectable from a distant more easily than a magnetic bipole, and while
magnets are going to be vaporised into plasma (hence losing their magnetic
field), electrons are electrons (at least until you get into the quantum
gravity regime, at which point we don't know what happens).

Marko Rauhamaa

未讀,

2016年6月30日凌晨2:22:122016/6/30

收件者：

Gregory Ewing <greg....@canterbury.ac.nz>:

> All your experiment shows is that the last information we had about
> the magnet is that it was nearly stationary just above the horizon.
>
> It doesn't prove that the probe itself is frozen, any more than the
> fact that a photograph you took of something last month doesn't move
> proves that the object you photographed is stuck in the state it was
> in a month ago.

Interaction defines scientific reality. Things that don't interact don't
exist.

Call it cosmic duck-typing: you are what you appear to be.

Marko