Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

From D

19 views
Skip to first unread message

bearoph...@lycos.com

unread,
Jul 24, 2007, 6:19:53 AM7/24/07
to
There are various things I like about the D language that I think
Python too may enjoy. Here are few bits (mostly syntactical ones):

1) (we have discussed part of this in the past) You can put
underscores inside number literals, like 1_000_000, the compiler
doesn't enforce the position of such underscores, so you can also put
them like this: 1_00_000. You can put them in literals of decimals,
binary, hex, etc. I think it's quite useful, because when in Python
code I have a line like:
for i in xrange(1000000):
I need some time to count the zeros, because the lower levels of my
visual systems can't count/group them quickly (perceptually). While in
a syntax like:
for i in xrange(1_000_000):
my eyes help me group them at once.


2) Base 2 number literals, and base 2 "%b" printing with the writefln.
Base-2 numbers are less common in Python code, but once in a while I
use them. For example:
import std.stdio;
void main() {
auto x = 0b0100_0011;
writefln("%b", x);
writefln("%.8b", x);
writefln(x);
}
Prints:
1000011
01000011
67


3) All string literals are multi line. So you can write:
a = "how are
you";
There's no need for """ """.


4) With D I have created an xsplit() generator, and from my tests it's
quite faster than the split(), expecially if the string/lines you want
to split are few hundred chars long or more (it's not faster if you
want to split very little strings). So I think Python can enjoy such
string method too (you can probably simulate an xsplit with a regular
expression, but the same is true for some other string methods too).

Bye,
bearophile

Stargaming

unread,
Jul 24, 2007, 10:10:53 AM7/24/07
to
On Tue, 24 Jul 2007 03:19:53 -0700, bearophileHUGS wrote:

> There are various things I like about the D language that I think Python
> too may enjoy. Here are few bits (mostly syntactical ones):
>
> 1) (we have discussed part of this in the past) You can put underscores
> inside number literals, like 1_000_000, the compiler doesn't enforce the
> position of such underscores, so you can also put them like this:
> 1_00_000. You can put them in literals of decimals, binary, hex, etc. I
> think it's quite useful, because when in Python code I have a line like:
> for i in xrange(1000000):
> I need some time to count the zeros, because the lower levels of my
> visual systems can't count/group them quickly (perceptually). While in a
> syntax like:
> for i in xrange(1_000_000):
> my eyes help me group them at once.

Sounds like a good thing to be but the arbitrary positioning doesnt make
any sense. Additionally, I'd suggest 10**n in such cases (eg. 10**6).



> 2) Base 2 number literals, and base 2 "%b" printing with the writefln.
> Base-2 numbers are less common in Python code, but once in a while I use
> them. For example:
> import std.stdio;
> void main() {
> auto x = 0b0100_0011;
> writefln("%b", x);
> writefln("%.8b", x);
> writefln(x);
> }
> Prints:
> 1000011
> 01000011
> 67

Accepted. http://www.python.org/dev/peps/pep-3127/#abstract

> 3) All string literals are multi line. So you can write: a = "how are
> you";
> There's no need for """ """.

Well, I think it's just better to recognize visually. If you read ``foo =
"""...``, it's clear you can skip the next few lines because they're most
likely a chunk of data, not program code. Single quotation mark makes
clear this is just a very small token in the whole line. (Yes, there may
be exceptions, there may be syntax highlighting.)

> 4) With D I have created an xsplit() generator, and from my tests it's
> quite faster than the split(), expecially if the string/lines you want
> to split are few hundred chars long or more (it's not faster if you want
> to split very little strings). So I think Python can enjoy such string
> method too (you can probably simulate an xsplit with a regular
> expression, but the same is true for some other string methods too).

Yea, that's a good idea -- fits into the current movement towards
generator'ing everything. But (IIRC) this idea came up earlier and there
has been a patch, too. A quick search at sf.net didn't turn up anything
relevant, tho.

> Bye,
> bearophile

Regards,
Stargaming

Bjoern Schliessmann

unread,
Jul 24, 2007, 2:09:00 PM7/24/07
to
Stargaming wrote:
> On Tue, 24 Jul 2007 03:19:53 -0700, bearophileHUGS wrote:

>> While in a syntax like:
>> for i in xrange(1_000_000):
>> my eyes help me group them at once.
>
> Sounds like a good thing to be but the arbitrary positioning
> doesnt make any sense.

Checking underscore positions would only add complexity. Why not
just ignore them, no matter where they are?

> Additionally, I'd suggest 10**n in such cases (eg. 10**6).

This fails if you happen to have more than only zeros at the right
side.

Regards,


Björn

--
BOFH excuse #97:

Small animal kamikaze attack on power supplies

Steven D'Aprano

unread,
Jul 24, 2007, 7:08:37 PM7/24/07
to
On Tue, 24 Jul 2007 20:09:00 +0200, Bjoern Schliessmann wrote:

> Stargaming wrote:
>> On Tue, 24 Jul 2007 03:19:53 -0700, bearophileHUGS wrote:
>
>>> While in a syntax like:
>>> for i in xrange(1_000_000):
>>> my eyes help me group them at once.
>>
>> Sounds like a good thing to be but the arbitrary positioning
>> doesnt make any sense.
>
> Checking underscore positions would only add complexity. Why not
> just ignore them, no matter where they are?


Underscores in numerics are UGLY. Why not take a leaf out of implicit
string concatenation and allow numeric literals to implicitly concatenate?

Python already does:
"hello-" "world" => "hello-world"

Propose:
123 456 789 => 123456789
123.456 789 => 123.456789


--
Steven.

Eric_...@msn.com

unread,
Jul 24, 2007, 7:26:35 PM7/24/07
to


I think there is a language bridge so that you can compile d for
python.. looks realy easy but I have python 2.5 and panda and it
try's to go for the panda instalation. It looks much easier than c to
use with python in fact.. I don't know if that would change the speed
of it
though to be in a library.

https://sourceforge.net/projects/dex-tracker

Gabriel Genellina

unread,
Jul 24, 2007, 8:47:51 PM7/24/07
to pytho...@python.org
En Tue, 24 Jul 2007 11:10:53 -0300, Stargaming <starg...@gmail.com>
escribió:

> On Tue, 24 Jul 2007 03:19:53 -0700, bearophileHUGS wrote:
>
>> There are various things I like about the D language that I think Python
>> too may enjoy. Here are few bits (mostly syntactical ones):
>>
>> 1) (we have discussed part of this in the past) You can put underscores
>> inside number literals, like 1_000_000, the compiler doesn't enforce the
>> position of such underscores, so you can also put them like this:
>> 1_00_000. You can put them in literals of decimals, binary, hex, etc. I
>

> Sounds like a good thing to be but the arbitrary positioning doesnt make
> any sense. Additionally, I'd suggest 10**n in such cases (eg. 10**6).

Why not? Because in English major numbers are labeled in thousands?
(thousand, million, billion...)
In India, they're grouped by two after the first thousand; in China,
they're grouped each 4 digits (that is, there is a single word for "ten
thousands" = wan4 = 万, and the next required word is for 10**8 = yi4 = 亿)

--
Gabriel Genellina

Jakub Stolarski

unread,
Jul 25, 2007, 3:32:12 AM7/25/07
to
On Jul 25, 1:08 am, Steven D'Aprano

<st...@REMOVE.THIS.cybersource.com.au> wrote:
> Underscores in numerics are UGLY. Why not take a leaf out of implicit
> string concatenation and allow numeric literals to implicitly concatenate?
>
> Python already does:
> "hello-" "world" => "hello-world"
>
> Propose:
> 123 456 789 => 123456789
> 123.456 789 => 123.456789
>

I like that.

Wildemar Wildenburger

unread,
Jul 25, 2007, 11:32:52 AM7/25/07
to pytho...@python.org
Steven D'Aprano wrote:
> Python already does:
> "hello-" "world" => "hello-world"
>
> Propose:
> 123 456 789 => 123456789
> 123.456 789 => 123.456789
>
>
I second that!

/W

mensa...@aol.com

unread,
Jul 25, 2007, 1:22:46 PM7/25/07
to
On Jul 24, 6:08 pm, Steven D'Aprano

So, spaces will no longer be delimiters? Won't that cause
much wailing and gnashing of teeth?

>
> --
> Steven.


Paddy

unread,
Jul 25, 2007, 1:47:33 PM7/25/07
to
On Jul 25, 1:47 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
> En Tue, 24 Jul 2007 11:10:53 -0300, Stargaming <stargam...@gmail.com>

> escribió:
>
> > On Tue, 24 Jul 2007 03:19:53 -0700, bearophileHUGS wrote:
>
> >> There are various things I like about the D language that I think Python
> >> too may enjoy. Here are few bits (mostly syntactical ones):
>
> >> 1) (we have discussed part of this in the past) You can put underscores
> >> inside number literals, like 1_000_000, the compiler doesn't enforce the
> >> position of such underscores, so you can also put them like this:
> >> 1_00_000. You can put them in literals of decimals, binary, hex, etc. I
>
> > Sounds like a good thing to be but the arbitrary positioning doesnt make
> > any sense. Additionally, I'd suggest 10**n in such cases (eg. 10**6).
>
> Why not? Because in English major numbers are labeled in thousands?
> (thousand, million, billion...)
> In India, they're grouped by two after the first thousand; in China,
> they're grouped each 4 digits (that is, there is a single word for "ten
> thousands" = wan4 = , and the next required word is for 10**8 = yi4 = )
>
> --
> Gabriel Genellina

But then,what would _0 be, the number 0 or the name _0 analagous to
a0

- Pad.

star....@gmail.com

unread,
Jul 25, 2007, 1:47:47 PM7/25/07
to
On Jul 25, 1:22 pm, "mensana...@aol.com" <mensana...@aol.com> wrote:
>
> So, spaces will no longer be delimiters? Won't that cause
> much wailing and gnashing of teeth?
>

I can't think of a circumstance in which

48 1906

is valid, so . . .

I like it, too :)

--
Star Weaver

Marc 'BlackJack' Rintsch

unread,
Jul 25, 2007, 2:09:55 PM7/25/07
to
On Wed, 25 Jul 2007 10:47:33 -0700, Paddy wrote:

> But then,what would _0 be, the number 0 or the name _0 analagous to
> a0

Of course the name because numbers have to start with a digit or a dot.
Otherwise this would break backwards compatibility.

Ciao,
Marc 'BlackJack' Rintsch

Ben Finney

unread,
Jul 25, 2007, 9:00:04 PM7/25/07
to
"mensa...@aol.com" <mensa...@aol.com> writes:

I don't see how you get that conclusion from Steven's proposal. If the
latter implied that "spaces will no longer be delimiters", then surely
the former must also imply that.

The former already exists, spaces are still delimiters when syntax
allows, so your conclusion is baseless.

--
\ "I got food poisoning today. I don't know when I'll use it." |
`\ -- Steven Wright |
_o__) |
Ben Finney

mensa...@aol.com

unread,
Jul 25, 2007, 9:17:19 PM7/25/07
to
On Jul 25, 8:00 pm, Ben Finney <bignose+hates-s...@benfinney.id.au>
wrote:

> "mensana...@aol.com" <mensana...@aol.com> writes:
> > On Jul 24, 6:08 pm, Steven D'Aprano
> > <st...@REMOVE.THIS.cybersource.com.au> wrote:
> > > Python already does:
> > > "hello-" "world" => "hello-world"
>
> > > Propose:
> > > 123 456 789 => 123456789
> > > 123.456 789 => 123.456789
>
> > So, spaces will no longer be delimiters?
>
> I don't see how you get that conclusion from Steven's proposal.

IDLE 1.2c1
>>> s = '123 456'
>>> s.split()
['123', '456']

The only way to get '123 456' would be to treat a space as a
non-delimiter. But what if those actually WERE two different numbers?

Steven D'Aprano

unread,
Jul 25, 2007, 9:54:56 PM7/25/07
to


Did you miss the bit where Python ALREADY does this for strings?

Yes, whitespace will still delimit tokens. No, it won't be a problem,
because two int tokens can be "concatenated" to make a single int token,
exactly as happens for strings.

(I say "no problem", but of course I don't know how much _actual_ coding
effort will be needed to Make This Work. It might be a little, it might be
a lot.)

Currently, 234 567 is a syntax error in Python, so there are no problems
with backward compatibility or breaking code that relies on the meaning of
whitespace between two ints.


--
Steven

Steven D'Aprano

unread,
Jul 25, 2007, 10:04:01 PM7/25/07
to
On Wed, 25 Jul 2007 18:17:19 -0700, mensa...@aol.com wrote:

> On Jul 25, 8:00 pm, Ben Finney <bignose+hates-s...@benfinney.id.au>
> wrote:
>> "mensana...@aol.com" <mensana...@aol.com> writes:
>> > On Jul 24, 6:08 pm, Steven D'Aprano
>> > <st...@REMOVE.THIS.cybersource.com.au> wrote:
>> > > Python already does:
>> > > "hello-" "world" => "hello-world"
>>
>> > > Propose:
>> > > 123 456 789 => 123456789
>> > > 123.456 789 => 123.456789
>>
>> > So, spaces will no longer be delimiters?
>>
>> I don't see how you get that conclusion from Steven's proposal.
>
> IDLE 1.2c1
>>>> s = '123 456'
>>>> s.split()
> ['123', '456']
>
> The only way to get '123 456' would be to treat a space as a
> non-delimiter. But what if those actually WERE two different numbers?

That makes no sense at all. Your example is about splitting a _string_.
You can construct and split the string any way you like:

>>> s = '123SURPRISE456'
>>> s.split('SURPRISE')
['123', '456']

Notice that the results aren't ints, they are strings.

To get an int literal, you currently type something like 123456. 123 456
is currently not valid in Python, it raises an SyntaxError. Try it for
yourself:

>>> 123 456
File "<stdin>", line 1
123 456
^
SyntaxError: invalid syntax

If you want two numbers, you would do exactly the same thing you would now:

>>> x, y = 123, 456
>>> print "x is %d and y is %d" % (x, y)
x is 123 and y is 456

--
Steven.

mensa...@aol.com

unread,
Jul 26, 2007, 12:51:03 AM7/26/07
to
On Jul 25, 8:54?pm, Steven D'Aprano
<st...@REMOVE.THIS.cybersource.com.au> wrote:

> On Wed, 25 Jul 2007 10:22:46 -0700, mensana...@aol.com wrote:
> > On Jul 24, 6:08 pm, Steven D'Aprano
> > <st...@REMOVE.THIS.cybersource.com.au> wrote:
> >> On Tue, 24 Jul 2007 20:09:00 +0200, Bjoern Schliessmann wrote:
> >> > Stargaming wrote:
> >> >> On Tue, 24 Jul 2007 03:19:53 -0700, bearophileHUGS wrote:
>
> >> >>> While in a syntax like:
> >> >>> for i in xrange(1_000_000):
> >> >>> my eyes help me group them at once.
>
> >> >> Sounds like a good thing to be but the arbitrary positioning
> >> >> doesnt make any sense.
>
> >> > Checking underscore positions would only add complexity. Why not
> >> > just ignore them, no matter where they are?
>
> >> Underscores in numerics are UGLY. Why not take a leaf out of implicit
> >> string concatenation and allow numeric literals to implicitly concatenate?
>
> >> Python already does:
> >> "hello-" "world" => "hello-world"
>
> >> Propose:
> >> 123 456 789 => 123456789
> >> 123.456 789 => 123.456789
>
> > So, spaces will no longer be delimiters? Won't that cause
> > much wailing and gnashing of teeth?
>
> Did you miss the bit where Python ALREADY does this for strings?

Did you miss the bit where I agreed this was a GOOD feature?
You didn't miss it because I didn't say it.

>
> Yes, whitespace will still delimit tokens. No, it won't be a problem,
> because two int tokens can be "concatenated" to make a single int token,
> exactly as happens for strings.

Any number of whitespace characters? Just spaces or all
whitespace characters?

>
> (I say "no problem", but of course I don't know how much _actual_ coding
> effort will be needed to Make This Work. It might be a little, it might be
> a lot.)
>
> Currently, 234 567 is a syntax error in Python, so there are no problems
> with backward compatibility or breaking code that relies on the meaning of
> whitespace between two ints.

That's the ONLY issue? What about searching source
code files? What's the regular expression for
locating a number with an arbitrary number of digits
seperated into an arbitrary number of blocks of an
arbitray number of digits with an arbitrary number
of whitespace characters between each block?

>
> --
> Steven

mensa...@aol.com

unread,
Jul 26, 2007, 1:01:06 AM7/26/07
to
On Jul 25, 9:04?pm, Steven D'Aprano
<st...@REMOVE.THIS.cybersource.com.au> wrote:

> On Wed, 25 Jul 2007 18:17:19 -0700, mensana...@aol.com wrote:
> > On Jul 25, 8:00 pm, Ben Finney <bignose+hates-s...@benfinney.id.au>
> > wrote:
> >> "mensana...@aol.com" <mensana...@aol.com> writes:
> >> > On Jul 24, 6:08 pm, Steven D'Aprano
> >> > <st...@REMOVE.THIS.cybersource.com.au> wrote:
> >> > > Python already does:
> >> > > "hello-" "world" => "hello-world"
>
> >> > > Propose:
> >> > > 123 456 789 => 123456789
> >> > > 123.456 789 => 123.456789
>
> >> > So, spaces will no longer be delimiters?
>
> >> I don't see how you get that conclusion from Steven's proposal.
>
> > IDLE 1.2c1
> >>>> s = '123 456'
> >>>> s.split()
> > ['123', '456']
>
> > The only way to get '123 456' would be to treat a space as a
> > non-delimiter. But what if those actually WERE two different numbers?
>
> That makes no sense at all. Your example is about splitting a _string_.

Why does it make no sense? Have you never had to
scrape a web page or read a CSV file?

> You can construct and split the string any way you like:
>
> >>> s = '123SURPRISE456'
> >>> s.split('SURPRISE')
>
> ['123', '456']
>
> Notice that the results aren't ints, they are strings.

Duh. I took for granted you knew how to convert
an string to an integer.

>
> To get an int literal, you currently type something like 123456. 123 456
> is currently not valid in Python, it raises an SyntaxError. Try it for
> yourself:

So this proposal would only apply to string literals
at compile time, not running programs?

>
> >>> 123 456
>
> File "<stdin>", line 1
> 123 456
> ^
> SyntaxError: invalid syntax

And I want the same error to occur if my CSV parser
tries to convert '123 456' into a single number.
I don't want it to assume the number is '123456'.

Ben Finney

unread,
Jul 26, 2007, 1:18:32 AM7/26/07
to
"mensa...@aol.com" <mensa...@aol.com> writes:

> IDLE 1.2c1
> >>> s = '123 456'
> >>> s.split()
> ['123', '456']

The str.split method has no bearing on this discussion, which is about
the Python language syntax, and numeric literal values in particular.

--
\ "Pinky, are you pondering what I'm pondering?" "Wuh, I think |
`\ so, Brain, but burlap chafes me so." -- _Pinky and The Brain_ |
_o__) |
Ben Finney

Ben Finney

unread,
Jul 26, 2007, 2:21:16 AM7/26/07
to
"mensa...@aol.com" <mensa...@aol.com> writes:

> On Jul 25, 8:54?pm, Steven D'Aprano
> <st...@REMOVE.THIS.cybersource.com.au> wrote:
> Any number of whitespace characters? Just spaces or all whitespace
> characters?

> What about searching source code files? What's the regular


> expression for locating a number with an arbitrary number of digits
> seperated into an arbitrary number of blocks of an arbitray number
> of digits with an arbitrary number of whitespace characters between
> each block?

These issues all exist for implicitly concatenated string literals. It
seems the same rules would apply; that would both make sense from an
implementation standpoint, and result in consistency for the Python
programmer too.

--
\ "Are you pondering what I'm pondering?" "I think so, Brain, but |
`\ wouldn't his movies be more suitable for children if he was |
_o__) named Jean-Claude van Darn?" -- _Pinky and The Brain_ |
Ben Finney

Ben Finney

unread,
Jul 26, 2007, 2:24:32 AM7/26/07
to
"mensa...@aol.com" <mensa...@aol.com> writes:

> On Jul 25, 9:04?pm, Steven D'Aprano
> <st...@REMOVE.THIS.cybersource.com.au> wrote:
> Why does it make no sense? Have you never had to scrape a web page
> or read a CSV file?

Again, unrelated to the way the Python compiler syntactically treats
the source code.

> So this proposal would only apply to string literals at compile
> time, not running programs?

Exactly the same way that it works for string literals in source code:
once the source code is compiled, the literal is indistinguishable
from the same value written a different way.

> And I want the same error to occur if my CSV parser tries to convert
> '123 456' into a single number. I don't want it to assume the
> number is '123456'.

Once again, this is a discussion about Python syntax, not the
behaviour of the csv module.

--
\ "I always had a repulsive need to be something more than |
`\ human." -- David Bowie |
_o__) |
Ben Finney

bearoph...@lycos.com

unread,
Jul 26, 2007, 6:14:58 AM7/26/07
to
Sorry for the slow feedback.

Stargaming>Sounds like a good thing to be but the arbitrary


positioning doesnt make any sense.<

The arbitrary positioning allows you to denote 4-digit groups too in
binary/hex literals, like in my example:
auto x = 0b0100_0011;


Stargaming>fits into the current movement towards generator'ing


everything. But (IIRC) this idea came up earlier and there has been a
patch, too.<

Python is old so most simple ideas aren't new :-)


Steven D'Aprano>Underscores in numerics are UGLY.<

I presume it's a matter of taste too. I use them often in D code, and
the _ symbol is very different from the 0..F/0..f digits so you can
tell them apart with no problems.


Steven D'Aprano>Why not take a leaf out of implicit string


concatenation and allow numeric literals to implicitly concatenate?<

The "_" helps my eyes see that those digit groups are part of the same
number. With spaces I think my eyes may need a bit of extra time to
decide if they are parts of the same number literal.


Eric Dexter>I think there is a language bridge so that you can compile


d for python.. looks realy easy but I have python 2.5 and panda and
it try's to go for the panda instalation. It looks much easier than c
to use with python in fact..<

Are you talking about "Pyd"? It's a good bridge, and I like it. It's
actively updated, soon in version 1.0.

Bye,
bearophile

Paul Rubin

unread,
Jul 26, 2007, 6:58:20 AM7/26/07
to
Steven D'Aprano <st...@REMOVE.THIS.cybersource.com.au> writes:
> Propose:
> 123 456 789 => 123456789
> 123.456 789 => 123.456789

+1

Leo Petr

unread,
Jul 26, 2007, 11:39:21 AM7/26/07
to
On Jul 24, 10:10 am, Stargaming <stargam...@gmail.com> wrote:
> On Tue, 24 Jul 2007 03:19:53 -0700, bearophileHUGS wrote:
> > There are various things I like about the D language that I think Python
> > too may enjoy. Here are few bits (mostly syntactical ones):
>
> > 1) (we have discussed part of this in the past) You can put underscores
> > inside number literals, like 1_000_000, the compiler doesn't enforce the
> > position of such underscores, so you can also put them like this:
> > 1_00_000. You can put them in literals of decimals, binary, hex, etc. I
> > think it's quite useful, because when in Python code I have a line like:
> > for i in xrange(1000000):
> > I need some time to count the zeros, because the lower levels of my
> > visual systems can't count/group them quickly (perceptually). While in a
> > syntax like:
> > for i in xrange(1_000_000):
> > my eyes help me group them at once.
>
> Sounds like a good thing to be but the arbitrary positioning doesnt make
> any sense. Additionally, I'd suggest 10**n in such cases (eg. 10**6).
>

http://blogs.msdn.com/oldnewthing/archive/2006/04/17/577483.aspx

Digits are grouped in 2s in India and in 4s in China and Japan.

Regards,

Leons Petrazickis
http://lpetr.org/blog/

mensa...@aol.com

unread,
Jul 26, 2007, 2:02:26 PM7/26/07
to
On Jul 26, 1:24 am, Ben Finney <bignose+hates-s...@benfinney.id.au>
wrote:

> "mensana...@aol.com" <mensana...@aol.com> writes:
> > On Jul 25, 9:04?pm, Steven D'Aprano
> > <st...@REMOVE.THIS.cybersource.com.au> wrote:
> > Why does it make no sense? Have you never had to scrape a web page
> > or read a CSV file?
>
> Again, unrelated to the way the Python compiler syntactically treats
> the source code.

That's what I was enquiring about.

So, just as

>>> int('123' '456')
123456

is not an error, the proposal is that

>>> a = 123 456
SyntaxError: invalid syntax

will not be an error either.

Yet,

>>> a = int('123 456')
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
a = int('123 456')
ValueError: invalid literal for int() with base 10: '123 456'

will still be an error. Just trying to be clear on this. Wouldn't
want that syntax behavior to carry over into run-time.

>
> > So this proposal would only apply to string literals at compile
> > time, not running programs?
>
> Exactly the same way that it works for string literals in source code:
> once the source code is compiled, the literal is indistinguishable
> from the same value written a different way.
>
> > And I want the same error to occur if my CSV parser tries to convert
> > '123 456' into a single number. I don't want it to assume the
> > number is '123456'.
>
> Once again, this is a discussion about Python syntax, not the
> behaviour of the csv module.

Who said I was using the csv module?

mensa...@aol.com

unread,
Jul 26, 2007, 2:10:41 PM7/26/07
to
On Jul 26, 12:18 am, Ben Finney <bignose+hates-s...@benfinney.id.au>
wrote:

> "mensana...@aol.com" <mensana...@aol.com> writes:
> > IDLE 1.2c1
> > >>> s = '123 456'
> > >>> s.split()
> > ['123', '456']
>
> The str.split method has no bearing on this discussion,

It most certainly does. To make '123 456' into an integer,
you split it and then join it.
>>> z = '123 456'
>>> y = z.split()
>>> x = ''.join(y)
>>> w = int(x)
>>> w
123456

Just wanted to be sure that this must still be done explicitly
and that the language won't do it for me behind my back.

> which is about
> the Python language syntax,

Provided it is confined to the language syntax.

> and numeric literal values in particular.

Fine, as long as int('123 456') continues to be an error.

Kay Schluehr

unread,
Jul 26, 2007, 3:42:44 PM7/26/07
to

Nope. Just replace the current grammar rule

atom: ... NAME | STRING+ | NUMBER

by

atom: ... NAME | STRING+ | NUMBER+

The resulting grammar is still free of ambiguities. The tokenizer
doesn't complain anyway - not even yet.

Ryan Ginstrom

unread,
Jul 26, 2007, 6:45:13 PM7/26/07
to pytho...@python.org
> On Behalf Of Leo Petr

> Digits are grouped in 2s in India and in 4s in China and Japan.

This is not entirely true in Japan's case. When written without Japanese
characters, Japan employs the same format as the US, for example:

1,000,000
(However, they would read this as 百万 (hyaku man), literally 100 ten
thousands.)

Raymond is correct in that Japan traditionally groups in fours (and stills
reads it that way regardless, as shown above), but in an ordinary
programming context, this almost never comes into play.

On the original topic of the thread, I personally like the underscore idea
from D, and I like it better than the "concatenation" idea, even though I
agree that it is more consistent with Python's string-format rules.

Regards,
Ryan Ginstrom

Tim Williams

unread,
Jul 26, 2007, 7:26:23 PM7/26/07
to mensa...@aol.com, pytho...@python.org
On 26/07/07, mensa...@aol.com <mensa...@aol.com> wrote:

>> The str.split method has no bearing on this discussion,

> It most certainly does. To make '123 456' into an integer,


> you split it and then join it.
> >>> z = '123 456'
> >>> y = z.split()
> >>> x = ''.join(y)
> >>> w = int(x)
> >>> w
> 123456

....but it doesn't if you use replace !! <wink>

>>> z = '123 456'
>>> int( z.replace( ' ' ,'' ) )
> 123456


> Propose:
> 123 456 789 => 123456789
> 123.456 789 => 123.456789

+1 for me too

--

Tim Williams

Ben Finney

unread,
Jul 26, 2007, 7:44:27 PM7/26/07
to
"mensa...@aol.com" <mensa...@aol.com> writes:

> So, just as
>
> >>> int('123' '456')
> 123456
>
> is not an error, the proposal is that
>
> >>> a = 123 456
> SyntaxError: invalid syntax
>
> will not be an error either.

More directly: Just as these three statements create the same literal
value:

>>> 'abc' 'def'
'abcdef'
>>> 'ab' 'cd' 'ef'
'abcdef'
>>> 'abcdef'
'abcdef'

the proposal is that these three statements create the same literal
value:

>>> 12 345.678 90
12345.67890
>>> 12 3456.78 90
12345.67890
>>> 12345.67890
12345.67890

and not be a syntax error.

> Yet,
>
> >>> a = int('123 456')
> Traceback (most recent call last):
> File "<pyshell#7>", line 1, in <module>
> a = int('123 456')
> ValueError: invalid literal for int() with base 10: '123 456'
>
> will still be an error.

Since that value, '123 456', is one that is rejected by the 'int'
constructor. Nothing to do with this proposal.

> Just trying to be clear on this. Wouldn't want that syntax behavior
> to carry over into run-time.

The distinction you need to be clear on is between the Python syntax
for writing literal values in code (which is proposed to change by
this), and the behaviour of operations on arbitrary values at runtime
(which is outside the scope of this proposal).

--
\ "I bought a dog the other day. I named him Stay. It's fun to |
`\ call him. 'Come here, Stay! Come here, Stay!' He went insane. |
_o__) Now he just ignores me and keeps typing." -- Steven Wright |
Ben Finney

Ben Finney

unread,
Jul 26, 2007, 7:45:41 PM7/26/07
to
"mensa...@aol.com" <mensa...@aol.com> writes:

> > The str.split method has no bearing on this discussion,
>
> It most certainly does. To make '123 456' into an integer,
> you split it and then join it.

Indeed. Which has nothing to do with the Python syntax for creating a
numeric literal in code.

--
\ "God forbid that any book should be banned. The practice is as |
`\ indefensible as infanticide." -- Dame Rebecca West |
_o__) |
Ben Finney

fdu.x...@gmail.com

unread,
Jul 30, 2007, 10:50:00 PM7/30/07
to pytho...@python.org
Gabriel Genellina wrote:
> En Tue, 24 Jul 2007 11:10:53 -0300, Stargaming <starg...@gmail.com>
> escribió:

>
>> On Tue, 24 Jul 2007 03:19:53 -0700, bearophileHUGS wrote:
>>
>>> There are various things I like about the D language that I think Python
>>> too may enjoy. Here are few bits (mostly syntactical ones):
>>>
>>> 1) (we have discussed part of this in the past) You can put underscores
>>> inside number literals, like 1_000_000, the compiler doesn't enforce the
>>> position of such underscores, so you can also put them like this:
>>> 1_00_000. You can put them in literals of decimals, binary, hex, etc. I
>> Sounds like a good thing to be but the arbitrary positioning doesnt make
>> any sense. Additionally, I'd suggest 10**n in such cases (eg. 10**6).
>
> Why not? Because in English major numbers are labeled in thousands?
> (thousand, million, billion...)
> In India, they're grouped by two after the first thousand; in China,
> they're grouped each 4 digits (that is, there is a single word for "ten
> thousands" = wan4 = 万, and the next required word is for 10**8 = yi4 = 亿)
>

Yes, in China numbers are grouped each 4 digits while it is different in
other countries, so I think it would be better if we could put arbitrary white
spaces inside number literals.

Alex Martelli

unread,
Jul 31, 2007, 11:52:26 AM7/31/07
to
mensa...@aol.com <mensa...@aol.com> wrote:

> code files? What's the regular expression for
> locating a number with an arbitrary number of digits
> seperated into an arbitrary number of blocks of an
> arbitray number of digits with an arbitrary number
> of whitespace characters between each block?

For a decimal integer (or octal) number, I'd use something similar to:
r'\d[\d\s]+'

This also gets trailing whitespace, but that shouldn't be much of a
problem in most practical cases. Of course, just like today, it becomes
a bit hairier if you also want to find hex, oct (to be 0o777 in the
future), other future notations such as binary, floats, complex numbers,
&c:-) -- but the simple fact that a [\d\s] is accepted where today only
a \d would be, per se, would not contribute to that hair in any
significant way, it seems to me.


Alex

Ben Finney

unread,
Sep 1, 2008, 9:13:27 PM9/1/08
to
bearoph...@lycos.com writes:

> For Python 2.7/3.1 I'd now like to write a PEP regarding the
> underscores into the number literals, like: 0b_0101_1111, 268_435_456
> etc.

+1 on such a capability.

-1 on underscore as the separator.

When you proposed this last year, the counter-proposal was made
<URL:http://groups.google.com/group/comp.lang.python/msg/18123d100bba63b8?dmode=source>
to instead use white space for the separator, exactly as one can now
do with string literals.

I don't see any good reason (other than your familiarity with the D
language) to use underscores for this purpose, and much more reason
(readability, consistency, fewer arbitrary differences in syntax,
perhaps simpler implementation) to use whitespace just as with string
literals.

--
\ “When in doubt tell the truth. It will confound your enemies |
`\ and astound your friends.” —Mark Twain, _Following the Equator_ |
_o__) |
Ben Finney

bearoph...@lycos.com

unread,
Sep 1, 2008, 9:34:40 PM9/1/08
to
Ben Finney:

> I don't see any good reason (other than your familiarity with the D
> language) to use underscores for this purpose, and much more reason
> (readability, consistency, fewer arbitrary differences in syntax,
> perhaps simpler implementation) to use whitespace just as with string
> literals.

It's not just my familiarity, Ada language too uses underscore for
that purpose, I think, so there's a precedent, and Ada is a language
designed to always minimize programming errors, simple code mistakes
too.

And another thing to consider is that they so far have given me zero
problems...

Consider:

a = 125 125 125

a = 125, 125, 125

a = 125_125_125

For me the gestalt of the first line looks too much like the second
one, that is three separated things (note that this is relative to the
font you use, I am using a really good free font, Inconsolata, the
very best I have found to program (better than Consolas) that
separates things well). While in the third case the _ helps glue the
parts, creating a single gestalt to my eyes.

Note that it's not just a matter of font and familiarity, it's also a
matter of brains. Your brain may be different from mine, so it may be
possible that what's better for you isn't better for me. So in such
situation a popular voting may be the only way to choose. But for me
having spaces to split number literals in parts is _worse_ than not
having any way at all to split them. So I'm strong opposed to your
suggestion, so I may not even propose the PEP if lot of people agrees
with your tastes.

Bye,
bearophile

Ben Finney

unread,
Sep 1, 2008, 11:51:16 PM9/1/08
to
bearoph...@lycos.com writes:

> Ben Finney:
> > I don't see any good reason (other than your familiarity with the
> > D language) to use underscores for this purpose, and much more
> > reason (readability, consistency, fewer arbitrary differences in
> > syntax, perhaps simpler implementation) to use whitespace just as
> > with string literals.
>
> It's not just my familiarity, Ada language too uses underscore for
> that purpose, I think, so there's a precedent, and Ada is a language
> designed to always minimize programming errors, simple code mistakes
> too.

I would argue that the precedent, already within Python, for using a
space to separate pieces of a string literal, is more important than
precedents from other programming languages.

> Consider:
>
> a = 125 125 125
>
> a = 125, 125, 125
>
> a = 125_125_125
>
> For me the gestalt of the first line looks too much like the second
> one, that is three separated things

This is no more the case than for literal strings:

a = "spam" "eggs" "ham"

a = "spam", "eggs", "ham"

Yet this is already a valid way in Python to specify, respectively, a
single literal string and a literal tuple of strings.

> While in the third case the _ helps glue the parts, creating a
> single gestalt to my eyes.

To my eyes, it's needlessly hard to read, and looks too similar to an
identifier, not a literal. On the other hand, the spaces version is
easy to see as analogous to the same syntax rules that already exist
for strings.

> Note that it's not just a matter of font and familiarity, it's also a
> matter of brains. Your brain may be different from mine, so it may be
> possible that what's better for you isn't better for me. So in such
> situation a popular voting may be the only way to choose. But for me
> having spaces to split number literals in parts is _worse_ than not
> having any way at all to split them. So I'm strong opposed to your
> suggestion, so I may not even propose the PEP if lot of people agrees
> with your tastes.

Thanks for making your position clear.

--
\ “The WWW is exciting because Microsoft doesn't own it, and |
`\ therefore, there's a tremendous amount of innovation |
_o__) happening.” —Steve Jobs |
Ben Finney

Fredrik Lundh

unread,
Sep 2, 2008, 12:34:53 AM9/2/08
to pytho...@python.org
Ben Finney wrote:

> I would argue that the precedent, already within Python, for using a
> space to separate pieces of a string literal, is more important than
> precedents from other programming languages.

that precedent also tells us that the whitespace approach is a common
source of errors. taking an approach that's known to be error-prone and
applying it to more cases isn't necessarily a great way to build a
better language.

</F>

Message has been deleted

Steven D'Aprano

unread,
Sep 2, 2008, 2:10:51 AM9/2/08
to
On Tue, 02 Sep 2008 11:13:27 +1000, Ben Finney wrote:

> bearoph...@lycos.com writes:
>
>> For Python 2.7/3.1 I'd now like to write a PEP regarding the
>> underscores into the number literals, like: 0b_0101_1111, 268_435_456
>> etc.
>
> +1 on such a capability.
>
> -1 on underscore as the separator.
>
> When you proposed this last year, the counter-proposal was made
> <URL:http://groups.google.com/group/comp.lang.python/
msg/18123d100bba63b8?dmode=source>
> to instead use white space for the separator, exactly as one can now do
> with string literals.
>
> I don't see any good reason (other than your familiarity with the D
> language) to use underscores for this purpose, and much more reason
> (readability, consistency, fewer arbitrary differences in syntax,
> perhaps simpler implementation) to use whitespace just as with string
> literals.

At the risk of bike-shedding, I think that allowing arbitrary whitespace
between string literals is fine, because it aids readability to write
this:

do_something(
"first part of the string"
"another part of the string"
"yet more of the string"
"and a bit more"
"and so on..."
)

but I'm not sure that it is desirable to allow this:

do_something(
142325
93.8012
7113
)


-1/2 on arbitrary whitespace, +1/2 on a single space, and +0 on
underscores. If semi-colons didn't already have a use, I'd propose using
them to break up numeric literals:

14;232;593.801;271;13

--
Steven

Steven D'Aprano

unread,
Sep 2, 2008, 2:56:24 AM9/2/08
to
On Mon, 01 Sep 2008 22:11:13 -0700, Dennis Lee Bieber wrote:

> On Tue, 02 Sep 2008 13:51:16 +1000, Ben Finney
> <bignose+h...@benfinney.id.au> declaimed the following in
> comp.lang.python:


>
>> This is no more the case than for literal strings:
>>
>> a = "spam" "eggs" "ham"
>>
>> a = "spam", "eggs", "ham"
>>

> But... Literal string still have the " (or ') delimiters around the
> components. Such does not exist for you example with integers.
>
> Consider
>
> a = "spam, eggs", "ham"
> vs
> a = "spam, eggs" "ham"


Quite frankly, I think that it's a stretch to say that leaving out a
tuple delimiter is a problem with whitespace inside numeric literals.
That's hardly unique to whitespace:

atuple = 5,6,7,8
vs
atuple = 5,67,8

Look Ma, no whitespace!


But even if allowing whitespace inside numeric literals did create a new
avenue for errors which never existed before, it is a mistake to only
consider the downside without the upside. In my opinion, that would be
rather like declaring that the syntax for attribute access is a mistake
because you might do this:

x = MyClass()
xy = 4

instead of this:

x = MyClass()
x.y = 4

At some point the programmer has to take responsibility for typos instead
of blaming the syntax of the language. I agree that we should avoid
syntax that *encourages* typos, but I don't believe that allowing
whitespace inside numeric literals does that.

--
Steven

Nick Craig-Wood

unread,
Sep 2, 2008, 7:35:41 AM9/2/08
to
bearoph...@lycos.com <bearoph...@lycos.com> wrote:
> Ben Finney:
> > I don't see any good reason (other than your familiarity with the D
> > language) to use underscores for this purpose, and much more reason
> > (readability, consistency, fewer arbitrary differences in syntax,
> > perhaps simpler implementation) to use whitespace just as with string
> > literals.
>
> It's not just my familiarity, Ada language too uses underscore for
> that purpose, I think, so there's a precedent, and Ada is a language
> designed to always minimize programming errors, simple code mistakes
> too.

And perl also

*ducks*
--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

Peter Pearson

unread,
Sep 2, 2008, 12:02:33 PM9/2/08
to
On 02 Sep 2008 06:10:51 GMT, Steven D'Aprano wrote:
> At the risk of bike-shedding,
[snip]

(startled noises) It is a delight to find a reference to
that half-century-old essay (High Finance) by the wonderful
C. Northcote Parkinson, but how many readers will catch the
allusion?

--
To email me, substitute nowhere->spamcop, invalid->net.

Alan G Isaac

unread,
Sep 2, 2008, 1:18:58 PM9/2/08
to
> On 02 Sep 2008 06:10:51 GMT, Steven D'Aprano wrote:
>> At the risk of bike-shedding,
> [snip]


Peter Pearson wrote:
> (startled noises) It is a delight to find a reference to
> that half-century-old essay (High Finance) by the wonderful
> C. Northcote Parkinson, but how many readers will catch the
> allusion?


It is pretty common geek speek:
http://en.wikipedia.org/wiki/Color_of_the_bikeshed

Cheers,
Alan Isaac

Fredrik Lundh

unread,
Sep 2, 2008, 1:25:06 PM9/2/08
to pytho...@python.org
Peter Pearson wrote:

> (startled noises) It is a delight to find a reference to
> that half-century-old essay (High Finance) by the wonderful
> C. Northcote Parkinson, but how many readers will catch the
> allusion?

anyone that's been involved in open source on the development side for
more than, say, ten minutes.

http://www.bikeshed.com/

</F>

Peter Pearson

unread,
Sep 2, 2008, 2:54:19 PM9/2/08
to

Ah, the wondrous Wiki.

I thought I was a geek, for the past 40 years; but maybe its
time for me to be demoted to the dad on whose bookshelf
you'll find that old book.

Christian Heimes

unread,
Sep 2, 2008, 4:41:51 PM9/2/08
to pytho...@python.org
Fredrik Lundh wrote:

> Peter Pearson wrote:
>
>> (startled noises) It is a delight to find a reference to
>> that half-century-old essay (High Finance) by the wonderful
>> C. Northcote Parkinson, but how many readers will catch the
>> allusion?
>
> anyone that's been involved in open source on the development side for
> more than, say, ten minutes.

Indeed! Thus speaks the experienced developer -- effbot :)

On some mailing lists the bikeshed issue comes hand in hand with the
Dunning-Kruger-effect. [1] *sigh*

Christian

[1] http://en.wikipedia.org/wiki/Dunning-Kruger_effect

Ben Finney

unread,
Sep 2, 2008, 7:52:07 PM9/2/08
to
Peter Pearson <ppea...@nowhere.invalid> writes:

> I thought I was a geek, for the past 40 years; but maybe its time
> for me to be demoted to the dad on whose bookshelf you'll find that
> old book.

Once a geek, always a geek. You either stay sharp or get sloppy, but
you never stop being a geek :-)

--
\ “The best ad-libs are rehearsed.” —Graham Kennedy |
`\ |
_o__) |
Ben Finney

Patrick Maupin

unread,
Sep 2, 2008, 10:31:39 PM9/2/08
to

On Sep 2, 6:35 am, Nick Craig-Wood <n...@craig-wood.com> wrote:

> bearophileH...@lycos.com <bearophileH...@lycos.com> wrote:
> >  It's not just my familiarity, Ada language too uses underscore for
> >  that purpose, I think, so there's a precedent, and Ada is a language
> >  designed to always minimize programming errors, simple code mistakes
> >  too.
>
> And perl also

Add Verilog to that list. The ability to embed underscores in numeric
literals, which the parser discards, is sometimes very useful in
hardware description, especially when dealing with binary bit vectors
which can sometimes be 32 bits or more long.

Underscores are great. I have actually wished for this in Python
myself, for those cases when I am doing binary. Spaces, not so much
-- as others have pointed out, this is error prone, partly because
spaces are "light weight" visually, and partly because the parser does
not currently distinguish between different kinds of whitespace. I
can't count how often I've forgotten a trailing comma on a line of
items.

To the complaints about the underscores getting in the way -- if the
number is short, you don't need either underscores or spaces, and if
the number is long, it's much easier to count underscores to find your
position than it is to count spaces. Also, on long numbers (where
this is most useful), the issue with mistaking a number for an
identifier is much less likely to happen in real life.

I think the issue of location sensitivity has already been flogged
enough, but I will give it one last hit -- long numbers, where this is
most useful, are often encountered in domain-specific mini languages,
where the number of digits in each portion of a number might have some
specific meaning. If the proposal were restricted to "once every 3
digits" or something similar, it would not be worth doing at all.

+1 on the original proposal.

Pat

Ben Finney

unread,
Sep 2, 2008, 11:14:03 PM9/2/08
to
Ben Finney <bignose+h...@benfinney.id.au> writes:

> I don't see any good reason (other than your familiarity with the D
> language) to use underscores for this purpose, and much more reason
> (readability, consistency, fewer arbitrary differences in syntax,
> perhaps simpler implementation) to use whitespace just as with string
> literals.

Another reason in support of spaces (rather than underscores) to
separate digit groups: it's the only separator that follows the SI
standard for representing numbers:

… for numbers with many digits the digits may be divided into
groups of three by a thin space, in order to facilitate reading.
Neither dots nor commas are inserted in the spaces between groups
of three.

<URL:http://www.bipm.org/en/si/si_brochure/chapter5/5-3-2.html#5-3-4>

This isn't binding upon Python, of course. However, it should be a
consideration in choosing what separator convention to follow.

--
\ “If you ever catch on fire, try to avoid seeing yourself in the |
`\ mirror, because I bet that's what REALLY throws you into a |
_o__) panic.” —Jack Handey |
Ben Finney

Grant Edwards

unread,
Sep 3, 2008, 10:14:03 AM9/3/08
to

That paper is really very interesting -- it explains a lot of
what one sees in corporate life.

--
Grant Edwards grante Yow! I just remembered
at something about a TOAD!
visi.com

Grant Edwards

unread,
Sep 3, 2008, 10:15:05 AM9/3/08
to
On 2008-09-03, Ben Finney <bignose+h...@benfinney.id.au> wrote:
> Ben Finney <bignose+h...@benfinney.id.au> writes:
>
>> I don't see any good reason (other than your familiarity with the D
>> language) to use underscores for this purpose, and much more reason
>> (readability, consistency, fewer arbitrary differences in syntax,
>> perhaps simpler implementation) to use whitespace just as with string
>> literals.
>
> Another reason in support of spaces (rather than underscores) to
> separate digit groups: it's the only separator that follows the SI
> standard for representing numbers:
>
> ??? for numbers with many digits the digits may be divided into

> groups of three by a thin space, in order to facilitate reading.
> Neither dots nor commas are inserted in the spaces between groups
> of three.

But my keyboard doesn't _have_ a thin-space key!

--
Grant Edwards grante Yow! One FISHWICH coming
at up!!
visi.com

Grant Edwards

unread,
Sep 3, 2008, 11:01:28 AM9/3/08
to
On 2008-09-03, Ben Finney <bignose+h...@benfinney.id.au> wrote:

> Another reason in support of spaces (rather than underscores) to
> separate digit groups: it's the only separator that follows the SI
> standard for representing numbers:
>

> ??? for numbers with many digits the digits may be divided into


> groups of three by a thin space, in order to facilitate reading.
> Neither dots nor commas are inserted in the spaces between groups
> of three.
>
> <URL:http://www.bipm.org/en/si/si_brochure/chapter5/5-3-2.html#5-3-4>
>
> This isn't binding upon Python, of course. However, it should
> be a consideration in choosing what separator convention to
> follow.

I don't think that standard is applicable. It's a typesetting
style guide. It also references superscripts, half-high
centered dots, the "cross" multiplication symbol, the degree
symbol and tons of other things which, like the thin space,
can't be represented using the most common text encodings.

It's quite explicit that the separator is a thin space, which
one presumes would not be considered "white space" for
tokenizing purposes. We don't have a thin-space, and allowing
spaces within numerical literals would throw a major
monkey-wrench into a lot of things (like data files whose
values are separated by a single space).

I suppose you could have a different format for literals in
program source and for the operands to int() and float(), but
that feels like a bad idea.

--
Grant Edwards grante Yow! Pardon me, but do you
at know what it means to be
visi.com TRULY ONE with your BOOTH!

bearoph...@lycos.com

unread,
Sep 3, 2008, 11:37:07 AM9/3/08
to
Ben Finney:

> … for numbers with many digits the digits may be divided into
> groups of three by a thin space, in order to facilitate reading.
> Neither dots nor commas are inserted in the spaces between groups
> of three.
> <URL:http://www.bipm.org/en/si/si_brochure/chapter5/5-3-2.html#5-3-4>
> This isn't binding upon Python, of course. However, it should be a
> consideration in choosing what separator convention to follow.

It confirms what I say :-) A thin space doesn't break the gestalt of
the number, while a normal space, especially if you use a not
proportional font with good readability (and characters well spaced)
breaks the single gestalt of the number.

Bye,
bearophile

Cliff

unread,
Sep 3, 2008, 12:00:55 PM9/3/08
to

Also a source of mental complexity. The two proposals (whitespace vs.
underscores) are not just a question of what character to use, it's a
question of whether to create an integer (and possibly other numeric
type) literal that allows delimiters, or to allow separate literals to
be concatenated. In the second case, which of the following would be
proper syntax?

0b1001 0110
0b1001 0b0110

In the first case, the second literal, on its own, is an octal
literal, but we expect it to behave as a binary literal. In the
second case, we have more consistency with string literals (with which
you can do this: "abc" r'''\def''') but we lose the clarity of using
the concatenation to make the whole number more readable.

On the other hand, 0b1001_0110 has one clear meaning. It is one
literal that stands alone. I'm not super thrilled about the look (or
keyboard location) of the underscore, but it's better than anything
else that is available, and works within a single numeric literal.
For this reason I am +0 on the underscore and -1 on the space.

Alexander Schmolck

unread,
Sep 3, 2008, 8:22:22 PM9/3/08
to
Ben Finney <bignose+h...@benfinney.id.au> writes:

> bearoph...@lycos.com writes:
>
>> For Python 2.7/3.1 I'd now like to write a PEP regarding the
>> underscores into the number literals, like: 0b_0101_1111, 268_435_456
>> etc.
>
> +1 on such a capability.
>
> -1 on underscore as the separator.
>
> When you proposed this last year, the counter-proposal was made
> <URL:http://groups.google.com/group/comp.lang.python/msg/18123d100bba63b8?dmode=source>
> to instead use white space for the separator, exactly as one can now
> do with string literals.
>
> I don't see any good reason (other than your familiarity with the D
> language) to use underscores for this purpose, and much more reason
> (readability, consistency, fewer arbitrary differences in syntax,
> perhaps simpler implementation) to use whitespace just as with string
> literals.

It seems to me that the right choice for thousands seperator is the
apostrophe. It doesn't suffer from brittleness and editing problems as
whitespace does (e.g. consider filling and auto-line breaking). It is already
used in some locales for this function but never for the decimal point (so no
ambiguity, unlike '.' and ','). It also reads well, unlike the underscore
which is visually obstrusive and ugly (compare 123'456'890 to 123_456_789).

Having said that, I'd still have 123_456_789 over 123456789 any day.

It's amazing that after over half a century of computing we still can't denote
numbers with more than 4 digits readably in the vast majority of contexts.

'as

bearoph...@lycos.com

unread,
Sep 3, 2008, 9:27:05 PM9/3/08
to
Alexander Schmolck:

> It also reads well, unlike the underscore
> which is visually obstrusive and ugly (compare 123'456'890 to 123_456_789).

I like that enough, in my language that symbol is indeed the standard
one to separate thousands, in large numbers. It's light, looks
natural, and as you say it's visually unobstrusive.

But in my language ' means just thousands, so it's used only in blocks
of 3 digits, not in blocks of any length, so something like this looks
a bit strange/wrong:

0b'0000'0000

While the underscore has no meaning, so it can be used in both
situations.

A problem is that '1234' in Python is a string, so using ' in numbers
looks a bit dangerous to me (and my editor will color those numbers as
alternated strings, I think).

Note that for other people the ' denotes feet, while in my language it
denotes minutes, while I think the underscore has no meaning.

So for me the underscore is better :-)

Bye,
bearophile

Steven D'Aprano

unread,
Sep 3, 2008, 9:30:39 PM9/3/08
to
On Thu, 04 Sep 2008 01:22:22 +0100, Alexander Schmolck wrote:

> It seems to me that the right choice for thousands seperator is the
> apostrophe.

You mean the character already used as a string delimiter?

--
Steven

Delaney, Timothy (Tim)

unread,
Sep 3, 2008, 10:43:36 PM9/3/08
to pytho...@python.org
Steven D'Aprano wrote:

Hey - I just found a new use for the backtick!

123`456`7890
0b`1001`0110

Note: Guido has stated that the backtick will *not* be given a new
meaning in any future version of Python ...

Tim Delaney

Alexander Schmolck

unread,
Sep 4, 2008, 5:37:02 AM9/4/08
to

Yup. No ambiguity or problem here; indeed unlike space seperation or '_' it
would work straighforwardly as a syntax extension in pretty much any
programming language I can think as well as end-user output (I think that
writing e.g. 1'000'000 on a website would be perfectly acceptable; unlike
1_000_000).

'as

Alexander Schmolck

unread,
Sep 4, 2008, 5:47:07 AM9/4/08
to
bearoph...@lycos.com writes:

> A problem is that '1234' in Python is a string, so using ' in numbers
> looks a bit dangerous to me (and my editor will color those numbers as
> alternated strings, I think).

Yeah, editors, especially those with crummy syntax highlighting (like emacs)
might get it wrong. This should be easy enough to fix though. Indeed unlike
raw and tripplequoted strings which were adopted without major hitches this
new syntax wouldn't have any bearing on what's a valid string.

'as

Fredrik Lundh

unread,
Sep 4, 2008, 10:53:53 AM9/4/08
to pytho...@python.org
Alexander Schmolck wrote:

>> A problem is that '1234' in Python is a string, so using ' in numbers
>> looks a bit dangerous to me (and my editor will color those numbers as
>> alternated strings, I think).
>
> Yeah, editors, especially those with crummy syntax highlighting (like emacs)
> might get it wrong. This should be easy enough to fix though.

instead of forcing all editor developers to change their Python modes to
allow you to use a crude emulation of a typographic convention in your
Python source code, why not ask a few of them to implement the correct
typographic convention (thin spaces) in their Python mode?

</F>

Alan G Isaac

unread,
Sep 6, 2008, 7:30:03 PM9/6/08
to
> bearoph...@lycos.com writes:
>
>> For Python 2.7/3.1 I'd now like to write a PEP regarding the
>> underscores into the number literals, like: 0b_0101_1111, 268_435_456
>> etc.
>
> +1 on such a capability.
>
> -1 on underscore as the separator.


On 9/1/2008 9:13 PM Ben Finney apparently wrote:
> When you proposed this last year, the counter-proposal was made
> <URL:http://groups.google.com/group/comp.lang.python/msg/18123d100bba63b8?dmode=source>
> to instead use white space for the separator, exactly as one can now
> do with string literals.

Yuck.
Repeating a mistake means two mistakes.

But I would hate less the use of nobreak spaces,
since any decent editor can reveal them.

Alan Isaac

Steven D'Aprano

unread,
Sep 7, 2008, 12:04:29 AM9/7/08
to
On Sat, 06 Sep 2008 23:30:03 +0000, Alan G Isaac wrote:

>> bearoph...@lycos.com writes:
>>
>>> For Python 2.7/3.1 I'd now like to write a PEP regarding the
>>> underscores into the number literals, like: 0b_0101_1111, 268_435_456
>>> etc.
>>
>> +1 on such a capability.
>>
>> -1 on underscore as the separator.
>
>
> On 9/1/2008 9:13 PM Ben Finney apparently wrote:
>> When you proposed this last year, the counter-proposal was made
>> <URL:http://groups.google.com/group/comp.lang.python/
msg/18123d100bba63b8?dmode=source>
>> to instead use white space for the separator, exactly as one can now do
>> with string literals.
>
> Yuck.
> Repeating a mistake means two mistakes.

A lot of us don't think that white space between string literals was a
mistake. A lot of us consider it a desirable feature.


> But I would hate less the use of nobreak spaces, since any decent editor
> can reveal them.

How do you type a nobreak space?

It's also probably a bad idea for Python the language to depend on
developers using "a decent editor", since many people disagree on what a
decent editor is, and many other people don't have access to whatever you
consider "a decent editor".

--
Steven

Tom Harris

unread,
Sep 8, 2008, 6:32:29 PM9/8/08
to pytho...@python.org
On Thu, Sep 4, 2008 at 10:22 AM, Alexander Schmolck
<a.sch...@gmail.com> wrote:
>
> It's amazing that after over half a century of computing we still can't denote
> numbers with more than 4 digits readably in the vast majority of contexts.
>

I agree. So did Forth's early designers. That is why Forth's number
parser considers a word that starts with a number and has embedded
punctuation to be a 32 bit integer, and simply ignores the
punctuation. I haven't used Forth in years, but it seems a neat
solution to the problem of decoding a long string of numbers: let the
user put in whatever they want, the parser ignores it. I usually used
a comma (with no surrounding whitespace of course), but it was your
choice. You could also do this in whatever base you were working in,
so you could punctuate a 32 bit hex number to correspond to the bit
fields inside it. Of course not applicable to Python.

--

Tom Harris <celephicus(AT)gmail(DOT)com>

glen stark

unread,
Sep 9, 2008, 3:31:12 AM9/9/08
to
On Tue, 09 Sep 2008 08:32:29 +1000, Tom Harris wrote:

> I agree. So did Forth's early designers. That is why Forth's number
> parser considers a word that starts with a number and has embedded
> punctuation to be a 32 bit integer, and simply ignores the punctuation.
> I haven't used Forth in years, but it seems a neat solution to the
> problem of decoding a long string of numbers: let the user put in
> whatever they want, the parser ignores it. I usually used a comma (with
> no surrounding whitespace of course), but it was your choice. You could
> also do this in whatever base you were working in, so you could
> punctuate a 32 bit hex number to correspond to the bit fields inside it.
> Of course not applicable to Python.


That sounds like a great idea, except I'd specify non-period (.)
punctuation, so it would go for floating points as well.

Is there a language design guru who can say why inputs like 123,456.00
couldn't be handles as above? the only problem I can see is an abiguity
in argument lists (e.g. mult(2,4) ) which could be handled by the
inclusion of whitespace.

0 new messages