[Python-ideas] Improve readability of long numeric literals

37 views
Skip to first unread message

Manuel Cerón

unread,
Feb 9, 2016, 4:41:08 PM2/9/16
to Python-Ideas
Hi everyone!

Sometimes it's hard to read long numbers. For example:

>>> opts.write_buffer_size = 67108864

Some languages (Ruby, Perl, Swift) allow the use of underscores in numeric literals, which are ignored. They are typically used as thousands separators. The example above would look like this:

>>> opts.write_buffer_size = 67_108_864

Which helps to quickly identify that this is around 67 million.

Another option is to use spaces instead of underscores:

>>> opts.write_buffer_size = 67 108 864

This has two advantages: 1. is analog to the way string literals work, which are concatenated if put next to each other. 2. spaces are already used as thousands separator in many european languages [1].

The disadvantage is that, as far as I known, no other languages do this.

I have seen some old discussions around this, but nothing on this list or a PEP. With Python being use more and more for scientific and numeric computation, this is a small change that will help with readability a lot. And, as far as I can tell, it doesn't break compatibility in any way.

Thoughts?

Manuel.

Ian Kelly

unread,
Feb 9, 2016, 5:39:15 PM2/9/16
to python...@python.org
On Tue, Feb 9, 2016 at 2:40 PM, Manuel Cerón <cero...@gmail.com> wrote:
> Another option is to use spaces instead of underscores:
>
>>>> opts.write_buffer_size = 67 108 864
>
> This has two advantages: 1. is analog to the way string literals work, which
> are concatenated if put next to each other. 2. spaces are already used as
> thousands separator in many european languages [1].
>
> The disadvantage is that, as far as I known, no other languages do this.

Another disadvantage to using spaces is that it could mask an
inadvertently omitted operator that previously would have been a
SyntaxError.
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Ethan Furman

unread,
Feb 9, 2016, 6:02:34 PM2/9/16
to python...@python.org
On 02/09/2016 01:40 PM, Manuel Cerón wrote:

> Sometimes it's hard to read long numbers. For example:
>
> >>> opts.write_buffer_size = 67108864
>
> Some languages (Ruby, Perl, Swift) allow the use of underscores in
> numeric literals, which are ignored. They are typically used as
> thousands separators. The example above would look like this:
>
> >>> opts.write_buffer_size = 67_108_864

As I recall, a number of years ago we had this discussion and Guido
approved the idea. The only email I could locate at the moment, though,
shows his support of the idea, but not outright approval. [1]

I dare say if somebody submitted a patch it would fare well. (As in: be
accepted, not gone forever.)

--
~Ethan~

[1] https://mail.python.org/pipermail/python-ideas/2011-May/010157.html

Guido van Rossum

unread,
Feb 9, 2016, 6:18:44 PM2/9/16
to Ethan Furman, Python-Ideas
Indeed, "123 456" is a no-no, but "123_456" sound good. (Not sure
about "12_34_56" but there are probably use cases for that too.)

--
--Guido van Rossum (python.org/~guido)

Oscar Benjamin

unread,
Feb 9, 2016, 6:28:10 PM2/9/16
to python-ideas

On 9 Feb 2016 23:18, "Guido van Rossum" <gu...@python.org> wrote:
>
> Indeed, "123 456" is a no-no, but "123_456" sound good. (Not sure
> about "12_34_56" but there are probably use cases for that too.)
>

It would be useful for hex literals. There are other more confusing possibilities such as 1_._0_e_-_1_0.

--
Oscar

Rob Cliffe

unread,
Feb 9, 2016, 6:49:14 PM2/9/16
to python...@python.org


On 09/02/2016 23:17, Guido van Rossum wrote:
> Indeed, "123 456" is a no-no, but "123_456" sound good. (Not sure
> about "12_34_56" but there are probably use cases for that too.)
Looks to me like a bank sort code. Probably not Guido's, to judge by
his comment.

Guido van Rossum

unread,
Feb 9, 2016, 6:59:38 PM2/9/16
to Rob Cliffe, Python-Ideas
I don't know what a bank sort code is (maybe a UK thing?)

FWIW there are some edge cases to be decided: is _123 valid? or 123_?
or 123__456?
--
--Guido van Rossum (python.org/~guido)

Ben Finney

unread,
Feb 9, 2016, 7:07:10 PM2/9/16
to python...@python.org
Ian Kelly <ian.g...@gmail.com> writes:

> On Tue, Feb 9, 2016 at 2:40 PM, Manuel Cerón <cero...@gmail.com> wrote:
> > Another option is to use spaces instead of underscores:
> >
> >>>> opts.write_buffer_size = 67 108 864
> >

> > […]


> > The disadvantage is that, as far as I known, no other languages do
> > this.
>
> Another disadvantage to using spaces is that it could mask an
> inadvertently omitted operator that previously would have been a
> SyntaxError.

The exact same fact – that a proposed new syntax was previously a syntax
error – is commonly presented as a positive. We know there are no valid
Python programs already using the construct to mean something else.

So I don't think it's reasonable to present that now as though it were
negative.

(good sigmonster, have a cookie)

--
\ “… correct code is great, code that crashes could use |
`\ improvement, but incorrect code that doesn’t crash is a |
_o__) horrible nightmare.” —Chris Smith, 2008-08-22 |
Ben Finney

Guido van Rossum

unread,
Feb 9, 2016, 7:17:27 PM2/9/16
to Ben Finney, Python-Ideas
On Tue, Feb 9, 2016 at 4:06 PM, Ben Finney <ben+p...@benfinney.id.au> wrote:
> Ian Kelly <ian.g...@gmail.com> writes:
>
>> On Tue, Feb 9, 2016 at 2:40 PM, Manuel Cerón <cero...@gmail.com> wrote:
>> > Another option is to use spaces instead of underscores:
>> >
>> >>>> opts.write_buffer_size = 67 108 864
>> >
>> > […]
>> > The disadvantage is that, as far as I known, no other languages do
>> > this.
>>
>> Another disadvantage to using spaces is that it could mask an
>> inadvertently omitted operator that previously would have been a
>> SyntaxError.
>
> The exact same fact – that a proposed new syntax was previously a syntax
> error – is commonly presented as a positive. We know there are no valid
> Python programs already using the construct to mean something else.
>
> So I don't think it's reasonable to present that now as though it were
> negative.

I think you misunderstand. The argument (which I agree with) is that
the syntax error was considered beneficial, since it would catch
common typos -- so that we're loath to make that valid code.

--
--Guido van Rossum (python.org/~guido)

Oscar Benjamin

unread,
Feb 9, 2016, 7:18:02 PM2/9/16
to Guido van Rossum, Python-Ideas
On 9 February 2016 at 23:51, Guido van Rossum <gu...@python.org> wrote:
> I don't know what a bank sort code is (maybe a UK thing?)

It is a UK thing. It identifies the bank you opened your account with.

> FWIW there are some edge cases to be decided: is _123 valid? or 123_?
> or 123__456?

_123 is currently a valid identifier:

>>> _123 = 1
>>> _123
1

123_ is not. There's no good reason to allow either though. If the
purpose is to separate the digits for clarity then the underscore
doesn't need to be at the beginning or the end.

--
Oscar

Manuel Cerón

unread,
Feb 9, 2016, 7:23:22 PM2/9/16
to Guido van Rossum, Python-Ideas
On Wed, Feb 10, 2016 at 12:51 AM, Guido van Rossum <gu...@python.org> wrote:
I don't know what a bank sort code is (maybe a UK thing?)

FWIW there are some edge cases to be decided: is _123 valid? or 123_?
or 123__456?

_123 is a valid identifier name, so no. For consistency, I think the leading underscore should be out too.

Multiple underscores in the middle might be useful for separating millions and thousands:

700__000_000

but perhaps it's too much.

MRAB

unread,
Feb 9, 2016, 7:46:32 PM2/9/16
to python...@python.org
On 2016-02-09 23:27, Oscar Benjamin wrote:
> On 9 Feb 2016 23:18, "Guido van Rossum" <gu...@python.org
> <mailto:gu...@python.org>> wrote:
> >
> > Indeed, "123 456" is a no-no, but "123_456" sound good. (Not sure
> > about "12_34_56" but there are probably use cases for that too.)
> >
>
> It would be useful for hex literals. There are other more confusing
> possibilities such as 1_._0_e_-_1_0.
>
The Ada programming language allows underscores in numerals, but
requires there to be a digit on both sides of the underscore.

Ethan Furman

unread,
Feb 9, 2016, 7:55:01 PM2/9/16
to python...@python.org
On 02/09/2016 04:06 PM, Ben Finney wrote:
> Ian Kelly writes:

>> Another disadvantage to using spaces is that it could mask an
>> inadvertently omitted operator that previously would have been a
>> SyntaxError.
>
> The exact same fact – that a proposed new syntax was previously a
> syntax error – is commonly presented as a positive. We know there are
> no valid Python programs already using the construct to mean
> something else.
>
> So I don't think it's reasonable to present that now as though it were
> negative.

If the SyntaxError is also a common mistake, then suddenly having it be
working, but wrong, code is a bad thing.


> (good sigmonster, have a cookie)

“… correct code is great, code that crashes could use |
improvement, but incorrect code that doesn’t crash is a |

horrible nightmare.” —Chris Smith, 2008-08-22 |

Yes, good sigmonster - "incorrect code that doesn't crash is a horrible
nightmare" -- which is what could happen when a SyntaxError suddenly
becomes syntacticly correct.

--
~Ethan~

Steven D'Aprano

unread,
Feb 9, 2016, 7:55:16 PM2/9/16
to python...@python.org
On Wed, Feb 10, 2016 at 12:16:43AM +0000, Oscar Benjamin wrote:

> _123 is currently a valid identifier:
>
> >>> _123 = 1
> >>> _123
> 1
>
> 123_ is not. There's no good reason to allow either though. If the
> purpose is to separate the digits for clarity then the underscore
> doesn't need to be at the beginning or the end.

Agreed.

Disallow leading and trailing underscores, otherwise allow and ignore
any number of underscores in integer literals so that all of these are
legal:

123_456_789
0x1234_ABCD
0b1111_0000_1010_0101
0o12_34

For avoidance of doubt, there must be at least one digit before the
first underscore. These are not allowed:

-_123_456
+_123_456

(actually, they are allowed, since they're legal identifiers).

Consecutive underscores will be allowed:

1234____5678

but the docs (PEP 8?) should say "don't do that". Likewise for excessive
underscores:

1_2_3_4_5_6_7_8_9_0

These sorts of abuses are a style issue, not a syntax issue.

Floats are more complex:

123_456.000_001e-23

looks okay to me, but what about this?

123_456_._000_001_e_-_23


I think that's ugly. Should we restrict underscores to being only
between digits? Or just call that a style issue too?



--
Steve

Ethan Furman

unread,
Feb 9, 2016, 8:08:22 PM2/9/16
to Python-Ideas
To excerpt the email I referred to earlier:

Guido said:
----------
> Fine points about _ in floats: IMO the _ should be allowed to appear
> between any two digits, or between the last digit and the 'e' in the
> exponent, or between the 'e' and a following digit. But not adjacent
> to the '.' or to the '+' or '-' in the exponent. So 3.141_593 yes,
> 3_.14 no.
>
> Fine points about _ in bin/oct/hex literals: 0x_dead_beef yes, 0_xdeadbeef no.
>
> (The overall rule seems to be that it must be internal to alphanumeric
> strings, except that leading 0x, 0o or 0b must not be separated --
> somehow I find 0_x_dead_beef would be a disservice to human readers.)

and in talking about int() accepting underscored inputs:

> It seems entirely harmless here. Also for float().

--
~Ethan~

Andrew Barnert via Python-ideas

unread,
Feb 9, 2016, 8:36:31 PM2/9/16
to Manuel Cerón, Python-Ideas
One possible objection that nobody's raised:

Separating groups of three is all well and good; to my western eyes, 10_000_000_000 is obviously 10 billion.*

But someone from China is likely to use groups of four, and 100_0000_0000 is not obviously anything--my first thought is around 100 billion, but that can't be right, so I have to count up the digits.

I still think this is a good suggestion, because 100000000000 is even more useless to me as 100_0000_0000, and far more likely to be concealing a typo. I just wanted to make sure everyone knew the issue.


* If you're going to say "no, it's a milliard, you stupid American", go back to the early 70s, and bring your non-decimal currency with you. It's billion in English, and has been for 40+ years. Leave the fighting to the languages where it's ambiguous, like Portuguese or Finnish.

Sent from my iPhone
_______________________________________________

Oscar Benjamin

unread,
Feb 9, 2016, 8:37:07 PM2/9/16
to Ethan Furman, Python-Ideas
On 10 February 2016 at 01:07, Ethan Furman <et...@stoneleaf.us> wrote:
>
> and in talking about int() accepting underscored inputs:
>
>> It seems entirely harmless here. Also for float().

I don't agree with either of those. Syntax accepted by int() is less
permissive than for int literals (e.g. int('0x1')) which is good
because int is often used to process data form external sources. In
this vain I'm not sure how I feel about int accepting non-ascii
characters - perhaps there should be a separate int.from_ascii
function for this purpose but that's a different subject.

Having float() accept underscored inputs violates IEEE754. That
doesn't mean it's impossible but why bother?

--
Oscar

Chris Angelico

unread,
Feb 9, 2016, 8:39:36 PM2/9/16
to Python-Ideas
On Wed, Feb 10, 2016 at 12:35 PM, Oscar Benjamin
<oscar.j....@gmail.com> wrote:
> On 10 February 2016 at 01:07, Ethan Furman <et...@stoneleaf.us> wrote:
>>
>> and in talking about int() accepting underscored inputs:
>>
>>> It seems entirely harmless here. Also for float().
>
> I don't agree with either of those. Syntax accepted by int() is less
> permissive than for int literals (e.g. int('0x1')) which is good
> because int is often used to process data form external sources.

+1. Keep int() as it is, so we don't get weird stuff happening and
causing confusion.

But I'm +1 on allowing underscores between digits (not straight after
a decimal point in a float, though, as it looks like attribute
access).

ChrisA

Andrew Barnert via Python-ideas

unread,
Feb 9, 2016, 8:46:00 PM2/9/16
to MRAB, python...@python.org
On Feb 9, 2016, at 16:45, MRAB <pyt...@mrabarnett.plus.com> wrote:
>
> The Ada programming language allows underscores in numerals, but requires there to be a digit on both sides of the underscore.

I think Swift, Ruby, and most other languages allow runs of multiple underscores, and even trailing underscores.

It seems like it's a lot easier to lex "digit (digit-or-underscore)*", "0x (hex-digit-or-underscore)+", etc. than to try to add restrictions. And not just for Python itself, but for anyone who wants to write a Python tokenizer or parser. And it's a shorter rule to document, and easier to remember. So, unless there's a really compelling reason for an extra restriction, I think it's better to leave the restrictions out (and make them style issues).

Joshua Landau

unread,
Feb 9, 2016, 9:20:04 PM2/9/16
to python-ideas
On 10 February 2016 at 00:45, MRAB <pyt...@mrabarnett.plus.com> wrote:
> The Ada programming language allows underscores in numerals, but requires
> there to be a digit on both sides of the underscore.

+1 to this.

Nobody's given an important use-case for any of the odd cases (doubled
or trailing underscores, or those oddly placed in floats) but there
are legitimate reasons to want groups of different sizes, especially
with non-decimal bases. Saying a digit is needed either side is both
obvious and sufficient.

Chris Angelico

unread,
Feb 9, 2016, 9:46:43 PM2/9/16
to python-ideas
On Wed, Feb 10, 2016 at 1:18 PM, Joshua Landau <jos...@landau.ws> wrote:
> On 10 February 2016 at 00:45, MRAB <pyt...@mrabarnett.plus.com> wrote:
>> The Ada programming language allows underscores in numerals, but requires
>> there to be a digit on both sides of the underscore.
>
> +1 to this.
>
> Nobody's given an important use-case for any of the odd cases (doubled
> or trailing underscores, or those oddly placed in floats) but there
> are legitimate reasons to want groups of different sizes, especially
> with non-decimal bases. Saying a digit is needed either side is both
> obvious and sufficient.

I'm not sure how that would get encoded into the grammar, but
certainly that's the advice I would recommend for style guides.

ChrisA

Joshua Landau

unread,
Feb 9, 2016, 10:34:18 PM2/9/16
to python-ideas
On 10 February 2016 at 02:46, Chris Angelico <ros...@gmail.com> wrote:
> On Wed, Feb 10, 2016 at 1:18 PM, Joshua Landau <jos...@landau.ws> wrote:
>> On 10 February 2016 at 00:45, MRAB <pyt...@mrabarnett.plus.com> wrote:
>>> The Ada programming language allows underscores in numerals, but requires
>>> there to be a digit on both sides of the underscore.
>>
>> +1 to this.
>
> I'm not sure how that would get encoded into the grammar, but
> certainly that's the advice I would recommend for style guides.

You do the BNF equivalent of turning \d\d* to \d(\d|_\d)*. It should
be really simple.

Stephen J. Turnbull

unread,
Feb 9, 2016, 11:12:11 PM2/9/16
to Andrew Barnert, Python-Ideas
Andrew Barnert via Python-ideas writes:

> One possible objection that nobody's raised:
>
> Separating groups of three is all well and good; to my western
> eyes, 10_000_000_000 is obviously 10 billion.*
>
> But someone from China is likely to use groups of four, and
> 100_0000_0000 is not obviously anything--my first thought is around
> 100 billion, but that can't be right, so I have to count up the
> digits.

Not a problem for me any more, that's quite obviously "100-oku" (and
if the unit is "yen", that roughly converts to USD100 million, very
convenient). I suspect others will learn equally quickly if this
becomes at all common and they need to read "Chinese" or "Japanese"
code where it's used.

Anyway, for me (YMMV) this really is a for-the-writer readability
problem. Personally I can't imagine using it except interactively.
If I think a number needs checking, I make it a named value, and often
computed. (Eg, the OP's example would be 1 << 26).

Dan Sommers

unread,
Feb 9, 2016, 11:30:44 PM2/9/16
to python...@python.org
On Wed, 10 Feb 2016 11:50:41 +1100, Steven D'Aprano wrote:

> Floats are more complex:
>
> 123_456.000_001e-23

Floats are less complex. Complexes are more complex:

22j

123_456.000_001e-23j

:-)

> looks okay to me, but what about this?
>
> 123_456_._000_001_e_-_23

Agreed: that is no longer more readable.

All I can think of for a use case for leading underscores would be to
line up values of different lengths:

x4 = __4
x5 = _33
x6 = __4
x7 = 220

milli = 1e__-3
micro = 1e__-6
nano = 1e__-9
pico = 1e_-12
femto = 1e_-15

But PEP 8 already suggests otherwise:

milli = 1e-3
micro = 1e-6
nano = 1e-9
pico = 1e-12
femto = 1e-15

Guido van Rossum

unread,
Feb 9, 2016, 11:53:59 PM2/9/16
to Dan Sommers, python...@python.org
Let me show you how silly this looked to me... 
--
--Guido (mobile)

Victor Stinner

unread,
Feb 10, 2016, 3:53:56 AM2/10/16
to Manuel Cerón, Python-Ideas
2016-02-09 22:40 GMT+01:00 Manuel Cerón <cero...@gmail.com>:
> Hi everyone!
>
> Sometimes it's hard to read long numbers. For example:
>
>>>> opts.write_buffer_size = 67108864
>
> Some languages (Ruby, Perl, Swift) allow the use of underscores in numeric
> literals, which are ignored.

Yeah, I saw this (in Perl) and I think that it's a good idea.


> Another option is to use spaces instead of underscores:
>
>>>> opts.write_buffer_size = 67 108 864

It sounds error-prone to me. It's common that I forget a comma in a tuple like :

x = (1,
2
3)

I expect a SyntaxError, not x = (1, 23).

It can also occur on a single line: x = (1, 2 3)

> I have seen some old discussions around this, but nothing on this list or a
> PEP. With Python being use more and more for scientific and numeric
> computation, this is a small change that will help with readability a lot.
> And, as far as I can tell, it doesn't break compatibility in any way.

I'm not sure that a PEP is required. We just have to clarify where
underscore are allowed exactly. See the discussion above they are
corner cases on float and complex numbers.

Victor

Paul Moore

unread,
Feb 10, 2016, 4:44:27 AM2/10/16
to Joshua Landau, python-ideas
On 10 February 2016 at 03:32, Joshua Landau <jos...@landau.ws> wrote:
> On 10 February 2016 at 02:46, Chris Angelico <ros...@gmail.com> wrote:
>> On Wed, Feb 10, 2016 at 1:18 PM, Joshua Landau <jos...@landau.ws> wrote:
>>> On 10 February 2016 at 00:45, MRAB <pyt...@mrabarnett.plus.com> wrote:
>>>> The Ada programming language allows underscores in numerals, but requires
>>>> there to be a digit on both sides of the underscore.
>>>
>>> +1 to this.
>>
>> I'm not sure how that would get encoded into the grammar, but
>> certainly that's the advice I would recommend for style guides.
>
> You do the BNF equivalent of turning \d\d* to \d(\d|_\d)*. It should
> be really simple.

It's possible to get it right, but I think keeping the grammar simple
and making the rest a style issue is the best approach. We don't
disallow 0x6AfEbbC for example, but mixing case like that is ugly to
read too.

(I was originally going to say "Under that change, "23" becomes
invalid" but then I realised I'd misread the grammar. Which sort of
makes my point that we want to keep the rules simple :-))

Paul

Guido van Rossum

unread,
Feb 10, 2016, 5:53:57 AM2/10/16
to Dan Sommers, python...@python.org
Let me show you how silly this looked to me... 

On Tuesday, February 9, 2016, Dan Sommers <d...@tombstonezero.net> wrote:
--
--Guido (mobile)
IMG_0253.jpg

Alexander Heger

unread,
Feb 10, 2016, 6:37:56 AM2/10/16
to Manuel Cerón, Python-Ideas
>>>> opts.write_buffer_size = 67108864
>
> The disadvantage is that, as far as I known, no other languages do this.

This is not true. It is absolutely legal in FORTRAN

program f
print*, 123 456
end

will just print the number 123456. Hence for me as a FORTRAN this
would seem a natural thing to do. Better than the underscores ... I
would associate the with the LaTeX maths mode index operator, and then
I would read 100_002 as the number 4.

-Alexander

Greg Ewing

unread,
Feb 10, 2016, 6:57:57 AM2/10/16
to Python-Ideas
Guido van Rossum wrote:
> (Not sure
> about "12_34_56" but there are probably use cases for that too.)

I think the Chinese group by 10,000s rather than 1000s,
so they might want to write 1234_5678.

--
Greg

Nick Coghlan

unread,
Feb 10, 2016, 7:39:29 AM2/10/16
to Greg Ewing, Python-Ideas
On 10 February 2016 at 21:57, Greg Ewing <greg....@canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>>
>> (Not sure
>> about "12_34_56" but there are probably use cases for that too.)
>
> I think the Chinese group by 10,000s rather than 1000s,
> so they might want to write 1234_5678.

As others have suggested, I like the idea of keeping the grammar
simple (i.e. numeric literals must start with a base appropriate
digit, but may subsequently contain digits or underscores). I'd even
apply that to float literals, with the avoidance of putting an
underscore just before the floating point being a style issue, rather
than a syntactic one.

What kind of numeric grouping to use is also a style question - if
it's an English-language project or an international project using
metric values, then it would make sense to group by thousands. If it's
a project written assuming maintainers can follow Chinese or Japanese,
then it would make sense to group according to the conventions of
those language communities, just as folks may already decide to do
with variable names and comments.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Serhiy Storchaka

unread,
Feb 10, 2016, 7:59:38 AM2/10/16
to python...@python.org
On 10.02.16 13:37, Alexander Heger wrote:
> will just print the number 123456. Hence for me as a FORTRAN this
> would seem a natural thing to do. Better than the underscores ... I
> would associate the with the LaTeX maths mode index operator, and then
> I would read 100_002 as the number 4.

No, 100_{002} is the number 4.

Serhiy Storchaka

unread,
Feb 10, 2016, 8:10:05 AM2/10/16
to python...@python.org
On 09.02.16 23:40, Manuel Cerón wrote:
> Sometimes it's hard to read long numbers. For example:
>
> >>> opts..write_buffer_size = 67108864
>
> Some languages (Ruby, Perl, Swift) allow the use of underscores in
> numeric literals, which are ignored. They are typically used as
> thousands separators.

Some languages allow the use of ' (an apostrophe) as thousands separators:

67'108'864

But I prefer underscores as more common variant.

Paul Moore

unread,
Feb 10, 2016, 8:11:13 AM2/10/16
to Alexander Heger, Python-Ideas
On 10 February 2016 at 11:37, Alexander Heger <pyt...@2sn.net> wrote:
>>>>> opts.write_buffer_size = 67108864
>>
>> The disadvantage is that, as far as I known, no other languages do this.
>
> This is not true. It is absolutely legal in FORTRAN
>
> program f
> print*, 123 456
> end
>
> will just print the number 123456. Hence for me as a FORTRAN this
> would seem a natural thing to do. Better than the underscores ... I
> would associate the with the LaTeX maths mode index operator, and then
> I would read 100_002 as the number 4.

But to be fair, in older fortrans at least (I'd like to hope it's got
more sane these days) wasn't

pro gr a m f
p r i n t*,1 2 3 4 5 6
e nd

just as valid? (IIRC, whitespace was ignored everywhere, even within
keywords...)

Paul

Alexander Heger

unread,
Feb 10, 2016, 8:21:40 AM2/10/16
to Serhiy Storchaka, python-ideas

>> will just print the number 123456.  Hence for me as a FORTRAN this
>> would seem a natural thing to do.  Better than the underscores ... I
>> would associate the with the LaTeX maths mode index operator, and then
>> I would read 100_002 as the number 4.
>
> No, 100_{002} is the number 4.

Well, as you know, even more strictly speaking, in a LaTeX text you'd have to write $100_{002}$.  But in more colloquial use of LaTeX syntax, e.g., in plain text abstracts, the braces are often dropped for the sake of readability.  Beside, base 0 or base 1 make little sense.

-Alexander


Sent from my Pixel C.

Stephen J. Turnbull

unread,
Feb 10, 2016, 9:51:28 AM2/10/16
to Alexander Heger, Python-Ideas
Alexander Heger writes:

> This is not true. It is absolutely legal in FORTRAN
>
> program f
> print*, 123 456
> end

In traditional FORTRAN, so is this:

print*,123456

and this

p r i n t * , 1 2 3 4 5 6

and all three print statements have exactly the same meaning, because
in traditional FORTRAN spaces are completely insignificant. That's
kinda un-Pythonic.<wink/>

Sven R. Kunze

unread,
Feb 10, 2016, 12:35:49 PM2/10/16
to python...@python.org
That's one reason I am -1 on this proposal.

Random832

unread,
Feb 10, 2016, 12:42:39 PM2/10/16
to python...@python.org
On Tue, Feb 9, 2016, at 18:17, Guido van Rossum wrote:
> Indeed, "123 456" is a no-no, but "123_456" sound good. (Not sure
> about "12_34_56" but there are probably use cases for that too.)

For one, not all languages use the same divisions. In many Indian
languages, the natural divisions would be like 1_23_45_678, whereas in
Chinese and Japanese some people may want to use 1_2345_6789, though the
western system is also common.

Also, dare I suggest, 0x_0123_4567_89AB_CDEF? Arbitrary grouping may be
useful in binary constants representing bit fields.

For that matter, for an integer representing a fixed-point fractional
quantity, one may want to use 1_234_567__0000, where the last separator
represents the decimal place. Mathematical publications sometimes group
digits after a decimal place into groups of five, e.g.
Decimal("3.14159_26535_89793_84626") or combine these ideas for
Fraction(3__14159_26535_89793_84626, 10**20)

I don't know that there's any reason to build restrictions into the
language, any more than to require certain integer constants to be in
decimal and others to be in hexadecimal.

Random832

unread,
Feb 10, 2016, 12:47:42 PM2/10/16
to python...@python.org
On Wed, Feb 10, 2016, at 12:42, Random832 wrote:
> 3__14159_26535_89793_84626

...And before anyone else points it out, I copied the fourth digit group
from the wrong position. (I happen to have the first 17 digits
memorized, so I typed the first three groups, and then mistakenly copied
from after the last one I had memorized rather than the last one I'd
typed).

Georg Brandl

unread,
Feb 10, 2016, 12:51:34 PM2/10/16
to python...@python.org
On 02/09/2016 10:40 PM, Manuel Cerón wrote:
> Hi everyone!
>
> Sometimes it's hard to read long numbers. For example:
>
>>>> opts.write_buffer_size = 67108864
>
> Some languages (Ruby, Perl, Swift) allow the use of underscores in numeric
> literals, which are ignored. They are typically used as thousands separators.
> The example above would look like this:
>
>>>> opts.write_buffer_size = 67_108_864
>
> Which helps to quickly identify that this is around 67 million.

> Thoughts?

I like it, and for everybody to try it out, I posted a draft patch here:

http://bugs.python.org/issue26331

Underscores are allowed anywhere in numeric literals, except:

* at the beginning of a literal (obviously)
* at the end of a literal
* directly after a dot (since the underscore could start an attribute name)
* directly after a sign in exponents (for consistency with leading signs)
* in the middle of the "0x", "0o" or "0b" base specifiers

Reviewers welcome!

cheers,
Georg

Random832

unread,
Feb 10, 2016, 1:00:47 PM2/10/16
to python...@python.org
On Wed, Feb 10, 2016, at 12:51, Georg Brandl wrote:
> Underscores are allowed anywhere in numeric literals, except:
>
> * at the beginning of a literal (obviously)
> * at the end of a literal
> * directly after a dot (since the underscore could start an attribute
> name)

I don't think it's particularly important to support this case, but the
sequence digit/dot/name with no spaces between is a syntax error now,
because the digit/dot is interpreted as a floating point constant.

> * directly after a sign in exponents (for consistency with leading signs)
> * in the middle of the "0x", "0o" or "0b" base specifiers

Do you allow multiple underscores in a row? I mentioned a couple
possible use cases for that.

Georg Brandl

unread,
Feb 10, 2016, 1:11:29 PM2/10/16
to python...@python.org
On 02/10/2016 07:00 PM, Random832 wrote:
> On Wed, Feb 10, 2016, at 12:51, Georg Brandl wrote:
>> Underscores are allowed anywhere in numeric literals, except:
>>
>> * at the beginning of a literal (obviously)
>> * at the end of a literal
>> * directly after a dot (since the underscore could start an attribute
>> name)
>
> I don't think it's particularly important to support this case, but the
> sequence digit/dot/name with no spaces between is a syntax error now,
> because the digit/dot is interpreted as a floating point constant.

Don't forget that float literals can also start with just the dot. Therefore
this case can get quite ambiguous.

expr ._2 is an attribute access
expr 0._2 is a syntax error

>> * directly after a sign in exponents (for consistency with leading signs)
>> * in the middle of the "0x", "0o" or "0b" base specifiers
>
> Do you allow multiple underscores in a row? I mentioned a couple
> possible use cases for that.

Yes, there is no restriction.

Georg

Greg Ewing

unread,
Feb 10, 2016, 3:26:33 PM2/10/16
to python-ideas
Alexander Heger wrote:
> Well, as you know, even more strictly speaking, in a LaTeX text you'd
> have to write $100_{002}$.

+1 on allowing Python expressions to be written in
LaTeX math mode.

This is obviously why Guido has reserved $ for such
a long time. He's just popped back in the time machine
and told himself about this thread!

--
Greg

Chris Angelico

unread,
Feb 10, 2016, 4:42:41 PM2/10/16
to python-ideas
On Thu, Feb 11, 2016 at 5:00 AM, Random832 <rand...@fastmail.com> wrote:
> On Wed, Feb 10, 2016, at 12:51, Georg Brandl wrote:
>> Underscores are allowed anywhere in numeric literals, except:
>>
>> * at the beginning of a literal (obviously)
>> * at the end of a literal
>> * directly after a dot (since the underscore could start an attribute
>> name)
>
> I don't think it's particularly important to support this case, but the
> sequence digit/dot/name with no spaces between is a syntax error now,
> because the digit/dot is interpreted as a floating point constant.

Only if that's the only dot.

>>> 1._
File "<stdin>", line 1
1._
^
SyntaxError: invalid syntax
>>> 1.1._
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'float' object has no attribute '_'

I don't want to see the first one have an arbitrary new meaning. It'd
be confusing.

ChrisA

Random832

unread,
Feb 10, 2016, 11:55:50 PM2/10/16
to python...@python.org
On Wed, Feb 10, 2016, at 16:42, Chris Angelico wrote:
> On Thu, Feb 11, 2016 at 5:00 AM, Random832 <rand...@fastmail.com>
> wrote:
> > I don't think it's particularly important to support this case, but the
> > sequence digit/dot/name with no spaces between is a syntax error now,
> > because the digit/dot is interpreted as a floating point constant.
>
> Only if that's the only dot.

But only the first dot is part of the numeric literal and therefore even
theoretically eligible to have an underscore before or after it accepted
as part of the literal, so that goes without saying. I was just pointing
out that the underscore this rule disallows can't actually start an
attribute name.
Reply all
Reply to author
Forward
0 new messages