[Python-Dev] PEP 515: Underscores in Numeric Literals

Georg Brandl

unread,

Feb 10, 2016, 5:22:02 PM2/10/16

to pytho...@python.org

This came up in python-ideas, and has met mostly positive comments,
although the exact syntax rules are up for discussion.

cheers,
Georg

--------------------------------------------------------------------------------

PEP: 515
Title: Underscores in Numeric Literals
Version: $Revision$
Last-Modified: $Date$
Author: Georg Brandl
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 10-Feb-2016
Python-Version: 3.6

Abstract and Rationale
======================

This PEP proposes to extend Python's syntax so that underscores can be used in
integral and floating-point number literals.

This is a common feature of other modern languages, and can aid readability of
long literals, or literals whose value should clearly separate into parts, such
as bytes or words in hexadecimal notation.

Examples::

# grouping decimal numbers by thousands
amount = 10_000_000.0

# grouping hexadecimal addresses by words
addr = 0xDEAD_BEEF

# grouping bits into bytes in a binary literal
flags = 0b_0011_1111_0100_1110

Specification
=============

The current proposal is to allow underscores anywhere in numeric literals, with
these exceptions:

* Leading underscores cannot be allowed, since they already introduce
identifiers.
* Trailing underscores are not allowed, because they look confusing and don't
contribute much to readability.
* The number base prefixes ``0x``, ``0o``, and ``0b`` cannot be split up,
because they are fixed strings and not logically part of the number.
* No underscore allowed after a sign in an exponent (``1e-_5``), because
underscores can also not be used after the signs in front of the number
(``-1e5``).
* No underscore allowed after a decimal point, because this leads to ambiguity
with attribute access (the lexer cannot know that there is no number literal
in ``foo._5``).

There appears to be no reason to restrict the use of underscores otherwise.

The production list for integer literals would therefore look like this::

integer: decimalinteger | octinteger | hexinteger | bininteger
decimalinteger: nonzerodigit [decimalrest] | "0" [("0" | "_")* "0"]
nonzerodigit: "1"..."9"
decimalrest: (digit | "_")* digit
digit: "0"..."9"
octinteger: "0" ("o" | "O") (octdigit | "_")* octdigit
hexinteger: "0" ("x" | "X") (hexdigit | "_")* hexdigit
bininteger: "0" ("b" | "B") (bindigit | "_")* bindigit
octdigit: "0"..."7"
hexdigit: digit | "a"..."f" | "A"..."F"
bindigit: "0" | "1"

For floating-point literals::

floatnumber: pointfloat | exponentfloat
pointfloat: [intpart] fraction | intpart "."
exponentfloat: (intpart | pointfloat) exponent
intpart: digit (digit | "_")*
fraction: "." intpart
exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest]

Alternative Syntax
==================

Underscore Placement Rules
--------------------------

Instead of the liberal rule specified above, the use of underscores could be
limited. Common rules are (see the "other languages" section):

* Only one consecutive underscore allowed, and only between digits.
* Multiple consecutive underscore allowed, but only between digits.

Different Separators
--------------------

A proposed alternate syntax was to use whitespace for grouping. Although
strings are a precedent for combining adjoining literals, the behavior can lead
to unexpected effects which are not possible with underscores. Also, no other
language is known to use this rule, except for languages that generally
disregard any whitespace.

C++14 introduces apostrophes for grouping, which is not considered due to the
conflict with Python's string literals. [1]_

Behavior in Other Languages
===========================

Those languages that do allow underscore grouping implement a large variety of
rules for allowed placement of underscores. This is a listing placing the known
rules into three major groups. In cases where the language spec contradicts the
actual behavior, the actual behavior is listed.

**Group 1: liberal (like this PEP)**

* D [2]_
* Perl 5 (although docs say it's more restricted) [3]_
* Rust [4]_
* Swift (although textual description says "between digits") [5]_

**Group 2: only between digits, multiple consecutive underscores**

* C# (open proposal for 7.0) [6]_
* Java [7]_

**Group 3: only between digits, only one underscore**

* Ada [8]_
* Julia (but not in the exponent part of floats) [9]_
* Ruby (docs say "anywhere", in reality only between digits) [10]_

Implementation
==============

A preliminary patch that implements the specification given above has been
posted to the issue tracker. [11]_

References
==========

.. [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html

.. [2] http://dlang.org/spec/lex.html#integerliteral

.. [3] http://perldoc.perl.org/perldata.html#Scalar-value-constructors

.. [4] http://doc.rust-lang.org/reference.html#number-literals

.. [5]
https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html

.. [6] https://github.com/dotnet/roslyn/issues/216

.. [7]
https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html

.. [8] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4

.. [9]
http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-point-numbers/

.. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers

.. [11] http://bugs.python.org/issue26331

Copyright
=========

This document has been placed in the public domain.

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Brett Cannon

unread,

Feb 10, 2016, 5:37:02 PM2/10/16

to Georg Brandl, pytho...@python.org

I assume all of these examples are possible in either the liberal or restrictive approaches?

Is the implementation made easier or harder if we went with the Group 2 or 3 approaches? Are there any reasonable examples that the Group 1 approach allows that Group 3 doesn't that people have used in other languages?

I'm +1 on the idea, but which approach I prefer is going to be partially dependent on the difficulty of implementing (else I say Group 3 to make it easier to explain the rules).

-Brett

Glenn Linderman

unread,

Feb 10, 2016, 5:44:08 PM2/10/16

to pytho...@python.org

+1

You don't mention potential restrictions that decimal numbers should permit them only every three places, or hex ones only every 2 or 4, and your binary example mentions grouping into bytes, but actually groups into nybbles.

But such restrictions would be annoying: if it is useful to the coder to use them, that is fine. But different situation may find other placements more useful... particularly in binary, as it might want to match widths of various bitfields.

Adding that as a rejected consideration, with justifications, would be helpful.

Paul Moore

unread,

Feb 10, 2016, 5:54:08 PM2/10/16

to Georg Brandl, Python Dev

On 10 February 2016 at 22:20, Georg Brandl <g.br...@gmx.net> wrote:
> This came up in python-ideas, and has met mostly positive comments,
> although the exact syntax rules are up for discussion.

+1 on the PEP. Is there any value in allowing underscores in strings
passed to the Decimal constructor as well? The same sorts of
justifications would seem to apply. It's perfectly arguable that the
change for Decimal would be so rarely used as to not be worth it,
though, so I don't mind either way in practice.

Paul

Victor Stinner

unread,

Feb 10, 2016, 6:06:14 PM2/10/16

to Georg Brandl, Python Dev

It looks like the implementation https://bugs.python.org/issue26331
only changes the Python parser.

What about other functions converting strings to numbers at runtime
like int(str) and float(str)? Paul also asked for Decimal(str).

Victor

MRAB

unread,

Feb 10, 2016, 6:09:44 PM2/10/16

to pytho...@python.org

On 2016-02-10 22:35, Brett Cannon wrote:

[snip]

>
> Examples::
>
> # grouping decimal numbers by thousands
> amount = 10_000_000.0
>
> # grouping hexadecimal addresses by words
> addr = 0xDEAD_BEEF
>
> # grouping bits into bytes in a binary literal
> flags = 0b_0011_1111_0100_1110
>
>
> I assume all of these examples are possible in either the liberal or
> restrictive approaches?
>

[snip]
Strictly speaking, "0b_0011_1111_0100_1110" wouldn't be valid if an
underscore was allowed only between digits because the "b" isn't a digit.

Similarly, "0x_FF_FF" wouldn't be valid, but "0xFF_FF" would.

Steven D'Aprano

unread,

Feb 10, 2016, 6:16:35 PM2/10/16

to pytho...@python.org

On Wed, Feb 10, 2016 at 10:53:09PM +0000, Paul Moore wrote:
> On 10 February 2016 at 22:20, Georg Brandl <g.br...@gmx.net> wrote:
> > This came up in python-ideas, and has met mostly positive comments,
> > although the exact syntax rules are up for discussion.
>
> +1 on the PEP. Is there any value in allowing underscores in strings
> passed to the Decimal constructor as well? The same sorts of
> justifications would seem to apply. It's perfectly arguable that the
> change for Decimal would be so rarely used as to not be worth it,
> though, so I don't mind either way in practice.

Let's delay making any change to string conversions for now, and that
includes Decimal. We can also do this:

Decimal("123_456_789.00000_12345_67890".replace("_", ""))

for those who absolutely must include underscores in their numeric
strings. The big win is for numeric literals, not numeric string
conversions.

--
Steve

Andrew Barnert via Python-Dev

unread,

Feb 10, 2016, 6:47:33 PM2/10/16

to Georg Brandl, pytho...@python.org

On Feb 10, 2016, at 14:20, Georg Brandl <g.br...@gmx.net> wrote:

First, general questions: should the PEP mention the Decimal constructor? What about int and float (I'd assume int(s) continues to work as always, while int(s, 0) gets the new behavior, but if that isn't obviously true, it may be worth saying explicitly).

> * Trailing underscores are not allowed, because they look confusing and don't
> contribute much to readability.

Why is "123_456_" so ugly that we have to catch it, when "1___2_345______6" is just fine, or "123e__+456"? More to the point, if we really need an extra rule, and more complicated BNF, to outlaw this case, I don't think we want a liberal design at all.

Also, notice that Swift, Rust, and D all show examples with trailing underscores in their references, and they don't look particularly out of place with the other examples.

> There appears to be no reason to restrict the use of underscores otherwise.

What other restrictions are there? I think the only place you've left that's not between digits is between the e and the sign. A dead-simple rule like Swift's seems better than five separate rules that I have to learn and remember that make lexing more complicated and that ultimately amount to the conservative rule plus one other place I can put underscores where I'd never want to.

> **Group 1: liberal (like this PEP)**
>
> * D [2]_
> * Perl 5 (although docs say it's more restricted) [3]_
> * Rust [4]_
> * Swift (although textual description says "between digits") [5]_

I don't think any of these are liberal like this PEP.

For example, Swift's actual grammar rule allows underscores anywhere but leading in the "digits" part of int literals and all three potential digit parts of float literals. That's the whole rule. It's more conservative than this PEP in not allowing them outside of digit parts (like between E and +), more liberal in allowing them to be trailing, but I'm pretty sure the reason behind the design wasn't specifically about how liberal or conservative they wanted to be, but about being as simple as possible. Rust's rule seems to be equivalent to Swift's, except that they forgot to define exponents anywhere. I don't think either of them was trying to be more liberal or more conservative; rather, they were both trying to be as simple as possible.

D does go out of its way to be as liberal as possible, e.g., allowing things like "0x_1_" that the others wouldn't (they'd treat the "_1_" as a digit part, which can't have leading underscores), but it's also more conservative than this spec in not allowing underscores between e and the sign.

I think Perl is the only language that allows them anywhere but in the digits part.

Steven D'Aprano

unread,

Feb 10, 2016, 7:06:55 PM2/10/16

to pytho...@python.org

On Wed, Feb 10, 2016 at 11:20:38PM +0100, Georg Brandl wrote:
> This came up in python-ideas, and has met mostly positive comments,
> although the exact syntax rules are up for discussion.

Nicely done. But I would change the restrictions to a simpler version.
Instead of five rules to learn:

> The current proposal is to allow underscores anywhere in numeric literals, with
> these exceptions:
>
> * Leading underscores cannot be allowed, since they already introduce
> identifiers.
> * Trailing underscores are not allowed, because they look confusing and don't
> contribute much to readability.
> * The number base prefixes ``0x``, ``0o``, and ``0b`` cannot be split up,
> because they are fixed strings and not logically part of the number.
> * No underscore allowed after a sign in an exponent (``1e-_5``), because
> underscores can also not be used after the signs in front of the number
> (``-1e5``).
> * No underscore allowed after a decimal point, because this leads to ambiguity
> with attribute access (the lexer cannot know that there is no number literal
> in ``foo._5``).

change to a single rule "one or more underscores may appear between two
(hex)digits, but otherwise nowhere else". That's much simpler to
understand than a series of restrictions as given above.

That would be your second restrictive rule:

"Multiple consecutive underscore allowed, but only between digits."

That forbids leading and trailing underscores, underscores inside or
immediately after the leading number base (since x, o and b aren't
digits), and immediately before or after the sign, decimal point or e|E
exponent symbol.

> There appears to be no reason to restrict the use of underscores otherwise.

I don't like underscores immediately before the . or e|E in floats
either: 123_.000_456

The dot is already visually distinctive enough, as is the e|E, and
placing an underscore immediately before them doesn't aid in grouping
the digits.

> Instead of the liberal rule specified above, the use of underscores could be
> limited. Common rules are (see the "other languages" section):
>
> * Only one consecutive underscore allowed, and only between digits.
> * Multiple consecutive underscore allowed, but only between digits.

I don't think there is any need to restrict it to only a single
underscore. There are uses for more than one:

Fraction(3__141_592_654, 1_000_000_000)

hints that the 3 is somewhat special (for obvious reasons).

--
Steve

Greg Ewing

unread,

Feb 10, 2016, 7:09:51 PM2/10/16

to pytho...@python.org

The Mersenne Twister is no longer regarded as quite state-of-the art
because it can get into states that produce long sequences that are
not very random.

There is a variation on MT called WELL that has better properties
in this regard. Does anyone think it would be a good idea to replace
MT with WELL as Python's default rng?

https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear

--
Greg

Ethan Furman

unread,

Feb 10, 2016, 7:14:47 PM2/10/16

to pytho...@python.org

On 02/10/2016 04:04 PM, Steven D'Aprano wrote:

> change to a single rule "one or more underscores may appear between
> two (hex)digits, but otherwise nowhere else". That's much simpler to
> understand than a series of restrictions as given above.

I like the simpler rule, but I would also allow for an underscore
between the base and the first digit:

0x_1ef9_ab22

is easier (at least, for me ;)
to parse than

0x1ef9_ab22

However, since Georg is doing the work, I'm not going to argue too hard.

--
~Ethan~

Steven D'Aprano

unread,

Feb 10, 2016, 7:23:13 PM2/10/16

to pytho...@python.org

On Wed, Feb 10, 2016 at 03:45:48PM -0800, Andrew Barnert via Python-Dev wrote:
> On Feb 10, 2016, at 14:20, Georg Brandl <g.br...@gmx.net> wrote:
>
> First, general questions: should the PEP mention the Decimal constructor? What about int and float (I'd assume int(s) continues to work as always, while int(s, 0) gets the new behavior, but if that isn't obviously true, it may be worth saying explicitly).
>
> > * Trailing underscores are not allowed, because they look confusing and don't
> > contribute much to readability.
>
> Why is "123_456_" so ugly that we have to catch it, when
> "1___2_345______6" is just fine,

It's not just fine, it's ugly as sin, but it shouldn't be a matter for
the parser to decide a style-issue.

Just as we allow people to write ugly tuples:

t = ( 1 , 2, 3 ,4, 5, )

so we should allow people to write ugly ints rather than try to enforce
good taste in the parser. There are uses for allowing multiple
underscores, and odd groupings, so rather than a blanket ban, we trust
that people won't do stupid things.

> or "123e__+456"?

That I would prohibit. I think that the decimal point and exponent sign
provide sufficient visual distinctiveness that putting underscores
around them doesn't gain you anything. In some cases it looks like
you might have missed a group of digits:

1.234_e-89

hints that perhaps there ought to be more digits after the 4.

I'd be okay with a rule "no underscores in the exponent at all", but I
don't particularly see the need for it since that's pretty much covered
by the style guide saying "don't use underscores unnecessarily". For
floats, exponents have a practical limitation of three digits, so
there's not much need for grouping them.

+1 on allowing underscores between digits
+0 on prohibiting underscores in the exponent

> More to the point,
> if we really need an extra rule, and more complicated BNF, to outlaw
> this case, I don't think we want a liberal design at all.

I think "underscores can occur between any two digits" is pretty
liberal, since it allows multiple underscores, and allows grouping in
any size group (including mixed sizes, and stupid sizes like 1).

To me, the opposite of a liberal rule is something like "underscores may
only occur between groups of three digits".

> Also, notice that Swift, Rust, and D all show examples with trailing
> underscores in their references, and they don't look particularly out
> of place with the other examples.

That's a matter of opinion.

--
Steve

Martin Panter

unread,

Feb 10, 2016, 8:17:20 PM2/10/16

to Georg Brandl, python-dev

I have occasionally wondered about this missing feature.

On 10 February 2016 at 22:20, Georg Brandl <g.br...@gmx.net> wrote:
> Abstract and Rationale
> ======================
>
> This PEP proposes to extend Python's syntax so that underscores can be used in
> integral and floating-point number literals.

This should extend complex or imaginary literals like 10_000j for consistency.

> Specification
> =============

>
> * Trailing underscores are not allowed, because they look confusing and don't
> contribute much to readability.

> * No underscore allowed after a sign in an exponent (``1e-_5``), because
> underscores can also not be used after the signs in front of the number
> (``-1e5``).

> [. . .]

This allows trailing underscores such as 1_.2, 1.2_, 1.2_e-5. Your
bullet point above suggests at least some of these are not desired.

> fraction: "." intpart
> exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest]

This allows underscores in the exponent (1e-5_0), contradicting the
other bullet point.

Andrew Barnert via Python-Dev

unread,

Feb 10, 2016, 11:43:10 PM2/10/16

to Steven D'Aprano, pytho...@python.org

On Feb 10, 2016, at 16:21, Steven D'Aprano <st...@pearwood.info> wrote:
>
>> On Wed, Feb 10, 2016 at 03:45:48PM -0800, Andrew Barnert via Python-Dev wrote:
>> On Feb 10, 2016, at 14:20, Georg Brandl <g.br...@gmx.net> wrote:
>>
>> First, general questions: should the PEP mention the Decimal constructor? What about int and float (I'd assume int(s) continues to work as always, while int(s, 0) gets the new behavior, but if that isn't obviously true, it may be worth saying explicitly).
>>
>>> * Trailing underscores are not allowed, because they look confusing and don't
>>> contribute much to readability.
>>
>> Why is "123_456_" so ugly that we have to catch it, when
>> "1___2_345______6" is just fine,
>
> It's not just fine, it's ugly as sin, but it shouldn't be a matter for
> the parser to decide a style-issue.

Exactly. So why should it be any more of a matter for the parser to decide that "123_456_" is illegal? Leave that in the style guide, and keep the parser, and the reference documentation, as simple as possible.

>> or "123e__+456"?
>
> That I would prohibit.

The PEP allows that. The simpler rule used by Swift and Rust prohibits it.

>> More to the point,
>> if we really need an extra rule, and more complicated BNF, to outlaw
>> this case, I don't think we want a liberal design at all.
>
> I think "underscores can occur between any two digits" is pretty
> liberal, since it allows multiple underscores, and allows grouping in
> any size group (including mixed sizes, and stupid sizes like 1).

The PEP calls that a type-2 conservative proposal, and uses "liberal" to mean that underscores can appear in places that aren't between digits. I don't think we want that liberalism, especially if it requires 5 rules instead of 1 to get it right.

Again, Swift and Rust only allow underscores in the digit part of integers, and the up to three digit parts of floats, and the only rule they impose is no leading underscore. (In some caass they lead to ambiguity, in others they don't, but it's easier to just always ban them.) I don't see anything wrong with that rule. The fact that it doesn't allow "1.2e_+3" seems fine. The fact that it doesn't prevent "123_" seems fine also. It's not about being as liberal as possible, or as restrictive as possible, because those edge cases just don't matter, so being as simple as possible seems like an obvious win.

>> Also, notice that Swift, Rust, and D all show examples with trailing
>> underscores in their references, and they don't look particularly out
>> of place with the other examples.
>
> That's a matter of opinion.

Sure, but it's apparently the opinion of the people who designed and/or documented this feature in three out of the four languages I looked at (aka every language but Perl), not mine.

And honestly, are you really claiming that in your opinion, "123_456_" is worse than all of their other examples, like "1_23__4"?

They're both presented as something the syntax allows, and neither one looks like something I'd ever want to write, much less promote in a style guide or something, but neither one screams out as something that's so heinous we need to complicate the language to ensure it raises a SyntaxError. Yes, that's my opinion, but do.you really have a different opinion about any part of that?

Georg Brandl

unread,

Feb 11, 2016, 2:26:29 AM2/11/16

to pytho...@python.org

On 02/11/2016 02:16 AM, Martin Panter wrote:
> I have occasionally wondered about this missing feature.
>
> On 10 February 2016 at 22:20, Georg Brandl <g.br...@gmx.net> wrote:
>> Abstract and Rationale
>> ======================
>>
>> This PEP proposes to extend Python's syntax so that underscores can be used in
>> integral and floating-point number literals.
>
> This should extend complex or imaginary literals like 10_000j for consistency.

Yes, that was always the case, but I guess it should be explicit.

The middle one isn't, indeed. I updated the grammar accordingly.

>> fraction: "." intpart
>> exponent: ("e" | "E") "_"* ["+" | "-"] digit [decimalrest]
>
> This allows underscores in the exponent (1e-5_0), contradicting the
> other bullet point.

I clarified the bullet points. An "immediately" was missing.

Thanks for the feedback!
Georg

Georg Brandl

unread,

Feb 11, 2016, 2:38:23 AM2/11/16

to pytho...@python.org

On 02/11/2016 12:45 AM, Andrew Barnert via Python-Dev wrote:
> On Feb 10, 2016, at 14:20, Georg Brandl <g.br...@gmx.net> wrote:
>
> First, general questions: should the PEP mention the Decimal constructor?
> What about int and float (I'd assume int(s) continues to work as always,
> while int(s, 0) gets the new behavior, but if that isn't obviously true, it
> may be worth saying explicitly).
>
>> * Trailing underscores are not allowed, because they look confusing and
>> don't contribute much to readability.
>
> Why is "123_456_" so ugly that we have to catch it, when "1___2_345______6"
> is just fine, or "123e__+456"? More to the point, if we really need an extra
> rule, and more complicated BNF, to outlaw this case, I don't think we want a
> liberal design at all.
>
> Also, notice that Swift, Rust, and D all show examples with trailing
> underscores in their references, and they don't look particularly out of
> place with the other examples.

That's a point. I'll look into the implementation.

>> There appears to be no reason to restrict the use of underscores
>> otherwise.
>
> What other restrictions are there? I think the only place you've left that's
> not between digits is between the e and the sign.

There are other places left:

* between 0x and the digits
* between the digits and "j"
* before and after the decimal point

> A dead-simple rule like
> Swift's seems better than five separate rules that I have to learn and
> remember that make lexing more complicated and that ultimately amount to the
> conservative rule plus one other place I can put underscores where I'd never
> want to.

Not quite, see above.

>> **Group 1: liberal (like this PEP)**
>>
>> * D [2]_ * Perl 5 (although docs say it's more restricted) [3]_ * Rust
>> [4]_ * Swift (although textual description says "between digits") [5]_
>
> I don't think any of these are liberal like this PEP.
>
> For example, Swift's actual grammar rule allows underscores anywhere but
> leading in the "digits" part of int literals and all three potential digit
> parts of float literals. That's the whole rule. It's more conservative than
> this PEP in not allowing them outside of digit parts (like between E and +),
> more liberal in allowing them to be trailing, but I'm pretty sure the reason
> behind the design wasn't specifically about how liberal or conservative they
> wanted to be, but about being as simple as possible. Rust's rule seems to be
> equivalent to Swift's, except that they forgot to define exponents anywhere.
> I don't think either of them was trying to be more liberal or more
> conservative; rather, they were both trying to be as simple as possible.

I actually modelled this PEP closely on Rust. It has restrictions as in this
PEP, except that trailing underscores are allowed, and that "1.0e_+5" is not
allowed (allowed by the PEP), and "1.0e+_5" is (not allowed by the PEP).

I don't think you can argue that it's simpler.

(If the PEP and our lexical reference were as loosely worded as Rust's, one
could probably say it's "simple", too.)

Also, both Swift and Rust don't have the baggage of allowing ".5" style
literals, which makes the grammar simpler in Swift's case.

> D does go out of its way to be as liberal as possible, e.g., allowing things
> like "0x_1_" that the others wouldn't (they'd treat the "_1_" as a digit
> part, which can't have leading underscores), but it's also more conservative
> than this spec in not allowing underscores between e and the sign.
>
> I think Perl is the only language that allows them anywhere but in the digits
> part.

Thanks for the feedback!
Georg

Georg Brandl

unread,

Feb 11, 2016, 2:46:51 AM2/11/16

to pytho...@python.org

On 02/10/2016 11:35 PM, Brett Cannon wrote:

>> Examples::
>>
>> # grouping decimal numbers by thousands
>> amount = 10_000_000.0
>>
>> # grouping hexadecimal addresses by words
>> addr = 0xDEAD_BEEF
>>
>> # grouping bits into bytes in a binary literal
>> flags = 0b_0011_1111_0100_1110
>>
>
> I assume all of these examples are possible in either the liberal or restrictive
> approaches?

The last one isn't for restrictive -- its first underscore isn't between digits.

>>
>> Implementation
>> ==============
>>
>> A preliminary patch that implements the specification given above has been
>> posted to the issue tracker. [11]_
>>
>
> Is the implementation made easier or harder if we went with the Group 2 or 3
> approaches? Are there any reasonable examples that the Group 1 approach allows
> that Group 3 doesn't that people have used in other languages?

Group 3 is probably a little more work than group 2, since you have to make sure
only one consecutive underscore is present. I don't see a point to that.

> I'm +1 on the idea, but which approach I prefer is going to be partially
> dependent on the difficulty of implementing (else I say Group 3 to make it
> easier to explain the rules).

Based on the feedback so far, I have an easier rule in mind that I will base
the next PEP revision on. It's basically

"One ore more underscores allowed anywhere after a digit or a base specifier."

This preserves my preferred non-restrictive cases (0b_1111_0000, 1.5_j) and
disallows more controversial versions like "1.5e_+_2".

cheers,
Georg

Georg Brandl

unread,

Feb 11, 2016, 3:11:11 AM2/11/16

to pytho...@python.org

I added a short paragraph.

Thanks for the feedback,
Georg

Georg Brandl

unread,

Feb 11, 2016, 3:16:03 AM2/11/16

to pytho...@python.org

On 02/11/2016 12:04 AM, Victor Stinner wrote:
> It looks like the implementation https://bugs.python.org/issue26331
> only changes the Python parser.
>
> What about other functions converting strings to numbers at runtime
> like int(str) and float(str)? Paul also asked for Decimal(str).

I added these as "Open Questions" to the PEP.

For Decimal, it's probably a good idea. For int(), it should only be
allowed with base argument = 0. For float() and complex(), probably.

Georg

Georg Brandl

unread,

Feb 11, 2016, 3:24:14 AM2/11/16

to pytho...@python.org

Hey all,

based on the feedback so far, I revised the PEP. There is now
a much simpler rule for allowed underscores, with no exceptions.
This made the grammar simpler as well.

---------------------------------------------------------------------------

PEP: 515
Title: Underscores in Numeric Literals
Version: $Revision$
Last-Modified: $Date$
Author: Georg Brandl
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 10-Feb-2016
Python-Version: 3.6

Abstract and Rationale
======================

This PEP proposes to extend Python's syntax so that underscores can be used in

integral, floating-point and complex number literals.

This is a common feature of other modern languages, and can aid readability of
long literals, or literals whose value should clearly separate into parts, such
as bytes or words in hexadecimal notation.

Examples::

# grouping decimal numbers by thousands
amount = 10_000_000.0

# grouping hexadecimal addresses by words
addr = 0xDEAD_BEEF

# grouping bits into bytes in a binary literal
flags = 0b_0011_1111_0100_1110

# making the literal suffix stand out more
imag = 1.247812376e-15_j

Specification
=============

The current proposal is to allow one or more consecutive underscores following
digits and base specifiers in numeric literals.

The production list for integer literals would therefore look like this::

integer: decimalinteger | octinteger | hexinteger | bininteger

octdigit: "0"..."7"
hexdigit: digit | "a"..."f" | "A"..."F"
bindigit: "0" | "1"

For floating-point and complex literals::

floatnumber: pointfloat | exponentfloat
pointfloat: [intpart] fraction | intpart "."
exponentfloat: (intpart | pointfloat) exponent
intpart: digit (digit | "_")*

fraction: "." intpart
exponent: ("e" | "E") ["+" | "-"] intpart
imagnumber: (floatnumber | intpart) ("j" | "J")

Alternative Syntax
==================

Underscore Placement Rules
--------------------------

Instead of the liberal rule specified above, the use of underscores could be
limited. Common rules are (see the "other languages" section):

* Only one consecutive underscore allowed, and only between digits.
* Multiple consecutive underscore allowed, but only between digits.

A less common rule would be to allow underscores only every N digits (where N
could be 3 for decimal literals, or 4 for hexadecimal ones). This is
unnecessarily restrictive, especially considering the separator placement is
different in different cultures.

Different Separators
--------------------

A proposed alternate syntax was to use whitespace for grouping. Although
strings are a precedent for combining adjoining literals, the behavior can lead
to unexpected effects which are not possible with underscores. Also, no other
language is known to use this rule, except for languages that generally
disregard any whitespace.

C++14 introduces apostrophes for grouping, which is not considered due to the
conflict with Python's string literals. [1]_

Behavior in Other Languages
===========================

Those languages that do allow underscore grouping implement a large variety of
rules for allowed placement of underscores. This is a listing placing the known
rules into three major groups. In cases where the language spec contradicts the
actual behavior, the actual behavior is listed.

**Group 1: liberal**

This group is the least homogeneous: the rules vary slightly between languages.
All of them allow trailing underscores. Some allow underscores after non-digits
like the ``e`` or the sign in exponents.

* D [2]_
* Perl 5 (underscores basically allowed anywhere, although docs say it's more
restricted) [3]_
* Rust (allows between exponent sign and digits) [4]_

* Swift (although textual description says "between digits") [5]_

**Group 2: only between digits, multiple consecutive underscores**

* C# (open proposal for 7.0) [6]_
* Java [7]_

**Group 3: only between digits, only one underscore**

* Ada [8]_
* Julia (but not in the exponent part of floats) [9]_
* Ruby (docs say "anywhere", in reality only between digits) [10]_

Implementation
==============

A preliminary patch that implements the specification given above has been
posted to the issue tracker. [11]_

Open Questions
==============

This PEP currently only proposes changing the literal syntax. The following
extensions are open for discussion:

* Allowing underscores in string arguments to the ``Decimal`` constructor. It
could be argued that these are akin to literals, since there is no Decimal
literal available (yet).

* Allowing underscores in string arguments to ``int()`` with base argument 0,
``float()`` and ``complex()``.

References
==========

.. [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html

.. [2] http://dlang.org/spec/lex.html#integerliteral

.. [3] http://perldoc.perl.org/perldata.html#Scalar-value-constructors

.. [4] http://doc.rust-lang.org/reference.html#number-literals

.. [5]
https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html

.. [6] https://github.com/dotnet/roslyn/issues/216

.. [7]
https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html

.. [8] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4

.. [9]
http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-point-numbers/

.. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers

.. [11] http://bugs.python.org/issue26331

Copyright
=========

This document has been placed in the public domain.

Paul Moore

unread,

Feb 11, 2016, 4:13:05 AM2/11/16

to Steven D'Aprano, Python Dev

On 10 February 2016 at 23:14, Steven D'Aprano <st...@pearwood.info> wrote:
> On Wed, Feb 10, 2016 at 10:53:09PM +0000, Paul Moore wrote:
>> On 10 February 2016 at 22:20, Georg Brandl <g.br...@gmx.net> wrote:
>> > This came up in python-ideas, and has met mostly positive comments,
>> > although the exact syntax rules are up for discussion.
>>
>> +1 on the PEP. Is there any value in allowing underscores in strings
>> passed to the Decimal constructor as well? The same sorts of
>> justifications would seem to apply. It's perfectly arguable that the
>> change for Decimal would be so rarely used as to not be worth it,
>> though, so I don't mind either way in practice.
>
> Let's delay making any change to string conversions for now, and that
> includes Decimal. We can also do this:
>
> Decimal("123_456_789.00000_12345_67890".replace("_", ""))
>
> for those who absolutely must include underscores in their numeric
> strings. The big win is for numeric literals, not numeric string
> conversions.

Good point. Maybe add this as an example in the PEP to explain why
conversions are excluded. But I did only mean the Decimal constructor,
which I think of more as a "decimal literal" - whereas int() and
float() are (in my mind at least) conversion functions and as such
should not be coupled to literal format (for example, 0x0001 notation
isn't supported by int())

Paul

Steven D'Aprano

unread,

Feb 11, 2016, 4:40:13 AM2/11/16

to pytho...@python.org

On Thu, Feb 11, 2016 at 01:08:41PM +1300, Greg Ewing wrote:
> The Mersenne Twister is no longer regarded as quite state-of-the art
> because it can get into states that produce long sequences that are
> not very random.
>
> There is a variation on MT called WELL that has better properties
> in this regard. Does anyone think it would be a good idea to replace
> MT with WELL as Python's default rng?
>
> https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear

I'm not able to judge the claims about which PRNG is better (perhaps Tim
Peters has an opinion?) but if we do change, I'd like to see the
existing random.Random moved to random.MT_Random for backwards
compatibility and compatibility with other software which uses MT. Not
necessarily saying that we have to keep it around forever (after all, we
did dump the Wichmann-Hill PRNG some time ago) but we ought to keep it
for at least a couple of releases.

--
Steve

Georg Brandl

unread,

Feb 11, 2016, 4:41:04 AM2/11/16

to pytho...@python.org

On 02/11/2016 10:10 AM, Paul Moore wrote:
> On 10 February 2016 at 23:14, Steven D'Aprano <st...@pearwood.info> wrote:
>> On Wed, Feb 10, 2016 at 10:53:09PM +0000, Paul Moore wrote:
>>> On 10 February 2016 at 22:20, Georg Brandl <g.br...@gmx.net> wrote:
>>> > This came up in python-ideas, and has met mostly positive comments,
>>> > although the exact syntax rules are up for discussion.
>>>
>>> +1 on the PEP. Is there any value in allowing underscores in strings
>>> passed to the Decimal constructor as well? The same sorts of
>>> justifications would seem to apply. It's perfectly arguable that the
>>> change for Decimal would be so rarely used as to not be worth it,
>>> though, so I don't mind either way in practice.
>>
>> Let's delay making any change to string conversions for now, and that
>> includes Decimal. We can also do this:
>>
>> Decimal("123_456_789.00000_12345_67890".replace("_", ""))
>>
>> for those who absolutely must include underscores in their numeric
>> strings. The big win is for numeric literals, not numeric string
>> conversions.
>
> Good point. Maybe add this as an example in the PEP to explain why
> conversions are excluded. But I did only mean the Decimal constructor,
> which I think of more as a "decimal literal" - whereas int() and
> float() are (in my mind at least) conversion functions and as such
> should not be coupled to literal format (for example, 0x0001 notation
> isn't supported by int())

Actually, it is. Just not without a base argument, because the default
base is 10. But both with base 0 and base 16, '0x' prefixes are allowed.

That's why I'm leaning towards supporting the underscores. In any case
I'm preparing the implementation.

Georg

Victor Stinner

unread,

Feb 11, 2016, 5:00:37 AM2/11/16

to Georg Brandl, Python Dev

2016-02-11 9:11 GMT+01:00 Georg Brandl <g.br...@gmx.net>:
> On 02/11/2016 12:04 AM, Victor Stinner wrote:
>> It looks like the implementation https://bugs.python.org/issue26331
>> only changes the Python parser.
>>
>> What about other functions converting strings to numbers at runtime
>> like int(str) and float(str)? Paul also asked for Decimal(str).
>
> I added these as "Open Questions" to the PEP.

Ok nice. Now another question :-)

Would it be useful to add an option to repr(int) and repr(float), or a
formatter to int.__format__() and float.__float__() to add an
underscore for thousands. Currently, we have the "n" format which
depends on the current LC_NUMERIC locale:

>>> '{:n}'.format(1234)
'1234'
>>> import locale; locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> '{:n}'.format(1234)
'1 234'

My idea:

>>> (1234).__repr__(pep515=True)
'1_234'
>>> (1234.0).__repr__(pep515=True)
'1_234.0'

or maybe:

>>> '{:pep515}'.format(1234)
'1_234'
>>> '{:pep515}'.format(1234.0)
'1_234.0'

I don't think that it would be a good idea to modify repr() default
behaviour, it would likely break a lot of applications.

Victor

Nick Coghlan

unread,

Feb 11, 2016, 5:08:53 AM2/11/16

to Victor Stinner, Georg Brandl, Python Dev

On 11 February 2016 at 19:59, Victor Stinner <victor....@gmail.com> wrote:
> 2016-02-11 9:11 GMT+01:00 Georg Brandl <g.br...@gmx.net>:
>> On 02/11/2016 12:04 AM, Victor Stinner wrote:
>>> It looks like the implementation https://bugs.python.org/issue26331
>>> only changes the Python parser.
>>>
>>> What about other functions converting strings to numbers at runtime
>>> like int(str) and float(str)? Paul also asked for Decimal(str).
>>
>> I added these as "Open Questions" to the PEP.
>
> Ok nice. Now another question :-)
>
> Would it be useful to add an option to repr(int) and repr(float), or a
> formatter to int.__format__() and float.__float__() to add an
> underscore for thousands.

Given that str.format supports a thousands separator:

>>> "{:,d}".format(100000000)
'100,000,000'

it might be reasonable to permit "_" in place of "," in the format specifier.

However, I'm not sure when you'd use it aside from code generation,
and you can already insert the thousands separator and then replace
"," with "_".

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Steven D'Aprano

unread,

Feb 11, 2016, 5:14:27 AM2/11/16

to pytho...@python.org

On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote:

> And honestly, are you really claiming that in your opinion, "123_456_"
> is worse than all of their other examples, like "1_23__4"?

Yes I am, because 123_456_ looks like you've forgotten to finish typing
the last group of digits, while 1_23__4 merely looks like you have no
taste.

> They're both presented as something the syntax allows, and neither one
> looks like something I'd ever want to write, much less promote in a
> style guide or something, but neither one screams out as something
> that's so heinous we need to complicate the language to ensure it
> raises a SyntaxError. Yes, that's my opinion, but do.you really have a
> different opinion about any part of that?

I don't think the rule "underscores must occur between digits" is
complicating the specification. It is *less* complicated to explain this
rule than to give a whole lot of special cases

- can you use a leading or trailing underscore?
- can an underscore follow the base prefix 0b 0o 0x?
- can an underscore precede or follow the decimal place?
- can an underscore precede or follow a + or - sign?
- can an underscore precede or follow the e|E exponent symbol?
- can an underscore precede or follow the j suffix for complex numbers?

versus

- underscores can only appear between (hex)digits.

I'm not sure why you seem to think that "only between digits" is more
complex than the alternative -- to me it is less complex, with no
special cases to memorise, just one general rule.

Of course, if (generic) you think that it is a feature to be able to put
underscores before the decimal point, after the E exponent, etc. then
you will dislike my suggested rule. That's okay, but in that case, it is
not because of "simplicity|complexity" but because (generic) you want to
be able to write things which my rule would prohibit.

--
Steve

Serhiy Storchaka

unread,

Feb 11, 2016, 5:18:09 AM2/11/16

to pytho...@python.org

On 11.02.16 00:20, Georg Brandl wrote:
> **Group 1: liberal (like this PEP)**
>

> * D [2]_
> * Perl 5 (although docs say it's more restricted) [3]_
> * Rust [4]_

> * Swift (although textual description says "between digits") [5]_
>
> **Group 2: only between digits, multiple consecutive underscores**
>
> * C# (open proposal for 7.0) [6]_
> * Java [7]_
>
> **Group 3: only between digits, only one underscore**
>
> * Ada [8]_
> * Julia (but not in the exponent part of floats) [9]_
> * Ruby (docs say "anywhere", in reality only between digits) [10]_

C++ is in this group too.

The documentation of Perl explicitly says that Perl is in this group too
(23__500 is not legal). Perhaps there is a bug in Perl implementation.
And may be Swift is intended to be in this group.

I think we should follow the majority of languages and use simple rule:
"only between digits".

I have provided an implementation.

Petr Viktorin

unread,

Feb 11, 2016, 5:26:07 AM2/11/16

to pytho...@python.org

On 02/11/2016 11:07 AM, Nick Coghlan wrote:
> On 11 February 2016 at 19:59, Victor Stinner <victor....@gmail.com> wrote:
>> 2016-02-11 9:11 GMT+01:00 Georg Brandl <g.br...@gmx.net>:
>>> On 02/11/2016 12:04 AM, Victor Stinner wrote:
>>>> It looks like the implementation https://bugs.python.org/issue26331
>>>> only changes the Python parser.
>>>>
>>>> What about other functions converting strings to numbers at runtime
>>>> like int(str) and float(str)? Paul also asked for Decimal(str).
>>>
>>> I added these as "Open Questions" to the PEP.
>>
>> Ok nice. Now another question :-)
>>
>> Would it be useful to add an option to repr(int) and repr(float), or a
>> formatter to int.__format__() and float.__float__() to add an
>> underscore for thousands.
>
> Given that str.format supports a thousands separator:
>
>>>> "{:,d}".format(100000000)
> '100,000,000'
>
> it might be reasonable to permit "_" in place of "," in the format specifier.
>
> However, I'm not sure when you'd use it aside from code generation,
> and you can already insert the thousands separator and then replace
> "," with "_".

It would make "SI style" [0] numbers a little bit more straightforward
to generate, since the order of operations wouldn't matter.
Currently it's:

"{:,}".format(1234.5678).replace(',', ' ').replace('.', ',')

Also it would make numbers with decimal comma and dot as separator a bit
easier to generate. Currently, that's (from PEP 378):

format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".")

[0] https://en.wikipedia.org/wiki/Decimal_mark#Examples_of_use

Steven D'Aprano

unread,

Feb 11, 2016, 5:30:53 AM2/11/16

to pytho...@python.org

On Thu, Feb 11, 2016 at 08:07:56PM +1000, Nick Coghlan wrote:

> Given that str.format supports a thousands separator:
>
> >>> "{:,d}".format(100000000)
> '100,000,000'
>
> it might be reasonable to permit "_" in place of "," in the format specifier.

+1

> However, I'm not sure when you'd use it aside from code generation,
> and you can already insert the thousands separator and then replace
> "," with "_".

It's not always easy or convenient to call .replace(",", "_") on the
output of format:

"With my help, the {} caught {:,d} ants.".format("aardvark", 100000000)

would need to be re-written as something like:

py> "With my help, the {} caught {} ants.".format("aardvark",
"{:,d}".format(100000000).replace(",", "_"))
'With my help, the aardvark caught 100_000_000 ants.'

--
Steve

Chris Angelico

unread,

Feb 11, 2016, 6:13:37 AM2/11/16

to python-dev

On Thu, Feb 11, 2016 at 7:22 PM, Georg Brandl <g.br...@gmx.net> wrote:
> * Allowing underscores in string arguments to the ``Decimal`` constructor. It
> could be argued that these are akin to literals, since there is no Decimal
> literal available (yet).
>
> * Allowing underscores in string arguments to ``int()`` with base argument 0,
> ``float()`` and ``complex()``.

I'm -0.5 on both of these, with the caveat that if either gets done,
both should be. Decimal() shouldn't be different from int() just
because there's currently no way to express a Decimal literal; if
Python 3.7 introduces such a literal, there'd be this weird rule
difference that has to be maintained for backward compatibility, and
has no justification left.

(As a side point, I would be fully in favour of Decimal literals. I'd
also be in favour of something like "from __future__ import
fraction_literals" so 1/2 would evaluate to Fraction(1,2) rather than
0.5. Hence I'm inclined *not* to support underscores in Decimal().)

ChrisA

Robert Kern

unread,

Feb 11, 2016, 6:59:00 AM2/11/16

to pytho...@python.org

On 2016-02-11 00:08, Greg Ewing wrote:
> The Mersenne Twister is no longer regarded as quite state-of-the art
> because it can get into states that produce long sequences that are
> not very random.
>
> There is a variation on MT called WELL that has better properties
> in this regard. Does anyone think it would be a good idea to replace
> MT with WELL as Python's default rng?
>
> https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear

There was a side-discussion about this during the secrets module proposal
discussion.

WELL would not be my first choice. It escapes the excess-0 islands faster than
MT, but still suffers from them. More troubling to me is that it is a linear
feedback shift register, like MT, and all LFSRs quickly fail the linear
complexity test in BigCrush.

xorshift* shares some of these flaws, but is significantly stronger and
dominates WELL in most (all?) relevant dimensions.

http://xorshift.di.unimi.it/

I'm favorable to the PCG family these days, though xorshift* and Random123 are
reasonable alternatives.

http://www.pcg-random.org/
https://www.deshawresearch.com/resources_random123.html

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Georg Brandl

unread,

Feb 11, 2016, 7:15:56 AM2/11/16

to pytho...@python.org

On 02/11/2016 11:17 AM, Serhiy Storchaka wrote:

>> **Group 3: only between digits, only one underscore**
>>
>> * Ada [8]_
>> * Julia (but not in the exponent part of floats) [9]_
>> * Ruby (docs say "anywhere", in reality only between digits) [10]_
>
> C++ is in this group too.
>
> The documentation of Perl explicitly says that Perl is in this group too
> (23__500 is not legal). Perhaps there is a bug in Perl implementation.
> And may be Swift is intended to be in this group.
>
> I think we should follow the majority of languages and use simple rule:
> "only between digits".
>
> I have provided an implementation.

Thanks for the alternate patch. I used the two-function approach you took
in ast.c for my latest revision.

I still think that some cases (like two of the examples in the PEP,
0b_1111_0000 and 1.5_j) are worth having, and therefore a more relaxed
rule is preferable.

cheers,
Georg

Barry Warsaw

unread,

Feb 11, 2016, 9:52:48 AM2/11/16

to pytho...@python.org

On Feb 11, 2016, at 09:22 AM, Georg Brandl wrote:

>based on the feedback so far, I revised the PEP. There is now
>a much simpler rule for allowed underscores, with no exceptions.
>This made the grammar simpler as well.

I'd be +1, but there's something missing from the PEP: what the underscores
*mean*. You describe the syntax nicely, but not the semantics.

From reading the examples, I'd guess that the underscores are semantically
transparent, meaning that the resulting value is the same if you just removed
the underscores and interpreted the resulting literal.

Right or wrong, could you please add a paragraph explaining the meaning of the
underscores?

Cheers,
-Barry

Andrew Barnert via Python-Dev

unread,

Feb 11, 2016, 11:37:36 AM2/11/16

to Steven D'Aprano, pytho...@python.org

On Feb 11, 2016, at 02:13, Steven D'Aprano <st...@pearwood.info> wrote:
>
>> On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote:

>> They're both presented as something the syntax allows, and neither one
>> looks like something I'd ever want to write, much less promote in a
>> style guide or something, but neither one screams out as something
>> that's so heinous we need to complicate the language to ensure it
>> raises a SyntaxError. Yes, that's my opinion, but do.you really have a
>> different opinion about any part of that?
>
> I don't think the rule "underscores must occur between digits" is
> complicating the specification.

That rule isn't in the specification in the PEP, except as one of the alternatives rejected for being "too restrictive". It's also not the rule you were suggesting in your previous email, arguing where you insisted that you wanted something "more liberal". I also don't understand why you're presenting this whole thing as an argument against my response, which was suggesting that whatever rule we choose should be simpler than what's in the PEP, when that's also (apparently, now) your position.

> It is *less* complicated to explain this
> rule than to give a whole lot of special cases

Sure. Your rule is about as complicated as the Swift rule, and both are much less complicated than the PEP. I'm fine with either one, because, as I said, the edge cases don't matter to me nearly as much as having a rule that's easy to keep it my head and easy to lex. The only reason I specifically proposed the Swift rule instead of one of the other simple rules is that it seemed the most "liberal", which the PEP was in favor of, and and it has precedent in more other languages. But, in favor of your version, almost every language uses some variation of "you can put underscores between digits" as the "tutorial-level" explanation and rationale.

Steve Dower

unread,

Feb 11, 2016, 11:53:59 AM2/11/16

to pytho...@python.org

On 11Feb2016 0651, Barry Warsaw wrote:
> On Feb 11, 2016, at 09:22 AM, Georg Brandl wrote:
>
>> based on the feedback so far, I revised the PEP. There is now
>> a much simpler rule for allowed underscores, with no exceptions.
>> This made the grammar simpler as well.
>
> I'd be +1, but there's something missing from the PEP: what the underscores
> *mean*. You describe the syntax nicely, but not the semantics.
>
> From reading the examples, I'd guess that the underscores are semantically
> transparent, meaning that the resulting value is the same if you just removed
> the underscores and interpreted the resulting literal.
>
> Right or wrong, could you please add a paragraph explaining the meaning of the
> underscores?

Glad I kept reading the thread this far - just pretend I also wrote
exactly the same thing as Barry.

Cheers,
Steve

Andrew Barnert via Python-Dev

unread,

Feb 11, 2016, 11:57:27 AM2/11/16

to Georg Brandl, pytho...@python.org

On Feb 11, 2016, at 00:22, Georg Brandl <g.br...@gmx.net> wrote:

Allowing underscores in string arguments to the ``Decimal`` constructor. It
could be argued that these are akin to literals, since there is no Decimal
literal available (yet).

I'm +1 on this. Partly for consistency (see below)--but also, one of the use cases for Decimal is when you need more precision than float, meaning you'll often have even more digits to separate.

* Allowing underscores in string arguments to ``int()`` with base argument 0,
``float()`` and ``complex()``.

+1, because these are actually defined in terms of literals. For example, under int, "Base 0 means to interpret exactly as a code literal". This isn't actually quite true, because "-2" is not an integer literal but is accepted here--but see float for an example that *is* rigorously defined, and still defers to literal syntax and semantics.

Georg Brandl

unread,

Feb 11, 2016, 11:59:02 AM2/11/16

to pytho...@python.org

On 02/11/2016 05:52 PM, Steve Dower wrote:
> On 11Feb2016 0651, Barry Warsaw wrote:
>> On Feb 11, 2016, at 09:22 AM, Georg Brandl wrote:
>>
>>> based on the feedback so far, I revised the PEP. There is now
>>> a much simpler rule for allowed underscores, with no exceptions.
>>> This made the grammar simpler as well.
>>
>> I'd be +1, but there's something missing from the PEP: what the underscores
>> *mean*. You describe the syntax nicely, but not the semantics.
>>
>> From reading the examples, I'd guess that the underscores are semantically
>> transparent, meaning that the resulting value is the same if you just removed
>> the underscores and interpreted the resulting literal.
>>
>> Right or wrong, could you please add a paragraph explaining the meaning of the
>> underscores?
>
> Glad I kept reading the thread this far - just pretend I also wrote
> exactly the same thing as Barry.

D'oh :) I added (hopefully) clarifying wording.

Thanks,
Georg

Barry Warsaw

unread,

Feb 11, 2016, 12:07:40 PM2/11/16

to pytho...@python.org

On Feb 11, 2016, at 05:57 PM, Georg Brandl wrote:

>D'oh :) I added (hopefully) clarifying wording.

I saw the diff - perfect! Thanks.

-Barry

Serhiy Storchaka

unread,

Feb 11, 2016, 12:20:24 PM2/11/16

to pytho...@python.org

On 11.02.16 14:14, Georg Brandl wrote:
> On 02/11/2016 11:17 AM, Serhiy Storchaka wrote:
>
>>> **Group 3: only between digits, only one underscore**
>>>
>>> * Ada [8]_
>>> * Julia (but not in the exponent part of floats) [9]_
>>> * Ruby (docs say "anywhere", in reality only between digits) [10]_
>>
>> C++ is in this group too.
>>
>> The documentation of Perl explicitly says that Perl is in this group too
>> (23__500 is not legal). Perhaps there is a bug in Perl implementation.
>> And may be Swift is intended to be in this group.
>>
>> I think we should follow the majority of languages and use simple rule:
>> "only between digits".
>>
>> I have provided an implementation.
>
> Thanks for the alternate patch. I used the two-function approach you took
> in ast.c for my latest revision.
>
> I still think that some cases (like two of the examples in the PEP,
> 0b_1111_0000 and 1.5_j) are worth having, and therefore a more relaxed
> rule is preferable.

Should I write an alternative PEP for strong rule?

Terry Reedy

unread,

Feb 11, 2016, 12:40:27 PM2/11/16

to pytho...@python.org

On 2/11/2016 2:45 AM, Georg Brandl wrote:

Thanks for grabbing this issue and moving it forward. I will like being
about to write or read 200_000_000 and be sure I an right without
counting 0s.

> Based on the feedback so far, I have an easier rule in mind that I will base
> the next PEP revision on. It's basically
>
> "One ore more underscores allowed anywhere after a digit or a base specifier."
>
> This preserves my preferred non-restrictive cases (0b_1111_0000, 1.5_j) and
> disallows more controversial versions like "1.5e_+_2".

I like both choices above. I don't like trailing underscores for two
reasons.

1. The stated purpose of adding '_'s is to visually separate. Trailing
underscores do not do that. They serve no purpose.
2. Trailing _s are used to turn keywords (class) into identifiers
(class_). To me, 123_ mentally clashes with this usage.

If trailing _ is allowed, to simplify the implementation, I would like
PEP 8, while on the subject, to say something like "While trailing _s on
numbers are allowed, to simplify the implementation, they serve no
purpose and are strongly discouraged".

--
Terry Jan Reedy

Georg Brandl

unread,

Feb 11, 2016, 12:41:38 PM2/11/16

to pytho...@python.org

On 02/11/2016 06:19 PM, Serhiy Storchaka wrote:

>> Thanks for the alternate patch. I used the two-function approach you took
>> in ast.c for my latest revision.
>>
>> I still think that some cases (like two of the examples in the PEP,
>> 0b_1111_0000 and 1.5_j) are worth having, and therefore a more relaxed
>> rule is preferable.
>
> Should I write an alternative PEP for strong rule?

That seems excessive for a minor point. Let's collect feedback for
a few days, and we can also collect some informal votes.

In the end, I suspect that Guido will let us know about his preference for
one of the possibilities, and when he does, I will update the PEP accordingly.

cheers,
Georg

Ethan Furman

unread,

Feb 11, 2016, 12:43:41 PM2/11/16

to pytho...@python.org

On 02/11/2016 09:19 AM, Serhiy Storchaka wrote:
> On 11.02.16 14:14, Georg Brandl wrote:

>> I still think that some cases (like two of the examples in the PEP,
>> 0b_1111_0000 and 1.5_j) are worth having, and therefore a more relaxed
>> rule is preferable.
>
> Should I write an alternative PEP for strong rule?

Please don't.

A style guide recommendation which allows for variations when necessary
is much better -- consenting adults, remember?

--
~Ethan~

Brett Cannon

unread,

Feb 11, 2016, 1:05:27 PM2/11/16

to Steven D'Aprano, pytho...@python.org

On Thu, 11 Feb 2016 at 02:13 Steven D'Aprano <st...@pearwood.info> wrote:

On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote:

> And honestly, are you really claiming that in your opinion, "123_456_"
> is worse than all of their other examples, like "1_23__4"?

Yes I am, because 123_456_ looks like you've forgotten to finish typing
the last group of digits, while 1_23__4 merely looks like you have no
taste.

OK, but the keyword in your sentence is "taste". If we update PEP 8 for our needs to say "Numerical literals should not have multiple underscores in a row or have a trailing underscore" then this is taken care of. We get a dead-simple rule for when underscores can be used, the implementation is simple, and we get to have more tasteful usage in the stdlib w/o forcing our tastes upon everyone or complicating the rules or implementation.

Brett Cannon

unread,

Feb 11, 2016, 1:08:15 PM2/11/16

to Georg Brandl, pytho...@python.org

+1 from me. Nice and simple! And we can always update PEP 8 do disallow any usage that we deem ugly.

Andrew Barnert via Python-Dev

unread,

Feb 11, 2016, 1:17:29 PM2/11/16

to Terry Reedy, pytho...@python.org

On Feb 11, 2016, at 09:39, Terry Reedy <tjr...@udel.edu> wrote:
>
> If trailing _ is allowed, to simplify the implementation, I would like PEP 8, while on the subject, to say something like "While trailing _s on numbers are allowed, to simplify the implementation, they serve no purpose and are strongly discouraged".

That's a good point: we need style rules for PEP 8.

But I think everything that's just obviously pointless (like putting an underscore between every pair of digits, or sprinkling underscores all over a huge number to make ASCII art), or already handled by other guidelines (e.g., using a ton of underscores to "line up a table" is the same as using a ton of spaces, which is already discouraged) doesn't really need to be covered. And I think trailing underscores probably fall into that category.

It might be simpler to write a "whitelist" than a "blacklist" of all the ugly things people might come up with, and then just give a bunch of examples instead of a bunch of rules. Something like this:

While underscores can legally appear anywhere in the digit string, you should never use them for purposes other than visually separating meaningful digit groups like thousands, bytes, and the like.

123456_789012: ok (millions are groups, but thousands are more common, and 6-digit groups are readable, but on the edge)
123_456_789_012: better
123_456_789_012_: bad (trailing)
1_2_3_4_5_6: bad (too many)
1234_5678: ok if code is intended to deal with east-Asian numerals (where 10000 is a standard grouping), bad otherwise
3__141_592_654: ok if this represents a fixed-point fraction (obviously bad otherwise)
123.456_789e123: good
123.456_789e1_23: bad (never useful in exponent)
0x1234_5678: good
0o123_456: good
0x123_456_789: bad (3 hex digits is usually not a meaningful group)

The one case that seems contentious is "123_456_j". Honestly, I don't care which way that goes, and I'd be fine if the PEP left out any mention of it, but if people feel strongly one way or the other, the PEP could just give it as a good or a bad example and that would be enough to clarify the intention.

Serhiy Storchaka

unread,

Feb 11, 2016, 1:30:23 PM2/11/16

to pytho...@python.org

On 11.02.16 19:40, Georg Brandl wrote:
> On 02/11/2016 06:19 PM, Serhiy Storchaka wrote:
>
>>> Thanks for the alternate patch. I used the two-function approach you took
>>> in ast.c for my latest revision.
>>>
>>> I still think that some cases (like two of the examples in the PEP,
>>> 0b_1111_0000 and 1.5_j) are worth having, and therefore a more relaxed
>>> rule is preferable.
>>
>> Should I write an alternative PEP for strong rule?
>
> That seems excessive for a minor point. Let's collect feedback for
> a few days, and we can also collect some informal votes.

I suspect that my arguments can be lost otherwise.

Andrew Barnert via Python-Dev

unread,

Feb 11, 2016, 1:32:41 PM2/11/16

to Andrew Barnert, pytho...@python.org, Terry Reedy

On Feb 11, 2016, at 10:15, Andrew Barnert via Python-Dev <pytho...@python.org> wrote:
>
> That's a good point: we need style rules for PEP 8.

One more point: should the tutorial mention underscores? It looks like the intro docs for a lot of the other languages do. And it would only take one short sentence in 3.1.1 Numbers to say that you can use underscores to make large numbers like 123_456.789_012 more readable.

Jeff Hardy

unread,

Feb 11, 2016, 1:36:49 PM2/11/16

to Andrew Barnert, pytho...@python.org, Terry Reedy

On Thu, Feb 11, 2016 at 10:15 AM, Andrew Barnert via Python-Dev <pytho...@python.org> wrote:

On Feb 11, 2016, at 09:39, Terry Reedy <tjr...@udel.edu> wrote:
>
> If trailing _ is allowed, to simplify the implementation, I would like PEP 8, while on the subject, to say something like "While trailing _s on numbers are allowed, to simplify the implementation, they serve no purpose and are strongly discouraged".

That's a good point: we need style rules for PEP 8.

But I think everything that's just obviously pointless (like putting an underscore between every pair of digits, or sprinkling underscores all over a huge number to make ASCII art), or already handled by other guidelines (e.g., using a ton of underscores to "line up a table" is the same as using a ton of spaces, which is already discouraged) doesn't really need to be covered. And I think trailing underscores probably fall into that category.

It might be simpler to write a "whitelist" than a "blacklist" of all the ugly things people might come up with, and then just give a bunch of examples instead of a bunch of rules. Something like this:

While underscores can legally appear anywhere in the digit string, you should never use them for purposes other than visually separating meaningful digit groups like thousands, bytes, and the like.

123456_789012: ok (millions are groups, but thousands are more common, and 6-digit groups are readable, but on the edge)
123_456_789_012: better
123_456_789_012_: bad (trailing)
1_2_3_4_5_6: bad (too many)
1234_5678: ok if code is intended to deal with east-Asian numerals (where 10000 is a standard grouping), bad otherwise
3__141_592_654: ok if this represents a fixed-point fraction (obviously bad otherwise)
123.456_789e123: good
123.456_789e1_23: bad (never useful in exponent)
0x1234_5678: good
0o123_456: good
0x123_456_789: bad (3 hex digits is usually not a meaningful group)

The one case that seems contentious is "123_456_j". Honestly, I don't care which way that goes, and I'd be fine if the PEP left out any mention of it, but if people feel strongly one way or the other, the PEP could just give it as a good or a bad example and that would be enough to clarify the intention.

I imagine that for whatever "bad" grouping you can suggest, someone, somewhere, has a legitimate reason to use it. Any rule more complex than "Use underscores in numeric literals only when the improve clarity" is unnecessarily prescriptive.

- Jeff

Serhiy Storchaka

unread,

Feb 11, 2016, 1:51:15 PM2/11/16

to pytho...@python.org

On 11.02.16 10:22, Georg Brandl wrote:
> Abstract and Rationale
> ======================
>
> This PEP proposes to extend Python's syntax so that underscores can be used in
> integral, floating-point and complex number literals.
>
> This is a common feature of other modern languages, and can aid readability of
> long literals, or literals whose value should clearly separate into parts, such
> as bytes or words in hexadecimal notation.

I have strong preference for more strict and simpler rule, used by most
other languages -- "only between two digits". Main arguments:

1. Simple rule is easier to understand, remember and recognize. I care
not about the complexity of the implementation (there is no large
difference), but about cognitive complexity.

2. Most languages use this rule. It is better to follow non-formal
standard that invent the rule that differs from rules in every other
language. This will help programmers that use multiple languages.

I have provided an alternative patch and can provide an alternative PEP
if it is needed.

> The production list for integer literals would therefore look like this::
>
> integer: decimalinteger | octinteger | hexinteger | bininteger
> decimalinteger: nonzerodigit (digit | "_")* | "0" ("0" | "_")*
> nonzerodigit: "1"..."9"
> digit: "0"..."9"
> octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")*

octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)*

> hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")*

hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)*

> bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")*

bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)*

> octdigit: "0"..."7"
> hexdigit: digit | "a"..."f" | "A"..."F"
> bindigit: "0" | "1"
>
> For floating-point and complex literals::
>
> floatnumber: pointfloat | exponentfloat
> pointfloat: [intpart] fraction | intpart "."
> exponentfloat: (intpart | pointfloat) exponent
> intpart: digit (digit | "_")*

intpart: digit (["_"] digit)*

> fraction: "." intpart
> exponent: ("e" | "E") ["+" | "-"] intpart
> imagnumber: (floatnumber | intpart) ("j" | "J")

> **Group 1: liberal**
>
> This group is the least homogeneous: the rules vary slightly between languages.
> All of them allow trailing underscores. Some allow underscores after non-digits
> like the ``e`` or the sign in exponents.
>
> * D [2]_
> * Perl 5 (underscores basically allowed anywhere, although docs say it's more
> restricted) [3]_
> * Rust (allows between exponent sign and digits) [4]_

> * Swift (although textual description says "between digits") [5]_
>
> **Group 2: only between digits, multiple consecutive underscores**
>
> * C# (open proposal for 7.0) [6]_
> * Java [7]_
>

> **Group 3: only between digits, only one underscore**
>
> * Ada [8]_
> * Julia (but not in the exponent part of floats) [9]_
> * Ruby (docs say "anywhere", in reality only between digits) [10]_

This classification is misleading. The difference between groups 2 and 3
is less then between different languages in group 1. To be fair, groups
2 and 3 should be united in one group. C++ should be included in this
group. Perl 5 and Swift should be either included in both groups or
excluded from any group, because they have inconsistencies between the
documentation and the implementation or between different parts of the
documentation.

With correct classification it is obvious what variant is the most popular.

Ethan Furman

unread,

Feb 11, 2016, 2:01:53 PM2/11/16

to pytho...@python.org

On 02/11/2016 10:50 AM, Serhiy Storchaka wrote:
> I have strong preference for more strict and simpler rule, used by
> most other languages -- "only between two digits". Main arguments:

> 2. Most languages use this rule. It is better to follow non-formal
> standard that invent the rule that differs from rules in every other
> language. This will help programmers that use multiple languages.

If Python followed other languages in everything:

1) Python would not need to exist; and
2) Python would suck ;)

If our rule is more permissive that other languages then cross-language
developers can still use the same style in both languages, without
penalizing those who want to use the extra freedom in Python.

--
~Ethan~

Glenn Linderman

unread,

Feb 11, 2016, 2:10:00 PM2/11/16

to pytho...@python.org

On 2/11/2016 11:01 AM, Ethan Furman wrote:

On 02/11/2016 10:50 AM, Serhiy Storchaka wrote:
> I have strong preference for more strict and simpler rule, used by
> most other languages -- "only between two digits". Main arguments:

> 2. Most languages use this rule. It is better to follow non-formal
> standard that invent the rule that differs from rules in every other
> language. This will help programmers that use multiple languages.

If Python followed other languages in everything:

1) Python would not need to exist; and
2) Python would suck ;)

If our rule is more permissive that other languages then cross-language developers can still use the same style in both languages, without penalizing those who want to use the extra freedom in Python.

Ditto.

If people need an idea to shoot down, regarding literal constants, and because I couldn't find a Python-Non-Ideas list to post this in, here is one. Note that it is unambiguous, does not conflict with existing binary literals, but otherwise sucks. Please vote this idea down with emphasis:

Base 64 decoding literals:

print( 0b64_CjMy_NTM0_Mjkw_NQ )
325342905

Glenn Linderman

unread,

Feb 11, 2016, 2:12:11 PM2/11/16

to pytho...@python.org

On 2/11/2016 12:22 AM, Georg Brandl wrote:

Hey all,

based on the feedback so far, I revised the PEP.  There is now
a much simpler rule for allowed underscores, with no exceptions.
This made the grammar simpler as well.

+1 overall

Examples::

    # grouping decimal numbers by thousands
    amount = 10_000_000.0

    # grouping hexadecimal addresses by words
    addr = 0xDEAD_BEEF

    # grouping bits into bytes in a binary literal

nybbles, not bytes, is shown... which is more readable, and does group into bytes also.

    flags = 0b_0011_1111_0100_1110

+1 on 0b_ and 0X_ and, especially, 0O_ (but why anyone would use uppercase base designators is beyond me, as it is definitely less readable)

    # making the literal suffix stand out more
    imag = 1.247812376e-15_j

+1 on _j

Steven D'Aprano

unread,

Feb 11, 2016, 6:05:22 PM2/11/16

to pytho...@python.org

On Thu, Feb 11, 2016 at 08:50:09PM +0200, Serhiy Storchaka wrote:

> I have strong preference for more strict and simpler rule, used by most
> other languages -- "only between two digits". Main arguments:
>
> 1. Simple rule is easier to understand, remember and recognize. I care
> not about the complexity of the implementation (there is no large
> difference), but about cognitive complexity.
>
> 2. Most languages use this rule. It is better to follow non-formal
> standard that invent the rule that differs from rules in every other
> language. This will help programmers that use multiple languages.
>
> I have provided an alternative patch and can provide an alternative PEP
> if it is needed.

I don't think an alternative PEP is needed, but I hope that your
alternative gets a fair treatment in the PEP.

To me, Serhiy's versions (starting with single > symbols) are not only
simpler to learn, but have a simpler (or at least shorter)
implementation too.

[...]

> >**Group 3: only between digits, only one underscore**
> >
> >* Ada [8]_
> >* Julia (but not in the exponent part of floats) [9]_
> >* Ruby (docs say "anywhere", in reality only between digits) [10]_
>
> This classification is misleading. The difference between groups 2 and 3
> is less then between different languages in group 1. To be fair, groups
> 2 and 3 should be united in one group. C++ should be included in this
> group. Perl 5 and Swift should be either included in both groups or
> excluded from any group, because they have inconsistencies between the
> documentation and the implementation or between different parts of the
> documentation.
>
> With correct classification it is obvious what variant is the most popular.

It is not obvious to me what you think the correct classification is.

If you disagree with Georg's classification, would you reclassify the
languages, and if there is agreement that you are correct, he can update
the PEP?

--
Steve

Steven D'Aprano

unread,

Feb 11, 2016, 7:19:18 PM2/11/16

to pytho...@python.org

On Thu, Feb 11, 2016 at 06:03:34PM +0000, Brett Cannon wrote:
> On Thu, 11 Feb 2016 at 02:13 Steven D'Aprano <st...@pearwood.info> wrote:
>
> > On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote:
> >
> > > And honestly, are you really claiming that in your opinion, "123_456_"
> > > is worse than all of their other examples, like "1_23__4"?
> >
> > Yes I am, because 123_456_ looks like you've forgotten to finish typing
> > the last group of digits, while 1_23__4 merely looks like you have no
> > taste.
> >
>
> OK, but the keyword in your sentence is "taste".

I disagree. The key *idea* in my sentence is that the trailing
underscore looks like a programming error. In my opinion, avoiding that
impression is important enough to make trailing underscores a syntax
error.

I've seen a few people vote +1 for things like 123_j and 1.23_e99, but I
haven't seen anyone in favour of trailing underscores. Does anyone think
there is a good case for allowing trailing underscores?

> If we update PEP 8 for our
> needs to say "Numerical literals should not have multiple underscores in a
> row or have a trailing underscore" then this is taken care of. We get a
> dead-simple rule for when underscores can be used, the implementation is
> simple, and we get to have more tasteful usage in the stdlib w/o forcing
> our tastes upon everyone or complicating the rules or implementation.

I think this is a misrepresentation of the alternative. As I see it, we
have two alternatives:

- one or more underscores can appear AFTER the base specifier or any digit;
- one or more underscores can appear BETWEEN two digits.

To describe the second alternative as "complicating the rules" is, I
think, grossly unfair. And if Serhiy's proposal is correct, the
implementation is also no more complicated:

# underscores after digits

octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")*

hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")*

bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")*

# underscores between digits

octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)*

hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)*

bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)*

The idea that the second alternative "forc[es] our tastes on everyone"
while the first does not is bogus. The first alternative also prohibits
things which are a matter of taste:

# prohibited in both alternatives
0_xDEADBEEF
0._1234
1.2e_99
-_1
1j_

I think that there is broad agreement that:

- the basic idea is sound
- leading underscores followed by digits are currently legal
identifiers and this will not change
- underscores should not follow the sign - +
- underscores should not follow the decimal point .
- underscores should not follow the exponent e|E
- underscores will not be permitted inside the exponent (even if
it is harmless, it's silly to write 1.2e9_9)
- underscores should not follow the complex suffix j

and only minor disagreement about:

- whether or not underscores will be allowed after the base
specifier 0x 0o 0b
- whether or not underscores will be allowed before the decimal
point, exponent and complex suffix.

Can we have a show of hands, in favour or against the above two? And
then perhaps Guido can rule on this one way or the other and we can get
back to arguing about more important matters? :-)

In case it isn't obvious, I prefer to say No to allowing underscores
after the base specifier, or before the decimal point, exponent and
complex suffix.

Martin Panter

unread,

Feb 11, 2016, 7:59:58 PM2/11/16

to python-dev

On 11 February 2016 at 11:12, Chris Angelico <ros...@gmail.com> wrote:
> On Thu, Feb 11, 2016 at 7:22 PM, Georg Brandl <g.br...@gmx.net> wrote:

The following extensions are open for discussion:

>> * Allowing underscores in string arguments to the ``Decimal`` constructor. It
>> could be argued that these are akin to literals, since there is no Decimal
>> literal available (yet).
>>
>> * Allowing underscores in string arguments to ``int()`` with base argument 0,
>> ``float()`` and ``complex()``.
>
> I'm -0.5 on both of these, with the caveat that if either gets done,
> both should be. Decimal() shouldn't be different from int() just
> because there's currently no way to express a Decimal literal; if
> Python 3.7 introduces such a literal, there'd be this weird rule
> difference that has to be maintained for backward compatibility, and
> has no justification left.

I would be weakly in favour of all relevant constructors being updated
to match the new syntax. The main reason is just consistency, and that
the documentation already kind of guarantees that the literal syntax
is supported (definitely for int and float; for complex it is too
vague).

To be consistent, the following minor extensions of the syntax should
be allowed, which are not legal Python literals: int("0_001"),
int("J_00", 20), float("0_001"), complex("0_001").

Maybe also with non-ASCII digits. However I tried writing Arabic-Indic
digits (U+0600 etc) and my web browser split the number apart when I
inserted an underscore. Maybe a right-to-left thing. But using
Devangari digits U+0966, U+0967: int("१_०००") (= 1_000). Non-ASCII
digits are apparently intentionally supported, but not documented:
<https://bugs.python.org/issue10581>.

> (As a side point, I would be fully in favour of Decimal literals. I'd
> also be in favour of something like "from __future__ import
> fraction_literals" so 1/2 would evaluate to Fraction(1,2) rather than
> 0.5. Hence I'm inclined *not* to support underscores in Decimal().)

Seems more like an argument to have the support in Decimal()
consistent with float() etc, i.e. all or nothing.

Martin Panter

unread,

Feb 11, 2016, 8:30:32 PM2/11/16

to Steven D'Aprano, python-dev

+1

> - one or more underscores can appear BETWEEN two digits.

-0

Having underscores between digits is the main usage, but I don’t see
much harm in the more liberal version, unless it that makes the
specification or implementation too complex. Allowing stuff like
0x_100, 4.7_e3, and 1_j seems of slightly more benefit IMO than
disallowing 1_000_.

> To describe the second alternative as "complicating the rules" is, I
> think, grossly unfair. And if Serhiy's proposal is correct, the
> implementation is also no more complicated:
>
> # underscores after digits
> octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")*
> hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")*
> bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")*
>
> # underscores between digits
> octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)*
> hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)*
> bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)*
>
>
> The idea that the second alternative "forc[es] our tastes on everyone"
> while the first does not is bogus. The first alternative also prohibits
> things which are a matter of taste:
>
> # prohibited in both alternatives
> 0_xDEADBEEF
> 0._1234
> 1.2e_99
> -_1

This one is already a valid variable identifier name.

> 1j_
>
>
> I think that there is broad agreement that:
>
> - the basic idea is sound
> - leading underscores followed by digits are currently legal
> identifiers and this will not change

+1 to both

> - underscores should not follow the sign - +
> - underscores should not follow the decimal point .
> - underscores should not follow the exponent e|E

No strong opinion on these from me

> - underscores will not be permitted inside the exponent (even if
> it is harmless, it's silly to write 1.2e9_9)

-0, it seems like a needless inconsistency, unless it somehow hurts
the implementation

> - underscores should not follow the complex suffix j

No opinion

> and only minor disagreement about:
>
> - whether or not underscores will be allowed after the base
> specifier 0x 0o 0b

+0

> - whether or not underscores will be allowed before the decimal
> point, exponent and complex suffix.

No opinion about directly before decimal point; +0 before exponent or
imaginary (complex) suffix.

> Can we have a show of hands, in favour or against the above two? And
> then perhaps Guido can rule on this one way or the other and we can get
> back to arguing about more important matters? :-)
>
> In case it isn't obvious, I prefer to say No to allowing underscores
> after the base specifier, or before the decimal point, exponent and
> complex suffix.

Andrew Barnert via Python-Dev

unread,

Feb 11, 2016, 8:40:12 PM2/11/16

to Jeff Hardy, pytho...@python.org, Terry Reedy

On Thursday, February 11, 2016 10:35 AM, Jeff Hardy <jdh...@gmail.com> wrote:

>On Thu, Feb 11, 2016 at 10:15 AM, Andrew Barnert via Python-Dev <pytho...@python.org> wrote:
>
>>That's a good point: we need style rules for PEP 8.

...

>>It might be simpler to write a "whitelist" than a "blacklist" of all the ugly things people might come up with, and then just give a bunch of examples instead of a bunch of rules. Something like this:
>>
>>While underscores can legally appear anywhere in the digit string, you should never use them for purposes other than visually separating meaningful digit groups like thousands, bytes, and the like.
>>
>> 123456_789012: ok (millions are groups, but thousands are more common, and 6-digit groups are readable, but on the edge)
>> 123_456_789_012: better
>> 123_456_789_012_: bad (trailing)
>> 1_2_3_4_5_6: bad (too many)
>> 1234_5678: ok if code is intended to deal with east-Asian numerals (where 10000 is a standard grouping), bad otherwise
>> 3__141_592_654: ok if this represents a fixed-point fraction (obviously bad otherwise)
>> 123.456_789e123: good
>> 123.456_789e1_23: bad (never useful in exponent)
>> 0x1234_5678: good
>> 0o123_456: good
>> 0x123_456_789: bad (3 hex digits is usually not a meaningful group)
>

>I imagine that for whatever "bad" grouping you can suggest, someone, somewhere, has a legitimate reason to use it.

That's exactly why we should just have bad examples in the style guide, rather than coming up with style rules that try to strongly discourage them (or making them syntax errors).

>Any rule more complex than "Use underscores in numeric literals only when the improve clarity" is unnecessarily prescriptive.

Your rule doesn't need to be stated at all. It's already a given that you shouldn't add semantically-meaningless characters anywhere unless they improve clarity....

I don't think saying that they're for "visually separating meaningful digit groups like thousands, bytes, and the like" is unnecessarily prescriptive. If someone comes up with a legitimate use for something we've never anticipated, it will almost certainly just be a way of grouping digits that's meaningful in a way we didn't anticipate. And, if not, it's just a style guideline, so it doesn't have to apply 100% of the time. If someone really comes up with something that has nothing to do with grouping digits, all the style guideline will do is make them stop and think about whether it really is a good use of underscores--and, if it is, they'll go ahead and do it.

Glenn Linderman

unread,

Feb 11, 2016, 9:03:45 PM2/11/16

to pytho...@python.org

# underscores after digits
octinteger: "0" ("o" | "O") (octdigit | "_")*
hexinteger: "0" ("x" | "X") (hexdigit | "_")*
bininteger: "0" ("b" | "B") (bindigit | "_")*

An extra side effect is that there are more ways to write zero. 0x, 0b, 0o, 0X, 0B, 0O, 0x_, 0b_, 0o_, etc.
But most people write 0 anyway, so those would be bad style, anyway, but it makes the implementation simpler.

+1 to allow underscores after the base specifier.

- whether or not underscores will be allowed before the decimal 
  point, exponent and complex suffix.

+1 to allow them. There may be cases where they are useful, and if it is not useful, it would not be used. I really liked someone's style guide proposal: use of underscore within numeric constants should only be done to aid readability. However, pre-judging what aids readability to one person's particular taste is inappropriate.

Can we have a show of hands, in favour or against the above two? And 
then perhaps Guido can rule on this one way or the other and we can get 
back to arguing about more important matters? :-)

In case it isn't obvious, I prefer to say No to allowing underscores 
after the base specifier, or before the decimal point, exponent and 
complex suffix.

I think it was obvious :) And I think we disagree. And yes, there are more important matters. But it was just a couple days ago when I wrote a big constant in some new code that I was thinking how nice it would be if I could put a delimiter in there... so I'll be glad for the feature when it is available.

Stephen J. Turnbull

unread,

Feb 11, 2016, 10:18:33 PM2/11/16

to Serhiy Storchaka, pytho...@python.org

Serhiy Storchaka writes:

> I suspect that my arguments can be lost [without a competing PEP].

Send Georg a patch for his PEP, that's where they belong, since only
one of the two PEPs could be approved, and they would be 95% the same
otherwise. If he doesn't apply it (he's allowed to move it to the
"rejected arguments" section, though), or the decision silently goes
against you, speak up then -- that would be a problem IMO.

Or you could offer to BD1P! (If you're selected, I hope you change
your mind! :-)

Stephen J. Turnbull

unread,

Feb 11, 2016, 10:20:16 PM2/11/16

to Steven D'Aprano, pytho...@python.org

Steven D'Aprano writes:

> Peters has an opinion?) but if we do change, I'd like to see the
> existing random.Random moved to random.MT_Random for backwards
> compatibility and compatibility with other software which uses MT. Not
> necessarily saying that we have to keep it around forever (after all, we
> did dump the Wichmann-Hill PRNG some time ago) but we ought to keep it
> for at least a couple of releases.

I think we should keep it around forever. Even my slowest colleagues
are learning that they should record their seeds and PRNG algorithms
for reproducibility's sake. :-) For that matter, restore Wichmann-Hill.
Both should be clearly marked as "use only for reproducing previous
bitstreams" (eg, in a package random.deprecated_generators).

David Mertz

unread,

Feb 11, 2016, 10:57:46 PM2/11/16

to Glenn Linderman, Python-Dev

Great PEP overall. We definitely don't want the restriction to grouping numbers only in threes. South Asian crore use grouping in twos.

https://en.m.wikipedia.org/wiki/Crore

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/mertz%40gnosis.cx

Glenn Linderman

unread,

Feb 11, 2016, 11:10:28 PM2/11/16

to David Mertz, Python-Dev

On 2/11/2016 7:56 PM, David Mertz wrote:

Great PEP overall. We definitely don't want the restriction to grouping numbers only in threes. South Asian crore use grouping in twos.

https://en.m.wikipedia.org/wiki/Crore

Interesting... 3 digits in the least significant group, and _then_ by twos. Wouldn't have predicted that one! Never bumped into that notation before!

Andrew Barnert via Python-Dev

unread,

Feb 11, 2016, 11:14:12 PM2/11/16

to Stephen J. Turnbull, Steven D'Aprano, pytho...@python.org

On Thursday, February 11, 2016 7:20 PM, Stephen J. Turnbull <ste...@xemacs.org> wrote:

> I think we should keep it around forever. Even my slowest colleagues
> are learning that they should record their seeds and PRNG algorithms
> for reproducibility's sake. :-)

+1

> For that matter, restore Wichmann-Hill.

So you can write code that works on 2.3 and 3.6, but not 3.5?

I agree that it shouldn't have gone away, but I think it may be too late for adding it back to help too much.

> Both should be clearly marked as "use only for reproducing previous
> bitstreams" (eg, in a package random.deprecated_generators).

I like the random.deprecated_generators idea.

Tim Peters

unread,

Feb 11, 2016, 11:16:15 PM2/11/16

to Greg Ewing, Python Dev

[Greg Ewing <greg....@canterbury.ac.nz>]
> The Mersenne Twister is no longer regarded as quite state-of-the art
> because it can get into states that produce long sequences that are
> not very random.
>
> There is a variation on MT called WELL that has better properties
> in this regard. Does anyone think it would be a good idea to replace
> MT with WELL as Python's default rng?

I don't think so, because I've seen no groundswell of discontent about
the Twister among Python users. Perhaps I'm missing some? Changes
are disruptive and people argue about RNGs with religious zeal, so I
favor making a change in this area only when it's compelling. It was
compelling to move away from Wichmann-Hill when the Twister was
introduced: WH was waaaaaay behind the state of the art at the time,
its limitations were causing real problems, and there was
near-universal adoption of the Twister around the world. The Twister
was a game changer.

When the time comes for a change, I'd be more inclined to (as Robert
Kern already said) look at PCG and Random123. Like the Twister, WELL
requires massive internal state, and fails the same kinds of
randomnesss tests (while the suggested alternatives fail none to
date). WELL does escape "zeroland" faster, but still much slower than
PCG or Random123 (which appear to have no systematic attractors). The
alternatives require much smaller state, and at least PCG much simpler
code.

Note that the seeding function used by Python doesn't take the
user-supplied seed as-is (only __setstate__ does): it runs rounds of
pseudo-random bit dispersion, to make it highly unlikely that an
initial state with lots of zeroes is produced. While the Twister
escapes zeroland very slowly, the flip side is that it also
transitions _to_ zeroland very slowly. It's quite possible that
nobody has ever fallen into such a state (short of contriving to via
__setstate__). Falling into zeroland was a very real problem in the
Twister's very early days, which is why its authors added the
bit-dispersal code to the seeding function. Python was wise to wait
until they did.

It's prudent to wait for someone else to find the early surprises in
PCG and Random123 too ;-)

Andrew Barnert via Python-Dev

unread,

Feb 11, 2016, 11:24:35 PM2/11/16

to Glenn Linderman, David Mertz, Python-Dev

The first time I used underscore separators in any language, it was a test script for a server that wanted social security numbers as integers instead of strings, like 123_45_6789.[^1]

Which is why I suggested the style guideline should just say "meaningful grouping of digits", rather than try to predict what counts as "meaningful" for every program.

[^1] Of course in Python, it's usually trivial to stick a shim in between the database and the model thingy so I could just pass in "123-45-6789", so I don't expect to ever need this specific example.

Glenn Linderman

unread,

Feb 11, 2016, 11:29:14 PM2/11/16

to Python-Dev

On 2/11/2016 8:22 PM, Andrew Barnert wrote:

On Thursday, February 11, 2016 8:10 PM, Glenn Linderman <v+py...@g.nevcal.com> wrote:

On 2/11/2016 7:56 PM, David Mertz wrote:

Great PEP overall. We definitely don't want the restriction to grouping numbers only in threes. South Asian crore use grouping in twos.

https://en.m.wikipedia.org/wiki/Crore

Interesting... 3 digits in the least significant group, and _then_

   by twos. Wouldn't have predicted that one! Never bumped into that
   notation before!


The first time I used underscore separators in any language, it was a test script for a server that wanted social security numbers as integers instead of strings, like 123_45_6789.[^1] 

Which is why I suggested the style guideline should just say "meaningful grouping of digits", rather than try to predict what counts as "meaningful" for every program.


[^1] Of course in Python, it's usually trivial to stick a shim in between the database and the model thingy so I could just pass in "123-45-6789", so I don't expect to ever need this specific example.

Yes, I had thought of the Social Security Number possibility also, although having them as constants in a program seems a bit unusual. Test script, fake numbers, yeah, I guess so.

Chris Angelico

unread,

Feb 11, 2016, 11:46:34 PM2/11/16

to pytho...@python.org

On Fri, Feb 12, 2016 at 3:12 PM, Andrew Barnert via Python-Dev
<pytho...@python.org> wrote:
> On Thursday, February 11, 2016 7:20 PM, Stephen J. Turnbull <ste...@xemacs.org> wrote:
>
>
>
>> I think we should keep it around forever. Even my slowest colleagues
>> are learning that they should record their seeds and PRNG algorithms
>> for reproducibility's sake. :-)
>
> +1
>
>> For that matter, restore Wichmann-Hill.
>
> So you can write code that works on 2.3 and 3.6, but not 3.5?
>
> I agree that it shouldn't have gone away, but I think it may be too late for adding it back to help too much.

You're probably right, but the point isn't to make the same code run,
necessarily. It's to make things verifiable. Suppose I do some
scientific research that involves a pseudo-random number component,
and I publish my results ("Monte Carlo analysis produced these
results, blah blah, using this seed, etc, etc"). If you want to come
back later and say "I think there was a bug in your code", you need to
be able to generate the exact same PRNG sequence. I published my
algorithm and my seed, so you should in theory be able to recreate
that sequence; but if you have to reimplement the same algorithm,
that's a lot of unnecessary work that could have been replaced with
"from random.deprecated_generators import WichmannHill as Random".
(Plus there's the whole question of "was your reimplemented PRNG
buggy" - or, for that matter, "was the original PRNG buggy". Using the
exact same code eliminates even that.)

So I'm +1 on keeping Mersenne Twister even after it's been replaced as
the default PRNG, -0 on reinstating something that hasn't been used in
well over a decade, and -1 on replacing MT today - I'm not seeing
strong arguments in favour of changing.

ChrisA

Paul Moore

unread,

Feb 12, 2016, 4:01:16 AM2/12/16

to Steven D'Aprano, Python Dev

On 12 February 2016 at 00:16, Steven D'Aprano <st...@pearwood.info> wrote:
> I think that there is broad agreement that:
>
> - the basic idea is sound
> - leading underscores followed by digits are currently legal
> identifiers and this will not change
> - underscores should not follow the sign - +
> - underscores should not follow the decimal point .
> - underscores should not follow the exponent e|E
> - underscores will not be permitted inside the exponent (even if
> it is harmless, it's silly to write 1.2e9_9)
> - underscores should not follow the complex suffix j
>
> and only minor disagreement about:
>
> - whether or not underscores will be allowed after the base
> specifier 0x 0o 0b
> - whether or not underscores will be allowed before the decimal
> point, exponent and complex suffix.
>
> Can we have a show of hands, in favour or against the above two? And
> then perhaps Guido can rule on this one way or the other and we can get
> back to arguing about more important matters? :-)
>
> In case it isn't obvious, I prefer to say No to allowing underscores
> after the base specifier, or before the decimal point, exponent and
> complex suffix.

I have no opinion on anything other than that whatever syntax is
implemented as long as it allows single underscores between digits,
such as

1_000_000

Everything else is irrelevant to me, and if I read code that uses
anything else, I'd judge it based on readability and style, and
wouldn't care about arguments that "it's allowed by the grammar".

Paul

Robert Kern

unread,

Feb 12, 2016, 6:29:07 AM2/12/16

to pytho...@python.org

On 2016-02-12 04:15, Tim Peters wrote:
> [Greg Ewing <greg....@canterbury.ac.nz>]
>> The Mersenne Twister is no longer regarded as quite state-of-the art
>> because it can get into states that produce long sequences that are
>> not very random.
>>
>> There is a variation on MT called WELL that has better properties
>> in this regard. Does anyone think it would be a good idea to replace
>> MT with WELL as Python's default rng?
>
> I don't think so, because I've seen no groundswell of discontent about
> the Twister among Python users. Perhaps I'm missing some?

Well me, but I'm mostly focused on numpy's PRNG, which is proceeding apace.

https://github.com/bashtage/ng-numpy-randomstate

While I am concerned about MT's BigCrush failures, what makes me most
discontented is not having multiple guaranteed-independent streams.

> It's prudent to wait for someone else to find the early surprises in
> PCG and Random123 too ;-)

Quite so!

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Chris Barker

unread,

Feb 12, 2016, 3:07:45 PM2/12/16

to Paul Moore, Python Dev

On Fri, Feb 12, 2016 at 1:00 AM, Paul Moore <p.f....@gmail.com> wrote:

I have no opinion on anything other than that whatever syntax is
implemented as long as it allows single underscores between digits,
such as

1_000_000

Everything else is irrelevant to me, and if I read code that uses
anything else, I'd judge it based on readability and style, and
wouldn't care about arguments that "it's allowed by the grammar".

I totally agree -- and it's clear that other cultures group digits differently, so we should allow that, but while I'll live with it either way, I'd rather have it be as restrictive as possible rather than as unrestricted as possible. As in:

no double underscores

no underscore right before or after a period

no underscore at the beginning or end.

....

As Paul said, as long as I can do the above, I'll be fine, but I think everyone's source code will be a lot cleaner in the long run if you don't have the option of doing who knows what weird arrangement....

As for the SS# example -- it seems a bad idea to me to store a SS# number as an integer anyway -- so all the weird IDs etc. formats aren't really relevant...

-CHB

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA 98115   (206) 526-6317   main reception

Chris....@noaa.gov

MRAB

unread,

Feb 12, 2016, 3:40:30 PM2/12/16

to pytho...@python.org

On 2016-02-12 20:06, Chris Barker wrote:
> On Fri, Feb 12, 2016 at 1:00 AM, Paul Moore <p.f....@gmail.com
> <mailto:p.f....@gmail.com>> wrote:
>
>
> I have no opinion on anything other than that whatever syntax is
> implemented as long as it allows single underscores between digits,
> such as
>
> 1_000_000
>
> Everything else is irrelevant to me, and if I read code that uses
> anything else, I'd judge it based on readability and style, and
> wouldn't care about arguments that "it's allowed by the grammar".
>
>
> I totally agree -- and it's clear that other cultures group digits
> differently, so we should allow that, but while I'll live with it either
> way, I'd rather have it be as restrictive as possible rather than as
> unrestricted as possible. As in:
>
> no double underscores
> no underscore right before or after a period
> no underscore at the beginning or end.
> ....
>
> As Paul said, as long as I can do the above, I'll be fine, but I think
> everyone's source code will be a lot cleaner in the long run if you
> don't have the option of doing who knows what weird arrangement....
>
> As for the SS# example -- it seems a bad idea to me to store a SS#
> number as an integer anyway -- so all the weird IDs etc. formats aren't
> really relevant...
>

That also applies to telephone numbers, account numbers, etc. They
aren't really numbers (you wouldn't do arithmetic on them) and might
have leading zeros.

Glenn Linderman

unread,

Feb 12, 2016, 3:59:21 PM2/12/16

to pytho...@python.org

On 2/12/2016 12:06 PM, Chris Barker wrote:

On Fri, Feb 12, 2016 at 1:00 AM, Paul Moore <p.f....@gmail.com> wrote:

I have no opinion on anything other than that whatever syntax is
implemented as long as it allows single underscores between digits,
such as

1_000_000

Everything else is irrelevant to me, and if I read code that uses
anything else, I'd judge it based on readability and style, and
wouldn't care about arguments that "it's allowed by the grammar".

I totally agree -- and it's clear that other cultures group digits differently, so we should allow that, but while I'll live with it either way, I'd rather have it be as restrictive as possible rather than as unrestricted as possible. As in:

no double underscores

Useful for really long binary constants... one _ for nybble or field divisions, two __ for byte divisions.

Of course, really long binary constants might be a bad idea.

no underscore right before or after a period

no underscore at the beginning or end.

You get your wish for the beginning... it would be ambiguous with identifiers. And your style guide can include whatever restrictions you like, for your code.

....

As Paul said, as long as I can do the above, I'll be fine, but I think everyone's source code will be a lot cleaner in the long run if you don't have the option of doing who knows what weird arrangement....

As for the SS# example -- it seems a bad idea to me to store a SS# number as an integer anyway -- so all the weird IDs etc. formats aren't really relevant...

SS#... why not integer? Phone#... why not integer? There's a lot of nice digit-division conventions for phone#s in different parts of the world.

The only ambiguity is if such numbers have leading zeros, you have to "know" (or record) how many total digits are expected.

Andrew Barnert via Python-Dev

unread,

Feb 12, 2016, 4:33:20 PM2/12/16

to Glenn Linderman, pytho...@python.org

I'm the one who brought up the SSN example--and, as I said at the time, I almost certainly wouldn't have done that in Python. I was maintaining tests for a service that stored SSNs as integers (which I think is a mistake, but I couldn't change it), a automatically-generated strongly-typed interface to that service (which is good), and no easy way to wrap or hook that interface (which is bad). In Python, it's hard to imagine how I'd end up with a situation where I couldn't wrap or hook the interface and treat SSNs as strings in my test code. (In fact, for complicated tests, I did exactly that in Python to make sure they were correct, then ported them over to integrate with the test suite...)

And anyway, the only point was that I've actually used a grouping that isn't "every 3 digits" and it didn't end the world. I think everyone agrees that some such groupings will come up--even if not every specific examples is good, there are some that are. Even the people who want something more conservative than the PEP doesn't seem to be taking that position--they may not want double underscores, or "123_456_j", but they're fine with "if yuan > 9999_9999:".

So, either we try to anticipate every possible way people might want to group numbers and decide which ones are good or bad, or we just let the style guide say "meaningful group of digits" and let each developer decide what counts as "meaningful" for their application. Does anyone really want to argue for the former?

If not, why not just settle that and go back to bikeshedding the cases that *are* contended, like "123_456_j"? (I'm happy either way, as long as the grammar rule is dead simple and the PEP 8 rule is pretty simple, but I know others have strong, and conflicting, opinions on that.)

Paul Moore

unread,

Feb 12, 2016, 6:18:17 PM2/12/16

to Chris Barker, Python Dev

On 12 February 2016 at 20:06, Chris Barker <chris....@noaa.gov> wrote:
> As Paul said, as long as I can do the above, I'll be fine, but I think
> everyone's source code will be a lot cleaner in the long run if you don't
> have the option of doing who knows what weird arrangement....

Just to be clear, I'm personally in favour of less restrictions rather
than more (as a general principle) - consenting adults and all that.
But I'm also in favour of less debate rather than more on this issue,
so I'll shut up at this point :-)

Reply all

Reply to author

Forward