Allowing underscores in string arguments to the ``Decimal`` constructor. It
could be argued that these are akin to literals, since there is no Decimal
literal available (yet).
* Allowing underscores in string arguments to ``int()`` with base argument 0,
``float()`` and ``complex()``.
On Wed, Feb 10, 2016 at 08:41:27PM -0800, Andrew Barnert wrote:
> And honestly, are you really claiming that in your opinion, "123_456_"
> is worse than all of their other examples, like "1_23__4"?
Yes I am, because 123_456_ looks like you've forgotten to finish typing
the last group of digits, while 1_23__4 merely looks like you have no
taste.
On Feb 11, 2016, at 09:39, Terry Reedy <tjr...@udel.edu> wrote:
>
> If trailing _ is allowed, to simplify the implementation, I would like PEP 8, while on the subject, to say something like "While trailing _s on numbers are allowed, to simplify the implementation, they serve no purpose and are strongly discouraged".
That's a good point: we need style rules for PEP 8.
But I think everything that's just obviously pointless (like putting an underscore between every pair of digits, or sprinkling underscores all over a huge number to make ASCII art), or already handled by other guidelines (e.g., using a ton of underscores to "line up a table" is the same as using a ton of spaces, which is already discouraged) doesn't really need to be covered. And I think trailing underscores probably fall into that category.
It might be simpler to write a "whitelist" than a "blacklist" of all the ugly things people might come up with, and then just give a bunch of examples instead of a bunch of rules. Something like this:
While underscores can legally appear anywhere in the digit string, you should never use them for purposes other than visually separating meaningful digit groups like thousands, bytes, and the like.
123456_789012: ok (millions are groups, but thousands are more common, and 6-digit groups are readable, but on the edge)
123_456_789_012: better
123_456_789_012_: bad (trailing)
1_2_3_4_5_6: bad (too many)
1234_5678: ok if code is intended to deal with east-Asian numerals (where 10000 is a standard grouping), bad otherwise
3__141_592_654: ok if this represents a fixed-point fraction (obviously bad otherwise)
123.456_789e123: good
123.456_789e1_23: bad (never useful in exponent)
0x1234_5678: good
0o123_456: good
0x123_456_789: bad (3 hex digits is usually not a meaningful group)
The one case that seems contentious is "123_456_j". Honestly, I don't care which way that goes, and I'd be fine if the PEP left out any mention of it, but if people feel strongly one way or the other, the PEP could just give it as a good or a bad example and that would be enough to clarify the intention.
On 02/11/2016 10:50 AM, Serhiy Storchaka wrote:
> I have strong preference for more strict and simpler rule, used by
> most other languages -- "only between two digits". Main arguments:
> 2. Most languages use this rule. It is better to follow non-formal
> standard that invent the rule that differs from rules in every other
> language. This will help programmers that use multiple languages.
If Python followed other languages in everything:
1) Python would not need to exist; and
2) Python would suck ;)
If our rule is more permissive that other languages then cross-language developers can still use the same style in both languages, without penalizing those who want to use the extra freedom in Python.
Hey all, based on the feedback so far, I revised the PEP. There is now a much simpler rule for allowed underscores, with no exceptions. This made the grammar simpler as well.
Examples:: # grouping decimal numbers by thousands amount = 10_000_000.0 # grouping hexadecimal addresses by words addr = 0xDEAD_BEEF # grouping bits into bytes in a binary literal
flags = 0b_0011_1111_0100_1110
# making the literal suffix stand out more imag = 1.247812376e-15_j
The following extensions are open for discussion:
>> * Allowing underscores in string arguments to the ``Decimal`` constructor. It
>> could be argued that these are akin to literals, since there is no Decimal
>> literal available (yet).
>>
>> * Allowing underscores in string arguments to ``int()`` with base argument 0,
>> ``float()`` and ``complex()``.
>
> I'm -0.5 on both of these, with the caveat that if either gets done,
> both should be. Decimal() shouldn't be different from int() just
> because there's currently no way to express a Decimal literal; if
> Python 3.7 introduces such a literal, there'd be this weird rule
> difference that has to be maintained for backward compatibility, and
> has no justification left.
I would be weakly in favour of all relevant constructors being updated
to match the new syntax. The main reason is just consistency, and that
the documentation already kind of guarantees that the literal syntax
is supported (definitely for int and float; for complex it is too
vague).
To be consistent, the following minor extensions of the syntax should
be allowed, which are not legal Python literals: int("0_001"),
int("J_00", 20), float("0_001"), complex("0_001").
Maybe also with non-ASCII digits. However I tried writing Arabic-Indic
digits (U+0600 etc) and my web browser split the number apart when I
inserted an underscore. Maybe a right-to-left thing. But using
Devangari digits U+0966, U+0967: int("१_०००") (= 1_000). Non-ASCII
digits are apparently intentionally supported, but not documented:
<https://bugs.python.org/issue10581>.
> (As a side point, I would be fully in favour of Decimal literals. I'd
> also be in favour of something like "from __future__ import
> fraction_literals" so 1/2 would evaluate to Fraction(1,2) rather than
> 0.5. Hence I'm inclined *not* to support underscores in Decimal().)
Seems more like an argument to have the support in Decimal()
consistent with float() etc, i.e. all or nothing.
> - one or more underscores can appear BETWEEN two digits.
-0
Having underscores between digits is the main usage, but I don’t see
much harm in the more liberal version, unless it that makes the
specification or implementation too complex. Allowing stuff like
0x_100, 4.7_e3, and 1_j seems of slightly more benefit IMO than
disallowing 1_000_.
> To describe the second alternative as "complicating the rules" is, I
> think, grossly unfair. And if Serhiy's proposal is correct, the
> implementation is also no more complicated:
>
> # underscores after digits
> octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")*
> hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")*
> bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")*
>
> # underscores between digits
> octinteger: "0" ("o" | "O") octdigit (["_"] octdigit)*
> hexinteger: "0" ("x" | "X") hexdigit (["_"] hexdigit)*
> bininteger: "0" ("b" | "B") bindigit (["_"] bindigit)*
>
>
> The idea that the second alternative "forc[es] our tastes on everyone"
> while the first does not is bogus. The first alternative also prohibits
> things which are a matter of taste:
>
> # prohibited in both alternatives
> 0_xDEADBEEF
> 0._1234
> 1.2e_99
> -_1
This one is already a valid variable identifier name.
> 1j_
>
>
> I think that there is broad agreement that:
>
> - the basic idea is sound
> - leading underscores followed by digits are currently legal
> identifiers and this will not change
+1 to both
> - underscores should not follow the sign - +
> - underscores should not follow the decimal point .
> - underscores should not follow the exponent e|E
No strong opinion on these from me
> - underscores will not be permitted inside the exponent (even if
> it is harmless, it's silly to write 1.2e9_9)
-0, it seems like a needless inconsistency, unless it somehow hurts
the implementation
> - underscores should not follow the complex suffix j
No opinion
> and only minor disagreement about:
>
> - whether or not underscores will be allowed after the base
> specifier 0x 0o 0b
+0
> - whether or not underscores will be allowed before the decimal
> point, exponent and complex suffix.
No opinion about directly before decimal point; +0 before exponent or
imaginary (complex) suffix.
> Can we have a show of hands, in favour or against the above two? And
> then perhaps Guido can rule on this one way or the other and we can get
> back to arguing about more important matters? :-)
>
> In case it isn't obvious, I prefer to say No to allowing underscores
> after the base specifier, or before the decimal point, exponent and
> complex suffix.
# underscores after digits octinteger: "0" ("o" | "O") (octdigit | "_")* hexinteger: "0" ("x" | "X") (hexdigit | "_")* bininteger: "0" ("b" | "B") (bindigit | "_")*
- whether or not underscores will be allowed before the decimal point, exponent and complex suffix.
Can we have a show of hands, in favour or against the above two? And then perhaps Guido can rule on this one way or the other and we can get back to arguing about more important matters? :-) In case it isn't obvious, I prefer to say No to allowing underscores after the base specifier, or before the decimal point, exponent and complex suffix.
Great PEP overall. We definitely don't want the restriction to grouping numbers only in threes. South Asian crore use grouping in twos.
https://en.m.wikipedia.org/wiki/Crore
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Great PEP overall. We definitely don't want the restriction to grouping numbers only in threes. South Asian crore use grouping in twos.
On Thursday, February 11, 2016 8:10 PM, Glenn Linderman <v+py...@g.nevcal.com> wrote:On 2/11/2016 7:56 PM, David Mertz wrote: Great PEP overall. We definitely don't want the restriction to grouping numbers only in threes. South Asian crore use grouping in twos.https://en.m.wikipedia.org/wiki/CroreInteresting... 3 digits in the least significant group, and _then_by twos. Wouldn't have predicted that one! Never bumped into that notation before! The first time I used underscore separators in any language, it was a test script for a server that wanted social security numbers as integers instead of strings, like 123_45_6789.[^1] Which is why I suggested the style guideline should just say "meaningful grouping of digits", rather than try to predict what counts as "meaningful" for every program. [^1] Of course in Python, it's usually trivial to stick a shim in between the database and the model thingy so I could just pass in "123-45-6789", so I don't expect to ever need this specific example.
I have no opinion on anything other than that whatever syntax is
implemented as long as it allows single underscores between digits,
such as
1_000_000
Everything else is irrelevant to me, and if I read code that uses
anything else, I'd judge it based on readability and style, and
wouldn't care about arguments that "it's allowed by the grammar".
On Fri, Feb 12, 2016 at 1:00 AM, Paul Moore <p.f....@gmail.com> wrote:
I have no opinion on anything other than that whatever syntax is
implemented as long as it allows single underscores between digits,
such as
1_000_000
Everything else is irrelevant to me, and if I read code that uses
anything else, I'd judge it based on readability and style, and
wouldn't care about arguments that "it's allowed by the grammar".
I totally agree -- and it's clear that other cultures group digits differently, so we should allow that, but while I'll live with it either way, I'd rather have it be as restrictive as possible rather than as unrestricted as possible. As in:
no double underscores
no underscore right before or after a periodno underscore at the beginning or end.
....
As Paul said, as long as I can do the above, I'll be fine, but I think everyone's source code will be a lot cleaner in the long run if you don't have the option of doing who knows what weird arrangement....
As for the SS# example -- it seems a bad idea to me to store a SS# number as an integer anyway -- so all the weird IDs etc. formats aren't really relevant...