Proposal: Allow literal separators (')

40 views
Skip to first unread message

Peter Kasting

unread,
Jan 9, 2018, 4:26:57 PM1/9/18
to cxx
By popular request in https://groups.google.com/a/chromium.org/forum/#!topic/cxx/zsGhgaKLmIk .

This proposes to allow literal separators in integral/floating point constants, a la:

int a = 1'234'567'890;
long long b = 0x01234567'89abcdef;
float f = 1'000'000.000'1;

The primary use for these is probably 64-bit constants and binary literals, but they can be useful for any long constant.  In testing, Visual Studio's editor has no problem parsing/highlighting these appropriately; I would assume other editors are likely to be fine as well.


PK

Daniel Bratell

unread,
Jan 10, 2018, 4:57:03 AM1/10/18
to cxx, Peter Kasting
Emacs 24.5's C++-mode insists that those are string separators and if an odd number it will highlight the rest of the file as a long string.

This doesn't have to prevent them from being used but might be good to (manually) restrict the use to places where it makes a big difference.

http://cc-mode.sourceforge.net/changes-533.php claims that the newest version has support for "Separators in integer literals" so it might just be a case of getting the latest version of emacs or the cc-mode package. Emacs 24.5 is the version provided by some common distributions but it's not the newest one.

/Daniel

--
/* Opera Software, Linköping, Sweden: CET (UTC+1) */

Peter Kasting

unread,
Jan 10, 2018, 3:51:28 PM1/10/18
to Daniel Bratell, cxx
On Wed, Jan 10, 2018 at 1:56 AM, Daniel Bratell <bra...@opera.com> wrote:
http://cc-mode.sourceforge.net/changes-533.php claims that the newest version has support for "Separators in integer literals" so it might just be a case of getting the latest version of emacs or the cc-mode package. Emacs 24.5 is the version provided by some common distributions but it's not the newest one.

I'm not familiar enough with how Emacs versioning works.  It looks like CC Mode is versioned separately and supports several versions of Emacs, i.e. it functions more like a separate extension.  So is "Emacs 24.5" really the issue, or "CC Mode < 5.33"?  I notice that same page also lists added support for a variety of other things we allow and use, e.g. lambdas, parameter packs, raw strings, ">>" template enders; if older versions don't properly support these things, it seems like the problems are larger than just literal separators.

PK

Avi Drissman

unread,
Jan 10, 2018, 10:14:37 PM1/10/18
to Peter Kasting, Chris Blume, Daniel Bratell, cxx
As in the other thread, I don't see syntax highlighting as an issue that should stop us.

I still see this as a good example:

    image.red_mask = 0b11111'000000'00000;
    image.green_mask = 0b00000'111111'00000;
    image.blue_mask = 0b00000'000000'11111;

Note that this specifically does not split on the octet or nibble or whatever. This is raw bit-bashing code and the reason we're using binary literals here at all is specifically because the 5-6-5 structure of these 16-bit RGB values doesn't fall neatly on the octet so that it's less clear with hex literals what's going on. Having the freedom to use literal separators, and furthermore, having the freedom to use them anywhere, is critical here. The way they are used here makes it very obvious what the underlying structure of the binary value is: there's no way anyone could look at that and not notice that the middle block has more bits than the other two blocks.

I agree with what Chris wrote on the other thread: we can rely on reviewers to stop people from doing silly things like (in his example) 0b11'0'1'10'1. There are plenty of things in C++ that are abusable in a silly way like that that aren't prohibited by the style guide, and I would be pretty disappointed in ourselves if we couldn't trust CL authors to be reasonable.

My vote, for both binary literals and number literal separators, is for them to be "approved for use where appropriate for clarity" and for us to place trust in our committers.

Avi

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To post to this group, send email to c...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CAAHOzFA7gdE_%2BvwG_DQBUbnqJPjh81NqpGxeGCEcFvDsQPDWgQ%40mail.gmail.com.

Karl Wiberg

unread,
Jan 11, 2018, 4:53:23 AM1/11/18
to Avi Drissman, Peter Kasting, Chris Blume, Daniel Bratell, cxx
On Thu, Jan 11, 2018 at 4:14 AM, Avi Drissman <a...@chromium.org> wrote:
As in the other thread, I don't see syntax highlighting as an issue that should stop us.

I still see this as a good example:

    image.red_mask = 0b11111'000000'00000;
    image.green_mask = 0b00000'111111'00000;
    image.blue_mask = 0b00000'000000'11111;

Note that this specifically does not split on the octet or nibble or whatever. This is raw bit-bashing code and the reason we're using binary literals here at all is specifically because the 5-6-5 structure of these 16-bit RGB values doesn't fall neatly on the octet so that it's less clear with hex literals what's going on. Having the freedom to use literal separators, and furthermore, having the freedom to use them anywhere, is critical here. The way they are used here makes it very obvious what the underlying structure of the binary value is: there's no way anyone could look at that and not notice that the middle block has more bits than the other two blocks.

[The bold formatting was added by me.]

For what it's worth, I'll have to respectfully disagree—I would be surprised if I encountered literals whose digits were separated into groups of unequal size, and in the present case I really did miss it until it was pointed out in the discussion. However, calling it out would have made the problem go away:

    // Note: 5-6-5 grouping of bits.
    image.red_mask = 0b11111'000000'00000;
    image.green_mask = 0b00000'111111'00000;
    image.blue_mask = 0b00000'000000'11111;

But I'd argue that something like this is much better, since it clearly shows that we're dealing with 16-bit values where exactly one of the three has a 1 in any given position:

    // clang-format off
    image.red_mask =   0b1111'1000'0000'0000;
    image.green_mask = 0b0000'0111'1110'0000;
    image.blue_mask =  0b0000'0000'0001'1111;
    // clang-format on

I agree with what Chris wrote on the other thread: we can rely on reviewers to stop people from doing silly things like (in his example) 0b11'0'1'10'1. There are plenty of things in C++ that are abusable in a silly way like that that aren't prohibited by the style guide, and I would be pretty disappointed in ourselves if we couldn't trust CL authors to be reasonable.

My vote, for both binary literals and number literal separators, is for them to be "approved for use where appropriate for clarity" and for us to place trust in our committers.

This sounds like a fine rule to me.
 

Avi

On Wed, Jan 10, 2018 at 3:51 PM, 'Peter Kasting' via cxx <c...@chromium.org> wrote:
On Wed, Jan 10, 2018 at 1:56 AM, Daniel Bratell <bra...@opera.com> wrote:
http://cc-mode.sourceforge.net/changes-533.php claims that the newest version has support for "Separators in integer literals" so it might just be a case of getting the latest version of emacs or the cc-mode package. Emacs 24.5 is the version provided by some common distributions but it's not the newest one.

I'm not familiar enough with how Emacs versioning works.  It looks like CC Mode is versioned separately and supports several versions of Emacs, i.e. it functions more like a separate extension.  So is "Emacs 24.5" really the issue, or "CC Mode < 5.33"?  I notice that same page also lists added support for a variety of other things we allow and use, e.g. lambdas, parameter packs, raw strings, ">>" template enders; if older versions don't properly support these things, it seems like the problems are larger than just literal separators.

PK

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To post to this group, send email to c...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CAAHOzFA7gdE_%2BvwG_DQBUbnqJPjh81NqpGxeGCEcFvDsQPDWgQ%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To post to this group, send email to c...@chromium.org.

Avi Drissman

unread,
Jan 11, 2018, 10:41:25 AM1/11/18
to Karl Wiberg, Peter Kasting, Chris Blume, Daniel Bratell, cxx
I don't personally see your version as being "better", just different in how it explains what's going on. In any case, though, I would happily LG your version without hesitation if I were reviewing it.

Either way, having and using digit separators provide a clear benefit over not having them.

Avi

Joe Mason

unread,
Jan 11, 2018, 11:29:26 AM1/11/18
to Karl Wiberg, Avi Drissman, Peter Kasting, Chris Blume, Daniel Bratell, cxx
I agree that a separation that's not on a 4-bit boundary deserves a comment explaining why. (I also like using clang-format off to line up the bits in this case.)

However they SHOULDN'T be artificially split into groups of 4 here. The RGB format is explicitly 4 bytes for R, 5 bits for G, 4 bytes for B. Splitting on the 5-6-5 boundaries shows where those boundaries actually are. Showing where the 4-bit boundary isn't useful here since the format doesn't actually care about that boundary!

This seems easiest to read to me:

// clang-format off
// Note: 5-6-5 grouping of bits.
image.red_mask =   0b11111'000000'00000;
image.green_mask = 0b00000'111111'00000;
image.blue_mask =  0b00000'000000'11111;
// clang-format on

So I'd suggest that as guidance: separators should be allowed, and normally used to separate literals at the expected boundaries (4 or 8 bits for binary, 4 or 8 bytes for hex and octal, every 3 digits for decimal). If they are used at a different boundary there should be a comment drawing attention to this.

On Thu, Jan 11, 2018 at 4:53 AM, Karl Wiberg <kwi...@webrtc.org> wrote:

Chris Blume

unread,
Jan 11, 2018, 1:48:19 PM1/11/18
to joenot...@google.com, kwi...@webrtc.org, Avi Drissman, Peter Kasting, bra...@opera.com, cxx
I strongly agree with:
 
So I'd suggest that as guidance: separators should be allowed, and normally used to separate literals at the expected boundaries (4 or 8 bits for binary, 4 or 8 bytes for hex and octal, every 3 digits for decimal). If they are used at a different boundary there should be a comment drawing attention to this.

The example of the 5-6-5 comment made it very clear to me.


Possible spin-off conversation: What is the guideline on vertical alignment and // clang-format off ? I don't see any mention in the Google and Chromium style guides. But I completely agree that it is the most readable. Although the extra overhead is unfortunate.


Chris Blume |
 Software Engineer | cbl...@google.com | +1-614-929-9221

Peter Kasting

unread,
Jan 11, 2018, 1:48:26 PM1/11/18
to Joe Mason, Karl Wiberg, Avi Drissman, Chris Blume, Daniel Bratell, cxx
What I'm taking from this discussion so far is that people have varying opinions on precisely the best usage of separators, but there seems to be enthusiastic support for allowing them at all.

PK

Jeremy Roman

unread,
Jan 12, 2018, 10:22:19 AM1/12/18
to Peter Kasting, Joe Mason, Karl Wiberg, Avi Drissman, Chris Blume, Daniel Bratell, cxx
On Thu, Jan 11, 2018 at 1:48 PM, 'Peter Kasting' via cxx <c...@chromium.org> wrote:
What I'm taking from this discussion so far is that people have varying opinions on precisely the best usage of separators, but there seems to be enthusiastic support for allowing them at all.

My instinct here is to leave prevention of abuse to reviewers, who should have good judgment about what makes the code they own more or less readable.

I'm not an emacs user; how bad is the syntax stuff in emacs and how hard is it to fix/upgrade? (When would the upgraded version hit Debian testing and similar distros?)

PK

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To post to this group, send email to c...@chromium.org.

Daniel Bratell

unread,
Jan 12, 2018, 11:02:30 AM1/12/18
to Peter Kasting, Jeremy Roman, Joe Mason, Karl Wiberg, Avi Drissman, Chris Blume, cxx
On Fri, 12 Jan 2018 16:22:16 +0100, Jeremy Roman <jbr...@chromium.org> wrote:

On Thu, Jan 11, 2018 at 1:48 PM, 'Peter Kasting' via cxx <c...@chromium.org> wrote:
What I'm taking from this discussion so far is that people have varying opinions on precisely the best usage of separators, but there seems to be enthusiastic support for allowing them at all.

My instinct here is to leave prevention of abuse to reviewers, who should have good judgment about what makes the code they own more or less readable.

I'm not an emacs user; how bad is the syntax stuff in emacs and how hard is it to fix/upgrade? (When would the upgraded version hit Debian testing and similar distros?)

I brought up emacs so I'll add another 2 cents even though I have no version history. I happen to be on an Ubuntu 14.04 LTS (the dist recommended by https://chromium.googlesource.com/chromium/src/+/master/docs/linux_build_instructions.md ) where there is no support but:

a) I don't expect this to be a wildly used feature.
b) It's possible to manually update necessary libs or emacs itself.
c) Files are still editable even if the highlight and some auto-functions are wrong.
d) You can locally work around it by making sure there is an even number of '

I might end up regretting this if a) turns out to be wrong but if the recommendation includes a suggestion to not go wild with this just because you can, then I think it's fine to go. There are other reasons, like grep, regexps and other tools, to default to numbers written with only digits.

TL;DR: I don't think emacs is a show-stopper.

Avi Drissman

unread,
Jan 12, 2018, 11:09:17 AM1/12/18
to Daniel Bratell, Peter Kasting, Jeremy Roman, Joe Mason, Karl Wiberg, Chris Blume, cxx
I use Sublime. Raw strings don't syntax highlight correctly in it, but that didn't stop us from allowing them in Chromium code.

Tools will catch up. As long as there are no functional issues with a feature (as, say, there are with raw strings mixing with macros in GCC) or serious impediments to workflow (as in the line miscounting that GCC used to do with raw strings which was IIRC the reason we delayed in allowing them) then I think the benefits of allowing them far outweigh the disadvantages.

Avi

Joe Mason

unread,
Jan 12, 2018, 11:53:09 AM1/12/18
to Jeremy Roman, Peter Kasting, Karl Wiberg, Avi Drissman, Chris Blume, Daniel Bratell, cxx
I think it would be good for there to be some guidance to reviewers on whether splits on non-standard boundaries are allowed at all - some people seem to have felt they were obviously non-controversial and some people seem to have bounced off the idea, and that's a big enough difference of opinion that individual reviewers might have confusingly different takes. But yeah, if they're allowed at all, individual reviewers can easily decide "this non-standard split makes sense; this one's too weird; this one makes sense but needs a comment".

On the other hand, low-level bit fiddling is often domain-specific so maybe it's acceptable for different parts of the code to have different policies. I could see gfx reviewers allowing 5-6-5 RGB splits while base/memory reviewers require splitting on alignment boundaries.

Peter Kasting

unread,
Jan 18, 2018, 5:03:33 PM1/18/18
to cxx
Reply all
Reply to author
Forward
0 new messages