C++17 Proposal: Allow std::byte

807 views
Skip to first unread message

K. Moon

unread,
Jan 6, 2022, 1:06:55 PM1/6/22
to cxx
https://chromium.googlesource.com/chromium/src/+/HEAD/styleguide/c++/c++11.md#std_byte-tbd

Allowing std::byte seems harmless to me. I'm guessing the main concern would be knowing when it's appropriate to use std::byte over something like char or uint8_t. I imagine most of the time it would be used for blob-y data types, though.

Roland Bock

unread,
Jan 6, 2022, 1:22:20 PM1/6/22
to K. Moon, cxx
+1

std::byte indicates intent: Storing data (as opposed to textual information).



On Thu, Jan 6, 2022 at 7:06 PM K. Moon <km...@chromium.org> wrote:
https://chromium.googlesource.com/chromium/src/+/HEAD/styleguide/c++/c++11.md#std_byte-tbd

Allowing std::byte seems harmless to me. I'm guessing the main concern would be knowing when it's appropriate to use std::byte over something like char or uint8_t. I imagine most of the time it would be used for blob-y data types, though.

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CACwGi-4-P3YM2MvUsFv3N-yQ79d-TDPxFxW8b_RU4%3D46oBz-Wg%40mail.gmail.com.

Peter Kasting

unread,
Jan 6, 2022, 1:22:29 PM1/6/22
to K. Moon, cxx
On Thu, Jan 6, 2022 at 10:06 AM K. Moon <km...@chromium.org> wrote:
https://chromium.googlesource.com/chromium/src/+/HEAD/styleguide/c++/c++11.md#std_byte-tbd

Allowing std::byte seems harmless to me. I'm guessing the main concern would be knowing when it's appropriate to use std::byte over something like char or uint8_t. I imagine most of the time it would be used for blob-y data types, though.

Is there a particular place you'd like to use this?

I was worried about aliasing, but it looks like the standard carves out the same exceptions for std::byte as for char and unsigned char.  So it's presumably valid to cast-T*-to-byte* and back.  I'm not sure what other gotchas might exist.

PK

K. Moon

unread,
Jan 6, 2022, 1:25:01 PM1/6/22
to Peter Kasting, cxx
I'm reviewing some code at the moment that uses a std::vector<uint8_t> to hold a bag of bytes, and I was thinking, "Hey, we have C++17 now. Could we use a std::byte here?" Turns out the answer is TBD, so here's my proposal. :-)

Peter Kasting

unread,
Jan 6, 2022, 1:28:38 PM1/6/22
to K. Moon, cxx
On Thu, Jan 6, 2022 at 10:25 AM K. Moon <km...@chromium.org> wrote:
I'm reviewing some code at the moment that uses a std::vector<uint8_t> to hold a bag of bytes, and I was thinking, "Hey, we have C++17 now. Could we use a std::byte here?" Turns out the answer is TBD, so here's my proposal. :-)

Tentative +1, but it'd be good to see how the code is using uint8_t to see if "byte" would remove any aliasing violations or otherwise add clarity.  If the author actually wants numeric values out the other end, std::byte could just add more hassle...

PK 

K. Moon

unread,
Jan 6, 2022, 1:30:29 PM1/6/22
to Peter Kasting, cxx, Lei Zhang
The review in question is https://crrev.com/c/3368635. In this case, we're taking a blob of pixels, which we then transfer using postMessage(), which then gets interpreted on the other side of the IPC boundary. No other operations are performed on the data.

Roland McGrath

unread,
Jan 6, 2022, 1:51:02 PM1/6/22
to K. Moon, Peter Kasting, cxx, Lei Zhang
I think it's a good style distinction to use std::byte for cases of "moving opaque data around", and use uint8_t for "arrays of 8-bit integer values".  There is no real distinction to the compiler wrt aliasing rules and all that.  The only real distinction is in the front-end experience of requiring `static_cast<some integer type>(byte)` before you can do arithmetic, which IMHO is a good thing in distinguishing the intent of each use.  If you find yourself writing those casts, it's probably a case where using uint8_t instead may have been appropriate.

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

dan...@chromium.org

unread,
Jan 6, 2022, 1:54:00 PM1/6/22
to Roland McGrath, K. Moon, Peter Kasting, cxx, Lei Zhang
On Thu, Jan 6, 2022 at 1:51 PM Roland McGrath <mcgr...@chromium.org> wrote:
I think it's a good style distinction to use std::byte for cases of "moving opaque data around", and use uint8_t for "arrays of 8-bit integer values".  There is no real distinction to the compiler wrt aliasing rules and all that.

There is a difference. You can legally cast a T* to a char* or byte* but not to a uint8_t*.

For pixel data, it's numeric 8 byte channels. I am not sure that byte makes sense. I think it's better for opaque or into a byte stream of more complex types (IPC pickling for example).
 
  The only real distinction is in the front-end experience of requiring `static_cast<some integer type>(byte)` before you can do arithmetic, which IMHO is a good thing in distinguishing the intent of each use.  If you find yourself writing those casts, it's probably a case where using uint8_t instead may have been appropriate.

On Thu, Jan 6, 2022 at 10:30 AM K. Moon <km...@chromium.org> wrote:
The review in question is https://crrev.com/c/3368635. In this case, we're taking a blob of pixels, which we then transfer using postMessage(), which then gets interpreted on the other side of the IPC boundary. No other operations are performed on the data.

On Thu, Jan 6, 2022 at 10:28 AM Peter Kasting <pkas...@google.com> wrote:
On Thu, Jan 6, 2022 at 10:25 AM K. Moon <km...@chromium.org> wrote:
I'm reviewing some code at the moment that uses a std::vector<uint8_t> to hold a bag of bytes, and I was thinking, "Hey, we have C++17 now. Could we use a std::byte here?" Turns out the answer is TBD, so here's my proposal. :-)

Tentative +1, but it'd be good to see how the code is using uint8_t to see if "byte" would remove any aliasing violations or otherwise add clarity.  If the author actually wants numeric values out the other end, std::byte could just add more hassle...

PK 

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CACwGi-6B2JeOwggqQ-4_5bMPCf9epGv63eOofgGnGLL6oXbu6Q%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

K. Moon

unread,
Jan 6, 2022, 1:55:48 PM1/6/22
to dan...@chromium.org, Roland McGrath, Peter Kasting, cxx, Lei Zhang
In this case, there's no manipulation of the pixels as pixels: We have a buffer generated elsewhere (which happens to be pixels), it needs to be passed across an IPC boundary. So I think it fits precisely into the case you mentioned. :-)

Daniel Cheng

unread,
Jan 6, 2022, 2:02:09 PM1/6/22
to K. Moon, dan...@chromium.org, Roland McGrath, Peter Kasting, cxx, Lei Zhang
We have a bunch of code that assumes binary blobs are std::vector<std::uint8_t> or base::span<std::uint8_t>. Would we change those to std::vector<std::byte> and base::span<std::byte>? Or would we have to add yet more overloads?

We already have this problem for std::vector<char> and base::span<char>, and as_bytes() also currently returns a base::span<std::uint8_t>.

Daniel

K. Moon

unread,
Jan 6, 2022, 2:39:07 PM1/6/22
to Daniel Cheng, dan...@chromium.org, Roland McGrath, Peter Kasting, cxx, Lei Zhang
I think converging on std::byte in as many places as makes sense might be useful, but ultimately I think that's a library design question, orthogonal to whether or not std::byte should be allowed as a language feature. There are plenty of cases where we allow a C++ feature, but it wouldn't be a good choice in every design scenario.

Where I think this becomes a valid concern is if the uncontrolled proliferation of std::byte creates harm, but I think Peter's earlier point about std::byte following the same aliasing rules as char and unsigned char actually give it an advantage over uint8_t in that respect: casting provides a safety valve for cheaply converting from std::byte* to char* (and from anything else). In that respect, uint8_t was a suboptimal choice in the first place, since there's no such guarantee about aliasing behavior.

Daniel Cheng

unread,
Jan 6, 2022, 3:14:38 PM1/6/22
to K. Moon, dan...@chromium.org, Roland McGrath, Peter Kasting, cxx, Lei Zhang
Sure, but we had no choice prior to C++17 :)

I do think that allowing it means making a decision for base::as_bytes(). It also matters a bit for things that take std::vectors, because we can't just cast std::vector<uint8_t> to std::vector<std::byte>.

Daniel

Jan Wilken Dörrie

unread,
Jan 7, 2022, 6:37:29 AM1/7/22
to Daniel Cheng, K. Moon, dan...@chromium.org, Roland McGrath, Peter Kasting, cxx, Lei Zhang
+1 to allow std::byte.

On Thu, Jan 6, 2022 at 9:14 PM Daniel Cheng <dch...@chromium.org> wrote:
Sure, but we had no choice prior to C++17 :)

I do think that allowing it means making a decision for base::as_bytes().

Assuming we eventually want to adopt std::span and its byte conversion utilities I suggest we change base::as_bytes()'s return type sooner rather than later.
 
It also matters a bit for things that take std::vectors, because we can't just cast std::vector<uint8_t> to std::vector<std::byte>.

That unfortunately is true (w/o invoking UB). At least the conversion can be done via memcpy, though.
 

Daniel

On Thu, 6 Jan 2022 at 11:39, K. Moon <km...@chromium.org> wrote:
I think converging on std::byte in as many places as makes sense might be useful, but ultimately I think that's a library design question, orthogonal to whether or not std::byte should be allowed as a language feature. There are plenty of cases where we allow a C++ feature, but it wouldn't be a good choice in every design scenario.

Where I think this becomes a valid concern is if the uncontrolled proliferation of std::byte creates harm, but I think Peter's earlier point about std::byte following the same aliasing rules as char and unsigned char actually give it an advantage over uint8_t in that respect: casting provides a safety valve for cheaply converting from std::byte* to char* (and from anything else). In that respect, uint8_t was a suboptimal choice in the first place, since there's no such guarantee about aliasing behavior.

While aliasing is not guaranteed, uint8_t should just be an alias for unsigned char on all platforms we care about. So you could statically assert that these are the same types if you rely on uint8_t aliasing.
 

Chris Palmer

unread,
Jan 7, 2022, 3:07:40 PM1/7/22
to Jan Wilken Dörrie, Daniel Cheng, K. Moon, Dana Jansens, Roland McGrath, Peter Kasting, cxx, Lei Zhang
Generic reminder that adopting more `std`, including `span`, has the potential to increase memory unsafety, unless we can reliably get and test an implementation that checks bounds and checks for other UB. I have confidence in Abseil, though.

David Benjamin

unread,
Jan 7, 2022, 3:12:13 PM1/7/22
to Jan Wilken Dörrie, Daniel Cheng, K. Moon, dan...@chromium.org, Roland McGrath, Peter Kasting, cxx, Lei Zhang
tl;dr: We should not use std::byte for binary data, only object representations in memory. The correct type for binary data is uint8_t. And, if we change base::as_bytes, we'll need a replacement span<char> to span<uint8_t> converter.

I don't think std::byte is appropriate for binary data, and switching to it will increase complexity. By binary data, I mean 8-bit values we read/write from network/filesystem/IPC. To distinguish it from another use case later, call these "octets" instead of "bytes". It is true that octets in C++ are a mess. We constantly reinterpret_cast between char* and uint8_t*. But std::byte only adds more casts. Indefinitely so because some code (due to backwards- or C-compatibility) cannot move to std::byte.

Moreover, std::byte is worse than uint8_t for octets. It is neither signed or unsigned. It only supports bit operations. You need to call std::to_integer<uint8_t>(b) to get the numeric value. That only makes sense if [0,255] vs [-128,127] is a rare, case-by-case decision. Yet we're in this mess because there is a common, natural interpretation (unsigned) that's not that default (char is usually signed)! Protocols and encodings, including UTF-8, are always specified in terms of 0-255. Even C's isdigit() is UB on negative numbers and requires you cast to unsigned char first. I've never seen a use case for signed octets.

IMO, the root problem is that std::string and string literals should have used uint8_t, or uint8_t == char == unsigned char. Most uses of base::as_bytes are just working around this problem. To compare to other languages, Go programs use "byte" for binary data, but it's an alias to "uint8". Rust just has its succinct "u8" for binary data.

Now, one could argue it's good that char != uint8_t, to capture UTF-8-ness, or text vs binary, in the type system. (IMO this is a lost cause for char; everything puts binary data in std::string. C++20 instead adds char8_t, for even more reinterpret_cast.) Even so, we frequently need to work with the underlying byte sequence to serialize text. Thus, we need some other span<const char> to span<const uint8_t> converter. That's currently base::as_bytes.

Aside: Even to capture UTF-8, I think uint8_t vs char8_t is incorrect. "Valid UTF-8" is a property of the octet sequence, not the individual octet. To compare, Rust captures UTF-8 in &str, the whole sequence. A &str still has an underlying &[u8]. The problem is just C++ is stuck making u8"foo" be some T[N] rather than a language-level span type.

Another aside: Go has both []byte and string, but it's more about immutability than guaranteed UTF-8. It likewise makes it easy to convert between string and []byte.
https://go.dev/ref/spec#Conversions_to_and_from_a_string_type

So, with all that, what's going on with std::byte and std::as_bytes? I think C++17 was trying to address a different, more niche use in std::byte. It's not a unit of binary data, but a unit of C++ object representations in memory. Here's the original paper, and a follow-up:

In C++ the abstract language, objects have a representation in "memory". This representation is complicated (look up "pointer provenance"), but exposed to the programmer. Code can memcpy random objects, view object representations by casting to char*, etc. I think std::byte was trying to be a more type-safe version of that. This also explains why std::as_bytes takes any span<T> instead of span<char>.

Assuming that's right, I strongly disagree with P0583R0. The colloquial definition of "byte" is octet, not memory unit. I also don't understand why std::byte has bit ops. Seems to me bit ops are just as invalid here as arithmetic. But it is what is.

So, what does this mean? I think:

1. We should not use std::byte for binary data.

2. We can use std::byte for object representation manipulation if those folks want it. (Malloc implementations and such.)

3. std::byte is badly named, so if we use it for object representations, we should document clearly in the style guide that it is not for binary data.

4. Either base::as_bytes should remain uint8_t, or we replace it with a new span<char>-specific function. I'm inclined to do the latter. It'll actually be more convenient to be char-specific. The template messes with type inference a bit and we have to sprinkle base::make_span (base::span with CTAD) everywhere.

5. I wish char, char8_t, and uint8_t were the same type, but that's a deeply-entwined language bug and will probably never get better. :-)

On Fri, Jan 7, 2022 at 6:37 AM Jan Wilken Dörrie <jdoe...@chromium.org> wrote:
+1 to allow std::byte.

On Thu, Jan 6, 2022 at 9:14 PM Daniel Cheng <dch...@chromium.org> wrote:
Sure, but we had no choice prior to C++17 :)

I do think that allowing it means making a decision for base::as_bytes().

Assuming we eventually want to adopt std::span and its byte conversion utilities I suggest we change base::as_bytes()'s return type sooner rather than later.
 
It also matters a bit for things that take std::vectors, because we can't just cast std::vector<uint8_t> to std::vector<std::byte>.

That unfortunately is true (w/o invoking UB). At least the conversion can be done via memcpy, though.
 

Daniel

On Thu, 6 Jan 2022 at 11:39, K. Moon <km...@chromium.org> wrote:
I think converging on std::byte in as many places as makes sense might be useful, but ultimately I think that's a library design question, orthogonal to whether or not std::byte should be allowed as a language feature. There are plenty of cases where we allow a C++ feature, but it wouldn't be a good choice in every design scenario.

Where I think this becomes a valid concern is if the uncontrolled proliferation of std::byte creates harm, but I think Peter's earlier point about std::byte following the same aliasing rules as char and unsigned char actually give it an advantage over uint8_t in that respect: casting provides a safety valve for cheaply converting from std::byte* to char* (and from anything else). In that respect, uint8_t was a suboptimal choice in the first place, since there's no such guarantee about aliasing behavior.

While aliasing is not guaranteed, uint8_t should just be an alias for unsigned char on all platforms we care about. So you could statically assert that these are the same types if you rely on uint8_t aliasing.

Chromium, by way of BoringSSL, will already not compile if uint8_t is a different type from unsigned char. I suspect quite a lot of other code makes this assumption too.
 

K. Moon

unread,
Jan 7, 2022, 4:13:25 PM1/7/22
to David Benjamin, Jan Wilken Dörrie, Daniel Cheng, dan...@chromium.org, Roland McGrath, Peter Kasting, cxx, Lei Zhang
It seems inconsistent to argue on one hand that uint8_t is always going to be unsigned char on all the platforms we care about, but on the other hand, std::byte somehow isn't 8-bit units; std::byte is defined as "enum class byte : unsigned char {}", so it necessarily follows that it's the exact same size as unsigned char, which we've assumed is the exact same size as uint8_t.

For all practical purposes, I think we can assume that char, signed char, unsigned char, std::byte, and uint8_t all essentially describe octets, so let's ignore that potential difference.

That just leaves whether or not the type supports arithmetic operations (char, signed char, unsigned char, uint8_t) or not (std::byte). The right type to use feels more like tip-of-the-week territory to me than something you'd put in the style guide: There are pros and cons to each type in different situations.

Incidentally, someone actually filed a bug against Clang for not applying strict aliasing rules to uint8_t (https://bugs.llvm.org/show_bug.cgi?id=31410), although the bug ultimately was closed as WONTFIX for ABI compatibility reasons.

David Benjamin

unread,
Jan 7, 2022, 4:53:52 PM1/7/22
to K. Moon, Jan Wilken Dörrie, Daniel Cheng, dan...@chromium.org, Roland McGrath, Peter Kasting, cxx, Lei Zhang
Right, the issue isn't whether it'll work, given casts (we even build without strict aliasing, so most things work, setting pointer provenance questions aside), but what the type is suited for, and what to use for binary data. We're already paying a cost in having a mishmash of char vs uint8_t by needing reinterpret_cast everywhere. It's even a safety issue as reinterpret_cast<uint8_t*>(t) doesn't check that t was a character type. (base::as_bytes allows any T, but at least the size always matches.)

Adding a third type to the binary data rotation would make the problem worse, so I don't think we should do it lightly. Add in it not being very useful (lack of arithmetic ops) and the papers' references to accessing object representations, and I think it's clearly the wrong type for the job.

Roland McGrath

unread,
Jan 7, 2022, 5:58:26 PM1/7/22
to David Benjamin, K. Moon, Jan Wilken Dörrie, Daniel Cheng, Dana Jansens, Peter Kasting, cxx, Lei Zhang
I've always taken the purpose of `std::byte` to be for data as opaque blob.  I frankly don't know why bitwise operations on it are in the standard; that seems useless to me.  `span<byte>` is simply the modern replacement for `void *, size_t` and `byte` is the type to use for things that are opaque except for the granularity at which they can be copied around in a well-defined manner.  For any kind of data that is not meant to be wholly opaque at the layer in question, I think it's clear that `std::byte` is not appropriate.  If it's textual data, then `char` may be appropriate (depending on encodings and such), but in general I think it's clear that no kind of `char` type is the spelling one should be using for non-textual data.  It so happens that `uint8_t` is actually just an alias for `unsigned char`, but IMHO it's better not to think of it that way and to instead think of it as the thing that holds 8 bits when you want to manipulate bits (e.g. as well-defined [0,255) values with twos-complement behavior).  So for "bags of data" I think that both `std::byte` and `uint8_t` have sensible uses, and `std::byte` is a laudable attempt to consolidate on a single better answer for things where actually accessing 8 bits of data you understand as individual bits in any sense was never the intention.  Having nothing that looks in the source like `char` ever be used for anything but text is a win IMHO.  Having `std::byte` as the unambiguous C++-kosher thing for byte-by-byte copying and using only that for that purpose per se is a win IMHO.  Stamping out use of `reinterpret_cast` or other kinds of type-punning that isn't fully C++17 kosher use of `std::byte` (and eschewing the standard's allowed use of `char` and `unsigned char` for that purpose) is a win IMHO.

Honglin Yu

unread,
Jan 7, 2022, 7:44:04 PM1/7/22
to Roland McGrath, David Benjamin, K. Moon, Jan Wilken Dörrie, Daniel Cheng, Dana Jansens, Peter Kasting, cxx, Lei Zhang
Just curious: if I understand it correctly, there are many mojo interfaces using "array<uint8>" as data blobs. And there seems no exact counterpart of std::byte in mojo's IDL yet. If we enable std::byte, does it mean that we need to extend mojo too?

Roland McGrath

unread,
Jan 7, 2022, 8:12:43 PM1/7/22
to Honglin Yu, David Benjamin, K. Moon, Jan Wilken Dörrie, Daniel Cheng, Dana Jansens, Peter Kasting, cxx, Lei Zhang
Issues like that are very often what make this kind of migration and consolidation on new "best practices" difficult and not get done.  So that may well be a barrier here.  And there's often a case to be made that if there will be barriers to migrating fully to a new thing uniformly then perhaps it's better not to introduce the new thing at all if you can't use it to get rid of all the old things it ideally should replace.

Honglin Yu

unread,
Jan 7, 2022, 8:41:03 PM1/7/22
to Roland McGrath, David Benjamin, K. Moon, Jan Wilken Dörrie, Daniel Cheng, Dana Jansens, Peter Kasting, cxx, Lei Zhang
Yeah, changing mojo interfaces in Chrome may still be straightforward because the receiver/remote are in the same repo and can be modified at the same time (although there will be lots of work). But it will be a different story when one side of the mojo connection is in chromeOS --- we'd better well evaluate the effort needed. 

Daniel Cheng

unread,
Jan 7, 2022, 8:44:45 PM1/7/22
to Honglin Yu, Roland McGrath, David Benjamin, K. Moon, Jan Wilken Dörrie, Dana Jansens, Peter Kasting, cxx, Lei Zhang
We can probably treat array<uint8_t> and array<byte> as a special case of being wire compatible if needed from the Mojo perspective. My bigger concern is to make sure we have some consistency with uint8_t vs std::byte vs char so we don't just end up with more ways to do things and even more conversions marshalling things from vector<uint8_t> to vector<std::byte> to vector<char>.

Daniel 

Honglin Yu

unread,
Jan 7, 2022, 8:51:42 PM1/7/22
to Daniel Cheng, Roland McGrath, David Benjamin, K. Moon, Jan Wilken Dörrie, Dana Jansens, Peter Kasting, cxx, Lei Zhang
We can probably treat array<uint8_t> and array<byte> as a special case of being wire compatible if needed from the Mojo perspective. My bigger concern is to make sure we have some consistency with uint8_t vs std::byte vs char so we don't just end up with more ways to do things and even more conversions marshalling things from vector<uint8_t> to vector<std::byte> to vector<char>.
Yeah, if we can not do the replacement uniformly, I feel we will probably end up with that. 

David Benjamin

unread,
Jan 10, 2022, 11:22:41 AM1/10/22
to Honglin Yu, Daniel Cheng, Roland McGrath, K. Moon, Jan Wilken Dörrie, Dana Jansens, Peter Kasting, cxx, Lei Zhang
We definitely will not be able to do the replacement uniformly. Any APIs that need to go through C can never migrate to std::byte. std::byte is also not suitable when you actually need to interpret the values. (Even though most code, with respect to all values, octet sequences or otherwise, just passes stuff around, presumably some layer ultimately does otherwise what are we maintaining the value for?)

This should be expected since P0583R0 describes an entirely different use case for std::byte than binary data. (Note "object representation" in this context doesn't mean some arbitrary wire format, like Mojo, but how a C++ object is stored in memory.)

K. Moon

unread,
Jan 10, 2022, 11:32:42 AM1/10/22
to David Benjamin, Honglin Yu, Daniel Cheng, Roland McGrath, Jan Wilken Dörrie, Dana Jansens, Peter Kasting, cxx, Lei Zhang
What about the reverse issue, with APIs that require std::byte? This feels to me like trying to ban size_t; it's just not tenable to ban such a fundamental type, because other C++ code (including in the standard library) is going to assume it's available (and we'll end up on a tech island).

As previously discussed, I also think the in-memory object representation thing is a red herring, as we've already decided that a byte is always going to be 8 bits. A block of char or uint8_t or std::byte are all going to have the same problem, and none of these are suitable as wire formats.

Unlike std::any (which breaks the component build), I haven't seen a technical reason to ban std::byte yet, only "best practice" ones.

Peter Kasting

unread,
Jan 10, 2022, 11:38:01 AM1/10/22
to K. Moon, David Benjamin, Honglin Yu, Daniel Cheng, Roland McGrath, Jan Wilken Dörrie, Dana Jansens, cxx, Lei Zhang
I think it's telling that the upstream style guide doesn't ban std::byte.  Maybe we should allow it with a comment that API owners should think carefully about the knock-on effects before using it anywhere?

PK

dan...@chromium.org

unread,
Jan 11, 2022, 1:02:24 PM1/11/22
to K. Moon, David Benjamin, Honglin Yu, Daniel Cheng, Roland McGrath, Jan Wilken Dörrie, Peter Kasting, cxx, Lei Zhang
On Mon, Jan 10, 2022 at 11:32 AM K. Moon <km...@chromium.org> wrote:
What about the reverse issue, with APIs that require std::byte? This feels to me like trying to ban size_t; it's just not tenable to ban such a fundamental type, because other C++ code (including in the standard library) is going to assume it's available (and we'll end up on a tech island).

Any API taking a `std::byte*` would be fine to pass &object to, without requiring us to use byte ourselves.

Any API taking vector<byte> is doing it wrong, I think, is the conclusion of what byte represents. And if an API wanted to receive that we'd have to convert from vector<thingswecanconstruct> to vector<byte> at some point anyway, so just do that at the edge like we do when we need to pass a std::function.
 
As previously discussed, I also think the in-memory object representation thing is a red herring, as we've already decided that a byte is always going to be 8 bits. A block of char or uint8_t or std::byte are all going to have the same problem, and none of these are suitable as wire formats.

I don't think the point is about the size of the type. A byte is meant to be an opaque representation of an object, not a bytestream. As uint8_t* is meant to be a byte-stream not text, as char* is meant to be text, not a bytestream. They're all the same size but that doesn't mean they make logical sense in any give place. Our code is a mess in using these wrong, and adding byte won't improve the situation, we'll just have N+1 problems. Folks like David have spent a lot of time trying to reduce the mess between uint8_t* and char*, which is why there are strong feelings I presume about backsliding by introducing yet another type. One that, if used for random "blobs of data" is even harder to reason about when to use vs uint8_t. "Will I read the bytes" is a question of semi-global knowledge and not the type of information that necessarily makes sense in the type system in this way.

If, instead, byte* is not referring to intent to read/write the data, but instead refers to being a pointer to an object (or array of objects) rather than a byte stream, you can have a more clear rule of when to use it or not.

And in Chrome, we've already spent person-years trying to move away from char* for this purpose to int8_t*. Is it really worth adding inconsistency and spending many more years trying to change int8_t* to byte*, but only when you're not using the data..? I'd wager it's a poor use of our time.

Using a char* as a "pointer to anything" is not great, and using byte* to represent that makes sense there. I expect that's why it's not banned. But using it as a concrete type `byte` vs `byte*` makes about as much sense as using a `void` instead of a `void*`, except that the former compiles now.

Of course templates complicate things. span<byte> is in fact a byte* inside, but that's hidden behind an abstraction. It makes it hard to have a rule, but if I could: As a first pass, I would say you can use byte* (only as a pointer) in places where you'd have written void* or char* before. Thus span<byte> may technically be okay but makes little sense because you can't have a vector<byte>.

K. Moon

unread,
Jan 12, 2022, 1:47:43 PM1/12/22
to dan...@chromium.org, David Benjamin, Honglin Yu, Daniel Cheng, Roland McGrath, Jan Wilken Dörrie, Peter Kasting, cxx, Lei Zhang
TL;DR: It seems clear that std::byte is controversial enough that it shouldn't be in the initial set of things that are allowed from C++17. That said, I'm less in love with std:byte, and more against banning things: My ideal end state is that we (eventually) ban as few things from C++17 as possible, and I don't see a good reason to exclude std::byte because we think some APIs would be better using char or uint8_t or whatever. As Peter suggested earlier, I think this could/should be an API by API decision.

I think what would be even better is if we clearly articulated somewhere what our end goal is, as I don't think even char vs. uint8_t is written down anywhere. (It might also be a good opportunity to get ahead of the char8_t mess in C++20.)

For reference, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0298r3.pdf is the most recent version of P0298 (the std::byte proposal).

The Motivation and Scope section reads (emphasis mine):
Many programs require byte-oriented access to memory. Today, such programs must use either the char,
signed char, or unsigned char types for this purpose. However, these types perform a “triple duty”.
Not only are they used for byte addressing, but also as arithmetic types, and as character types. This
multiplicity of roles opens the door for programmer error – such as accidentally performing arithmetic on
memory that should be treated as a byte value – and confusion for both programmers and tools.

 
Having a distinct byte type improves type-safety, by distinguishing byte-oriented access to memory from
accessing memory as a character or integral value. It improves readability. Having the type would also
make the intent of code clearer to readers (as well as tooling for understanding and transforming
programs). It increases type-safety by removing ambiguities in expression of programmer’s intent,
thereby increasing the accuracy of analysis tools.

Whether or not one agrees with this analysis, the intent was to increase type safety. For example, uint8_t may be the wrong type if you actually have a buffer int8_t, and vice versa; std::byte doesn't make that assumption, so it can safely contain either (requiring explicit conversion). You also can't assign or compare 'a' to a std::byte.

On Tue, Jan 11, 2022 at 10:02 AM <dan...@chromium.org> wrote:
On Mon, Jan 10, 2022 at 11:32 AM K. Moon <km...@chromium.org> wrote:
What about the reverse issue, with APIs that require std::byte? This feels to me like trying to ban size_t; it's just not tenable to ban such a fundamental type, because other C++ code (including in the standard library) is going to assume it's available (and we'll end up on a tech island).

Any API taking a `std::byte*` would be fine to pass &object to, without requiring us to use byte ourselves.

Any API taking vector<byte> is doing it wrong, I think, is the conclusion of what byte represents. And if an API wanted to receive that we'd have to convert from vector<thingswecanconstruct> to vector<byte> at some point anyway, so just do that at the edge like we do when we need to pass a std::function.

For third-party types like absl::Status, I think it's acceptable to draw that line, but if it's part of the standard library, I think the bar for that is higher. As you note, there's precedent for banning types like std::function, but even when there's a strong technical reason, it makes interop awkward. We can't predict if the standard committee is going to make some standard type we've banned a key part of some critical future API.

I don't think that vector<byte> automatically is "doing it wrong" is a correct reading of the standard proposal: If you want a collection of uninterpreted bytes, vector<byte> is exactly the right type for that. The only question in my mind is how often one needs that (vs. the (u)int8_t interpretation), but I don't think the style guide should be taking a position on that.
 
As previously discussed, I also think the in-memory object representation thing is a red herring, as we've already decided that a byte is always going to be 8 bits. A block of char or uint8_t or std::byte are all going to have the same problem, and none of these are suitable as wire formats.

I don't think the point is about the size of the type. A byte is meant to be an opaque representation of an object, not a bytestream. As uint8_t* is meant to be a byte-stream not text, as char* is meant to be text, not a bytestream. They're all the same size but that doesn't mean they make logical sense in any give place. Our code is a mess in using these wrong, and adding byte won't improve the situation, we'll just have N+1 problems. Folks like David have spent a lot of time trying to reduce the mess between uint8_t* and char*, which is why there are strong feelings I presume about backsliding by introducing yet another type. One that, if used for random "blobs of data" is even harder to reason about when to use vs uint8_t. "Will I read the bytes" is a question of semi-global knowledge and not the type of information that necessarily makes sense in the type system in this way.

std::byte is allowed to access raw object representations (because it supports aliasing), but it's not solely meant for that; the proposal only has one sentence mentioning that, almost as an aside. It's main stated purpose is, "byte-oriented access to memory," which is also true of arrays/buffers/strings.

I think the argument that the current situation is a mess (and we don't want to make it messier) is a strong one, but if there are any conceivable situations where std::byte is clearly the optimal type, I think that argues against a style-level ban.

If, instead, byte* is not referring to intent to read/write the data, but instead refers to being a pointer to an object (or array of objects) rather than a byte stream, you can have a more clear rule of when to use it or not.

And in Chrome, we've already spent person-years trying to move away from char* for this purpose to int8_t*. Is it really worth adding inconsistency and spending many more years trying to change int8_t* to byte*, but only when you're not using the data..? I'd wager it's a poor use of our time.

Using a char* as a "pointer to anything" is not great, and using byte* to represent that makes sense there. I expect that's why it's not banned. But using it as a concrete type `byte` vs `byte*` makes about as much sense as using a `void` instead of a `void*`, except that the former compiles now.

Of course templates complicate things. span<byte> is in fact a byte* inside, but that's hidden behind an abstraction. It makes it hard to have a rule, but if I could: As a first pass, I would say you can use byte* (only as a pointer) in places where you'd have written void* or char* before. Thus span<byte> may technically be okay but makes little sense because you can't have a vector<byte>.

The aliasing rules allow easy conversion between char* and std::byte* (and we'll assume uint8_t*), so I agree that consumers are the least problematic use case, as it'll always be possible to convert between these types with 0 cost. I'd be fine with starting there, and seeing whether it causes horrible problems that need to be backed out.

A span<T> can be constructed from something other than a vector<T>, so I don't agree with that part. :-) I also don't agree that you would never want a vector<byte>; I think a reasonable design could include a vector<byte> as a private member, and a public facade for interpreting those bytes in multiple ways. (Is it the best design? I'd leave that up to code review.)

dan...@chromium.org

unread,
Jan 14, 2022, 3:06:53 PM1/14/22
to K. Moon, David Benjamin, Honglin Yu, Daniel Cheng, Roland McGrath, Jan Wilken Dörrie, Peter Kasting, cxx, Lei Zhang
On Wed, Jan 12, 2022 at 1:47 PM K. Moon <km...@chromium.org> wrote:
TL;DR: It seems clear that std::byte is controversial enough that it shouldn't be in the initial set of things that are allowed from C++17. That said, I'm less in love with std:byte, and more against banning things: My ideal end state is that we (eventually) ban as few things from C++17 as possible, and I don't see a good reason to exclude std::byte because we think some APIs would be better using char or uint8_t or whatever. As Peter suggested earlier, I think this could/should be an API by API decision.

I think what would be even better is if we clearly articulated somewhere what our end goal is, as I don't think even char vs. uint8_t is written down anywhere. (It might also be a good opportunity to get ahead of the char8_t mess in C++20.)

Here's some context on char vs uint8_t if you want to know more: https://bugs.chromium.org/p/chromium/issues/detail?id=559302 tl;dr a lot of code uses std::string when vector<uint8_t> would be better. Our base library struggles with this, and I see no appetite to introduce byte to //base. That probably doesn't mean it should be banned but I wish (personally) that we had a bit more coherence between our core libraries, which implicitly push a style on the rest of the codebase, and our styleguide, which does the same and often chooses between 1 in std:: and a different one in our own core libraries.
 

For reference, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0298r3.pdf is the most recent version of P0298 (the std::byte proposal).

The Motivation and Scope section reads (emphasis mine):
Many programs require byte-oriented access to memory. Today, such programs must use either the char,
signed char, or unsigned char types for this purpose. However, these types perform a “triple duty”.
Not only are they used for byte addressing, but also as arithmetic types, and as character types. This
multiplicity of roles opens the door for programmer error – such as accidentally performing arithmetic on
memory that should be treated as a byte value – and confusion for both programmers and tools.

 
Having a distinct byte type improves type-safety, by distinguishing byte-oriented access to memory from
accessing memory as a character or integral value. It improves readability. Having the type would also
make the intent of code clearer to readers (as well as tooling for understanding and transforming
programs). It increases type-safety by removing ambiguities in expression of programmer’s intent,
thereby increasing the accuracy of analysis tools.

Whether or not one agrees with this analysis, the intent was to increase type safety. For example, uint8_t may be the wrong type if you actually have a buffer int8_t, and vice versa; std::byte doesn't make that assumption, so it can safely contain either (requiring explicit conversion). You also can't assign or compare 'a' to a std::byte.

On Tue, Jan 11, 2022 at 10:02 AM <dan...@chromium.org> wrote:
On Mon, Jan 10, 2022 at 11:32 AM K. Moon <km...@chromium.org> wrote:
What about the reverse issue, with APIs that require std::byte? This feels to me like trying to ban size_t; it's just not tenable to ban such a fundamental type, because other C++ code (including in the standard library) is going to assume it's available (and we'll end up on a tech island).

Any API taking a `std::byte*` would be fine to pass &object to, without requiring us to use byte ourselves.

Any API taking vector<byte> is doing it wrong, I think, is the conclusion of what byte represents. And if an API wanted to receive that we'd have to convert from vector<thingswecanconstruct> to vector<byte> at some point anyway, so just do that at the edge like we do when we need to pass a std::function.

For third-party types like absl::Status, I think it's acceptable to draw that line, but if it's part of the standard library, I think the bar for that is higher.

I don't see a big distinction between absl and std, really. Both are standard libraries and in fact we prioritize absl over std for security/safety reasons.
 
As you note, there's precedent for banning types like std::function, but even when there's a strong technical reason, it makes interop awkward. We can't predict if the standard committee is going to make some standard type we've banned a key part of some critical future API.

We would deal with that if it happened, somehow, but hypotheticals for such a slow moving target don't carry a lot of weight for me. But this is again a weird conflict between an opinionated "core library" vs "style guide" motivations, where the fact we wrote a safer std::function for our purposes imposes that the styleguide bans std::function. Any time we introduce something in //base we should not duplicate and avoid using std. But when a new thing comes out in std, does that imply we drop the thing in //base? It shouldn't 100% of the time. If std::function was part of C++17 we wouldn't take it.

I'm going into a bit of a meta direction here, we don't have a competing equivalent of std::byte in base, but perhaps you get my point that we don't have a coherent body that sets direction for how to write code really and sometimes there's a conflict as a result as there is here.

I do think that if //base has no appetite for std::byte that could be a good indication that we should not use it, if having it around elsewhere will then introduce pain and suffering and codegen costs and duplication.

Peter Kasting

unread,
Jan 19, 2022, 8:08:48 PM1/19/22
to dan...@chromium.org, K. Moon, David Benjamin, Honglin Yu, Daniel Cheng, Roland McGrath, Jan Wilken Dörrie, cxx, Lei Zhang
It doesn't seem like we have any consensus here.

Dana mentions that "//base has no appetite for std::byte".  I'd like to understand whether that means "we are opposed because using it would be detrimental" or "we might support it in theory, but the benefit doesn't seem to justify the cost of trying to update all the APIs" or "we have too much on our plates to really think deeply about this at all".  (I know thestig@, for example, is in the third camp.)

I tend to feel:
* The Google styleguide doesn't ban std::byte, and I haven't heard a compelling reason why Chrome is different.
* A byte is not the same as an octet.
* Bag-of-binary-data APIs should in principle use std::byte instead of std::uint8_t, because it is semantically more correct and eliminates some types of aliasing problems.
* Bag-of-not-known-to-be-text-data APIs should in principle use std::byte instead of char/string, because it is semantically more correct and may reduce casting given the above bullet.
* The practical benefit of eliminating aliasing problems is small given that Chrome will never compile with strict aliasing on.
* Changes to core APIs are viral and thus harder.
* Inconsistencies in APIs are stumbling blocks to readability.

Personal conclusion: Allow std::byte; see if someone (e.g. me) has appetite to audit and convert a large fraction of applicable APIs to use it; either convert all those or don't convert anything.

PK

dan...@chromium.org

unread,
Jan 20, 2022, 9:46:33 AM1/20/22
to Peter Kasting, K. Moon, David Benjamin, Honglin Yu, Daniel Cheng, Roland McGrath, Jan Wilken Dörrie, cxx, Lei Zhang
On Wed, Jan 19, 2022 at 8:08 PM Peter Kasting <pkas...@google.com> wrote:
It doesn't seem like we have any consensus here.

Dana mentions that "//base has no appetite for std::byte".  I'd like to understand whether that means "we are opposed because using it would be detrimental" or "we might support it in theory, but the benefit doesn't seem to justify the cost of trying to update all the APIs" or "we have too much on our plates to really think deeply about this at all".  (I know thestig@, for example, is in the third camp.)

I think davidben@ is the domain expert here and I would defer to him about using it instead of string/span<uint8_t>/etc. But given the amount of code using string still, it doesn't seem feasible to try to force the conversion to byte. We already failed to get everything to uint8_t instead of char. So, we end up with 3 options instead of 2.

Roland Bock

unread,
Jan 20, 2022, 9:57:38 AM1/20/22
to danakj, Peter Kasting, K. Moon, David Benjamin, Honglin Yu, Daniel Cheng, Roland McGrath, Jan Wilken Dörrie, cxx, Lei Zhang
On Thu, Jan 20, 2022 at 3:46 PM <dan...@chromium.org> wrote:
On Wed, Jan 19, 2022 at 8:08 PM Peter Kasting <pkas...@google.com> wrote:
It doesn't seem like we have any consensus here.

Dana mentions that "//base has no appetite for std::byte".  I'd like to understand whether that means "we are opposed because using it would be detrimental" or "we might support it in theory, but the benefit doesn't seem to justify the cost of trying to update all the APIs" or "we have too much on our plates to really think deeply about this at all".  (I know thestig@, for example, is in the third camp.)

I think davidben@ is the domain expert here and I would defer to him about using it instead of string/span<uint8_t>/etc. But given the amount of code using string still, it doesn't seem feasible to try to force the conversion to byte. We already failed to get everything to uint8_t instead of char. So, we end up with 3 options instead of 2.

3 instead of 2 might actually be helpful, if and when some third party libraries start using std::byte.

To me this sounds like we should allow std::byte, but not try to forcefully convert everything.
 


I tend to feel:
* The Google styleguide doesn't ban std::byte, and I haven't heard a compelling reason why Chrome is different.
* A byte is not the same as an octet.
* Bag-of-binary-data APIs should in principle use std::byte instead of std::uint8_t, because it is semantically more correct and eliminates some types of aliasing problems.
* Bag-of-not-known-to-be-text-data APIs should in principle use std::byte instead of char/string, because it is semantically more correct and may reduce casting given the above bullet.
* The practical benefit of eliminating aliasing problems is small given that Chrome will never compile with strict aliasing on.
* Changes to core APIs are viral and thus harder.
* Inconsistencies in APIs are stumbling blocks to readability.

Personal conclusion: Allow std::byte; see if someone (e.g. me) has appetite to audit and convert a large fraction of applicable APIs to use it; either convert all those or don't convert anything.

PK

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

dan...@chromium.org

unread,
Jan 20, 2022, 9:59:08 AM1/20/22
to Roland Bock, Peter Kasting, K. Moon, David Benjamin, Honglin Yu, Daniel Cheng, Roland McGrath, Jan Wilken Dörrie, cxx, Lei Zhang
On Thu, Jan 20, 2022 at 9:57 AM Roland Bock <rb...@google.com> wrote:
On Thu, Jan 20, 2022 at 3:46 PM <dan...@chromium.org> wrote:
On Wed, Jan 19, 2022 at 8:08 PM Peter Kasting <pkas...@google.com> wrote:
It doesn't seem like we have any consensus here.

Dana mentions that "//base has no appetite for std::byte".  I'd like to understand whether that means "we are opposed because using it would be detrimental" or "we might support it in theory, but the benefit doesn't seem to justify the cost of trying to update all the APIs" or "we have too much on our plates to really think deeply about this at all".  (I know thestig@, for example, is in the third camp.)

I think davidben@ is the domain expert here and I would defer to him about using it instead of string/span<uint8_t>/etc. But given the amount of code using string still, it doesn't seem feasible to try to force the conversion to byte. We already failed to get everything to uint8_t instead of char. So, we end up with 3 options instead of 2.

3 instead of 2 might actually be helpful, if and when some third party libraries start using std::byte.

I would greatly expect first-party code to cause this long before third-party code.
Reply all
Reply to author
Forward
0 new messages