bit_cast?

1,460 views
Skip to first unread message

Avi Drissman

unread,
Nov 28, 2023, 6:48:26 PM11/28/23
to cxx
C++20 has std::bit_cast. We do not yet have a bug for converting to it. base::bit_cast uses the compiler intrinsics, and notes that NaCl may require its own implementation, but AFAIK NaCl supports C++20.

Any objections to migrating to the standard version?

Avi

Anton Bikineev

unread,
Nov 29, 2023, 4:56:36 AM11/29/23
to Avi Drissman, cxx
SGTM (and we also get the missing [[nodiscard]] as a bonus :) )

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CACWgwAZAVW-B61MXUrCQVZaFeAo-jp5n9KOGX3gTJkJeVZATgg%40mail.gmail.com.

Dana Jansens

unread,
Dec 1, 2023, 1:26:07 PM12/1/23
to Anton Bikineev, Avi Drissman, cxx

Avi Drissman

unread,
Dec 1, 2023, 1:30:54 PM12/1/23
to Dana Jansens, Anton Bikineev, cxx
Thanks!

Tracking in https://crbug.com/1506769. I'm planning on leaving /v8 alone, and while I already have a conversion ready for PA they've asked me to put that on hold until they can figure out the PDFium situation.

Dana Jansens

unread,
Dec 1, 2023, 1:35:09 PM12/1/23
to Avi Drissman, Anton Bikineev, cxx
Yeah, PA is still C++17 since it's got a wider audience than Chrome. But they also don't use //base, so they should have their own impl if they need it.

Avi Drissman

unread,
Dec 1, 2023, 1:41:21 PM12/1/23
to Dana Jansens, Anton Bikineev, cxx
👀

I have landed several C++20 changes to PA and PA's copy of base. I sent them for review. Was that something I should not have done?

Dana Jansens

unread,
Dec 1, 2023, 2:09:33 PM12/1/23
to Avi Drissman, Anton Bikineev, cxx
On Fri, Dec 1, 2023 at 1:41 PM Avi Drissman <a...@google.com> wrote:
👀

I have landed several C++20 changes to PA and PA's copy of base. I sent them for review. Was that something I should not have done?

Maybe. AFAIK there's an explicit desire to support c++17 there. I guess the "figure out the PDFium situation" should hopefully address this.

Avi Drissman

unread,
Dec 1, 2023, 3:23:26 PM12/1/23
to Dana Jansens, Anton Bikineev, cxx
I did a quick pass of the uses of bit_cast in our source, and I'm not sure I understand a lot of uses.

There are plenty of uses on pointers (e.g.):

Is that a useful cast? We shouldn't be accessing data through a pointer of the wrong type, but I thought you had to use bit_cast on the actual data, not on the pointer itself.

Avi

Peter Kasting

unread,
Dec 1, 2023, 3:28:04 PM12/1/23
to Avi Drissman, Dana Jansens, Anton Bikineev, cxx
Yeah, that's likely not sane, for the reason you identify.

PK

Dana Jansens

unread,
Dec 1, 2023, 3:31:18 PM12/1/23
to Peter Kasting, Avi Drissman, Anton Bikineev, cxx
pointers should be going through static_cast or reinterpret_cast. Might be nice if bit_cast prevented use on pointers :/

Avi Drissman

unread,
Dec 1, 2023, 3:33:26 PM12/1/23
to Dana Jansens, Peter Kasting, Anton Bikineev, cxx
static/reinterpret cast unless we're not talking about void or char pointers in which case they should be dereferenced and have bit_cast used.

I may well have to do a preparatory audit to fix the usage.

John Admanski

unread,
Dec 1, 2023, 3:35:21 PM12/1/23
to Dana Jansens, Peter Kasting, Avi Drissman, Anton Bikineev, cxx
Unfortunately I think it's not very unusual for people to think of bit_cast as just "like reinterpret_cast, but now it's safe!" without really understanding the underlying issue. So you get code that bit casts an X* to a Y* without really understanding that this does nothing to address the underlying issue, that you're using an X as if it's a Y.

Avi Drissman

unread,
Dec 1, 2023, 3:41:39 PM12/1/23
to John Admanski, Dana Jansens, Peter Kasting, Anton Bikineev, cxx
Indeed.

I don't know if this is something that we need Yet Another Presubmit for. :/

(FYI, my earlier email should be a char, unsigned char, or std::byte pointer; see https://eel.is/c++draft/basic.lval#11.3 for the spec.)

Avi Drissman

unread,
Dec 1, 2023, 3:50:36 PM12/1/23
to John Admanski, Dana Jansens, Peter Kasting, Anton Bikineev, cxx
At a quick glance, no one has any idea how to use this. I'm seeing this in google3 code imported in third_party. This is all Just Wrong.

Avi

Dana Jansens

unread,
Dec 1, 2023, 4:04:17 PM12/1/23
to Avi Drissman, John Admanski, Peter Kasting, Anton Bikineev, cxx
Since changing the API is out of scope maybe a clang-tidy could be proposed upstream.

Avi Drissman

unread,
Dec 1, 2023, 4:08:12 PM12/1/23
to Dana Jansens, John Admanski, Peter Kasting, Anton Bikineev, cxx
That would happen in the Clang bug tracker? Do they require an implementation, or would just a writeup of the problem do?

Dana Jansens

unread,
Dec 1, 2023, 4:41:14 PM12/1/23
to Avi Drissman, John Admanski, Peter Kasting, Anton Bikineev, cxx
On Fri, Dec 1, 2023 at 4:08 PM Avi Drissman <a...@google.com> wrote:
That would happen in the Clang bug tracker? Do they require an implementation, or would just a writeup of the problem do?

It would yeah. Filing a bug is a start but I wouldn't expect anyone else to write the implementation.

Avi Drissman

unread,
Dec 1, 2023, 4:42:29 PM12/1/23
to Dana Jansens, John Admanski, Peter Kasting, Anton Bikineev, cxx
OK, I did a quick audit of the ~2500 calls in ~400 files, and I honestly am shooketh. Seriously rattled.


There is a lot of usage of "bit cast this pointer into a different pointer type and read/write through it". A lot. By a lot of folks who are knowledgeable.

I can imagine that bit_cast is the only way to get a pointer with one set of bits to another in some of these cases. But I don't see how that fixes the UB issue. Am I missing some subtle way that bit casting a pointer is OK?

Avi

Peter Kasting

unread,
Dec 1, 2023, 4:50:23 PM12/1/23
to Avi Drissman, Dana Jansens, John Admanski, Anton Bikineev, cxx
You are not. The absl::bit_cast documentation even explicitly notes that bit_casting a pointer and then dereffing it is no better than using reinterpret_cast (both are UB). People who wanted this should have been reaching for memcpy() in many cases.

PK

Avi Drissman

unread,
Dec 1, 2023, 4:53:30 PM12/1/23
to Peter Kasting, Dana Jansens, John Admanski, Anton Bikineev, cxx
Do I file upstream bugs against tflite? tflite's third_party dependency xla? v8? 

John Admanski

unread,
Dec 1, 2023, 4:54:06 PM12/1/23
to Avi Drissman, Dana Jansens, Peter Kasting, Anton Bikineev, cxx
No, I don't think you're missing anything. There's not really any reason to ever bit_cast from one pointer type to another, or between an integer (like uintptr_t) and a pointer. That's already valid to do with reinterpret_cast, and so bit_cast doesn't add anything. It also doesn't add any benefit, as bit_cast does nothing to make dereferencing the casted-to pointer any safer.

FWIW I'll note that the abseil version of bit_cast explicitly calls this out as well, although that was only added last year. But the previous version of the documentation was a lot more vague, and also suggested using it for several cases where it doesn't really add any benefit over reinterpret_cast.

Dana Jansens

unread,
Dec 1, 2023, 4:57:02 PM12/1/23
to Avi Drissman, John Admanski, Peter Kasting, Anton Bikineev, cxx
On Fri, Dec 1, 2023 at 4:42 PM Avi Drissman <a...@google.com> wrote:
OK, I did a quick audit of the ~2500 calls in ~400 files, and I honestly am shooketh. Seriously rattled.


There is a lot of usage of "bit cast this pointer into a different pointer type and read/write through it". A lot. By a lot of folks who are knowledgeable.

I can imagine that bit_cast is the only way to get a pointer with one set of bits to another in some of these cases. But I don't see how that fixes the UB issue. Am I missing some subtle way that bit casting a pointer is OK?


26.5.3 Function template bit_cast says nothing about casting pointers.

6.7.5.4.3 Safely-derived pointers

A pointer value is a safely-derived pointer to an object with dynamic storage duration only if the pointer value has an object pointer type and is one of the following:
- a bunch of things that do include reinterpret_cast, but don't include bit_cast. Also it does allow conversion through the following 4 sections:

7.3.11 Pointer conversions
- these are implicit conversions, not bit_cast

7.6.1.3 Explicit type conversion (functional notation)
- these are explicit conversions, not bit_cast

7.6.1.8 Static cast
- not bit_cast

7.6.3 Explicit type conversion (cast notation)
- 2. An explicit type conversion can be expressed using functional notation (7.6.1.3), a type conversion operator (dynamic_cast, static_cast, reinterpret_cast, const_cast), or the cast notation.
- Talks more about casts, but not bit_cast.

AFAICT this is indeed meaning that:
- bit_cast of a pointer makes it not safely-derived.
- bit_cast makes the conversion constexpr which reinterpret_cast does not which is... bad.

On safely-derived pointers:

An implementation may have relaxed pointer safety, in which case the validity of a pointer value does not depend on whether it is a safely-derived pointer value. Alternatively, an implementation may have strict pointer safety, in which case a pointer value referring to an object with dynamic storage duration that is not a safely-derived pointer value is an invalid pointer value unless the referenced complete object has previously been declared reachable.

Avi Drissman

unread,
Dec 1, 2023, 5:08:29 PM12/1/23
to Dana Jansens, John Admanski, Peter Kasting, Anton Bikineev, cxx
Note that the entire section about "safely derived pointers" was removed by P2186R0 from the current standard as part of the removal of the GC support.

Dana Jansens

unread,
Dec 1, 2023, 6:00:11 PM12/1/23
to Avi Drissman, John Admanski, Peter Kasting, Anton Bikineev, cxx
On Fri, Dec 1, 2023 at 5:08 PM Avi Drissman <a...@google.com> wrote:
Note that the entire section about "safely derived pointers" was removed by P2186R0 from the current standard as part of the removal of the GC support.

Oh nice. Then I guess the other sections stand on their own and none of them give bit_cast as a way to convert a pointer, even though they list out the other casts.

Adam Rice

unread,
Dec 6, 2023, 2:12:49 AM12/6/23
to Dana Jansens, Avi Drissman, John Admanski, Peter Kasting, Anton Bikineev, cxx
I propose we ban std::bit_cast and make base::bit_cast static_assert telling you to use reinterpret_cast if you try to bit_cast between pointers or between pointers and {u,}intptr_t,

Peter Boström

unread,
Dec 6, 2023, 2:45:06 AM12/6/23
to Adam Rice, Dana Jansens, Avi Drissman, John Admanski, Peter Kasting, Anton Bikineev, cxx
Would it make more sense to pursue a -Werror-prone-bit-cast or put something in the chromium clang plugin (if anyone's willing to do so)? Using a base::bit_cast to introduce a compiler warning seems hacky.

Hans Wennborg

unread,
Dec 6, 2023, 4:39:51 AM12/6/23
to Peter Boström, Adam Rice, Dana Jansens, Avi Drissman, John Admanski, Peter Kasting, Anton Bikineev, cxx
I think it'd be worth filing this as a suggestion for a regular Clang warning, if nothing else to get a discussion going. (I suppose it may well end up as a clang-tidy warning instead though.)

Dana Jansens

unread,
Dec 6, 2023, 11:53:02 AM12/6/23
to Adam Rice, Avi Drissman, John Admanski, Peter Kasting, Anton Bikineev, cxx
On Wed, Dec 6, 2023 at 2:12 AM Adam Rice <ri...@chromium.org> wrote:
I propose we ban std::bit_cast and make base::bit_cast static_assert telling you to use reinterpret_cast if you try to bit_cast between pointers or between pointers and {u,}intptr_t,

I would support it, if we are not going to end up with a warning that we can enforce in a short time.

Avi Drissman

unread,
Dec 6, 2023, 12:09:51 PM12/6/23
to Dana Jansens, Adam Rice, John Admanski, Peter Kasting, Anton Bikineev, cxx
reinterpret_casting of pointers is also in most cases UB, though I suppose that's a different issue.

Avi Drissman

unread,
Dec 6, 2023, 12:18:59 PM12/6/23
to Dana Jansens, Adam Rice, John Admanski, Peter Kasting, Anton Bikineev, cxx
PTAL at my audit of bit_cast use. We could add static_assert (or concepts or whatever) to base::bit_cast, but that would break a ton of code, very hard. This architectural issue goes very deep, and would require the redesign of core 3p libraries from scratch.

Peter Kasting

unread,
Dec 6, 2023, 1:30:07 PM12/6/23
to Hans Wennborg, Peter Boström, Adam Rice, Dana Jansens, Avi Drissman, John Admanski, Anton Bikineev, cxx
On Wed, Dec 6, 2023 at 1:39 AM Hans Wennborg <ha...@chromium.org> wrote:
I think it'd be worth filing this as a suggestion for a regular Clang warning, if nothing else to get a discussion going. (I suppose it may well end up as a clang-tidy warning instead though.)

Avi Drissman

unread,
Dec 6, 2023, 1:40:45 PM12/6/23
to Peter Kasting, Hans Wennborg, Peter Boström, Adam Rice, Dana Jansens, John Admanski, Anton Bikineev, cxx
To be clear, I don't want to say that "most" usage in Chromium is dubious. Lots of it is, but there is actually a decent amount that looks correct at first glance.

Peter Kasting

unread,
Dec 6, 2023, 1:43:06 PM12/6/23
to Avi Drissman, Hans Wennborg, Peter Boström, Adam Rice, Dana Jansens, John Admanski, Anton Bikineev, cxx
On Wed, Dec 6, 2023 at 10:40 AM Avi Drissman <a...@google.com> wrote:
To be clear, I don't want to say that "most" usage in Chromium is dubious. Lots of it is, but there is actually a decent amount that looks correct at first glance.

Please do correct any misstatements I make on bugs :)

PK

Dana Jansens

unread,
Dec 6, 2023, 1:43:38 PM12/6/23
to Avi Drissman, Adam Rice, John Admanski, Peter Kasting, Anton Bikineev, cxx
On Wed, Dec 6, 2023 at 12:19 PM Avi Drissman <a...@google.com> wrote:
PTAL at my audit of bit_cast use. We could add static_assert (or concepts or whatever) to base::bit_cast, but that would break a ton of code, very hard. This architectural issue goes very deep, and would require the redesign of core 3p libraries from scratch.

Why do you say a redesign would be needed, as opposed to using reinterpret_cast and/or memcpy?

Avi Drissman

unread,
Dec 6, 2023, 1:47:41 PM12/6/23
to Dana Jansens, Adam Rice, John Admanski, Peter Kasting, Anton Bikineev, cxx
OK, being fair, I don't know. I was thinking on the more extreme end, where the pointer requirements of a library might not be able to be satisfied without involving UB. But each of these would probably need to be fully thought through; I don't see a quick fix for many of these uses.

Avi Drissman

unread,
Dec 6, 2023, 2:45:49 PM12/6/23
to Dana Jansens, Adam Rice, John Admanski, Peter Kasting, Anton Bikineev, cxx
For giggles I uploaded https://chromium-review.googlesource.com/c/chromium/src/+/5094596 with the static asserts as proposed. This does not come close to building locally, and I'm sure will have plenty of red trybots.

Avi Drissman

unread,
Dec 8, 2023, 3:28:46 PM12/8/23
to Dana Jansens, Adam Rice, John Admanski, Peter Kasting, Anton Bikineev, cxx
Still building https://crrev.com/c/5094596. What's fun is how bit_cast on pointers magically makes const go away. It's like a C cast but so much worse.

Avi Drissman

unread,
Dec 8, 2023, 5:41:28 PM12/8/23
to Dana Jansens, Adam Rice, John Admanski, Peter Kasting, Anton Bikineev, cxx
OK, https://chromium-review.googlesource.com/c/chromium/src/+/5094596 seems to be about where I want it to be. Do we have consensus to ban std::bit_cast, and land something like this CL?

I'll then write up a PSA.

Dana Jansens

unread,
Dec 8, 2023, 5:57:37 PM12/8/23
to Avi Drissman, Adam Rice, John Admanski, Peter Kasting, Anton Bikineev, cxx
LGTM yeah, and thank you for fixing these.

David Benjamin

unread,
Jan 29, 2024, 6:11:14 PMJan 29
to Dana Jansens, Avi Drissman, Adam Rice, John Admanski, Peter Kasting, Anton Bikineev, cxx
To circle back to this thread, I think we may have overblown the pitfalls of bit_cast with pointers. First, std::bit_cast does not let you play tricks with pointers in constant evaluation, because it's only constexpr when pointers aren't involved. (Which makes sense because I don't think the compiler could play tricks with pointers in constant evaluation if it wanted to. Pointers don't have addresses yet.)

We also don't build on a platform where T* and U* have different representations, so the fact that the representations are changing bit-by-bit doesn't make much of a difference here. Which means I think bit_cast<T*> isn't practically the same as reinterpret_cast<T*> here. (reinterpret_cast between pointers is defined as a pair of static casts: https://eel.is/c++draft/expr.reinterpret.cast#7 )

One edge case I can see: as the pointer provenance [archive.org because it's offline for me right now] mess firms up, there'll be a difference there. I'd expect the pair of static casts to be provenance-preserving, while the bit-by-bit conversion of bit_cast presumably isn't. I assume that'll settle to PNVI-ae-udi's "angelic nondeterminism" semantics and end up saying that it's the same as turning it into an integer and back. I found this paper which proposes as much. If so, that'll, at worst, cause a missed optimization, but even that would surprise me.

Anyway, I think all this means that bit_cast with pointers is "merely" silly. The programmer should have written reinterpret_cast, but reinterpret_cast on pointers is as dangerous as bit_cast (subclassing gets messed up, most pointer casts aren't allowed, the disaster that is char vs unsigned char vs uint8_t vs char8_t vs std::byte adds too much noise, etc).

If we want our casts to be a bit more uniform, maybe we get a clang-tidy or warning about silly uses of bit_cast, I dunno. I don't think we really need a wrapper.

John Admanski

unread,
Jan 29, 2024, 6:26:41 PMJan 29
to David Benjamin, Dana Jansens, Avi Drissman, Adam Rice, Peter Kasting, Anton Bikineev, cxx
Maybe I'm misunderstanding what's overblown? The problem as far as I know is that there are cases where people are trying to fix unsafe T* -> U* casts by using bit_cast on the pointers as an alternative to reinterpret_cast which does not in fact fix the underlying aliasing issues.

I don't know if it's actually more dangerous in some way than just doing a reinterpret_cast, but it does much more strongly suggest that someone was using the cast without understanding how it's actually supposed to be used.

David Benjamin

unread,
Jan 29, 2024, 11:17:25 PMJan 29
to John Admanski, Dana Jansens, Avi Drissman, Adam Rice, Peter Kasting, Anton Bikineev, cxx
So, some context is that it was asserted to me that this thread concluded std::bit_set is less safe and requires more manual auditing than base::bit_set. I do not think that's true.

We certainly should encourage reinterpret_cast over bit_cast for pointers, but we shouldn't tell ourselves the reason is unsafety or aliasing. It seems equally unsafe and doesn't do anything for aliasing. It's just we shouldn't have two ways to do the same thing, and bit_cast for pointers is pointless (hah). That sounds like a clang-tidy warning.

In particular, I think it's far more likely that programmers are unaware of the aliasing rules, rather than thinking bit_cast absolves them of it. If bit_cast didn't work, they would just use reinterpret_cast, not rethink the function.

Indeed, looking at the links in Peter's audit, every single one of them is just casting between char* and uint8_t*. There is no[*] an aliasing concern here in the first place. (Also Chromium builds with -fno-strict-aliasing.) It's just to work around the disaster that is char vs unsigned char vs signed char vs int8_t vs uint8_t. C++ has too many byte types and, with std::byte and char8_t, they have just made it worse.

To improve the aliasing problem, we need to fix the underlying cause, which is that needing to cast pointers is a commonplace event in our codebase. In my experience, the two most common causes are, in order:

1. The above byte types problem. We don't have a time machine, so I don't think there's any real way out of this one. Signed octet types are unusable, so binary data needs to work with unsigned char or uint8_t. But viewing a string as octets is a common-place operation, and both std::string and string literals use char. Even with -funsigned-char, char and unsigned char are still distinct types. Unless those underlying language rules budge, we will need to cast. So the best we can do is provide convenient helpers to go between all pairs of these, and make them easier to use than the casts. We have base::as_bytes, but that's not quite enough because it only goes one direction. I think we need to stop pretending that "char*" will ever mean "valid UTF-8" and just freely allow u8 spans to go to char spans.

2. Often people cast pointers to decode binary structures. This is invalid by both strict aliasing and alignment rules. Hopefully the spanification work will help establish better patterns here. But as long as problem 1 remains, casts will be too commonplace to get people to recognize that sometimes pointer casts are dangerous.

David

[*] Yes, it is legal to make a C++ implementation where uint8_t is a distinct type from unsigned char, and then make it non-aliasing. This is irrelevant to Chromium. None of our supported C++ implementations do this, and it would be completely unreasonable for an implementation to make that choice. uint8_t is simply an alias for unsigned char as far as we're concerned.


Adam Rice

unread,
Jan 30, 2024, 3:50:41 AMJan 30
to David Benjamin, John Admanski, Dana Jansens, Avi Drissman, Peter Kasting, Anton Bikineev, cxx
If it makes people stop and think about what they are doing, then base::bit_cast is valuable.

> I think we need to stop pretending that "char*" will ever mean "valid UTF-8" and just freely allow u8 spans to go to char spans.

I disagree. Mojo requires strings to be UTF-8, and this is a valuable security precaution. As such, we should continue to push back against the use of char* for non-UTF-8, at least at interface boundaries. //net seemed to be a lost cause, but some progress has been made on IOBuffer, so there is hope.
Reply all
Reply to author
Forward
0 new messages