WTF Vector, String size_t and unsigned

83 views
Skip to first unread message

Dave Tapuska

unread,
Jun 12, 2018, 9:44:44 AM6/12/18
to platform-architecture-dev
In my down times of other work I've been plugging away at trying to reduce some of the warnings in blink with regards to 64 bit to 32 bit truncation.

Now the main issues left are those with Vector and String. I thought I'd ask about guidance here before I attempt any patches.

Vector and String are using unsigned as the backing store size of the data but on an API level they are kind of incongruent.

For example String takes an unsigned value yet returns a size_t for find.
Likewise for Vector there is an implicit expansion to size_t on the size() API.

These cause all kinds of truncation errors because consumers of the API might be using unsigned indexes but the return value is a size_t.

Thoughts on this? I'm on the fence either way. The API could either solely use size_t (although I don't know for what purpose) or they could be converted to unsigned (32 bit always, perhaps there are perf benefits?)

With my OO hat on I'd argue that VectorBufferBase should really be defined having a template class attribute for the storage. But that is likely overkill.

I found an old bug yosin@ opened up before.

dave.

Jeremy Roman

unread,
Jun 12, 2018, 11:08:52 AM6/12/18
to Dave Tapuska, platform-architecture-dev
On Tue, Jun 12, 2018 at 9:44 AM, Dave Tapuska <dtap...@chromium.org> wrote:
In my down times of other work I've been plugging away at trying to reduce some of the warnings in blink with regards to 64 bit to 32 bit truncation.

Now the main issues left are those with Vector and String. I thought I'd ask about guidance here before I attempt any patches.

Vector and String are using unsigned as the backing store size of the data but on an API level they are kind of incongruent.

For example String takes an unsigned value yet returns a size_t for find.
Likewise for Vector there is an implicit expansion to size_t on the size() API.

These cause all kinds of truncation errors because consumers of the API might be using unsigned indexes but the return value is a size_t.

Just compile warnings, or are you seeing actual problems at runtime? I had a vague impression that we never allocate a StringImpl or vector buffer that is >4 GiB large.
 
Thoughts on this? I'm on the fence either way. The API could either solely use size_t (although I don't know for what purpose) or they could be converted to unsigned (32 bit always, perhaps there are perf benefits?)

I'd lean mildly toward unsigned in general. I think some of those were changed to correspond more closely with STL types, but I doubt it makes much of a difference.
 
With my OO hat on I'd argue that VectorBufferBase should really be defined having a template class attribute for the storage. But that is likely overkill.

There isn't enough template magic yet in Vector/VectorBuffer/VectorBufferBase/PartitionAllocator/etc? :P
 
I found an old bug yosin@ opened up before.

dave.

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.
To post to this group, send email to platform-architecture-dev@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CAHXv1w%3D1_WPgAcKqAH%2BBpjoKtRTSVXbxcuDNTaAPJbJ6SjF4nA%40mail.gmail.com.

Dave Tapuska

unread,
Jun 12, 2018, 11:27:05 AM6/12/18
to Jeremy Roman, platform-architecture-dev
No it is just compile warnings. You can't allocate a size larger than 4GiB. One more thing to add personally I'd prefer using uint32_t over unsigned to make it explicitly clear. 


On Tue, Jun 12, 2018 at 11:08 AM Jeremy Roman <jbr...@chromium.org> wrote:
On Tue, Jun 12, 2018 at 9:44 AM, Dave Tapuska <dtap...@chromium.org> wrote:
In my down times of other work I've been plugging away at trying to reduce some of the warnings in blink with regards to 64 bit to 32 bit truncation.

Now the main issues left are those with Vector and String. I thought I'd ask about guidance here before I attempt any patches.

Vector and String are using unsigned as the backing store size of the data but on an API level they are kind of incongruent.

For example String takes an unsigned value yet returns a size_t for find.
Likewise for Vector there is an implicit expansion to size_t on the size() API.

These cause all kinds of truncation errors because consumers of the API might be using unsigned indexes but the return value is a size_t.

Just compile warnings, or are you seeing actual problems at runtime? I had a vague impression that we never allocate a StringImpl or vector buffer that is >4 GiB large.
 
Thoughts on this? I'm on the fence either way. The API could either solely use size_t (although I don't know for what purpose) or they could be converted to unsigned (32 bit always, perhaps there are perf benefits?)

I'd lean mildly toward unsigned in general. I think some of those were changed to correspond more closely with STL types, but I doubt it makes much of a difference.
 
With my OO hat on I'd argue that VectorBufferBase should really be defined having a template class attribute for the storage. But that is likely overkill.

There isn't enough template magic yet in Vector/VectorBuffer/VectorBufferBase/PartitionAllocator/etc? :P
 
I found an old bug yosin@ opened up before.

dave.

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To post to this group, send email to platform-arc...@chromium.org.

Dmitry Gozman

unread,
Jun 12, 2018, 12:41:58 PM6/12/18
to Dave Tapuska, Jeremy Roman, platform-architecture-dev
I have no detailed knowledge here, but from a contributor standpoint aligning with STL sounds better.

- Dmitry

Dave Tapuska

unread,
Jun 12, 2018, 12:56:52 PM6/12/18
to Dmitry Gozman, Jeremy Roman, platform-architecture-dev
Aligning with STL is to template it. As vector returns Allocatior::size_type and you can define that to be whatever you like. A lot of people use size_t with a vector as the default but generally indexes and what not with vectors should be using vector::size_type.


Chris Palmer

unread,
Jun 12, 2018, 1:46:51 PM6/12/18
to dtap...@chromium.org, dgo...@chromium.org, Jeremy Roman, platform-architecture-dev
My view is:

  • No more template magic please
  • Use size_t at the API, and (if it really does save you struct size), uint32_t internally
    • Have we gained all the struct size wins we can, btw? Packing, ordering, bitfields instead of 5 bools?
  • Check for truncation internally; it's an implementation detail; don't make callers have to worry about it
  • uint32_t for clarity; the vaguely-sized types are bad and should feel bad
    • I suspect maybe they do feel bad. They know what they've done.

Kentaro Hara

unread,
Jun 12, 2018, 8:13:34 PM6/12/18
to Chris Palmer, Dave Tapuska, Dmitry Gozman, Jeremy Roman, platform-architecture-dev
A couple of thoughts:

- Vector and String cannot support >4 GB allocations in any way because PartitionAlloc & Oilpan don't support them.

- I want to avoid doing a size_t => uint32_t truncation in the API because it adds one if branch and also increases the binary size (Note: most of the String / Vector APIs are inlined).

- So, I'd prefer using the same type (size_t or uint32_t) both at the API and internally.




--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To post to this group, send email to platform-arc...@chromium.org.


--
Kentaro Hara, Tokyo, Japan

Yuta Kitamura

unread,
Jun 13, 2018, 1:10:57 AM6/13/18
to Kentaro Hara, Chris Palmer, dtap...@chromium.org, Dmitry Gozman, Jeremy Roman, platform-architecture-dev
I would say let's use just unsigned. unsigned is our size_t, essentially.

Chris Palmer

unread,
Jun 13, 2018, 1:13:35 PM6/13/18
to Yuta Kitamura, Kentaro Hara, Dave Tapuska, Dmitry Gozman, Jeremy Roman, platform-architecture-dev
On Tue, Jun 12, 2018 at 10:10 PM Yuta Kitamura <yu...@chromium.org> wrote:

I would say let's use just unsigned. unsigned is our size_t, essentially.

I'd really, really like the clarity that the name `uint32_t` provides, though.

Dave Tapuska

unread,
Jun 25, 2018, 10:17:52 AM6/25/18
to pal...@chromium.org, Yuta Kitamura, Kentaro Hara, Dmitry Gozman, Jeremy Roman, platform-architecture-dev
I've looked more at this and I think typedef'ng a "wtf_size_t" first  and then all the callsite that are using size_t can be changed.

There are a number of sites in layout, paint that all return a size_t but really just end up fetching this from a vector (that would need to be changed to deal with a uint32_t instead). And if vector returns a uint32_t all of these sites need to be changed. 

So I propose that first I define a wtf_size_t as size_t, change all the call sites that are using size_t or unsigned right now. Then I can redefine wtf_size_t to be a uin32_t. Does anyone have any objections to this? Or ideas for an alternate name?

dave.

Yuta Kitamura

unread,
Jun 26, 2018, 5:43:17 AM6/26/18
to Dave Tapuska, Chris Palmer, Kentaro Hara, Dmitry Gozman, Jeremy Roman, platform-architecture-dev
Hmm, introducing wtf_size_t seems a bit too exaggerated. I think just replacing the use sites with uint32_t would be sufficient -- but others may think differently though.

Dave Tapuska

unread,
Jun 26, 2018, 8:47:41 AM6/26/18
to Yuta Kitamura, pal...@chromium.org, Kentaro Hara, Dmitry Gozman, Jeremy Roman, platform-architecture-dev
I prefer using uint32_t but the problem is I've changed some of it locally and working through the remaining warnings but it is becoming a mega patch. Whereas if I had a typedef I could probably land it incrementally. Perhaps someone else has a different idea.

dave.

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To post to this group, send email to platform-arc...@chromium.org.

Dave Tapuska

unread,
Jul 20, 2018, 9:17:30 AM7/20/18
to Yuta Kitamura, pal...@chromium.org, Kentaro Hara, Dmitry Gozman, Jeremy Roman, platform-architecture-dev
I've posted a first change in the sequence here:
https://chromium-review.googlesource.com/c/chromium/src/+/1145045

Take a look at let me know what you think.

dave.

pkas...@chromium.org

unread,
Jul 21, 2018, 8:48:13 PM7/21/18
to platform-architecture-dev
Sorry to be late here, but I was just made aware of this thread.

Please use size_t in all APIs for these sorts of cases.  The Chromium Style Guide covers this explicitly in https://chromium.googlesource.com/chromium/src/+/master/styleguide/c++/c++.md#types (as well as the issue of size_t versus size_type):

"Use size_t for object and allocation sizes, object counts, array and pointer offsets, vector indices, and so on. The signed types are incorrect and unsafe for these purposes (e.g. integer overflow behavior for signed types is undefined in the C and C++ standards, while the behavior is defined for unsigned types.) The C++ STL is a guide here: they use size_t and foo::size_type for very good reasons.

"Use size_t directly in preference to std::string::size_type and similar.

"Occasionally classes may have a good reason to use a type other than size_t for one of these concepts, e.g. as a storage space optimization. In these cases, continue to use size_t in public-facing function declarations."

Per the above, if you have structs that contribute significantly to overall memory usage from using size_t, you can cast to something smaller to store in the object, though it is clearer, safer, and IMO better in most cases to just use size_t internally, and only optimize for the memory use in the most critical instances.  Casting in such cases should not introduce a branch; I'd like to understand better what Kentaro means by that.  Maybe thinking of doing a checked cast?

Meanwhile, the Google style guide bans uint32_t for these cases in http://google.github.io/styleguide/cppguide.html#Integer_Types :

"You should not use the unsigned integer types such as uint32_t, unless there is a valid reason such as representing a bit pattern rather than a number, or you need defined overflow modulo 2^N. In particular, do not use unsigned types to say a number will never be negative. Instead, use assertions for this."

"unsigned" is banned by both the above and an earlier bit from that section.

So if we wanted a sized type, I'd think we'd need to use int32_t and not uint32_t.

I think the justification that our allocators can't return >4GB regions is not very future-proof, honestly; I expect the Chromium codebase to live for decades, and I would not be willing to bet on that assertion being true in decades.  To me the best objection to size_t is taking an actual memory-usage hit.  If that's a serious concern, I'd want to better understand how much memory we are potentially saving in typical cases by representing things as explicitly-32-bit types.

PK

Dave Tapuska

unread,
Jul 21, 2018, 9:05:03 PM7/21/18
to Peter Kasting, platform-architecture-dev
Size_t is an unsigned type. So you have the exact same overflow problems with using uint32_t. 

The guidance actually shows that we really should be using int64_t in these cases. I personally still prefer my original idea of a wtf_size type. Then we can change it in the future easily. The max heap allocation is 2^27 so that would the max index today. 

Dave

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To post to this group, send email to platform-arc...@chromium.org.

Peter Kasting

unread,
Jul 22, 2018, 6:40:13 PM7/22/18
to Dave Tapuska, platform-architecture-dev
On Sat, Jul 21, 2018 at 6:05 PM Dave Tapuska <dtap...@chromium.org> wrote:
Size_t is an unsigned type. So you have the exact same overflow problems with using uint32_t.

size_t is not designed to be used in scenarios where overflow is possible.  It's supposed to be able to contain the size of any memory-allocated sequence of units on your platform, and be used to iterate through them.  If used in that way, overflow should never occur, so overflow being undefined is not a problem.

The guidance actually shows that we really should be using int64_t in these cases.

I'd be curious why you say that.  I think the Chromium style guide contradicts that directly, and the Google style guide a little less explicitly (but this sentence was added several years back to try and make things more clear: "When appropriate, you are welcome to use standard types like size_t".)

PK

Yuta Kitamura

unread,
Jul 23, 2018, 12:44:46 AM7/23/18
to Peter Kasting, Dave Tapuska, platform-architecture-dev
Hi Peter,

WTF containers use unsigned int (aka uint32_t) intentionally. AFAIK using size_t in WTF containers is NOT acceptable for us, as it would churn the performance numbers. I haven't tried to measure the effect myself, though.

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To post to this group, send email to platform-arc...@chromium.org.

Peter Kasting

unread,
Jul 23, 2018, 1:25:31 AM7/23/18
to Dave Tapuska, platform-architecture-dev
On Sun, Jul 22, 2018 at 3:40 PM Peter Kasting <pkas...@chromium.org> wrote:
On Sat, Jul 21, 2018 at 6:05 PM Dave Tapuska <dtap...@chromium.org> wrote:
Size_t is an unsigned type. So you have the exact same overflow problems with using uint32_t.

size_t is not designed to be used in scenarios where overflow is possible.  It's supposed to be able to contain the size of any memory-allocated sequence of units on your platform, and be used to iterate through them.  If used in that way, overflow should never occur, so overflow being undefined is not a problem.

The above is confusing since I got crossed up compared to my initial email -- overflow is well-defined for size_t since it's an unsigned type.  This was stated clearly in my first email but somehow I got it wrong in my head when writing this one.

PK

Peter Kasting

unread,
Jul 23, 2018, 1:25:34 AM7/23/18
to Yuta Kitamura, Dave Tapuska, platform-architecture-dev
On Sun, Jul 22, 2018 at 9:44 PM Yuta Kitamura <yu...@chromium.org> wrote:
Hi Peter,

WTF containers use unsigned int (aka uint32_t) intentionally. AFAIK using size_t in WTF containers is NOT acceptable for us, as it would churn the performance numbers. I haven't tried to measure the effect myself, though.

Can you elaborate on what "churn the performance numbers" means?

In most cases, it would help to have measurements if your plan is intentionally violating both style guides.  I do think it's possible to make exceptions (although if you must have an explicitly-32-bit type, there's an argument that it should be int32_t rather than uint32_t), but they'd normally be done as last resorts -- and in that case, I'd definitely recommend a type alias like wtf_size_t.

PK

Peter Kasting

unread,
Jul 23, 2018, 2:08:37 AM7/23/18
to Yuta Kitamura, Dave Tapuska, platform-architecture-dev
Here's a possible route forward that could satisfy everyone:

* Introduce a wtf_size_t type alias.  Rewrite everything in terms of this alias.
* Perf test with the alias set to e.g. size_t vs. uint32_t.
* If there aren't significant losses from using size_t, then it's trivial to use a plugin to mechanically rewrite wtf_size_t to size_t and be done without having taken much more work than to use size_t to start with.
* If there are significant losses, then the type alias definition point becomes the place where this can be documented, and the use of types in the code will be clear as to the intent -- since one of the advantages of size_t over uint32_t is that its meaning is more specific and clear.

WDYT?

PK

Yuta Kitamura

unread,
Jul 23, 2018, 2:24:21 AM7/23/18
to Peter Kasting, Dave Tapuska, platform-architecture-dev
I don't disagree on your plan, but I personally don't have time to do this, and it seems like the plan has a limited benefit with a lot of cost (merely conversions to wtf_size_t would require a lot of effort), so it doesn't look beneficial for us overall.

Peter Kasting

unread,
Jul 23, 2018, 2:36:26 AM7/23/18
to Yuta Kitamura, Dave Tapuska, platform-architecture-dev
On Sun, Jul 22, 2018 at 11:24 PM Yuta Kitamura <yu...@chromium.org> wrote:
On Mon, Jul 23, 2018 at 3:08 PM Peter Kasting <pkas...@chromium.org> wrote:
Here's a possible route forward that could satisfy everyone:

* Introduce a wtf_size_t type alias.  Rewrite everything in terms of this alias.
* Perf test with the alias set to e.g. size_t vs. uint32_t.
* If there aren't significant losses from using size_t, then it's trivial to use a plugin to mechanically rewrite wtf_size_t to size_t and be done without having taken much more work than to use size_t to start with.
* If there are significant losses, then the type alias definition point becomes the place where this can be documented, and the use of types in the code will be clear as to the intent -- since one of the advantages of size_t over uint32_t is that its meaning is more specific and clear.

WDYT?

I don't disagree on your plan, but I personally don't have time to do this,

I thought Dave was the one raising the issue to begin with and doing the engineering work, and wanted to do the type alias route already anyway.  I wasn't trying to suggest that this should be added to your plate.

and it seems like the plan has a limited benefit with a lot of cost (merely conversions to wtf_size_t would require a lot of effort),

I don't understand what you mean by "a lot of cost", since the plan here is to convert existing type usage anyway.  How is it more work to convert to a consistent alias than to convert to a consistent use of another type? 

so it doesn't look beneficial for us overall.

I'd be happy to hear alternate plans that don't have the downsides that a straight conversion to uint32_t does, but the above is the best suggestion I have right now to try and balance the different concerns that have been expressed.

PK

Kentaro Hara

unread,
Jul 23, 2018, 3:09:41 AM7/23/18
to Peter Kasting, Yuta Kitamura, Dave Tapuska, platform-architecture-dev
Just help me understand: What are the main practical reasons for banning uint32_t?

If overflow is the main reason, I have the same question as Dave. size_t would have the same concern...



--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To post to this group, send email to platform-arc...@chromium.org.

Dave Tapuska

unread,
Jul 23, 2018, 12:06:53 PM7/23/18
to Kentaro Hara, Peter Kasting, Yuta Kitamura, platform-architecture-dev
Peter just so my interpretation is correct you expect StringView and friends would change to be wtf_size_t based right? That will be a big undertaking. As what I've only tried right now is only to make the externals of WTF match what the internals actually are.

I think I can perhaps try adjusting some of the string natives to be size_t based and run some of the benchmarks locally and report back.

I do agree with Peter's points about using wtf_size_t either way though.

dave.


Kentaro Hara

unread,
Jul 23, 2018, 12:20:40 PM7/23/18
to Dave Tapuska, Peter Kasting, Yuta Kitamura, platform-architecture-dev
I'd disagree with changing the type of the internal storage of String / Vector etc from uint32_t to size_t. WTF is a fundamental library of Blink and thus must be extremely optimized for performance and memory. Even though it doesn't show regressions in some representative benchmarks, I'd prefer NOT making the change unless we have a very strong reason to do so (e.g., using size_t matters for security).

I'm open for ideas about the type used in their APIs though. I think we have three options.

a) uint32_t
b) size_t
c) wtf_size_t (alias of uint32_t)

Today we're using b), and the type mismatch between the internal implementation and the external API is exactly the problem Dave is addressing now. So b) is not an option.

If a) is not an option from the coding guideline perspective, yeah it seems like c) is the only solution...

Peter Kasting

unread,
Jul 23, 2018, 12:54:24 PM7/23/18
to Dave Tapuska, Kentaro Hara, Yuta Kitamura, platform-architecture-dev
On Mon, Jul 23, 2018 at 9:06 AM Dave Tapuska <dtap...@chromium.org> wrote:
Peter just so my interpretation is correct you expect StringView and friends would change to be wtf_size_t based right? That will be a big undertaking.

I don't think the win from type correctness is large enough to go change a whole bunch of working, warning-free code to a different type just so everything can be consistent.

My comments only apply in areas where you were already planning on making some set of changes.  In those cases, it would be nice to make changes such that they move closer to being clear, consistent, future-proof, and style-guide-compliant.

Think of it kind of like const-correctness: Blink has big problems with it now, but it's not a good idea to go try to fix all of it everywhere at once.  Better to just get incrementally more const-correct over time.

So in an ideal world, yeah, I think StringView would do better to be wtf_size_t based, but I'm not trying to mandate that.  Indeed, I'm not in a position to mandate anything; I'm trying to bring y'all the guide's perspective so you can make the best decisions possible.  What that decision is is up to you.

PK

Peter Kasting

unread,
Jul 23, 2018, 1:05:56 PM7/23/18
to Kentaro Hara, Yuta Kitamura, Dave Tapuska, platform-architecture-dev
On Mon, Jul 23, 2018 at 12:09 AM Kentaro Hara <har...@chromium.org> wrote:
Just help me understand: What are the main practical reasons for banning uint32_t?

There are two different considerations: uint32_t as an underlying type, and uint32_t as a visible type.

For uint32_t as an underlying type -- as is being proposed with aliasing wtf_size_t to that by default -- the biggest concerns are as follows:
  • If it would ever be possible to store a >32-bit value in these, you have problems.  My concern here would not be "now" but "ten years from now".  Since this is only a concern on >32-bit systems, size_t is sufficient to avoid overflow here (because it expands to 64-bit on those systems); something like int64_t/uint64_t is overkill.
  • Since Chromium and the STL use size_t, you have type narrowings at the boundaries.  Depending on how your boundaries are structured, these might be true bugs, but likely most just result in a bunch of compiler warnings that lead to inserting messy typecasts that reduce readability.
For uint32_t as a visible type, you add in the following concerns:
  • The meaning of the type is not only unclear, but somewhat misleading.  "size_t" means "count of memory-allocated objects", which is in fact what you have.  But "uint32_t" at best means "a 32-bit value", which is much more vague; and given the Google style guide's restrictions, it is normally only used for (and thus read to mean) "bit pattern".  Using it in such a different sense, when there's already a type that means exactly what you want to readers, is confusing, especially when moving between Blink and Chromium.
  • Choosing to use uint32_t because the 32-bit size is important immediately raises the question of why not int32_t, which is the normal type for "a 32-bit value".  The best answer I could give here is that conversions between uint32_t and size_t (at the boundaries mentioned above) are safer and cleaner than between int32_t and size_t, because there's no sign conversion.  But for readers this reinforces the "why not use size_t" question.  It's an implementation detail that's being exposed in the API.
PK

Peter Kasting

unread,
Jul 23, 2018, 1:12:57 PM7/23/18
to Kentaro Hara, Dave Tapuska, Yuta Kitamura, platform-architecture-dev
On Mon, Jul 23, 2018 at 9:20 AM Kentaro Hara <har...@chromium.org> wrote:
I'd disagree with changing the type of the internal storage of String / Vector etc from uint32_t to size_t. WTF is a fundamental library of Blink and thus must be extremely optimized for performance and memory. Even though it doesn't show regressions in some representative benchmarks, I'd prefer NOT making the change unless we have a very strong reason to do so (e.g., using size_t matters for security).

OK.

I'm open for ideas about the type used in their APIs though. I think we have three options.

a) uint32_t
b) size_t
c) wtf_size_t (alias of uint32_t)

Today we're using b), and the type mismatch between the internal implementation and the external API is exactly the problem Dave is addressing now. So b) is not an option.

If the problem being addressed is warnings, it seems like it would be sufficient to insert explicit casts in the implementation when an incoming size_t needs to be truncated.  Would that make (b) work?

That's the route the Chromium style guide suggests, it's less effort than changing consumers of these APIs to use a different type if they're already using size_t, and it's reasonably future-proof (since changing to use size_t someday would only require modifying the implementations of these classes and not the API or users thereof).

I guess I'm still not clear on why this isn't a viable path forward.

PK

Kentaro Hara

unread,
Jul 23, 2018, 1:26:44 PM7/23/18
to Peter Kasting, Dave Tapuska, Yuta Kitamura, platform-architecture-dev
On Mon, Jul 23, 2018 at 7:13 PM Peter Kasting <pkas...@chromium.org> wrote:
On Mon, Jul 23, 2018 at 9:20 AM Kentaro Hara <har...@chromium.org> wrote:
I'd disagree with changing the type of the internal storage of String / Vector etc from uint32_t to size_t. WTF is a fundamental library of Blink and thus must be extremely optimized for performance and memory. Even though it doesn't show regressions in some representative benchmarks, I'd prefer NOT making the change unless we have a very strong reason to do so (e.g., using size_t matters for security).

OK.

I'm open for ideas about the type used in their APIs though. I think we have three options.

a) uint32_t
b) size_t
c) wtf_size_t (alias of uint32_t)

Today we're using b), and the type mismatch between the internal implementation and the external API is exactly the problem Dave is addressing now. So b) is not an option.

If the problem being addressed is warnings, it seems like it would be sufficient to insert explicit casts in the implementation when an incoming size_t needs to be truncated.  Would that make (b) work?

Then we need to use checked casts (because it would not be a good idea to implicitly cast size_t to uint32_t). Dave tried this approach a couple of weeks ago but we had to revert the CL because it caused performance regression :/

 

That's the route the Chromium style guide suggests, it's less effort than changing consumers of these APIs to use a different type if they're already using size_t, and it's reasonably future-proof (since changing to use size_t someday would only require modifying the implementations of these classes and not the API or users thereof).

I guess I'm still not clear on why this isn't a viable path forward.

PK

Dave Tapuska

unread,
Jul 23, 2018, 1:29:53 PM7/23/18
to Peter Kasting, Kentaro Hara, Yuta Kitamura, platform-architecture-dev
Using explicit casts isn't necessarily that simple when the bounds checks (if relevant) are delegated into the implementation layer and that is working on narrow types.

For example (this is a contrived example; not sure it happens in our current code base)

vector[0]
vector[0xf00df00df00000000]

In the current world both return the exact same vector[0] value. But they really shouldn't. A DCHECK really should fire.

This is the main problem with a lot of the UMA histograms. They are using int64 types for calculations (mainly based on time) but histograms work only on ints. So the correct answer for the histograms is to insert a saturated cast. The correct solution here would be to implement the bounds check at the widest layer in the contrived example.

dave.


Peter Kasting

unread,
Jul 23, 2018, 1:36:44 PM7/23/18
to Kentaro Hara, Dave Tapuska, Yuta Kitamura, platform-architecture-dev
On Mon, Jul 23, 2018 at 10:26 AM Kentaro Hara <har...@chromium.org> wrote:
On Mon, Jul 23, 2018 at 7:13 PM Peter Kasting <pkas...@chromium.org> wrote:
On Mon, Jul 23, 2018 at 9:20 AM Kentaro Hara <har...@chromium.org> wrote:
I'd disagree with changing the type of the internal storage of String / Vector etc from uint32_t to size_t. WTF is a fundamental library of Blink and thus must be extremely optimized for performance and memory. Even though it doesn't show regressions in some representative benchmarks, I'd prefer NOT making the change unless we have a very strong reason to do so (e.g., using size_t matters for security).

OK.

I'm open for ideas about the type used in their APIs though. I think we have three options.

a) uint32_t
b) size_t
c) wtf_size_t (alias of uint32_t)

Today we're using b), and the type mismatch between the internal implementation and the external API is exactly the problem Dave is addressing now. So b) is not an option.

If the problem being addressed is warnings, it seems like it would be sufficient to insert explicit casts in the implementation when an incoming size_t needs to be truncated.  Would that make (b) work?

Then we need to use checked casts (because it would not be a good idea to implicitly cast size_t to uint32_t).

The claim is that using uint32_t is safe because no input values can ever be larger than 32 bits.  If this is true, you can static_cast.  If it's not true, you can't safely store the value in a uint32_t anyway -- a checked_cast just changes the sort of failure that results.

I would certainly expect a checked_cast route to cause perf regressions, and I wouldn't go that route.  static_cast is better here.

PK

Kentaro Hara

unread,
Jul 23, 2018, 1:44:46 PM7/23/18
to Peter Kasting, Dave Tapuska, Yuta Kitamura, platform-architecture-dev
IIUC that's the exact problem Dave is addressing here. Currently String, Vector etc are using uint32_t as an internal type but caller sites are using size_t as a visible type. This may result in an incorrect behavior, so we want to match the two types or add checked_assert.



I would certainly expect a checked_cast route to cause perf regressions, and I wouldn't go that route.  static_cast is better here.

PK

Peter Kasting

unread,
Jul 23, 2018, 1:50:54 PM7/23/18
to Kentaro Hara, Dave Tapuska, Yuta Kitamura, platform-architecture-dev
I thought Dave was addressing compiler warnings, not behavioral incorrectness.

This sort of solution is explicitly recommended in the Chromium style guide, and when we've employed it in Chromium we've frequently avoided checked_casts due to precisely the sort of performance concerns you're describing.  We static_cast, because not passing in larger values is part of the contract that callers are expected to guarantee on their own.

The incongruity for me here is that you're telling me "this may result in incorrect behavior", i.e. you believe that at runtime someone can actually pass a >32-bit value, yet you want to use 32-bit storage types.  This seems analogous to handling DCHECK failure: either the situation really can't happen, so don't handle it, or it really can, in which case the underlying types should change.  I suspect the reality for you is that it really can't happen and you should just static_cast, which will eliminate the compiler warnings and let everyone move on.  But if that's not true, and you can't add checked_casts due to perf, and you can't use size_t as the storage type due to perf, then using wtf_size_t everywhere is the only option I can come up with.

PK

Chris Palmer

unread,
Jul 24, 2018, 5:00:56 AM7/24/18
to Peter Kasting, Kentaro Hara, Dave Tapuska, Yuta Kitamura, platform-architecture-dev
Please use checked_cast. A checked_cast in the callee is the cheapest way to achieve minimally correct code.

First of all, we know we have a problem in that people expect the fundamental classes to be safe/well-tested/heavily-fuzzed/"bug-free after all these years", but in fact that is not true, as my team keeps finding out. (It should be true, and we feel your pain, and we're working on it, but it is not yet true. For reasons that this thread has illuminated. ;) )

Therefore, we (Security Team) keep calling for things like "sandbox that unsafe JSON parser!" which is far more expensive (and more work for you) than a checked_cast. Obviously, for WTF and Blink, everything is sandboxed, but it still matters that fundamental classes be minimally safe. Renderer compromise is still rated as a High severity security bug.

If the interface is size_t but the internal implementation is uint32_t, callers cannot be expected to know that. That's what interface means. :) Thus it is plainly the callee's duty to assert that its own internal space efficiency hack does not result in incorrect or attacker-friendly behavior.

Additionally, putting checks at every call site is error-prone and results in more source and object code vs. centralizing the check in the callee. So it would be overall less efficient and less effective than a checked_cast inside the callee.

On 32-bit platforms, of course, the checked_cast will compile away into nothing (because size_t is uint32_t there). It's sad that the space hack might cost some time on 64-bit, but time/space trade-offs should be nothing new. As far as I can tell, it also remains to be seen just how much space using uint32_t saves. Intuitively it makes sense that it would save some space, but intuition is insufficient.

Kentaro Hara

unread,
Jul 24, 2018, 5:56:57 AM7/24/18
to Chris Palmer, Peter Kasting, Dave Tapuska, Yuta Kitamura, platform-architecture-dev
So... I think we have the following options:

a) internal type = size_t, external type = size_t (We need to prove that this doesn't regress anything.)
b) internal type = uint32_t, external type = size_t (We need to introduce checked_cast. We confirmed that this is not acceptable for performance.)
c) internal type = uint32_t, external type = uint32_t (This is not nice per coding guideline.)
d) internal type = uint32_t, external type = wtf_size_t (defined as uint32_t)

Then my suggestion would be d).



Peter Kasting

unread,
Jul 24, 2018, 11:58:04 AM7/24/18
to Kentaro Hara, Chris Palmer, Dave Tapuska, Yuta Kitamura, platform-architecture-dev
On Tue, Jul 24, 2018 at 2:56 AM Kentaro Hara <har...@chromium.org> wrote:
So... I think we have the following options:

a) internal type = size_t, external type = size_t (We need to prove that this doesn't regress anything.)
b) internal type = uint32_t, external type = size_t (We need to introduce checked_cast. We confirmed that this is not acceptable for performance.)
c) internal type = uint32_t, external type = uint32_t (This is not nice per coding guideline.)
d) internal type = uint32_t, external type = wtf_size_t (defined as uint32_t)

If you do (d), I's make the internal type be wtf_size_t as well, rather than straight uint32_t.  That's not only clearer, it gives you the ability to test a change to (a) by changing one line of code, to confirm the performance impact.

Also, you might be able to do something clever here to ensure type safety and avoid compiler warnings at wtf_size_t/size_t boundaries: make wtf_size_t a class that wraps a uint32_t, with an implicit constructor that converts from size_t by checked_casting (and, of course, implicit copy/move constructors and constructor from uint32_t).  It may be (you'd have to check) that this can achieve zero overhead while ensuring all 64->32-bit downconversions get checked_casted.

PK

Dave Tapuska

unread,
Jul 24, 2018, 2:14:27 PM7/24/18
to Peter Kasting, Kentaro Hara, Chris Palmer, Yuta Kitamura, platform-architecture-dev
As per option d; I've replaced uint32_t with wtf_size_t in the patch and added a header file defining wtf_size_t.  (https://chromium-review.googlesource.com/c/chromium/src/+/1145045 has been updated)

dave.

Kentaro Hara

unread,
Jul 24, 2018, 2:18:33 PM7/24/18
to Dave Tapuska, Peter Kasting, Chris Palmer, Yuta Kitamura, platform-architecture-dev
Thanks Dave!

Let's wait one more day to see if there's any objection in going with d) :)


Reply all
Reply to author
Forward
0 new messages