std::string_view, absl::string_view, and base::StringPiece

1,553 views
Skip to first unread message

David Benjamin

unread,
Mar 21, 2023, 9:22:42 PM3/21/23
to cxx
Hi cxx@,

After we've enabled libc++'s safe mode, and updated past https://reviews.llvm.org/D145981, I believe libc++ now contains every safety check currently contained in absl::string_view and base::StringPiece. (See CL descriptions in links below for a breakdown.) I'd like to discuss switching one or both types to std::string_view.

I put together some notes here, and some links to CLs:

Thoughts?

David

David Benjamin

unread,
Mar 21, 2023, 9:26:46 PM3/21/23
to cxx
> See CL descriptions in links below for a breakdown.

Er, sorry, pretend that this said "See document for a breakdown". What used to be a long email and long CL description turned into a document and then I copy-pasted without paying attention. :-)

Peter Kasting

unread,
Mar 21, 2023, 10:05:24 PM3/21/23
to David Benjamin, cxx
Pros:
  • -90KiB binary
  • 2 -> 1 vocabulary types (simplifies things if we ever re-fork someday)
  • Stricter about ADL and (eventually) IWYU
  • Kills StringPieceHash
Cons:
  • Would require re-forking if in the future we want to change string_view in a way that cannot be upstreamed to libc++
IMO, clear win, +1

PK

dan...@chromium.org

unread,
Mar 22, 2023, 10:48:41 AM3/22/23
to Peter Kasting, David Benjamin, cxx
We have no means to use raw_ptr in std::string_view, do we?
 

PK

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CAAHOzFA1CWu73s0T9y-E7NFxicJzvBmbKb6h%2BXJGujiY1_S3eA%40mail.gmail.com.

David Benjamin

unread,
Mar 22, 2023, 11:31:52 AM3/22/23
to dan...@chromium.org, Peter Kasting, cxx
On Wed, Mar 22, 2023 at 10:48 AM <dan...@chromium.org> wrote:
On Tue, Mar 21, 2023 at 10:05 PM 'Peter Kasting' via cxx <c...@chromium.org> wrote:
On Tue, Mar 21, 2023 at 6:22 PM David Benjamin <davi...@chromium.org> wrote:
After we've enabled libc++'s safe mode, and updated past https://reviews.llvm.org/D145981, I believe libc++ now contains every safety check currently contained in absl::string_view and base::StringPiece. (See CL descriptions in links below for a breakdown.) I'd like to discuss switching one or both types to std::string_view.

I put together some notes here, and some links to CLs:

Pros:
  • -90KiB binary
  • 2 -> 1 vocabulary types (simplifies things if we ever re-fork someday)
  • Stricter about ADL and (eventually) IWYU
  • Kills StringPieceHash
Cons:
  • Would require re-forking if in the future we want to change string_view in a way that cannot be upstreamed to libc++
IMO, clear win, +1

We have no means to use raw_ptr in std::string_view, do we?

I mean, it's all code ultimately. There's nothing technical stopping us from patching raw_ptr into std::string_view if we want.

I'd even argue that's the Correct(TM) option. You can think of all the STL APIs and C++ itself as an API contract between our code and our toolchain. We don't get to define the contract, but we own our toolchain, so we ultimately own both sides of that contract. I'd expect raw_ptr to be compatible with the standard's std::string_view API contract. If so, we don't need our own vocabulary type for it. (If the standard says something is UB, we're free to make our toolchain define it to have any behavior we like. That's basically what the hardening mode is. Would be nice if the standard had the right behavior from the start, but that one's out of our hands.)

Doing it in the STL version also means we apply it to all of Chromium, including our implementations of HTTP/2, TLS, QUIC, WebRTC, and (soon) X.509, which are maintained as external projects to share code with non-Chromium projects.  And it relieves some of the tension between Chromium-specific hardening vs. reusing code across projects. This particularly matters for networking features. Every networking feature requires both sides to implement some protocol, so there'll be at least two users.

Of course, Correct(TM) isn't the same as correct. :-) I doubt we can upstream that, and carrying a libc++ patch is much more expensive than just patching base::StringPiece. So, yes, that is a cost of turning base::StringPiece into std::string_view. (Though I do wonder if we could come up with better patching strategies. The base::StringPiece story is that we maintain an entire parallel std::string_view implementation under another name, and yet that's less expensive right now...)

(Somewhat of a tangent, but are we expecting to use raw_ptr for stack variables? I would guess most StringPieces are on the stack, rather than on the heap. Putting raw_ptr in there would mean applying it to a bunch of stack values.)

David

dan...@chromium.org

unread,
Mar 22, 2023, 11:41:43 AM3/22/23
to David Benjamin, Peter Kasting, cxx
Yeah, it is much much cheaper to write a vocab type in base, especially when there's an API to copy from and it's already used everywhere, than to change upstream, FBOFW.
 
(Somewhat of a tangent, but are we expecting to use raw_ptr for stack variables? I would guess most StringPieces are on the stack, rather than on the heap. Putting raw_ptr in there would mean applying it to a bunch of stack values.)

Right, the rewrite and enforcement is that all field-pointers are raw_ptr, as an approximation for "may end up on the heap and probably points to the heap", and to scope the amount of smart pointers introduced.


It's hard to argue that we should make such a large change for 40 StringPieces.

David

David Benjamin

unread,
Mar 22, 2023, 11:52:24 AM3/22/23
to dan...@chromium.org, Peter Kasting, cxx
I guess that means there's a third option: if our aim is to use raw_ptrs in fields, but most StringPieces are not in fields, we can make the rewriter treat std::string_view as a pointer type. And then we have string_view_for_fields (insert better name here) that is to std::string_view what base::raw_ptr is for raw pointers. And then the rewriter just rewrites those 40 string_views into string_view_for_fields.

David 

Peter Kasting

unread,
Mar 23, 2023, 2:17:37 PM3/23/23
to David Benjamin, dan...@chromium.org, cxx, Arthur Sonzogni
I reached out to Louis @ libc++. He's receptive to finding a way to let us use raw_ptr inside things like std::span and std::string_view. I also reached out to Arthur (+CCed) on our side, who'd be a candidate to drive this if we want to do it. I think others question whether it's worth the effort to try to make this happen. I have done my best to make a case for doing this, so if you really want to see this happen you can join the discussion on Slack #base or debate more here.

In the meantime, I think your idea of string_view_for_fields would be a reasonable alternative for string_view usage in members. The question of base::span -> std::span is tougher, but since this thread isn't about that, we can punt that question for now :)

To me, this suggests we have reasonable future paths and there's not too much reason not to migrate to std::string_view. I don't know whether others are convinced.

PK

David Benjamin

unread,
Mar 24, 2023, 6:29:16 PM3/24/23
to Peter Kasting, Adrian Taylor, dan...@chromium.org, cxx, Arthur Sonzogni
+Adrian Taylor bumped into a case which, I believe, is another reason to at least turn absl::string_view into std::string_view. Over time, as libraries we use move to C++17, they'll start assuming the Abseil polyfills have become typedefs.

In particular, https://github.com/google/centipede is at C++17, so it naturally uses std::string_view. It also uses Abseil's string functions, like absl::StrCat. Abseil's string functions take absl::string_view instead of std::string_view, to support projects (like us) that haven not yet made them equivalent. When they're the same type, this works fine. When they're not, this breaks. That means centipede implicitly requires that absl::string_view and std::string_view are the same type. This is . Centipede requires C++17 and a C++17 Abseil is normally expected to make them match. But our C++17 Abseil is weird because we've intentionally kept the absl implementations.

Conversely, us being late to flipping the Abseil polyfills around means we prevent libraries that support us from making the same move. I believe QUICHE has a few absl -> std moves that are blocked on us right now. (IIRC, I think they have additional blockers for string_view, but we're the sole blocker for optional and variant. optional and variant should, I hope, be viable for us after https://reviews.llvm.org/D146190 lands. And, of course, their other string_view blocker will eventually clear, at which point we'll be the sole blocker.)

This is also, I think, a good demonstration of the costs of multiple vocabulary types. Different implementations of string_view don't implicitly convert between each other, and so divergence means calls across projects means keeping track of which string_view variant is used on each side.

How are we doing on this question? It didn't seem like anyone had reasons not to do the absl::string_view -> std::string_view change, so we could start with just that and unblock Ade. But if we expect to do base::StringPiece -> std::string_view too, I'd lightly prefer to get to a decision and then do both together. Then we don't need to figure out PRESUBMITs, etc. for the intermediate state. Also we can just remove base/strings/abseil_string_conversions.h entirely, rather than take a pit stop at renaming it to base/strings/std_string_conversions.h.

(But if the base question is still thorny, let's start by answering the absl question, since it's easier.)

David

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

Roland Bock

unread,
Mar 25, 2023, 9:09:56 AM3/25/23
to Peter Kasting, David Benjamin, cxx
On Wed, Mar 22, 2023 at 3:05 AM 'Peter Kasting' via cxx <c...@chromium.org> wrote:
On Tue, Mar 21, 2023 at 6:22 PM David Benjamin <davi...@chromium.org> wrote:
After we've enabled libc++'s safe mode, and updated past https://reviews.llvm.org/D145981, I believe libc++ now contains every safety check currently contained in absl::string_view and base::StringPiece. (See CL descriptions in links below for a breakdown.) I'd like to discuss switching one or both types to std::string_view.

I put together some notes here, and some links to CLs:

Pros:
  • -90KiB binary
  • 2 -> 1 vocabulary types (simplifies things if we ever re-fork someday)
  • Stricter about ADL and (eventually) IWYU
  • Kills StringPieceHash
I am a fan of this since it will allow the use of string_view in Heterogeneous Lookup in Associative Containers.

For instance, this would allow PrefService::FindPreference to use string_view as parameter. Since most call sites actually pass a char*, this would be a nice win.
 
Cons:
  • Would require re-forking if in the future we want to change string_view in a way that cannot be upstreamed to libc++
IMO, clear win, +1

PK

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.


--
--

Roland Bock

Software Engineering Manager

rb...@google.com


Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Liana Sebastian

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.


The above terms reflect a potential business arrangement, are provided solely as a basis for further discussion, and are not intended to be and do not constitute a legally binding obligation. No legally binding obligations will be created, implied, or inferred until an agreement in final form is executed in writing by all parties involved.

Arthur Sonzogni

unread,
Mar 27, 2023, 4:47:16 AM3/27/23
to cxx, rb...@google.com, David Benjamin, cxx, pkas...@google.com
That's interesting!
I wrote a doc about it. I hope it captures what everyone said here and on slack. I added some potential concerns in the doc too.
I am going to share it with chrome-memory-safety@ to get their opinion.

David Benjamin

unread,
Mar 27, 2023, 10:43:38 AM3/27/23
to Arthur Sonzogni, cxx, rb...@google.com, pkas...@google.com
To clarify, we are not using raw_ptr into base::StringPiece today. Switching base::StringPiece to std::string_view would not regress any hardening today.

It merely makes it harder to introduce raw_ptr into our string piece/view type later. Whether this matters depends on:
(a) How likely we would have been to find raw_ptr in base::StringPiece worthwhile and viable
(b) How much harder it is to customize libc++'s std::string_view over base::StringPiece

I do not think it makes sense to treat raw_ptr in std::string_view as a prerequisite for this change. It makes sense to discuss the likelihood of (a) and whether we can reduce the cost gap in (b), but we don't need to actually do it. There may be non-std::string_view reasons we don't want to do this. For instance, most of our base::StringPiece instances are stack-allocated. We already excluded pointers on the stack from raw_ptr. Putting raw_ptr in StringPiece would mostly impact stack bits.

Peter Kasting

unread,
Apr 5, 2023, 2:18:20 PM4/5/23
to David Benjamin, cxx
This proposal feels a bit stalled and I'd like to try to reach a conclusion.

Daniel investigated the size win of string_view somewhat and believes it's due to differing inlinining decisions. In theory, we might be able to get the same win without switching with some judicious NOINLINEs in StringPiece. So while this is a "for free" benefit of string_view, it's not likely to be a benefit _only_ achievable by switching to string_view.

Dana recently landed work to make raw_ptr constexpr in C++20. That would make it more feasible, if desired, to use raw_ptr in StringPiece, at least once C++20 lands (current ETA EOQ2).

Arthur wrote a doc on MiraclePtr use in libc++; upstream is theoretically receptive, but if we rely on this, Dana notes that non-Chrome projects that compile our code with libstdc++ will lose the benefits (unless we _also_ got libstdc++ to add something similar, and we made the easy/default way to build our code enable that support). In short, there are questions both about costs and about whether this would actually be a monotonic benefit.

All that said, the case for raw_ptr in StringPiece is not as compelling as for Span, and as David mentioned, if we really wanted protection specifically for fields we do have other rewriter routes. So perhaps we could sidestep the thorny raw_ptr issue for string_view.

I think the most compelling argument for making the switch are the set of issues David notes around third-party (and "first-party third-party" code like BoringSSL and QUICHE) code using these types, assuming the absl types are polyfills to them, etc. This would reduce some friction for things like migrating net/ code to third-party locations that's reused elsewhere without worrying about interop/regressions with Chromium.

To me, this is still something with "small but unevenly-distributed benefits, reasonably small costs" and I vote for allowing David to proceed.

PK

dan...@chromium.org

unread,
Apr 5, 2023, 2:46:05 PM4/5/23
to Peter Kasting, David Benjamin, cxx
Yeah I mean to reply to this earlier, thanks.

Given how little StringPieces are stored as fields, I don't think it feels critical that we transitively apply raw_ptr to the inside of StringPiece right now. That means string_view should be roughly similar to StringPiece for Google Chrome, though not for all Chromiums.

On the other hand, I have fairly low confidence that we won't find more things we'd like to add to our vocabulary types like StringPiece in the future, as C++ memory safety is very far from solved. Migrating to string_view would give up a huge lever we have over an unowned reference type. While libc++ gives us some room to do things internally, they do not allow us to change API. Any small API change would require first the large burden of (re-)introducing a different first-party type after this change.

I am not in favour of giving away our leverage over our vocabulary types as a general principle given:
-  how the standards committee is treating safety,
- the commitment to 100% backward compat from the committee,
- the fact that while we sometimes act otherwise, libc++ is not actually part of Chromium.

As you mentioned Peter, hardening libc++ is great for existing code and third-party code, but it feels like a less powerful strategy for hardening our first-party code, not only because that hardening does not reach all of our (Chromium's) users now or in the future. But also because it ties us to fixed APIs entirely outside of our control.

On net, I can see the argument that Google Chrome will be unchanged by the migration, at least for today. But I still don't come out in favour of doing it, myself. I would rather improve interop performance between string_view and StringPiece (fully willing to do performance-based improvements that depend on tight libc++ coupling) or consider pushing some vocab types into a third_party library, as these feel like more durable strategies to me.
--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

Chris Palmer

unread,
Apr 5, 2023, 2:59:49 PM4/5/23
to dan...@chromium.org, Peter Kasting, David Benjamin, cxx
I agree with not giving away leverage, and I also think putting vocabulary types in a third_party library is a durable strategy.

Peter Kasting

unread,
Apr 5, 2023, 3:11:44 PM4/5/23
to dan...@chromium.org, David Benjamin, cxx
On Wed, Apr 5, 2023 at 11:46 AM <dan...@chromium.org> wrote:
That means string_view should be roughly similar to StringPiece for Google Chrome, though not for all Chromiums.

I'm not sure what the last clause here means. I'm assuming this has something to do with the libc++ vs. libstdc++ distinction; maybe "libstdc++ may not be built with similar hardening asserts"? It's possible libstdc++ has assertions like libc++'s, and that we should tweak our .gn files so that if people are linking with that we #define the relevant things by default. I do think that since gcc/libstdc++ are "community-supported" that this falls into the "we would take patches for this but wouldn't necessarily do it ourselves" bucket, and transitively, I think that means we shouldn't have it as a factor to _not_ use libc++ here.

(As an aside, you mentioned elsewhere that you consider users of these products to be "our users", and I think it may be worth both figuring out how large this population is -- is it just various Linux distros' flavors of Chromium? -- and whether we should have any sort of project policy about support for them. To me, this set of users is "small" and "clearly not something we should either support or make decisions based on as previously stated by eng review", so I don't think I'm in your camp, but perhaps other stakeholders would disagree? Still, this seems possibly worthy of escalation to a larger or higher-authority group, since it seems to be a significant factor in your decision-making.)

While libc++ gives us some room to do things internally, they do not allow us to change API. Any small API change would require first the large burden of (re-)introducing a different first-party type after this change.

Is this a larger burden than the proposed migration away from StringPiece (which seems like a reasonable small burden)? I'm thinking in "YAGNI" terms right now -- can we design for what we need today if changing course in the future is feasible when future needs change?

I am not in favour of giving away our leverage over our vocabulary types as a general principle given:
-  how the standards committee is treating safety,
- the commitment to 100% backward compat from the committee,
- the fact that while we sometimes act otherwise, libc++ is not actually part of Chromium.

This seems very broad. As an example, it would also suggest we should not migrate from base::Clamp to std::clamp because even though libc++ hardens, the standard simply says certain behaviors are UB. I think what you are arguing is that we should not use anything from the STL (vocabulary type, API, etc.) that has any UB (or any other less-than-optimal API characteristics). Is that what you are saying or is there some sort of way to draw a "bright line"? Should we begin by reimplementing all vocabulary types (string, vector, etc.) so we maintain the leverage we want? Is StringPiece more susceptible to these arguments than some of the other types? (TBH it seems _less_ susceptible to me...)

There are nontrivial amounts of ongoing perf improvements, bug fixes, and API feature additions that happen to the core vocabulary types over time. Are we prepared to assume the costs of those for whatever set of things we fork?

How do you view the other costs of your position (e.g. interop with third_party code and with STL pieces we do continue to use, need to continue improving libc++ hardening/perf/etc. or else give up those benefits for non-first-party code)? I can understand your stated benefits, but without explicitly saying why you think the downsides are insignificant or overcomable I'm concerned about being guided by this.

PK

David Benjamin

unread,
Apr 5, 2023, 8:40:23 PM4/5/23
to Peter Kasting, dan...@chromium.org, cxx
On Thu, Apr 6, 2023 at 4:11 AM Peter Kasting <pkas...@google.com> wrote:
On Wed, Apr 5, 2023 at 11:46 AM <dan...@chromium.org> wrote:
That means string_view should be roughly similar to StringPiece for Google Chrome, though not for all Chromiums.

I'm not sure what the last clause here means. I'm assuming this has something to do with the libc++ vs. libstdc++ distinction; maybe "libstdc++ may not be built with similar hardening asserts"? It's possible libstdc++ has assertions like libc++'s, and that we should tweak our .gn files so that if people are linking with that we #define the relevant things by default. I do think that since gcc/libstdc++ are "community-supported" that this falls into the "we would take patches for this but wouldn't necessarily do it ourselves" bucket, and transitively, I think that means we shouldn't have it as a factor to _not_ use libc++ here.

I'm traveling right now but, briefly to this point, libstdc++ indeed has similar hardening assertions. Fedora even enables them system-wide!

If we're concerned about libstdc++, we should enable that in our build. Otherwise we haven't even yet exhausted what we can get out of the standard vocabulary type.

MSVC has this thing, but it talks about iterators and may be too heavyweight. But if it's impacting us, have we made that feature request to them? Whatever we do for first-party code, to the extent that we worry about our MSVC build, we should care about this for third-party code. (Truly third-party and first-party code that exists as a standalone project.)

Peter Kasting

unread,
Apr 6, 2023, 2:34:31 PM4/6/23
to David Benjamin, cxx
On Wed, Apr 5, 2023 at 11:18 AM Peter Kasting <pkas...@google.com> wrote:
This proposal feels a bit stalled and I'd like to try to reach a conclusion.

I think my reply to Dana's mail yesterday wasn't conducive to reaching a good conclusion. Sorry about that.

This raises good project governance/larger plan questions. I listed some in a doc. Most of these are too large to hash out here. It would be good to have answers. Should I ask ATLs, ask Chrome eng management, discuss on a new cxx@ thread, discuss on a chromium-dev@ thread, ...?

In the meantime, here are not-mutually-exclusive actions we could take:
  • Fix the ADL and IWYU errors we can identify.
  • Use NOINLINE and/or generalized inliner heuristic improvements/flag changes to replicate the std:: binary size win.
  • Change our .gn flags and instructions to enable libstdc++ hardening. Maybe run tests/set up bots to ensure we get the same level/quality of hardening as libc++.
  • Migrate to the std:: type but leave everything as a Chromium-side type alias to facilitate later forking.
  • Build string_view_for_fields that uses raw_ptr<>, and rewrite field usage of string_view.
  • Work with libc++ upstream (possibly also libstdc++ upstream) to propose/implement MiraclePtr support. (Needs answers to cost/benefit, "who does the work", "what's the timeframe")
  • Migrate StringPiece to a new repo, assuming BoringSSL, QUICHE, etc. can depend on it. (Need answers to policy, plan, maintenance)
David, you're driving. What are you willing to do? What are you willing to do if you had help?

Dana and Chris, you've shared concerns. Would any combination of the above (save the last) let you assent to a migration? Are there any you would help with?

FWIW, I am willing to help with the first five (of seven) bullets.

PK

Victor Vasiliev

unread,
Apr 6, 2023, 3:12:59 PM4/6/23
to Peter Kasting, David Benjamin, cxx
As a maintainer of one of the third-party libraries that integrates with Chrome, I am really not a fan of Chrome-specific forks of standard C++ vocabulary types.  Each one of those creates a lot of costs for integration (decreased code readability, having to type conversion calls, having to get approval for presubmit exceptions, etc).

On Fri, Apr 7, 2023 at 3:34 AM 'Peter Kasting' via cxx <c...@chromium.org> wrote:
  • Migrate StringPiece to a new repo, assuming BoringSSL, QUICHE, etc. can depend on it. (Need answers to policy, plan, maintenance)
I feel like that repo would have to be Abseil, since that's what all other Google C++ projects use for their vocabulary types.  For string_view, I feel like the ship has sailed when StringPiece -> string_view migration was decided.  Abseil still has its own version of span that is explicitly non-STL compatible, so you may want to talk to them about their plans for it.

Chris Palmer

unread,
Apr 6, 2023, 3:28:22 PM4/6/23
to Peter Kasting, David Benjamin, cxx
Your doc of questions looks complete and right to me. (Thanks!)

C++ `std` is not ideal from either safety or performance perspectives, and that reality has led us to build alternatives (WTF, //base/containers, et c.) throughout Chromium's lifetime. Additionally, various callers inside and outside of Google wish that //base, or parts of it, were a reusable library. The Chromium project has never supported that, and has explicitly disclaimed support, but people grab copies of //base anyway. (And then those copies age...)

So, long term, I think library-ifying selected parts of //base, or WTF, or other things, is a good way to go for Chrome, Chromium, and the general C++ world. It would be paving an existing cow-path. At the same time, it lets us raise the bar for C++ design and implementation.

To a certain extent, Abseil already is that long-term dream, which is part of why I like Abseil.

That said, I'm happy to accept a library, including an implementation of `std`, that can pass whatever security and/or efficiency tests we deem necessary. Passing the tests may be contingent on aspects of build configuration, as with libc++ and Abseil. That is OK as long as we ship to production only with the passing build configuration. For WTF and //base, we already are, so I'm confident the approach can work.

I would be happy to help design and/or implement such tests, and/or to work on migrating existing code to whatever library/ies we decide on.

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

Peter Kasting

unread,
Apr 6, 2023, 3:33:09 PM4/6/23
to Victor Vasiliev, David Benjamin, cxx
Two questions for clarity:

On Thu, Apr 6, 2023 at 12:13 PM Victor Vasiliev <vas...@google.com> wrote:
I feel like that repo would have to be Abseil, since that's what all other Google C++ projects use for their vocabulary types.

Can you define "have to be"? Is this "would be a good idea" or "my project formally cannot take a dependency on a vocabulary type repo that is not Abseil"? I don't think anyone would contest that Abseil would be a nice place to have various things, but it also has many non-Chromium consumers (and goals, and OWNERS), and it's not clear it's a much better solution for Dana/Chris' concerns than the STL implementations.
 
  For string_view, I feel like the ship has sailed when StringPiece -> string_view migration was decided.

You mean, the Google StringPiece -> Abseil string_view internal migration? Note that internally Google is already using the "absl::string_view is just a wrapper for std::string_view" route, while for the non-wrapper impl Abseil differs from std:: in ways Chromium can't work with (e.g. not based on a basic_string_view so no string16_view). Abseil is willing to consider a reimplementation of absl::string_view that is more like base::StringPiece, but to date we've not considered making that happen to be worth the effort (since it would require supporting all compilers and C++ versions Abseil supports).

PK

Peter Kasting

unread,
Apr 6, 2023, 3:37:38 PM4/6/23
to Chris Palmer, David Benjamin, cxx
On Thu, Apr 6, 2023 at 12:28 PM Chris Palmer <pal...@chromium.org> wrote:
That said, I'm happy to accept a library, including an implementation of `std`, that can pass whatever security and/or efficiency tests we deem necessary. Passing the tests may be contingent on aspects of build configuration, as with libc++ and Abseil. That is OK as long as we ship to production only with the passing build configuration.

This sounds like you're saying you'd be OK migrating to std::string_view if we have hardening tests for it (which, AFAIK, we do), Google Chrome (distinct from third-party Chromium builds) passes those tests, and we are willing to migrate away from std::string_view in the future if we decide that we need additional security properties that are impractical to realize in it. Is that correct? If not please clarify :)

PK

dan...@chromium.org

unread,
Apr 6, 2023, 4:01:40 PM4/6/23
to Victor Vasiliev, Peter Kasting, David Benjamin, cxx
It's almost the long weekend for me, so I will reply now rather than thinking more about a longer reply.

I want to avoid us ratholing on the hardening issue. I support us getting hardening into libc++ and using it in other stdlibs, 100%. This is good for our existing code and it's good for our third party libraries.

Hardening the implementation of stdlibs prevents some bugs but it does not go far enough because the APIs are fixed in time and cause UB, or cause significant performance cost to catch UB which we can not pay. Having control of our APIs allows us to consider and make changes that improve our software development from a security and stability perspective that we could not otherwise. For instance our span constructors take a StrictNumeric<size_t> instead of a size_t, in order to catch signed integer conversions and prevent bugs at compile time. That's one of the smallest possible API changes you can imagine, but we can't do that in libc++.

I would like to see the life of third-party library maintainers get easier, by making conversions easier and cheaper, or making presubmits smarter. I want Chrome to be able to make the best choice for development of our code though as well, and std doesn't look like that to me, at least for these reasons. Familiarity is certainly its advantage.

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

Victor Vasiliev

unread,
Apr 6, 2023, 4:25:02 PM4/6/23
to Peter Kasting, David Benjamin, cxx
On Thu, Apr 6, 2023 at 3:33 PM Peter Kasting <pkas...@google.com> wrote:
Two questions for clarity:

On Thu, Apr 6, 2023 at 12:13 PM Victor Vasiliev <vas...@google.com> wrote:
I feel like that repo would have to be Abseil, since that's what all other Google C++ projects use for their vocabulary types.

Can you define "have to be"? Is this "would be a good idea" or "my project formally cannot take a dependency on a vocabulary type repo that is not Abseil"? I don't think anyone would contest that Abseil would be a nice place to have various things, but it also has many non-Chromium consumers (and goals, and OWNERS), and it's not clear it's a much better solution for Dana/Chris' concerns than the STL implementations.

It's more of "I don't expect this to be practical".

Assume we move base::StringPiece into a new library (let's call it chrome_base).  As a maintainer of an open source C++ library, I could switch from std::string_view/absl::string_view to chrome_base::StringPiece; the problem is, now all of my non-Chrome users have to either do conversion calls everywhere (which means I am asking them to do work), or to switch to chrome_stl version (which also means work, and also feels like an unreasonable request).

 
  For string_view, I feel like the ship has sailed when StringPiece -> string_view migration was decided.

You mean, the Google StringPiece -> Abseil string_view internal migration? Note that internally Google is already using the "absl::string_view is just a wrapper for std::string_view" route, while for the non-wrapper impl Abseil differs from std:: in ways Chromium can't work with (e.g. not based on a basic_string_view so no string16_view). Abseil is willing to consider a reimplementation of absl::string_view that is more like base::StringPiece, but to date we've not considered making that happen to be worth the effort (since it would require supporting all compilers and C++ versions Abseil supports).

The point I was trying to make is that if Abseil has kept an API-incompatible version of string_view, we would still have some degree of flexibility with the APIs.

Peter Kasting

unread,
Apr 6, 2023, 4:39:36 PM4/6/23
to dan...@chromium.org, Victor Vasiliev, David Benjamin, cxx
On Thu, Apr 6, 2023 at 1:01 PM <dan...@chromium.org> wrote:
I want to avoid us ratholing on the hardening issue.

Fair.

Hardening the implementation of stdlibs prevents some bugs but it does not go far enough because the APIs are fixed in time and cause UB, or cause significant performance cost to catch UB which we can not pay. Having control of our APIs allows us to consider and make changes that improve our software development from a security and stability perspective that we could not otherwise.

This is fair, but it's a general sentiment for the doc I wrote. Can we keep this thread scoped specifically to the std::string_view type? I want to avoid us ratholing on the general question of how to view the STL.

PK

Peter Kasting

unread,
Apr 6, 2023, 5:39:36 PM4/6/23
to Dana Jansens, Victor Vasiliev, David Benjamin, cxx
I wish I had not phrased that that way; it sounds really snarky, and I didn't mean it that way at all. I honestly agree with some of the stl concerns and I think a direction going forward would be good to work out. I mainly want to understand what specific concerns we have on this API, so we can see if they're addressable. If there are no specific concerns, but we still have the general concern about future changes, I want to understand the perceived cost of forking later to address them and how to minimize that, since as several people have highlighted there are real costs to staying in our current world.

PK

Chris Palmer

unread,
Apr 6, 2023, 8:31:43 PM4/6/23
to Peter Kasting, David Benjamin, cxx
Yes, that is correct.

(To be clear, it's my fall-back position: I do believe, for reasons like those Dana gives, that migrating away from `std::*` might likely come sooner rather than later.)

But, as you say, for the moment let's worry about `string_view` specifically. I can live with a hardened and tested `std::string_view`. :)

David Benjamin

unread,
Apr 10, 2023, 3:38:58 PM4/10/23
to Chris Palmer, Peter Kasting, cxx
(Sorry for the late response, I've been OOO.)

So, I started this thread to talk about two things:
1. Make absl::string_view a typedef for std::string_view
2. Make base::StringPiece a typedef for std::string_view

I'd hoped to do them together to avoid the extra work of setting up an intermediate state, but it sounds like (2) is a bit controversial. :-) I don't think anything said in the thread so far applies to absl::string_view. Does anyone have a reason not to do (1)? If not, I'll polish off https://crrev.com/c/4295063


Okay, onto the more exciting one. :-)

Strong agreement to focus on std::string_view here. The STL isn't uniform, and our decisions on it should not be uniform. Many considerations are common---the cost of forked vocabulary types vs. the costs of staying within the constraints the STL imposes---but those costs will vary by the type. Something like string_view will appear at API boundaries far more often than unordered_map. STL APIs also vary in what API mistakes they've made. string_view is "just" manipulating a ptr/len pair, while unordered_map and deque made some unfortunate pointer stability promises.

This is a trade-off between costs. The broader STL context is useful to have in mind, but we need to get back to specifics when we evaluate these tradeoffs. For specifically std::string_view, provided we enable hardening to fix the spatial safety UB (which we've done), I'd argue there's not a whole lot of constraints we're incurring. Conversely, std::string_view is very, very much an API boundary type, so the costs of forking it are higher than most STL types. Moreover, unlike span, string_view/StringPiece cannot be constructed out of any container with data()/size(), so different string_view implementations do not implicitly convert between each other the way spans do.

Conversely, I'm not proposing that we replace base::Time with std::chrono. :-)

To respond to a few things up the thread:

> or consider pushing some vocab types into a third_party library, as these feel like more durable strategies to me.

That may help some external libraries, but not all. Adding dependencies gets more expensive the deeper in the dep tree you are. For BoringSSL, where we're moving Chromium's certificate verifier, we're used in enough places that I cannot see us depending on such a library. We don't even depend on Abseil right now, and that'd be the easiest dep for us to add, due to how much Google stuff already uses it. It sounds like QUICHE would face similar tensions.

> In the meantime, here are not-mutually-exclusive actions we could take:
> * Fix the ADL and IWYU errors we can identify.
> * Use NOINLINE and/or generalized inliner heuristic improvements/flag changes to replicate the std:: binary size win.
> * Change our .gn flags and instructions to enable libstdc++ hardening. Maybe run tests/set up bots to ensure we get the same level/quality of hardening as libc++.
> * Migrate to the std:: type but leave everything as a Chromium-side type alias to facilitate later forking.
> * Build string_view_for_fields that uses raw_ptr<>, and rewrite field usage of string_view.
> * Work with libc++ upstream (possibly also libstdc++ upstream) to propose/implement MiraclePtr support. (Needs answers to cost/benefit, "who does the work", "what's the timeframe")
> * Migrate StringPiece to a new repo, assuming BoringSSL, QUICHE, etc. can depend on it. (Need answers to policy, plan, maintenance)

> David, you're driving. What are you willing to do? What are you willing to do if you had help?

The work I've done so far was mostly out of some procrastinating. I probably need to limit the additional time I spend on this. :-)

I've fixed the ADL issues already. I'm happy to migrate but leave the Chromium-side type alias if that's what folks want. Though it seems an odd compromise to me. Also happy to enable libstdc++ hardening. We won't have automated testing, but that's our baseline for libstdc++ anyway. Beyond that, I would suggest a few other not-mutally-exclusive actions first:

- Switch absl::string_view to std::string_view.

- Instead of renaming base/strings/abseil_string_conversions.h to std_string_conversions.h or so, add implicit base::StringPiece::StringPiece(std::string_view) and base::StringPiece::operator std::string_view() conversions. This will help with some of the costs of the forked vocabulary type. (Though it doesn't eliminate them. If every project had its own string_view, it doesn't work. We're assuming the boundaries are always std::string_view, and that Chromium is an outlier in forking.)

Happy to do one or both of these if there is agreement to do it.

> Hardening the implementation of stdlibs prevents some bugs but it does not go far enough because the APIs are fixed in time and cause UB, or cause significant performance cost to catch UB which we can not pay. Having control of our APIs allows us to consider and make changes that improve our software development from a security and stability perspective that we could not otherwise. For instance our span constructors take a StrictNumeric<size_t> instead of a size_t, in order to catch signed integer conversions and prevent bugs at compile time. That's one of the smallest possible API changes you can imagine, but we can't do that in libc++.

The size_t conversion problem is actually one where, in two different ways, the problem we want to solve is perfectly compatible with using the STL, just not that exact solution. It's not really an API change.

base::StringPiece already briefly took a CheckedNumeric<size_t> in https://crrev.com/c/4021423. But we can get the same thing without it, done in https://crrev.com/c/4300852. absl::string_view even already had that check. (They actually had it before we did, so we'd have avoided some issues had we used absl instead of base. :-) )

libc++ did not have this check, but adding it was just a one line change (plus test). I landed that upstream in https://reviews.llvm.org/D145981, and it is already in Chromium. This worked because that check is API-compatible with std::string_view. std::string_view says the num characters from ptr must be valid (same as absl::string_view and base:::StringPiece. That requirement is fundamental any string-view type that accepts bare pointers), and any size_t with the high bit set cannot satisfy this requirement due to C's pointer arithmetic rules.

Now, that is just CheckedNumeric. StrictNumeric is a bit more interesting. (Keep in mind that, per https://chromium-review.googlesource.com/c/chromium/src/+/4300852/comment/9a338370_b6d6e241/, we can't actually get to StrictNumeric<size_t> in base::StringPiece right now anyway.) This is ultimately about preventing C/C++'s excessive implicit integer conversions. This language bug affects more than just StringPiece, and violations are pervasive in Chromium, so we need strategies to apply the diagnostic to avoid a long slog. StrictNumeric<T> is one such strategy. But since we ultimately just want to act as if this existing warning were enabled, this is really API-compatible. std::string_view over base::StringPiece means more work to divide up responsibilities across Chromium, Clang, and libc++, but it's not fundamentally impossible. (E.g. we could add a way to target the warning in Clang. Or we could just carry a patch. We'd only need to patch 3p code as much as we'd have to for the warning anyway. Or we could find a way to do it, such as an attribute, that would be palatable to upstream libc++.)

dan...@chromium.org

unread,
Apr 12, 2023, 11:49:13 AM4/12/23
to David Benjamin, Chris Palmer, Peter Kasting, cxx
On Mon, Apr 10, 2023 at 3:39 PM David Benjamin <davi...@chromium.org> wrote:
(Sorry for the late response, I've been OOO.)

So, I started this thread to talk about two things:
1. Make absl::string_view a typedef for std::string_view
2. Make base::StringPiece a typedef for std::string_view

I'd hoped to do them together to avoid the extra work of setting up an intermediate state, but it sounds like (2) is a bit controversial. :-) I don't think anything said in the thread so far applies to absl::string_view. Does anyone have a reason not to do (1)? If not, I'll polish off https://crrev.com/c/4295063


Okay, onto the more exciting one. :-)

Strong agreement to focus on std::string_view here. The STL isn't uniform, and our decisions on it should not be uniform. Many considerations are common---the cost of forked vocabulary types vs. the costs of staying within the constraints the STL imposes---but those costs will vary by the type. Something like string_view will appear at API boundaries far more often than unordered_map. STL APIs also vary in what API mistakes they've made. string_view is "just" manipulating a ptr/len pair, while unordered_map and deque made some unfortunate pointer stability promises.

This is a trade-off between costs. The broader STL context is useful to have in mind, but we need to get back to specifics when we evaluate these tradeoffs. For specifically std::string_view, provided we enable hardening to fix the spatial safety UB (which we've done), I'd argue there's not a whole lot of constraints we're incurring. Conversely, std::string_view is very, very much an API boundary type, so the costs of forking it are higher than most STL types. Moreover, unlike span, string_view/StringPiece cannot be constructed out of any container with data()/size(), so different string_view implementations do not implicitly convert between each other the way spans do.

Thanks for this framing about API boundary types. I think this is a useful tool to think about this space.

Conversely, I'm not proposing that we replace base::Time with std::chrono. :-)

To respond to a few things up the thread:

> or consider pushing some vocab types into a third_party library, as these feel like more durable strategies to me.

That may help some external libraries, but not all. Adding dependencies gets more expensive the deeper in the dep tree you are. For BoringSSL, where we're moving Chromium's certificate verifier, we're used in enough places that I cannot see us depending on such a library. We don't even depend on Abseil right now, and that'd be the easiest dep for us to add, due to how much Google stuff already uses it. It sounds like QUICHE would face similar tensions.

> In the meantime, here are not-mutually-exclusive actions we could take:
> * Fix the ADL and IWYU errors we can identify.
> * Use NOINLINE and/or generalized inliner heuristic improvements/flag changes to replicate the std:: binary size win.
> * Change our .gn flags and instructions to enable libstdc++ hardening. Maybe run tests/set up bots to ensure we get the same level/quality of hardening as libc++.
> * Migrate to the std:: type but leave everything as a Chromium-side type alias to facilitate later forking.
> * Build string_view_for_fields that uses raw_ptr<>, and rewrite field usage of string_view.
> * Work with libc++ upstream (possibly also libstdc++ upstream) to propose/implement MiraclePtr support. (Needs answers to cost/benefit, "who does the work", "what's the timeframe")
> * Migrate StringPiece to a new repo, assuming BoringSSL, QUICHE, etc. can depend on it. (Need answers to policy, plan, maintenance)
> David, you're driving. What are you willing to do? What are you willing to do if you had help?

The work I've done so far was mostly out of some procrastinating. I probably need to limit the additional time I spend on this. :-)

I've fixed the ADL issues already. I'm happy to migrate but leave the Chromium-side type alias if that's what folks want. Though it seems an odd compromise to me. Also happy to enable libstdc++ hardening. We won't have automated testing, but that's our baseline for libstdc++ anyway. Beyond that, I would suggest a few other not-mutally-exclusive actions first:

- Switch absl::string_view to std::string_view.

- Instead of renaming base/strings/abseil_string_conversions.h to std_string_conversions.h or so, add implicit base::StringPiece::StringPiece(std::string_view) and base::StringPiece::operator std::string_view() conversions. This will help with some of the costs of the forked vocabulary type. (Though it doesn't eliminate them. If every project had its own string_view, it doesn't work. We're assuming the boundaries are always std::string_view, and that Chromium is an outlier in forking.)

These are very easy to agree on doing.
 
Happy to do one or both of these if there is agreement to do it.

> Hardening the implementation of stdlibs prevents some bugs but it does not go far enough because the APIs are fixed in time and cause UB, or cause significant performance cost to catch UB which we can not pay. Having control of our APIs allows us to consider and make changes that improve our software development from a security and stability perspective that we could not otherwise. For instance our span constructors take a StrictNumeric<size_t> instead of a size_t, in order to catch signed integer conversions and prevent bugs at compile time. That's one of the smallest possible API changes you can imagine, but we can't do that in libc++.

The size_t conversion problem is actually one where, in two different ways, the problem we want to solve is perfectly compatible with using the STL, just not that exact solution. It's not really an API change.

base::StringPiece already briefly took a CheckedNumeric<size_t> in https://crrev.com/c/4021423. But we can get the same thing without it, done in https://crrev.com/c/4300852. absl::string_view even already had that check. (They actually had it before we did, so we'd have avoided some issues had we used absl instead of base. :-) )

libc++ did not have this check, but adding it was just a one line change (plus test). I landed that upstream in https://reviews.llvm.org/D145981, and it is already in Chromium. This worked because that check is API-compatible with std::string_view. std::string_view says the num characters from ptr must be valid (same as absl::string_view and base:::StringPiece. That requirement is fundamental any string-view type that accepts bare pointers), and any size_t with the high bit set cannot satisfy this requirement due to C's pointer arithmetic rules.

Now, that is just CheckedNumeric. StrictNumeric is a bit more interesting. (Keep in mind that, per https://chromium-review.googlesource.com/c/chromium/src/+/4300852/comment/9a338370_b6d6e241/, we can't actually get to StrictNumeric<size_t> in base::StringPiece right now anyway.) This is ultimately about preventing C/C++'s excessive implicit integer conversions. This language bug affects more than just StringPiece, and violations are pervasive in Chromium, so we need strategies to apply the diagnostic to avoid a long slog. StrictNumeric<T> is one such strategy. But since we ultimately just want to act as if this existing warning were enabled, this is really API-compatible. std::string_view over base::StringPiece means more work to divide up responsibilities across Chromium, Clang, and libc++, but it's not fundamentally impossible. (E.g. we could add a way to target the warning in Clang. Or we could just carry a patch. We'd only need to patch 3p code as much as we'd have to for the warning anyway. Or we could find a way to do it, such as an attribute, that would be palatable to upstream libc++.)

I am pretty wary about trying to do this in the compiler, and I believe we can do it entirely through the regular C++ type system anyway. StrictNumeric is not the rich type we need for this, but sus::num::usize could be. It converts from primitives that fit in the type, including signed values that are known at compile time. By using a type we can see exactly where integer correctness exists or not, and changes in the code are more clear, and don't require a bunch of noisy static_cast<> and checked_cast<> stuff. Experimenting with this is waiting for C++20 support.

Whatever the path is here, I think there needs to be _some_ path and I don't think it's sufficiently clear what is feasible or desirable. I am continuing to work toward types that will eliminate integer bugs with access to unchecked performance when it is needed and can be documented/explained.

I do want to lean harder into compiler errors, or tidy checks, over runtime checks whenever we can. Shifting left, as Titus put it.

On Thu, Apr 6, 2023 at 8:31 PM Chris Palmer <pal...@chromium.org> wrote:
Yes, that is correct.

(To be clear, it's my fall-back position: I do believe, for reasons like those Dana gives, that migrating away from `std::*` might likely come sooner rather than later.)

But, as you say, for the moment let's worry about `string_view` specifically. I can live with a hardened and tested `std::string_view`. :)

On Thu, Apr 6, 2023 at 12:37 PM Peter Kasting <pkas...@google.com> wrote:
On Thu, Apr 6, 2023 at 12:28 PM Chris Palmer <pal...@chromium.org> wrote:
That said, I'm happy to accept a library, including an implementation of `std`, that can pass whatever security and/or efficiency tests we deem necessary. Passing the tests may be contingent on aspects of build configuration, as with libc++ and Abseil. That is OK as long as we ship to production only with the passing build configuration.

This sounds like you're saying you'd be OK migrating to std::string_view if we have hardening tests for it (which, AFAIK, we do), Google Chrome (distinct from third-party Chromium builds) passes those tests, and we are willing to migrate away from std::string_view in the future if we decide that we need additional security properties that are impractical to realize in it. Is that correct? If not please clarify :)

PK

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

David Benjamin

unread,
Apr 13, 2023, 8:28:13 PM4/13/23
to dan...@chromium.org, Chris Palmer, Peter Kasting, cxx
Dana and I chatted about various safety topics out-of-band, and I came away with a better appreciation of the tradeoffs we're evaluating here, and the nature of vocabulary types. Some disorganized thoughts, in case they're as enlightening to others as they were to me. (This is all me typing off the cuff at way too late in the day, so definitely don't take this as me speaking for Dana, and blame all mistakes on me. To that end, Dana, let me know if I've gotten something wrong or missed something you think is important!)

APIs like string_view and span are interesting because they're "just" pointer/length pairs. But, were this a better organized language like Rust, all of the APIs that decompose to pointer/length would be marked `unsafe` anyway. Because no matter what conversion checks we put on the size_t in string_view(const char *, size_t), it will never fix the UB if the length is inconsistent with the pointer's bounds. Focusing on the size_t can, at best, catch common mistakes like passing in -1. Really the entire API is unsafe.

This unsafety is actually observing that C/C++ got pointers (to arrays) wrong. Pointers should never be divorced from their bounds. Bounds are intrinsically part of a correct "pointer" concept.

Which means we really should be avoiding that string_view constructor in the first place, and instead staying within bounds-aware types throughout the process. https://crbug.com/1393166 has some examples where we should have used StringPiece::substr() instead of StringPiece(str.data(), len). The problem is:

1. We do not have a good way to classify "safe" vs. "unsafe" code in C++, and encourage "safe" code, the way Rust does. (For many notions of "safe". Ideally we'd include temporal safety like Rust, but this is hard without Rust's reference semantics. But C++ is so bad at this, it cannot even capture spatial safety, which needs nothing so high-powered!)

2. Far, far too much code needs to decompose to ptr/len to interop with other ptr/len systems.

(1) would be a lot of work, from just getting from A to B, to the actual mechanism for flagging unsafe patterns. For the latter, Dana has an idea with unsafe functions taking an unsafe_fn tag parameter, which is nice in that it avoids language changes. But it's an API break and an example of how, even though string_view is just a ptr/len pair, there's still room for us to do interesting things if we owned the API surface. Another would be to use attributes or so, which is STL API-compatible but requires deeper language integration and probably more work. (E.g. it might require we work on clang-tidy-based tooling, or do some toolchain work.)

(2) is related because if we're pervasively decomposing string_views, flagging things won't be useful. A lot is just pre-StringPiece code, but I, completely unscientifically (they were the first ones alphabetically in xrefs as I was idly searching), happened to stumble upon examples that were very relevant to this discussion. First:

This is decomposing one string-view-like type into ptr/len and then stitching them back up into another string-view-like type. We need to do that because v8_inspector::StringView takes and returns ptr/len instead of base::StringPiece/std::string_view/whatever.

And then I stumbled on this, which is truly spectacular in how many times views were decomposed and recomposed!

I have no authoritative insight into why these were originally written that way, but even fixing them now is hard: v8_inspector::StringView lives in //v8. crdtp lives in //third_party/inspector_protocol. //content lives in Chromium. This is precisely the situation where we have bits of first-party code, that's been extracted into separate projects for ease of reuse. So they can't use base::StringPiece. As for alternatives, T* and size_t have a huge advantage in ubiquity. They're in the language itself. So it's no wonder that, when no other string_view type is readily available, people just decompose into pointer/length. At an individual call site, the cost of decomposing is really really low, not really enough to justify adding a new third-party libchrome_base dependency, even if such a thing existed.

But the result in aggregate is a huge cost to Chromium, because it makes it hard to flag the actually dangerous patterns, e.g. https://crbug.com/1393166. We can't flag those with some flavor of "unsafe" if the whole project would need to be marked unsafe. (As an aside: this is also why I don't like the mess C++ made with char/uint8_t/unsigned char/std::byte/char8_t. Having so many means code invariably needs to convert between them, and the only patterns for converting are not only verbose, but hard to distinguish from actually unsafe pointer casts. There's a bit of that in the second example too.)

If we want to fix this, the only vocabulary type with a fighting chance at project boundaries is std::string_view. The STL is privileged in being as ubiquitous as ptr/len. Happily, libc++ hardening means we have successfully made it spatially safe, apart from lacking a good way to flag ptr-based APIs. And so, at boundaries, we should be willing to use std::string_view, and make it as frictionless as possible. Looking at a couple options from this thread in that lens:

Option 1: use and encourage std::string_view everywhere
Pros:
+ Very straightforward
+ Developers don't need to keep track of different types in Chromium-only code and code at the boundaries
+ Easier to move code from Chromium-only to shared and vice versa
Cons:
- We cede control of the API to the C++ committee
- API-compatible but still invasive changes, like raw_ptr, are more difficult
- Something like Dana's unsafe_fn API becomes impossible. We'd instead need to use alternate strategies, which may be more expensive
- Possibly to undo this later if we change our mind, but harder

Option 2: allow and encourage std::string_view in shared code, use base::StringPiece instead in Chromium-only code, use implicit conversions to reduce friction at boundaries
Pros:
+ We retain API control and can apply benefits, at least in Chromium-only code
+ We can make changes without extra overhead, such as raw_ptr
+ We can even change the API to do things like unsafe_fn API parameters
+ If we start with this option, we easily can go to option 1 later, whereas the reverse direction is harder
Cons:
- Developers need to keep track of two different string_view types
- Moving code from Chromium-only to shared and vice versa becomes more complex
- Not all cases are covered by implicit conversions. E.g. Chromium-only code may implement a string_view-based 3p interface
- PRESUBMIT checks for std::string_view become very complex. We cannot easily distinguish between std::string_view uses that should be base::StringPiece and those that are forced to be std::string_view. We currently only deal with two APIs worth of std::string_view, and the PRESUBMITs already have very wide wildcards. I don't think our current PRESUBMIT architecture is suited for this option in the long term.

This still ultimately boils down to which tradeoff we find more palatable, but I think this is a useful angle to focus on. (Obviously I prefer option 1, but I think preferring option 2 is also defensible.)

David

David Benjamin

unread,
Apr 17, 2023, 11:26:14 AM4/17/23
to dan...@chromium.org, Chris Palmer, Peter Kasting, cxx
Since it's blocking Ade's fuzzer work, and the absl::string_view changes seem to uncontroversial (but let me know if I'm misreading things and this is premature!), I've sent these two CLs out for review:


The first makes absl::string_view into a typedef for std::string_view but doesn't change PRESUBMITs or guidance. The second changes PRESUBMITs and guidance accordingly, and also adds the implicit conversions mentioned in the thread, so that we don't need a replacement for base/strings/abseil_string_conversions.h.

David

Peter Kasting

unread,
Apr 18, 2023, 1:10:31 PM4/18/23
to David Benjamin, dan...@chromium.org, Chris Palmer, cxx
I have pondered this this weekend.

I think if Dana says there are meaningful security wins possible with our own forked string_view, we should believe it.

I wonder whether "possible" and "planned, scheduled" are the same thing. My suspicion is that many areas have possible wins, but some are smaller and lower-priority than others, and this may be one of the latter. It would be good to know how likely we would be to regret using std::string_view in the next two years.

The biggest factor I see in this is the cost of changing our minds in the future. Sticking with StringPiece is reasonably easy to reverse later on the Chromium side. People seem to be asserting that using string_view is harder to reverse. I question that assertion. I think using string_view in first_party code ought to be reasonably easy (not fully trivial) to reverse later; it's largely search-and-replace. The main issues I see are:
  • Third-party calls that take/return something implicitly converted from/to a string_view
  • Third-party APIs that take a string_view*.
  • Implementations of third-party abstract interfaces that use string_view.
These seem uncommon and would all have reasonably easy workarounds for the person doing the conversion. I feel as if converting back to StringPiece ought not to take more than a day. What am I missing?

If I am correct and we have escape hatches from either decision, then we can afford to be wrong here. And in that case I would suggest taking the route that has concrete benefits immediately over the route that has potential benefits down the road.

PK

Roland Bock

unread,
Apr 18, 2023, 2:28:34 PM4/18/23
to Peter Kasting, David Benjamin, dan...@chromium.org, Chris Palmer, cxx
On Tue, Apr 18, 2023 at 7:10 PM 'Peter Kasting' via cxx <c...@chromium.org> wrote:
I have pondered this this weekend.

I think if Dana says there are meaningful security wins possible with our own forked string_view, we should believe it.

I wonder whether "possible" and "planned, scheduled" are the same thing. My suspicion is that many areas have possible wins, but some are smaller and lower-priority than others, and this may be one of the latter. It would be good to know how likely we would be to regret using std::string_view in the next two years.

The biggest factor I see in this is the cost of changing our minds in the future. Sticking with StringPiece is reasonably easy to reverse later on the Chromium side. People seem to be asserting that using string_view is harder to reverse. I question that assertion. I think using string_view in first_party code ought to be reasonably easy (not fully trivial) to reverse later; it's largely search-and-replace. The main issues I see are:
  • Third-party calls that take/return something implicitly converted from/to a string_view
  • Third-party APIs that take a string_view*.
  • Implementations of third-party abstract interfaces that use string_view.
These seem uncommon and would all have reasonably easy workarounds for the person doing the conversion. I feel as if converting back to StringPiece ought not to take more than a day. What am I missing?

The hash functions are different. Folks would start using unordered containers with transparent lookup once we have std::string_view.



If I am correct and we have escape hatches from either decision, then we can afford to be wrong here. And in that case I would suggest taking the route that has concrete benefits immediately over the route that has potential benefits down the road.

PK

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

Peter Kasting

unread,
Apr 18, 2023, 3:14:46 PM4/18/23
to Roland Bock, David Benjamin, dan...@chromium.org, Chris Palmer, cxx
On Tue, Apr 18, 2023 at 11:28 AM Roland Bock <rb...@google.com> wrote:
On Tue, Apr 18, 2023 at 7:10 PM 'Peter Kasting' via cxx <c...@chromium.org> wrote:
The biggest factor I see in this is the cost of changing our minds in the future. Sticking with StringPiece is reasonably easy to reverse later on the Chromium side. People seem to be asserting that using string_view is harder to reverse. I question that assertion. I think using string_view in first_party code ought to be reasonably easy (not fully trivial) to reverse later; it's largely search-and-replace. The main issues I see are:
  • Third-party calls that take/return something implicitly converted from/to a string_view
  • Third-party APIs that take a string_view*.
  • Implementations of third-party abstract interfaces that use string_view.
These seem uncommon and would all have reasonably easy workarounds for the person doing the conversion. I feel as if converting back to StringPiece ought not to take more than a day. What am I missing?

The hash functions are different. Folks would start using unordered containers with transparent lookup once we have std::string_view.

Doesn't that work just as well with StringPiece as long as it implicitly converts to the key type for the container? You might have to ELI5 why this is a problem for me :)

PK

David Benjamin

unread,
Apr 18, 2023, 3:43:42 PM4/18/23
to Peter Kasting, Roland Bock, dan...@chromium.org, Chris Palmer, cxx
 We might need to make sure that std::hash<base::StringPiece> forwards to the std::string_view implementation so that it generates its own value. Or maybe we make those use a separate[*] is_transparent-capable hash that supports both std::string and base::StringPiece, I dunno.

One way or another, the reason folks would start using heterogeneous lookup is because it's useful! If the "base::StringPiece remains separate" story does not support this somehow, that's a flaw in that story and it should find a way to support it. We could always rewrite it to whatever that answer is, if we want to switch back.

David

[*] Historically, we didn't allow std::hash specializations because the Google C++ style guide didn't allow them, in favor of something that, at the time, didn't exist yet, hence base::StringPieceHash. I think it was in anticipation of absl::Hash, which does now exist, but which we aren't using. I don't believe that guidance is no longer in the public style guide, and I think it never actually made sense for it to apply to Chromium anyway. So if we switched back, I think we should just specialize std::hash<base::StringPiece> and not worry about it too much. We can evaluate absl::Hash orthogonally.

K. Moon

unread,
Apr 18, 2023, 4:22:36 PM4/18/23
to David Benjamin, Peter Kasting, Roland Bock, dan...@chromium.org, Chris Palmer, cxx
For context, the style guide removed the prohibition against specializing std::hash in https://github.com/google/styleguide/commit/840e5839f6518a5e9f484ee95490aa4b2cad7abf. I think the justification was along the lines that everyone was using absl::Hash anyway, though, so some of the caveats may still apply.

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

John Admanski

unread,
Apr 18, 2023, 4:54:51 PM4/18/23
to cxx, km...@chromium.org, pkas...@google.com, rb...@google.com, danakj, Chris Palmer, cxx, David Benjamin
Yes, to be clear the prohibition on specializing std::hash was removed because it was considered moot, not because the concerns about specializing std::hash no longer applied. Basically not wanting to spend style guide space banning something that nobody should ever want or need anymore. That may or may not really have any bearing on what Chromium should be doing here.

David Benjamin

unread,
Apr 18, 2023, 5:06:43 PM4/18/23
to John Admanski, cxx, km...@chromium.org, pkas...@google.com, rb...@google.com, danakj, Chris Palmer
I assume it was considered moot because it's expected consumers were using absl::Hash. Is that right? Chromium, for better or worse, is not and especially was not at the time. (I'm not sure Abseil existed at the time we adopted that guidance, and it was sometime before Abseil was usable for us even after it existed.) That would suggest the guidance should have been:

- We think you should use absl::Hash and friends
- If you are using absl::Hash and friends, don't specialize std::hash because it's not relevant here
- (Implicit) if you aren't using absl::Hash and friends for some reason, and instead you use std::unordered_map, then specialize std::hash because that's what you're expected to do with std::unordered_map

We're currently in that third bucket. I certainly have no objections moving Chromium to the second bucket. (I believe that's https://crbug.com/1121345.) But, until/unless we do, it seems our std::hash opinions should match our current state. (This is perhaps a good example of how our STL preferences needn't be uniform. std::unordered_map appears at API boundaries much less frequently than std::string_view. Also the STL really messed up std::unordered_map in promising things like pointer stability, which make it incompatible with the nice things that absl does.)

But this is perhaps a digression off the original heterogeneous lookup topic. :-)

John Admanski

unread,
Apr 18, 2023, 5:30:54 PM4/18/23
to cxx, David Benjamin, cxx, km...@chromium.org, pkas...@google.com, rb...@google.com, danakj, Chris Palmer, John Admanski
Oops, my attempt adding some background clarity was not very clear. Right, once absl::Hash was established (along with the abseil swiss tables) and using it instead of the stdlib version was normalized, it just ceased to be a thing that came up. The only situation where it would really be relevant is if you _needed_ to use the stdlib hash containers, and _couldn't_ specify the hash function as a parameter...which is not a thing that really happens anymore (in google3).

I would suggest that the original concerns with std::hash (basically that it's really difficult to specialize correctly and efficiently) are still relevant. But the original style rule was also about trying to prevent the proliferation of a large number of specializations that would then have to be migrated once abseil existed; adding some specific specializations to core libraries is probably less of a problem in that regard, even if Chromium were to eventually decide to adopt absl::Hash and the swiss tables.

Roland Bock

unread,
Apr 19, 2023, 3:08:33 PM4/19/23
to John Admanski, cxx, David Benjamin, km...@chromium.org, pkas...@google.com, danakj, Chris Palmer
With string_view you can to something like this:

```
#include <string>
#include <string_view>

#include "absl/container/flat_hash_map.h"

int main() {
  absl::flat_hash_map<std::string, int> sample;
  sample.find("something");
  sample.find(std::string("something"));
  sample.find(absl::string_view("something")); // Not creating a temporary string
}
```

I can get the same with StringPiece, but I need to provide a specific hash and equal class. 
So it is not just search and replace, but also less of an issue than I thought yesterday.

Cheers,
Roland

Roland Bock

unread,
Apr 20, 2023, 7:48:58 AM4/20/23
to John Admanski, cxx, David Benjamin, km...@chromium.org, pkas...@google.com, danakj, Chris Palmer
Scratch that. We don't even allow absl::flat_hash_map as of today...
Created https://crrev.com/c/4453587 in case that changes one day.

Sorry for the noise.

David Benjamin

unread,
May 24, 2023, 12:54:42 PM5/24/23
to Roland Bock, John Admanski, cxx, km...@chromium.org, pkas...@google.com, danakj, Chris Palmer
Circling back here, how do we make a decision on base::StringPiece?

Another thing we miss out on is we don't pick up new C++ features, and lose consistency with std::string, which Chromium uses pervasively today. C++20 adds std::string_view::starts_with and std::string_view::ends_with, but we don't have them. If we did, we could replace[*] base::StartsWith and base::EndsWith with more idiomatic and more convenient methods. Right now we have std::string::starts_with and std::string::ends_with. Of course, we could (and should) just add them to base::StringPiece, but it's another thing that we have to maintain ourselves.

[*] Partially replace. The INSENSITIVE_ASCII mode would need to say, but that's much rarer. We can keep around a base::StartsWithCaseInsensitiveASCII and base::EndWithCaseInsensitiveASCII pair.

The thread on this CL is a recent example of thorniness that would have been avoided if we hadn't forked the standard vocab type. Happily, it's in service of re2::StringPiece transitioning to being the same as absl::string_view / std::string_view, and might be fine anyway because we're C++20, so it'll all settle somewhere reasonable. But it highlights the high cost of forking a vocabulary type that is so pervasive at API boundaries. It's weird that we simultaneously prefer other projects adopt the standard string_view over their own (per the example earlier in the thread, we pay a safety cost when other projects do this), but to insist ourselves on having our own.

Quoting up the thread a bit:

On Tue, Apr 18, 2023 at 7:10 PM 'Peter Kasting' via cxx <c...@chromium.org> wrote:
The biggest factor I see in this is the cost of changing our minds in the future. Sticking with StringPiece is reasonably easy to reverse later on the Chromium side. People seem to be asserting that using string_view is harder to reverse. I question that assertion. I think using string_view in first_party code ought to be reasonably easy (not fully trivial) to reverse later; it's largely search-and-replace. The main issues I see are:
  • Third-party calls that take/return something implicitly converted from/to a string_view
  • Third-party APIs that take a string_view*.
  • Implementations of third-party abstract interfaces that use string_view.
These seem uncommon and would all have reasonably easy workarounds for the person doing the conversion. I feel as if converting back to StringPiece ought not to take more than a day. What am I missing?

I agree with Peter that this is reasonably easy to reverse. Indeed, the places where it's hard to reverse are exactly the places where the forked vocabulary type is particularly expensive! Right now, we're not getting anything out of this cost, so it doesn't make sense to keep the split. In the future, it may be that we get enough out of StringPiece to be worth that cost, but we can always choose to take on that cost then. (We can also evaluate it against the costs/benefits of applying our goals directly to libc++'s std::string_view. :-) ) This is just the same decision we'd be making around replacing the existing uses of std::string or std::vector or absl::optional or absl::variant, and those are less pervasive at boundaries than std::string_view.

Further up the thread, we had a discussion about whether relying on libc++ hardening vs. people building Chromium against other STLs. The security FAQ now says we don't treat those as security bugs, so I think we can consider that question answered. We now implicitly believe that, if you use another STL, it should have comparable hardening to how we configure our libc++. Places where others (or ourselves) don't configure that, are bugs in those configurations.

Peter Kasting

unread,
May 25, 2023, 1:55:22 PM5/25/23
to David Benjamin, Roland Bock, John Admanski, cxx, km...@chromium.org, danakj, Chris Palmer
I think there have been very reasonable concerns raised here, and I also think David has given a plausible response to those concerns. I suggest we keep this open until at least next Wednesday to hear more responses and opinions, especially if people don't feel things are addressed.

If at that point there is still no further response, then I would go ahead and move forward. Not trying to railroad anything here -- but the final authority/decision-making process on these sorts of "aim to achieve consensus" things is murky, so that's my best attempt to avoid stalling :)

PK

David Benjamin

unread,
Jun 9, 2023, 8:05:53 PM6/9/23
to Peter Kasting, Roland Bock, John Admanski, cxx, km...@chromium.org, danakj, Chris Palmer
I guess it's now been a week since [what was then] next Wednesday, so I've sent https://chromium-review.googlesource.com/c/chromium/src/+/4294483 out for review!
Reply all
Reply to author
Forward
0 new messages