Re: [chromium-dev] Re: [blink-dev] C++11 Feature Proposal: Allowing/Requiring u"" for constant UTF-16 strings

120 views
Skip to first unread message

Daniel Cheng

unread,
Aug 11, 2017, 5:20:36 AM8/11/17
to Rachel Blum, Peter Kasting, cxx, Scott Graham, Dana Jansens, Avi Drissman, Victor Khimenko
Bumping a rather old thread, since it's sad to see ASCIIToUTF16("my literal") written out*.

It looks like one of the main issues with using u"..." is that char16_t and wchar_t aren't implicitly convertible (boo).

Given that std::wstring / wchar_t is (almost?) always Windows specific, how about just using u"..." for places that use ASCIIToUTF16("...") today? Windows can use L"..." where a wide string is required.

The other big issue is base::string16, but perhaps we could just use char16_t as the character type for base::string16, and provide a Windows-only helper to reinterpret cast the string to a wchar_t? Today, we would write:
  base::string16 text = LoadResource(...);
  ::SetWindowTextW(hwnd, text.data());

And instead we would now write:
  base::string16 text = LoadResource(...);
  ::SetWindowTextW(hwnd, text.wdata());

Would this be sufficient to make life less painful?

Alternatively, we could provide a macro that either uses L"..." or u"..." as appropriate =)

Daniel

* a quick search shows that there's only ~758 non-test instances in Chrome, so that's actually not as bad as I thought.

On Tue, Sep 13, 2016 at 2:28 PM Rachel Blum <gr...@chromium.org> wrote:

On Tue, Sep 13, 2016 at 1:18 PM, Peter Kasting <pkas...@google.com> wrote:
the question is what the solution is.

All Windows version we support use UTF16 (instead of UCS2), so we don't need UTF16ToWide<>, a reinterpret_cast will suffice. I'd lean towards automated code rewriting instead of adding an implicit cast, though. (That's what I meant by 'rewrite tools', and that was supposed to address the second part :)

Also: Even if we inject in the std:: namespace, I don't think there's a way to make it work implicitly - assignments, cast operators, and converting constructors cannot be freestanding. 

The question is, do we want to invest that time, or is u"" such a minor gain, we don't care right now? 

Avi Drissman

unread,
Aug 11, 2017, 2:38:33 PM8/11/17
to Daniel Cheng, Rachel Blum, Peter Kasting, cxx, Scott Graham, Dana Jansens, Victor Khimenko
I personally think we should work on migrating base::string16 to std::u16string as a longer-term goal, and I see some work like that as a step towards it.

Jeffrey Yasskin

unread,
Aug 11, 2017, 2:52:59 PM8/11/17
to Daniel Cheng, Rachel Blum, Peter Kasting, cxx, Scott Graham, Dana Jansens, Avi Drissman, Victor Khimenko
On Fri, Aug 11, 2017 at 2:20 AM, Daniel Cheng <dch...@chromium.org> wrote:
> Bumping a rather old thread, since it's sad to see ASCIIToUTF16("my
> literal") written out*.
>
> It looks like one of the main issues with using u"..." is that char16_t and
> wchar_t aren't implicitly convertible (boo).
>
> Given that std::wstring / wchar_t is (almost?) always Windows specific, how
> about just using u"..." for places that use ASCIIToUTF16("...") today?
> Windows can use L"..." where a wide string is required.
>
> The other big issue is base::string16, but perhaps we could just use
> char16_t as the character type for base::string16, and provide a
> Windows-only helper to reinterpret cast the string to a wchar_t?

Note, of course, that reinterpret_cast<wchar_t>(my_char16_t) yields
undefined behavior when you proceed to use the buffer. MSVC and/or
clang-on-windows could go beyond the standard to give us special
permission to do that. Do you know if they have?

Jeffrey

Nico Weber

unread,
Aug 11, 2017, 2:57:27 PM8/11/17
to Jeffrey Yasskin, Daniel Cheng, Rachel Blum, Peter Kasting, cxx, Scott Graham, Dana Jansens, Avi Drissman, Victor Khimenko
On Fri, Aug 11, 2017 at 2:52 PM, Jeffrey Yasskin <jyas...@chromium.org> wrote:
On Fri, Aug 11, 2017 at 2:20 AM, Daniel Cheng <dch...@chromium.org> wrote:
> Bumping a rather old thread, since it's sad to see ASCIIToUTF16("my
> literal") written out*.
>
> It looks like one of the main issues with using u"..." is that char16_t and
> wchar_t aren't implicitly convertible (boo).
>
> Given that std::wstring / wchar_t is (almost?) always Windows specific, how
> about just using u"..." for places that use ASCIIToUTF16("...") today?
> Windows can use L"..." where a wide string is required.
>
> The other big issue is base::string16, but perhaps we could just use
> char16_t as the character type for base::string16, and provide a
> Windows-only helper to reinterpret cast the string to a wchar_t?

Note, of course, that reinterpret_cast<wchar_t>(my_char16_t) yields
undefined behavior when you proceed to use the buffer.

You mean reinterpret_cast<wchar_t*>(my_char16_t*)?

If so, we build with -fno-strict-aliasing and generally assume this to work.
 
MSVC and/or
clang-on-windows could go beyond the standard to give us special
permission to do that. Do you know if they have?

Jeffrey

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To post to this group, send email to c...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CANh-dXk9fAUKR-XPZPBQfRnA2v3TO%2BG52Wi_D8ioCfpFtowa4Q%40mail.gmail.com.

Jeffrey Yasskin

unread,
Aug 11, 2017, 2:58:42 PM8/11/17
to Nico Weber, Jeffrey Yasskin, Daniel Cheng, Rachel Blum, Peter Kasting, cxx, Scott Graham, Dana Jansens, Avi Drissman, Victor Khimenko
On Fri, Aug 11, 2017 at 11:57 AM, Nico Weber <tha...@chromium.org> wrote:
> On Fri, Aug 11, 2017 at 2:52 PM, Jeffrey Yasskin <jyas...@chromium.org>
> wrote:
>>
>> On Fri, Aug 11, 2017 at 2:20 AM, Daniel Cheng <dch...@chromium.org> wrote:
>> > Bumping a rather old thread, since it's sad to see ASCIIToUTF16("my
>> > literal") written out*.
>> >
>> > It looks like one of the main issues with using u"..." is that char16_t
>> > and
>> > wchar_t aren't implicitly convertible (boo).
>> >
>> > Given that std::wstring / wchar_t is (almost?) always Windows specific,
>> > how
>> > about just using u"..." for places that use ASCIIToUTF16("...") today?
>> > Windows can use L"..." where a wide string is required.
>> >
>> > The other big issue is base::string16, but perhaps we could just use
>> > char16_t as the character type for base::string16, and provide a
>> > Windows-only helper to reinterpret cast the string to a wchar_t?
>>
>> Note, of course, that reinterpret_cast<wchar_t>(my_char16_t) yields
>> undefined behavior when you proceed to use the buffer.
>
>
> You mean reinterpret_cast<wchar_t*>(my_char16_t*)?

Yep.

> If so, we build with -fno-strict-aliasing and generally assume this to work.

SGTM.

Peter Kasting

unread,
Aug 12, 2017, 3:21:32 AM8/12/17
to Avi Drissman, Daniel Cheng, Rachel Blum, cxx, Scott Graham, Dana Jansens, Victor Khimenko
On Fri, Aug 11, 2017 at 11:38 AM, Avi Drissman <a...@chromium.org> wrote:
I personally think we should work on migrating base::string16 to std::u16string as a longer-term goal, and I see some work like that as a step towards it.

On Fri, Aug 11, 2017 at 5:20 AM, Daniel Cheng <dch...@chromium.org> wrote:
Given that std::wstring / wchar_t is (almost?) always Windows specific, how about just using u"..." for places that use ASCIIToUTF16("...") today? Windows can use L"..." where a wide string is required.

The other big issue is base::string16, but perhaps we could just use char16_t as the character type for base::string16, and provide a Windows-only helper to reinterpret cast the string to a wchar_t? Today, we would write:
  base::string16 text = LoadResource(...);
  ::SetWindowTextW(hwnd, text.data());

And instead we would now write:
  base::string16 text = LoadResource(...);
  ::SetWindowTextW(hwnd, text.wdata());

Would this be sufficient to make life less painful?

So, to be clear, you need to do the second thing (using char16_t as the base for string16) to allow the first to compile (using u""), right?

To do that you need to be able to access string16 as wstring/wchar_t*.  Adding a wdata() member to do this seems tricky.  string16 is typedefed to wstring on Windows.  I think you'd have to change that typedef (because you can't inject a member function into std::basic_string) and I imagine there will be many more than 700+ places that will break, because we've removed most of the explicit conversions-to-wide we used to have in the Windows code, and now assume the two types are equivalent everywhere.  We don't just pass wstrings as wchar_t*s, we actually pass them as wstrings.

After you change the typedef, you'll need to actually introduce the helper(s).  This means you'll need to define string16 as... I dunno, something that inherits from u16string?  I thought extending STL classes like this via inheritance was generally considered a Bad Idea (see e.g. Effective C++ which tells you not to do it).  I think it probably works better if we do this via an external helper function than by a member.  At that point, I think we've gotten back to the ol' Wide<->UTF16 conversion functions route.  The actual implementation of these might be a reinterpret cast that we assume won't cause strict-aliasing problems (since we turn that off), but we still have to actually have somewhere that implements them.

Basically, I think in the end the plan has to be something like:
* ensure we use wstring/wchar_t as little as possible (hopefully mostly done)
* reintroduce all the WideToUTF16/UTF16ToWide machinery that got removed (maybe the necessary places can be found locally by changing the string16 typedef away from wstring)
* switch string16 to be a typedef to u16string
* convert ASCIIToUTF16("") to u""
* convert the typenames everywhere and remove the typedef

I think this would be nice to do, but the cost/benefit on it doesn't convince me it's worth my own time to contribute to.

PK

Daniel Cheng

unread,
Aug 12, 2017, 4:25:46 PM8/12/17
to Peter Kasting, Avi Drissman, Dana Jansens, Rachel Blum, Scott Graham, Victor Khimenko, cxx
On Sat, Aug 12, 2017, 00:21 Peter Kasting <pkas...@google.com> wrote:
On Fri, Aug 11, 2017 at 11:38 AM, Avi Drissman <a...@chromium.org> wrote:
I personally think we should work on migrating base::string16 to std::u16string as a longer-term goal, and I see some work like that as a step towards it.
On Fri, Aug 11, 2017 at 5:20 AM, Daniel Cheng <dch...@chromium.org> wrote:
Given that std::wstring / wchar_t is (almost?) always Windows specific, how about just using u"..." for places that use ASCIIToUTF16("...") today? Windows can use L"..." where a wide string is required.

The other big issue is base::string16, but perhaps we could just use char16_t as the character type for base::string16, and provide a Windows-only helper to reinterpret cast the string to a wchar_t? Today, we would write:
  base::string16 text = LoadResource(...);
  ::SetWindowTextW(hwnd, text.data());

And instead we would now write:
  base::string16 text = LoadResource(...);
  ::SetWindowTextW(hwnd, text.wdata());

Would this be sufficient to make life less painful?

So, to be clear, you need to do the second thing (using char16_t as the base for string16) to allow the first to compile (using u""), right?

To do that you need to be able to access string16 as wstring/wchar_t*.  Adding a wdata() member to do this seems tricky.  string16 is typedefed to wstring on Windows.  I think you'd have to change that typedef (because you can't inject a member function into std::basic_string) and I imagine there will be many more than 700+ places that will break, because we've removed most of the explicit conversions-to-wide we used to have in the Windows code, and now assume the two types are equivalent everywhere.  We don't just pass wstrings as wchar_t*s, we actually pass them as wstrings.

Yeah I was thinking about this more and I think an external helper would be better. I think the mechanical changes would actually be pretty easy to automate with a Python script.

Re: passing wstring: we don't usually pass C++ objects outside of Chrome code so I imagine this would only be an issue within Chrome code. Fixing up Chrome to always use string16 internally doesn't seem too bad.


After you change the typedef, you'll need to actually introduce the helper(s).  This means you'll need to define string16 as... I dunno, something that inherits from u16string?  I thought extending STL classes like this via inheritance was generally considered a Bad Idea (see e.g. Effective C++ which tells you not to do it).  I think it probably works better if we do this via an external helper function than by a member.  At that point, I think we've gotten back to the ol' Wide<->UTF16 conversion functions route.  The actual implementation of these might be a reinterpret cast that we assume won't cause strict-aliasing problems (since we turn that off), but we still have to actually have somewhere that implements them.

Just to make sure we're on the same page, the UTF16AsWide would accept a const char16_t* or ustring16, but only return a wchar_t* from both overloads, right? Ideally we'd disallow implicit conversion of the result to a wstring as well (which I think should be achievable using some tricks).

(Similarly for WideAsUTF16)

Peter Kasting

unread,
Aug 23, 2017, 8:59:03 PM8/23/17
to Daniel Cheng, Avi Drissman, Dana Jansens, Rachel Blum, Scott Graham, Victor Khimenko, cxx
On Sat, Aug 12, 2017 at 1:25 PM, Daniel Cheng <dch...@chromium.org> wrote:
Re: passing wstring: we don't usually pass C++ objects outside of Chrome code so I imagine this would only be an issue within Chrome code. Fixing up Chrome to always use string16 internally doesn't seem too bad.

I don't know offhand if we pass wstrings as wstrings (not wchar_t*) to Windows APIs anywhere.

After you change the typedef, you'll need to actually introduce the helper(s).  This means you'll need to define string16 as... I dunno, something that inherits from u16string?  I thought extending STL classes like this via inheritance was generally considered a Bad Idea (see e.g. Effective C++ which tells you not to do it).  I think it probably works better if we do this via an external helper function than by a member.  At that point, I think we've gotten back to the ol' Wide<->UTF16 conversion functions route.  The actual implementation of these might be a reinterpret cast that we assume won't cause strict-aliasing problems (since we turn that off), but we still have to actually have somewhere that implements them.

Just to make sure we're on the same page, the UTF16AsWide would accept a const char16_t* or ustring16, but only return a wchar_t* from both overloads, right? Ideally we'd disallow implicit conversion of the result to a wstring as well (which I think should be achievable using some tricks).

I'd have done pointer->pointer and object->object, but it depends on whether we actually need wstring anywhere (see above).  We can always construct a wstring from a wchar_t* though.

PK

Jan Wilken Dörrie

unread,
Dec 4, 2018, 2:25:13 PM12/4/18
to Peter Kasting, Daniel Cheng, Avi Drissman, Dana Jansens, Rachel Blum, Scott Graham, Victor Khimenko, cxx
Bumping up this thread, since thakis@ and I just talked about this offline.

On Sat, Aug 12, 2017 at 3:21 AM 'Peter Kasting' via cxx <c...@chromium.org> wrote:
...
Basically, I think in the end the plan has to be something like:
* ensure we use wstring/wchar_t as little as possible (hopefully mostly done)
* reintroduce all the WideToUTF16/UTF16ToWide machinery that got removed (maybe the necessary places can be found locally by changing the string16 typedef away from wstring)
* switch string16 to be a typedef to u16string
* convert ASCIIToUTF16("") to u""
* convert the typenames everywhere and remove the typedef

...

This plan SGTM. Do we have an existing tracking bug for the wstring -> string16 change or should I create a new one?


I believe this could be done using a proxy object that implicitly converts to and from const wchar_t*, but does not allow a conversion to wstring, e.g. like so: https://godbolt.org/z/WD8JQZ 
 

PK

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To post to this group, send email to c...@chromium.org.
--

Jan Wilken Dörrie

Software Engineer

jdoe...@google.com
 +49 89 839300973


Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.

    

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.

Peter Kasting

unread,
Dec 4, 2018, 2:29:42 PM12/4/18
to Jan Wilken, Daniel Cheng, Avi Drissman, Dana Jansens, Rachel Blum, Scott Graham, Victor Khimenko, cxx
On Tue, Dec 4, 2018 at 11:25 AM Jan Wilken Dörrie <jdoe...@chromium.org> wrote:
Bumping up this thread, since thakis@ and I just talked about this offline.

On Sat, Aug 12, 2017 at 3:21 AM 'Peter Kasting' via cxx <c...@chromium.org> wrote:
...
Basically, I think in the end the plan has to be something like:
* ensure we use wstring/wchar_t as little as possible (hopefully mostly done)
* reintroduce all the WideToUTF16/UTF16ToWide machinery that got removed (maybe the necessary places can be found locally by changing the string16 typedef away from wstring)
* switch string16 to be a typedef to u16string
* convert ASCIIToUTF16("") to u""
* convert the typenames everywhere and remove the typedef

...

This plan SGTM. Do we have an existing tracking bug for the wstring -> string16 change or should I create a new one?

No bug on file I'm aware of -- this never reached the (informal) "intent to implement" stage, it was just discussion.

I'd be happy to see it pushed forward, if you're interested in driving it.

PK 

Avi Drissman

unread,
Dec 4, 2018, 2:31:28 PM12/4/18
to Peter Kasting, jdoe...@chromium.org, Daniel Cheng, Dana Jansens, Rachel Blum, Scott Graham, Victor Khimenko, cxx
I'm super happy to see this moving forward. Back when we introduced string16, I pushed for the wstring equivance on Windows, and that seems to have been a mistake. Thank you for fixing it.

Avi

jdoe...@chromium.org

unread,
Dec 4, 2018, 8:40:20 PM12/4/18
to cxx, pkas...@google.com, jdoe...@chromium.org, dch...@chromium.org, dan...@chromium.org, gr...@chromium.org, sco...@chromium.org, kh...@chromium.org

Jan Wilken Dörrie

unread,
Dec 6, 2018, 2:17:15 PM12/6/18
to cxx, pkas...@google.com, dch...@chromium.org, dan...@chromium.org, gr...@chromium.org, sco...@chromium.org, kh...@chromium.org
I played around with typedeffing base::string16 to std::u16string and base::char16 to char16_t in https://crrev.com/c/1361705. Here are my findings:

This change will break conversion between base::char16 and character types that so far have been the same. These are:
All of these conversions can be addressed with adding appropriate reinterpret_casts. Only ICU's UChar is used relatively widely, and we could ease adoption by dropping the corresponding entry from ICU's BUILD.gn file. We then roll the update into Chrome at the same time we switch the typedefs. However, adding the required reinterpret_casts first will allow us to update ICU independently, which might be desirable here. Apart from these casts switching the types was relatively painless on all non-Windows platforms, which brings me to my next point.

Windows breaks left, right and center. This should not be too surprising, as code like base::string16 my_str = L"FooBar"; will not compile anymore. Thus we first need to identify all base::string16 that are in Windows specific code and make them std::wstring (std::wstring usage will likely go up with this change). Also, we likely want to reintroduce some of the machinery we removed in https://crbug.com/23581, such as base::ASCIIToWide and other related string conversion functions. However, this time they should be Windows exclusive, and could live in the base/win subfolder (or we add a lot of #if defined(OS_WIN) guards). Luckily, transforming Windows code from base::string16 to std::wstring can be done in pieces, as right now they are the exact same type. Once everything is migrated, we add reinterpret_casts for shared code and flip the typedef switch.

Does this still sound good to you, or do you have concerns with the plan outlined above?

Best,
Jan
--

Nico Weber

unread,
Dec 6, 2018, 2:20:20 PM12/6/18
to Jan Wilken Dörrie, cxx, Peter Kasting, Daniel Cheng, Dana Jansens, Rachel Blum, Scott Graham, Victor Khimenko
On Thu, Dec 6, 2018 at 2:17 PM Jan Wilken Dörrie <jdoe...@chromium.org> wrote:
I played around with typedeffing base::string16 to std::u16string and base::char16 to char16_t in https://crrev.com/c/1361705. Here are my findings:

This change will break conversion between base::char16 and character types that so far have been the same. These are:
All of these conversions can be addressed with adding appropriate reinterpret_casts. Only ICU's UChar is used relatively widely, and we could ease adoption by dropping the corresponding entry from ICU's BUILD.gn file. We then roll the update into Chrome at the same time we switch the typedefs. However, adding the required reinterpret_casts first will allow us to update ICU independently, which might be desirable here. Apart from these casts switching the types was relatively painless on all non-Windows platforms, which brings me to my next point.

Windows breaks left, right and center. This should not be too surprising, as code like base::string16 my_str = L"FooBar"; will not compile anymore. Thus we first need to identify all base::string16 that are in Windows specific code and make them std::wstring (std::wstring usage will likely go up with this change). Also, we likely want to reintroduce some of the machinery we removed in https://crbug.com/23581, such as base::ASCIIToWide and other related string conversion functions. However, this time they should be Windows exclusive, and could live in the base/win subfolder (or we add a lot of #if defined(OS_WIN) guards). Luckily, transforming Windows code from base::string16 to std::wstring can be done in pieces, as right now they are the exact same type. Once everything is migrated, we add reinterpret_casts for shared code and flip the typedef switch.

Moving from string16 back to wstring seems like a pretty big step backwards to me. Maybe this means u16string isn't a good fit for us :-/
 
--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To post to this group, send email to c...@chromium.org.

Avi Drissman

unread,
Dec 6, 2018, 2:43:10 PM12/6/18
to Nico Weber, jdoe...@chromium.org, cxx, Peter Kasting, Daniel Cheng, Dana Jansens, Rachel Blum, Scott Graham, Victor Khimenko
But this is a move from string16 to wstring only in specifically targeted parts of the Windows code that talk to Win32.

We'll get a standard std::u16string everywhere except platform code. Today, we have NSStrings and CFStringRefs in Chromium: not everywhere, but just at the edges to interoperate with the platform. I see std::wstring in the same role: not everywhere, but just at the edges to interoperate with the platform.

A big win is being able to get rid of the weirdness of a custom type. "string16 is maybe our own type and is maybe wstring".

Avi

Peter Kasting

unread,
Dec 6, 2018, 4:08:13 PM12/6/18
to Avi Drissman, Nico Weber, Jan Wilken, cxx, Daniel Cheng, Dana Jansens, Rachel Blum, Scott Graham, Victor Khimenko
On Thu, Dec 6, 2018 at 11:43 AM Avi Drissman <a...@chromium.org> wrote:
But this is a move from string16 to wstring only in specifically targeted parts of the Windows code that talk to Win32.

Right.  And it's confusing today that we can use string16 in places but pass it to native APIs that expect a wstring.  You would expect to potentially need a conversion there; not converting makes it misleading what the underlying type is.

I see this as a win overall.  What Jan is describing exactly matches what I expected when this was roughly sketched out last time, so it's not surprising.  I support moving forward.

PK

Avi Drissman

unread,
Dec 6, 2018, 4:45:13 PM12/6/18
to Peter Kasting, Nico Weber, jdoe...@chromium.org, cxx, Daniel Cheng, Dana Jansens, Rachel Blum, Scott Graham, Victor Khimenko
To be clear, I see this as a win as well. We get rid of a custom type, we make it explicit where we are using wstring (rather than hide it behind a typedef), and we are using one standard type everywhere.

Daniel Cheng

unread,
Dec 6, 2018, 5:10:46 PM12/6/18
to Nico Weber, Jan Wilken Dörrie, cxx, Peter Kasting, Dana Jansens, Rachel Blum, Scott Graham, Victor Khimenko
On Thu, Dec 6, 2018 at 11:20 AM Nico Weber <tha...@chromium.org> wrote:
On Thu, Dec 6, 2018 at 2:17 PM Jan Wilken Dörrie <jdoe...@chromium.org> wrote:
I played around with typedeffing base::string16 to std::u16string and base::char16 to char16_t in https://crrev.com/c/1361705. Here are my findings:

This change will break conversion between base::char16 and character types that so far have been the same. These are:
All of these conversions can be addressed with adding appropriate reinterpret_casts. Only ICU's UChar is used relatively widely, and we could ease adoption by dropping the corresponding entry from ICU's BUILD.gn file. We then roll the update into Chrome at the same time we switch the typedefs. However, adding the required reinterpret_casts first will allow us to update ICU independently, which might be desirable here. Apart from these casts switching the types was relatively painless on all non-Windows platforms, which brings me to my next point.

Windows breaks left, right and center. This should not be too surprising, as code like base::string16 my_str = L"FooBar"; will not compile anymore. Thus we first need to identify all base::string16 that are in Windows specific code and make them std::wstring (std::wstring usage will likely go up with this change). Also, we likely want to reintroduce some of the machinery we removed in https://crbug.com/23581, such as base::ASCIIToWide and other related string conversion functions. However, this time they should be Windows exclusive, and could live in the base/win subfolder (or we add a lot of #if defined(OS_WIN) guards). Luckily, transforming Windows code from base::string16 to std::wstring can be done in pieces, as right now they are the exact same type. Once everything is migrated, we add reinterpret_casts for shared code and flip the typedef switch.

Moving from string16 back to wstring seems like a pretty big step backwards to me. Maybe this means u16string isn't a good fit for us :-/

It's not super clear to me why we'd need to use std::wstring. When we previously talked about this, we talked about having some external helper to expose the contents of a u16string as wchar_t* on Windows. Do we really need std::wstring?

Daniel

Jan Wilken Dörrie

unread,
Dec 6, 2018, 6:28:11 PM12/6/18
to Daniel Cheng, Nico Weber, cxx, Peter Kasting, Dana Jansens, Rachel Blum, Scott Graham, Victor Khimenko
I suppose that could be an option as well, yes. We commit to never use std::wstring anymore and just use std::u16string everywhere instead. The moment we interact with native APIs that consume or return wchar_t*s, we do the appropriate reinterpret_casts from or to char16_t*. This only fails if there are some APIs that take an actual std::wstring instead of wchar_t. Do these APIs exist? If no, good; if yes, we should allow a limited usage of std::wstring here. WDYT?

Jan Wilken Dörrie

unread,
Dec 11, 2018, 3:45:40 PM12/11/18
to Daniel Cheng, Nico Weber, cxx, Peter Kasting, Dana Jansens, Rachel Blum, Scott Graham, Victor Khimenko
To add some concrete examples: I am currently running into compilation errors that involve abstractions around OS primitives, such as base::CommandLine, base::Environment and base::FilePath. Each of these contain typedefs for native string types [1, 2, 3], that differ between Windows and non-Windows platforms. We will need to decide what these types are going to be going forward, and update code accordingly. Note that given this is platform dependent code, the choice of the string type for these should be limited to the Windows platform. Out of the two options of std::wstring and std::u16string I lean toward std::wstring, as this is de-facto native string type for Windows, and IMO it makes sense to use it for these OS-specific abstractions. 

On the other hand, using std::wstring will require additional overloads for string manipulation functions, such as TrimWhitespace. However, as I already said, these will be Windows specific, and thus could live in the base/win subfolder, making them unavailable for other platforms. What are your thoughts on this? Should we continue using std::wstring for native Windows strings or should we switch all of them to std::u16string, implying we switch most string literals from L"Foo" to u"Foo"?

Cheers,
Jan

Peter Kasting

unread,
Dec 11, 2018, 3:52:23 PM12/11/18
to Jan Wilken, Daniel Cheng, Nico Weber, cxx, Dana Jansens, Rachel Blum, Scott Graham, Victor Khimenko
On Tue, Dec 11, 2018 at 12:45 PM Jan Wilken Dörrie <jdoe...@chromium.org> wrote:
To add some concrete examples: I am currently running into compilation errors that involve abstractions around OS primitives, such as base::CommandLine, base::Environment and base::FilePath. Each of these contain typedefs for native string types [1, 2, 3], that differ between Windows and non-Windows platforms. We will need to decide what these types are going to be going forward, and update code accordingly. Note that given this is platform dependent code, the choice of the string type for these should be limited to the Windows platform. Out of the two options of std::wstring and std::u16string I lean toward std::wstring, as this is de-facto native string type for Windows, and IMO it makes sense to use it for these OS-specific abstractions. 

On the other hand, using std::wstring will require additional overloads for string manipulation functions, such as TrimWhitespace. However, as I already said, these will be Windows specific, and thus could live in the base/win subfolder, making them unavailable for other platforms. What are your thoughts on this? Should we continue using std::wstring for native Windows strings or should we switch all of them to std::u16string, implying we switch most string literals from L"Foo" to u"Foo"?

We could avoid the additional overloads if we want to convert across when calling utility functions, but that might be ugly.

It's a little tricky to know what to do in these cases without looking closely at how frequently the underlying code actually needs to interface with native APIs vs. Chrome APIs.  Given that the typedefs here already exist and we already have conversion functions (e.g. to construct a FilePath from a narrow string), my inclination matches yours -- use wstring for these types.  But perhaps u16string could make sense, with inserted conversions at the native boundaries.  Would have to try it to see.

PK

Jan Wilken Dörrie

unread,
Dec 18, 2018, 12:27:34 PM12/18/18
to Peter Kasting, Daniel Cheng, Nico Weber, cxx, Dana Jansens, Rachel Blum, Scott Graham, Victor Khimenko
Coming back to Daniel's initial point in the OP: What do y'all think about introducing a temporary STRING16_LITERAL macro to string16.h? This would allow us already to get rid of most ASCIIToUTF16 usages and will aid in the transition to std::u16string. Once base::string16 is std::u16string on all platforms we can remove the macro and simply use the u"" char16_t literals.

A possible definition of the macro could look like this: #define STRING16_LITERAL(x) bit_cast<const base::char16*>(u##x). This should work, given that all involved character types (wchar_t / uint16_t and char16_t) are UTF16.

Let me know what you think.

Cheers,
Jan 
--

Jan Wilken Dörrie

unread,
Jan 12, 2021, 6:29:34 AM1/12/21
to cxx, Jan Wilken Dörrie, Daniel Cheng, Nico Weber, cxx, danakj, Rachel Blum, Scott Graham, Victor Khimenko, Peter Kasting
Apologies for digging up this old thread, but I wanted to share an update:

As of https://crrev.com/841439 base::string16 is std::u16string except where wchar_t is UTF16 (most notably Windows). While I still plan on converting Windows's base::string16 as well, you finally are able to construct string literals of type const base::char16[], replacing most needs for base::ASCIIToUTF16. For now you need the STRING16_LITERAL macro to do this, but eventually you'll be able to just use the u"..." string literal:

Today:
constexpr const base::char16 kFoo[] = STRING16_LITERAL("foo");
constexpr base::StringPiece16 kBar = STRING16_LITERAL("bar");

(Hopefully) soon:
constexpr const base::char16 kFoo[] = u"foo";
constexpr base::StringPiece16 kBar = u"bar";

Best regards,
Jan

Reply all
Reply to author
Forward
0 new messages