On Tue, Sep 13, 2016 at 1:18 PM, Peter Kasting <pkas...@google.com> wrote:the question is what the solution is.All Windows version we support use UTF16 (instead of UCS2), so we don't need UTF16ToWide<>, a reinterpret_cast will suffice. I'd lean towards automated code rewriting instead of adding an implicit cast, though. (That's what I meant by 'rewrite tools', and that was supposed to address the second part :)Also: Even if we inject in the std:: namespace, I don't think there's a way to make it work implicitly - assignments, cast operators, and converting constructors cannot be freestanding.The question is, do we want to invest that time, or is u"" such a minor gain, we don't care right now?
On Fri, Aug 11, 2017 at 2:20 AM, Daniel Cheng <dch...@chromium.org> wrote:
> Bumping a rather old thread, since it's sad to see ASCIIToUTF16("my
> literal") written out*.
>
> It looks like one of the main issues with using u"..." is that char16_t and
> wchar_t aren't implicitly convertible (boo).
>
> Given that std::wstring / wchar_t is (almost?) always Windows specific, how
> about just using u"..." for places that use ASCIIToUTF16("...") today?
> Windows can use L"..." where a wide string is required.
>
> The other big issue is base::string16, but perhaps we could just use
> char16_t as the character type for base::string16, and provide a
> Windows-only helper to reinterpret cast the string to a wchar_t?
Note, of course, that reinterpret_cast<wchar_t>(my_char16_t) yields
undefined behavior when you proceed to use the buffer.
MSVC and/or
clang-on-windows could go beyond the standard to give us special
permission to do that. Do you know if they have?
Jeffrey
--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To post to this group, send email to c...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CANh-dXk9fAUKR-XPZPBQfRnA2v3TO%2BG52Wi_D8ioCfpFtowa4Q%40mail.gmail.com.
I personally think we should work on migrating base::string16 to std::u16string as a longer-term goal, and I see some work like that as a step towards it.On Fri, Aug 11, 2017 at 5:20 AM, Daniel Cheng <dch...@chromium.org> wrote:Given that std::wstring / wchar_t is (almost?) always Windows specific, how about just using u"..." for places that use ASCIIToUTF16("...") today? Windows can use L"..." where a wide string is required.The other big issue is base::string16, but perhaps we could just use char16_t as the character type for base::string16, and provide a Windows-only helper to reinterpret cast the string to a wchar_t? Today, we would write:base::string16 text = LoadResource(...);::SetWindowTextW(hwnd, text.data());And instead we would now write:base::string16 text = LoadResource(...);::SetWindowTextW(hwnd, text.wdata());Would this be sufficient to make life less painful?
On Fri, Aug 11, 2017 at 11:38 AM, Avi Drissman <a...@chromium.org> wrote:I personally think we should work on migrating base::string16 to std::u16string as a longer-term goal, and I see some work like that as a step towards it.On Fri, Aug 11, 2017 at 5:20 AM, Daniel Cheng <dch...@chromium.org> wrote:Given that std::wstring / wchar_t is (almost?) always Windows specific, how about just using u"..." for places that use ASCIIToUTF16("...") today? Windows can use L"..." where a wide string is required.The other big issue is base::string16, but perhaps we could just use char16_t as the character type for base::string16, and provide a Windows-only helper to reinterpret cast the string to a wchar_t? Today, we would write:base::string16 text = LoadResource(...);::SetWindowTextW(hwnd, text.data());And instead we would now write:base::string16 text = LoadResource(...);::SetWindowTextW(hwnd, text.wdata());Would this be sufficient to make life less painful?So, to be clear, you need to do the second thing (using char16_t as the base for string16) to allow the first to compile (using u""), right?To do that you need to be able to access string16 as wstring/wchar_t*. Adding a wdata() member to do this seems tricky. string16 is typedefed to wstring on Windows. I think you'd have to change that typedef (because you can't inject a member function into std::basic_string) and I imagine there will be many more than 700+ places that will break, because we've removed most of the explicit conversions-to-wide we used to have in the Windows code, and now assume the two types are equivalent everywhere. We don't just pass wstrings as wchar_t*s, we actually pass them as wstrings.
After you change the typedef, you'll need to actually introduce the helper(s). This means you'll need to define string16 as... I dunno, something that inherits from u16string? I thought extending STL classes like this via inheritance was generally considered a Bad Idea (see e.g. Effective C++ which tells you not to do it). I think it probably works better if we do this via an external helper function than by a member. At that point, I think we've gotten back to the ol' Wide<->UTF16 conversion functions route. The actual implementation of these might be a reinterpret cast that we assume won't cause strict-aliasing problems (since we turn that off), but we still have to actually have somewhere that implements them.
Re: passing wstring: we don't usually pass C++ objects outside of Chrome code so I imagine this would only be an issue within Chrome code. Fixing up Chrome to always use string16 internally doesn't seem too bad.
After you change the typedef, you'll need to actually introduce the helper(s). This means you'll need to define string16 as... I dunno, something that inherits from u16string? I thought extending STL classes like this via inheritance was generally considered a Bad Idea (see e.g. Effective C++ which tells you not to do it). I think it probably works better if we do this via an external helper function than by a member. At that point, I think we've gotten back to the ol' Wide<->UTF16 conversion functions route. The actual implementation of these might be a reinterpret cast that we assume won't cause strict-aliasing problems (since we turn that off), but we still have to actually have somewhere that implements them.Just to make sure we're on the same page, the UTF16AsWide would accept a const char16_t* or ustring16, but only return a wchar_t* from both overloads, right? Ideally we'd disallow implicit conversion of the result to a wstring as well (which I think should be achievable using some tricks).
Basically, I think in the end the plan has to be something like:* ensure we use wstring/wchar_t as little as possible (hopefully mostly done)* reintroduce all the WideToUTF16/UTF16ToWide machinery that got removed (maybe the necessary places can be found locally by changing the string16 typedef away from wstring)* switch string16 to be a typedef to u16string* convert ASCIIToUTF16("") to u""* convert the typenames everywhere and remove the typedef
...
PK
--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To post to this group, send email to c...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CAAHOzFBGpn8D3QH-z5MfYVPR%2BFm3RFHkX7Ke15QO%3D6yLs%2BFXCg%40mail.gmail.com.
Jan Wilken Dörrie
Software Engineer
jdoe...@google.com
+49 89 839300973
Google Germany GmbH
Erika-Mann-Straße 33
80636 München
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.
This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.
Bumping up this thread, since thakis@ and I just talked about this offline.On Sat, Aug 12, 2017 at 3:21 AM 'Peter Kasting' via cxx <c...@chromium.org> wrote:...Basically, I think in the end the plan has to be something like:* ensure we use wstring/wchar_t as little as possible (hopefully mostly done)* reintroduce all the WideToUTF16/UTF16ToWide machinery that got removed (maybe the necessary places can be found locally by changing the string16 typedef away from wstring)* switch string16 to be a typedef to u16string* convert ASCIIToUTF16("") to u""* convert the typenames everywhere and remove the typedef...This plan SGTM. Do we have an existing tracking bug for the wstring -> string16 change or should I create a new one?
I played around with typedeffing base::string16 to std::u16string and base::char16 to char16_t in https://crrev.com/c/1361705. Here are my findings:This change will break conversion between base::char16 and character types that so far have been the same. These are:All of these conversions can be addressed with adding appropriate reinterpret_casts. Only ICU's UChar is used relatively widely, and we could ease adoption by dropping the corresponding entry from ICU's BUILD.gn file. We then roll the update into Chrome at the same time we switch the typedefs. However, adding the required reinterpret_casts first will allow us to update ICU independently, which might be desirable here. Apart from these casts switching the types was relatively painless on all non-Windows platforms, which brings me to my next point.Windows breaks left, right and center. This should not be too surprising, as code like base::string16 my_str = L"FooBar"; will not compile anymore. Thus we first need to identify all base::string16 that are in Windows specific code and make them std::wstring (std::wstring usage will likely go up with this change). Also, we likely want to reintroduce some of the machinery we removed in https://crbug.com/23581, such as base::ASCIIToWide and other related string conversion functions. However, this time they should be Windows exclusive, and could live in the base/win subfolder (or we add a lot of #if defined(OS_WIN) guards). Luckily, transforming Windows code from base::string16 to std::wstring can be done in pieces, as right now they are the exact same type. Once everything is migrated, we add reinterpret_casts for shared code and flip the typedef switch.
--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To post to this group, send email to c...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CALB5StadYfxYcLgY4OqKf_5U00AZ8R%3DMBO11Cs2Qomm7DrZ4sg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CAMGbLiFAkMDhEQFybbsKNPbZLqPkii_N-1nTnOH8seXN6U9OpA%40mail.gmail.com.
But this is a move from string16 to wstring only in specifically targeted parts of the Windows code that talk to Win32.
On Thu, Dec 6, 2018 at 2:17 PM Jan Wilken Dörrie <jdoe...@chromium.org> wrote:I played around with typedeffing base::string16 to std::u16string and base::char16 to char16_t in https://crrev.com/c/1361705. Here are my findings:This change will break conversion between base::char16 and character types that so far have been the same. These are:All of these conversions can be addressed with adding appropriate reinterpret_casts. Only ICU's UChar is used relatively widely, and we could ease adoption by dropping the corresponding entry from ICU's BUILD.gn file. We then roll the update into Chrome at the same time we switch the typedefs. However, adding the required reinterpret_casts first will allow us to update ICU independently, which might be desirable here. Apart from these casts switching the types was relatively painless on all non-Windows platforms, which brings me to my next point.Windows breaks left, right and center. This should not be too surprising, as code like base::string16 my_str = L"FooBar"; will not compile anymore. Thus we first need to identify all base::string16 that are in Windows specific code and make them std::wstring (std::wstring usage will likely go up with this change). Also, we likely want to reintroduce some of the machinery we removed in https://crbug.com/23581, such as base::ASCIIToWide and other related string conversion functions. However, this time they should be Windows exclusive, and could live in the base/win subfolder (or we add a lot of #if defined(OS_WIN) guards). Luckily, transforming Windows code from base::string16 to std::wstring can be done in pieces, as right now they are the exact same type. Once everything is migrated, we add reinterpret_casts for shared code and flip the typedef switch.Moving from string16 back to wstring seems like a pretty big step backwards to me. Maybe this means u16string isn't a good fit for us :-/
To add some concrete examples: I am currently running into compilation errors that involve abstractions around OS primitives, such as base::CommandLine, base::Environment and base::FilePath. Each of these contain typedefs for native string types [1, 2, 3], that differ between Windows and non-Windows platforms. We will need to decide what these types are going to be going forward, and update code accordingly. Note that given this is platform dependent code, the choice of the string type for these should be limited to the Windows platform. Out of the two options of std::wstring and std::u16string I lean toward std::wstring, as this is de-facto native string type for Windows, and IMO it makes sense to use it for these OS-specific abstractions.On the other hand, using std::wstring will require additional overloads for string manipulation functions, such as TrimWhitespace. However, as I already said, these will be Windows specific, and thus could live in the base/win subfolder, making them unavailable for other platforms. What are your thoughts on this? Should we continue using std::wstring for native Windows strings or should we switch all of them to std::u16string, implying we switch most string literals from L"Foo" to u"Foo"?