Reminder: wstrings are deprecated - use string16 instead

145 views
Skip to first unread message

Jeremy Moskovich

unread,
Feb 24, 2010, 12:44:08 PM2/24/10
to Chromium-dev
(Sending this out again since it seems people are still introducing new wstrings).

To quote the original mail on the subject:

[A bunch of the team met up today to hammer out some decisions.] 
In brief: for strings that are known to be Unicode (that is, not 
random byte strings read from a file), we will migrate towards using 
string16.  This means all places we use wstring should be split into 
the appropriate types: 
 - byte strings should be string or vectors of chars 
 - paths should be FilePath 
 - urls should be GURL 
 - UI strings, etc. should be string16. 

string16 uses UTF-16 underneath.  It's equivalent to wstring on 
Windows, but wstring involves 4-byte characters on Linux/Mac. 

Some important factors were: 
- we don't have too many strings in this category (with the huge 
exception of WebKit), so memory usage isn't much of an issue 
- it's the native string type of Windows, Mac, and WebKit 
- we want it to be explicit (i.e. a compile error) when you 
accidentally use a byte string in a place where we should know the 
encoding (which std::string and UTF-8 doesn't allow) 
- we still use UTF-8 in some places (like the history full-text 
database) where space is more of a concern 

http://crbug.com/23581 is on file to track this.

Thanks,
Avi, tvl & Jeremy

Brett Wilson

unread,
Feb 24, 2010, 1:16:31 PM2/24/10
to jer...@chromium.org, Chromium-dev

The question that always comes up as a result is "what to do about literals?"

Use ASCIIToUTF16("foo"), assuming of course your literal is ASCII.
This is basically the same speed as constructing a wstring out of a
wchar_t* literal since it's just a copy with 0-extend on each one.

Brett

George Yakovlev

unread,
Feb 24, 2010, 1:46:15 PM2/24/10
to bre...@chromium.org, jer...@chromium.org, Chromium-dev
For Windows it is inefficient - both speed wise and memory wise, as
there are many places where we need wchar_t literals (all of the API
calls, for example). So creation of extra std::(w)string is wasteful.
Is there way to create a macro that declares UTF-16 literal on gcc?
AFAIK, u16 prefix is not supported yet.
George.

> --
> Chromium Developers mailing list: chromi...@chromium.org
> View archives, change email options, or unsubscribe:
>    http://groups.google.com/a/chromium.org/group/chromium-dev
>

Victor Khimenko

unread,
Feb 24, 2010, 1:50:33 PM2/24/10
to geo...@google.com, bre...@chromium.org, jer...@chromium.org, Chromium-dev
"u16" prefix is actually "u" prefix (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2442.htm) and it's supported starting from GCC 4.5... For GCC before 4.5 (i.e.: all released versions of GCC) there are no such way.

Peter Kasting

unread,
Feb 24, 2010, 1:53:30 PM2/24/10
to geo...@google.com, bre...@chromium.org, jer...@chromium.org, Chromium-dev
On Wed, Feb 24, 2010 at 10:46 AM, George Yakovlev <geo...@google.com> wrote:
For Windows it is inefficient - both speed wise and memory wise, as
there are many places where we need wchar_t literals (all of the API
calls, for example). So creation of extra std::(w)string is wasteful.

The rule is actually more nuanced: wstring is supposed to be gone in all cross-platform code, where it's not really possible to use wchar_t* anyway.  If you have Windows-only code, then within that scope you can use wchar_t* instead if it truly makes sense.

But more importantly, I've never seen a case in our code where this "inefficiency" matters.  Constructing a string16 (which is typedefed to wstring on Windows) is incredibly cheap and the resulting object does not use significantly more memory than the raw literal.  A primary rule when coding is "don't optimize the wrong things".  If you can avoid wchar_t* and wstring in favor of char*, string, and string16, please do so.  If you think you have a perf-critical case, we can consider things on a one-off basis, but in general you don't need to care about this.

PK

George Yakovlev

unread,
Feb 24, 2010, 1:56:34 PM2/24/10
to Peter Kasting, bre...@chromium.org, jer...@chromium.org, Chromium-dev
But why do not have it if we can (as Victor pointed out)?
Little waste is still waste. It adds up :).

G.

Scott Hess

unread,
Feb 24, 2010, 2:04:48 PM2/24/10
to geo...@google.com, Peter Kasting, bre...@chromium.org, jer...@chromium.org, Chromium-dev
Having code which is incorrect because wstring is not the same on all
platforms adds up even faster.

-scott

Peter Kasting

unread,
Feb 24, 2010, 2:05:29 PM2/24/10
to George Yakovlev, bre...@chromium.org, jer...@chromium.org, Chromium-dev
On Wed, Feb 24, 2010 at 10:56 AM, George Yakovlev <geo...@google.com> wrote:
But why do not have it if we can (as Victor pointed out)?
Little waste is still waste. It adds up :).

No, it doesn't add up.  You're the second developer I've seen recently with this misconception.  When you're not in perf-critical code, losing a few cycles simply _does not matter_.  Losing 4 bytes of memory on a handful of objects _does not matter_.  Short, readable, consistent code is not just "more important", it's the only thing that matters at all in this case.

C++ strings like string and string16 simply are not significantly more heavyweight to construct and get literals out of compared to keeping the raw literal everywhere, and they're safer, simpler and clearer to use.  Besides, as I said before, if you're touching cross-platform functions with any of your code surface, you're not going to be able to use raw literals everywhere anyway, as we don't have a type for that -- wchar_t* certainly isn't it.

PK

George Yakovlev

unread,
Feb 24, 2010, 2:07:12 PM2/24/10
to Scott Hess, Peter Kasting, bre...@chromium.org, jer...@chromium.org, Chromium-dev
Yes, but what I am saying is that if we have a STRING16() macro for
literals it will be string16 on all platforms, thus adding to clarity
and reducing both bugs and inefficiency!
G.

Peter Kasting

unread,
Feb 24, 2010, 2:12:08 PM2/24/10
to George Yakovlev, Scott Hess, bre...@chromium.org, jer...@chromium.org, Chromium-dev
On Wed, Feb 24, 2010 at 11:07 AM, George Yakovlev <geo...@google.com> wrote:
Yes, but what I am saying is that if we have a STRING16() macro for
literals it will be string16 on all platforms, thus adding to clarity
and reducing both bugs and inefficiency!

I don't see why ASCIIToUTF16("foo") is less clear or less buggy than STRING16("foo").

And as I've said multiple times now, I've never seen a case in our code where the extra cost you pay in constructing the object from the literal here matters in the slightest.  We don't use literals very often outside of testing, since most strings come from WebKit or the localized resource files.

PK

George Yakovlev

unread,
Feb 24, 2010, 2:12:42 PM2/24/10
to Peter Kasting, bre...@chromium.org, jer...@chromium.org, Chromium-dev
Again, let me repeat myself: I am *not* for using wchar_t literals
everywhere. I am for UTF-16 literals, which happen to be efficient on
Windows. And code will be more consistent and *clear* if we have them
instead of wchar_t. We need literals not only in UI related code.
G.

Jeremy Moskovich

unread,
Feb 24, 2010, 2:13:03 PM2/24/10
to George Yakovlev, Scott Hess, Peter Kasting, bre...@chromium.org, Chromium-dev
Please stop.

We've had this discussion before on at least 2 occasions, people have argued for and against the conversion to string16.  The performance argument has been made and other string types have been suggested.

I suggest you review the original thread, and baring any new performance data lets stick with our previous decision.

Best regards,
Jeremy

George Yakovlev

unread,
Feb 24, 2010, 2:14:58 PM2/24/10
to Peter Kasting, Scott Hess, bre...@chromium.org, jer...@chromium.org, Chromium-dev
Static array of literals. In you case we are *forced* to use
std::*string with destructor being called on program exit.
In my case it just static POD.
G.

Peter Kasting

unread,
Feb 24, 2010, 2:19:03 PM2/24/10
to George Yakovlev, Scott Hess, bre...@chromium.org, jer...@chromium.org, Chromium-dev
On Wed, Feb 24, 2010 at 11:14 AM, George Yakovlev <geo...@google.com> wrote:
Static array of literals. In you case we are *forced* to use
std::*string with destructor being called on program exit.
In my case it just static POD.

What?

char* foo[] = {
  "abc",
  "def",
  "ghi",
};

void bar() {
  for (int i = 0; i < arraysize(foo); ++i) {
    doSomethingWith(foo[i]);
    doSomethingElseWith(ASCIIToUTF16(foo[i]));
  }
}

Using a static array of objects is banned by the Google style guide.  I am not proposing changing that, and unless I'm missing something, the code above demonstrates why it's not hard to be compliant without needing a STRING16() macro.

PK

Jeremy Orlow

unread,
Feb 24, 2010, 2:22:52 PM2/24/10
to pkas...@google.com, George Yakovlev, Scott Hess, bre...@chromium.org, jer...@chromium.org, Chromium-dev
Please listen to Jeremy?  (The other one...  :-)

--
Reply all
Reply to author
Forward
0 new messages