Overloading std::to_string() with std::string and std::wstring.

844 views
Skip to first unread message

turbi...@gmail.com

unread,
May 12, 2014, 2:57:18 AM5/12/14
to std-pr...@isocpp.org
You may be wondering why would anyone want to convert a string into a string? Story short, generic type handling.

Currently the ways to handle this are any of these when suitable:
  • Create a template function for handling the conversion of all types through std::to_string and then overload string to just return itself. Then call this function instead of std::to_string().
  • Specialise template class methods by overloading different types with suitable methods.
  • Put the characters through a std::stringstream, which is fair bit slower.
Benefits:
  • The conversion using std::wstring and vice-versa handled transparently.
  • Use of std::to_string used in generic type conversions, instead of creating yet another implementation for something so mundane.
I hope you have a good day.

Thiago Macieira

unread,
May 12, 2014, 3:08:42 AM5/12/14
to std-pr...@isocpp.org
Em dom 11 maio 2014, às 23:57:18, turbi...@gmail.com escreveu:
> - The conversion using std::wstring and vice-versa handled transparently.

That will be most welcome. Using mbstowcs is quite annoying.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358

turbi...@gmail.com

unread,
May 12, 2014, 4:38:15 AM5/12/14
to std-pr...@isocpp.org
On Monday, 12 May 2014 19:08:42 UTC+12, Thiago Macieira wrote:
That will be most welcome. Using mbstowcs is quite annoying.

There's std::wstring_convert available, you may want to use that rather than the c function mbstowcs. It does require C++11 though. 

Jim Porter

unread,
May 12, 2014, 6:45:07 PM5/12/14
to std-pr...@isocpp.org
On Monday, May 12, 2014 1:57:18 AM UTC-5, Chris Ridgway wrote:
You may be wondering why would anyone want to convert a string into a string? Story short, generic type handling.

I've done similar things with homegrown stringification functions, and I agree that this is pretty useful.

I'm a bit less convinced of the use of converting std::wstring to std::string via std::to_string(), but that's more because I think wstring is a bit of a mess. Fixing that would probably be a long and difficult process, though.

- Jim

David Krauss

unread,
May 12, 2014, 8:01:40 PM5/12/14
to std-pr...@isocpp.org

On 2014–05–13, at 6:45 AM, Jim Porter <jvp...@g.rit.edu> wrote:

> I'm a bit less convinced of the use of converting std::wstring to std::string via std::to_string(), but that's more because I think wstring is a bit of a mess. Fixing that would probably be a long and difficult process, though.

As someone recently noted here, to_string doesn’t respect localization. It’s better suited to serialization than user or OS interfaces.

Although Unicode and UTF-8 are fairly ubiquitous regardless of locale, the specificity vs. genericity of wstring_convert is probably best.

Chris Ridgway

unread,
May 12, 2014, 8:02:47 PM5/12/14
to std-pr...@isocpp.org
On Tuesday, 13 May 2014 10:45:07 UTC+12, Jim Porter wrote:
I'm a bit less convinced of the use of converting std::wstring to std::string via std::to_string(), but that's more because I think wstring is a bit of a mess. Fixing that would probably be a long and difficult process, though.

Don't think of it converting a data type to just a string, it's the std::generic_string<char> after all which could be used to convert std::string into std::wstring. It'd certainly be a logical step, although the character encoding support is quite a mess as you say.

Jim Porter

unread,
May 12, 2014, 11:24:38 PM5/12/14
to std-pr...@isocpp.org
On Monday, May 12, 2014 7:01:40 PM UTC-5, David Krauss wrote:

On 2014–05–13, at 6:45 AM, Jim Porter <jvp...@g.rit.edu> wrote:

> I'm a bit less convinced of the use of converting std::wstring to std::string via std::to_string(), but that's more because I think wstring is a bit of a mess. Fixing that would probably be a long and difficult process, though.

As someone recently noted here, to_string doesn’t respect localization. It’s better suited to serialization than user or OS interfaces.

Localization (e.g. the punctuation you'd use for a floating-point number) is one thing; encoding is another. If it were up to me, std::basic_string would be an array of code units (e.g. bytes), and there'd be an std::text class that handles encodings and represents a sequence of code points. That's certainly beyond the scope of something like this, so I wouldn't argue too hard against an std::tostring(std::wstring) overload; it's just not something I particularly care about.

One of these days, I really want to make a proof-of-concept that starts over from scratch on I/O and character encodings for C++, although that might never be standardizable due to the entrenchment of std::string and I/O streams. I've been tinkering with a more printf-like way of formatting output, but that's only a small part of the problem.

- Jim

Thiago Macieira

unread,
May 12, 2014, 11:54:10 PM5/12/14
to std-pr...@isocpp.org
Em seg 12 maio 2014, às 20:24:38, Jim Porter escreveu:
> Localization (e.g. the punctuation you'd use for a floating-point number)
> is one thing; encoding is another. If it were up to me, std::basic_string
> would be an array of code units (e.g. bytes), and there'd be an std::text
> class that handles encodings and represents a sequence of code points.
> That's certainly beyond the scope of something like this, so I wouldn't
> argue too hard against an std::tostring(std::wstring) overload; it's just
> not something I particularly care about.
>
> One of these days, I really want to make a proof-of-concept that starts
> over from scratch on I/O and character encodings for C++, although that
> might never be standardizable due to the entrenchment of std::string and
> I/O streams. I've been tinkering with a more printf-like way of formatting
> output, but that's only a small part of the problem.

You've described QByteArray and QString: one is an array of bytes, the other
contains human text, in a sequence of code points, and can transform them
according to the Unicode rules. There are two choices for storing text: either
keep the encoding along with the object or convert everything to a single,
known encoding. QString chooses the latter.

Anyway, I agree with you that std::string is pretty much entrenched as an
array of arbitrary bytes, of any encoding. Attempting to do text
transformations with it might lead to trouble, or at least poorly-written code
(like QByteArray's toLower function: it's documented to behave strangely for
non-ASCII bytes).

However, std::u16string, std::u32string and std::wstring aren't affected.

Daniel Krügler

unread,
May 13, 2014, 1:35:07 AM5/13/14
to std-pr...@isocpp.org
2014-05-13 2:01 GMT+02:00 David Krauss <pot...@gmail.com>:
> On 2014-05-13, at 6:45 AM, Jim Porter <jvp...@g.rit.edu> wrote:
>
> As someone recently noted here, to_string doesn't respect localization. It's better suited to serialization than user or OS interfaces.
>

Unfortunately to_string still provides locale-dependency, albeit not
to grouping symbols, but is not guaranteed to be "pure". It still is
affected by locale effects that influence the decimal separator, for
example.

the to_(w)string functions were provided just as convenience functions
and were not intended to derive a family of ADL-lookup functions from
this.

If we want such a family I would strongly encourage to determine what
is the intend here:

a) Unchanged against any localization/global effects (except memory allocation)
->
This is some important property that I would like to see somewhere. We
cannot use to_string for this, because it doesn't hold for the
built-in types.

b) ?

Java's toString implementations of the "primitive" types and their
corresponding Object reference types guarantees (a), which I consider
as a very reasonable strategy.

- Daniel

Jim Porter

unread,
May 13, 2014, 1:51:01 AM5/13/14
to std-pr...@isocpp.org
(Google Groups ate my draft, so I'll keep this one brief...)


On Monday, May 12, 2014 10:54:10 PM UTC-5, Thiago Macieira wrote:
Anyway, I agree with you that std::string is pretty much entrenched as an
array of arbitrary bytes, of any encoding. Attempting to do text
transformations with it might lead to trouble, or at least poorly-written code
(like QByteArray's toLower function: it's documented to behave strangely for
non-ASCII bytes).

However, std::u16string, std::u32string and std::wstring aren't affected.

I'd agree for std::u(16|32)string, but with std::wstring, you're not even guaranteed a particular character width. While you might have some reasonable expectations of the encoding for std::wstring on a given platform, it's difficult to write cross-platform code with it unless you rely on platform-specific code to pick the right encoding for each platform. Especially given the lack of a defined encoding for std::string, I'd be worried about converting a std::wstring to a std::string unless the process was "ignore all non-ASCII characters".

That said, this is getting a bit off-topic from the original proposal, which I think is overall a good idea. I'd just have to see what the spec is for std::to_string(std::wstring). I doubt it would make things any worse than they already are, though.

- Jim

Thiago Macieira

unread,
May 13, 2014, 1:57:58 AM5/13/14
to std-pr...@isocpp.org
Em seg 12 maio 2014, às 22:51:01, Jim Porter escreveu:
> I'd agree for std::u(16|32)string, but with std::wstring, you're not even
> guaranteed a particular character width. While you might have some
> reasonable expectations of the encoding for std::wstring on a given
> platform, it's difficult to write cross-platform code with it unless you
> rely on platform-specific code to pick the right encoding for each
> platform. Especially given the lack of a defined encoding for std::string,
> I'd be worried about converting a std::wstring to a std::string unless the
> process was "ignore all non-ASCII characters".

Any conversion to std::string must have possible failure modes. Whether that's
an exception, a return value or silent replacement with U+FFFD, it's something
to be decided later.

Technically speaking, the wide character execution charset is also left to the
implementation, which means std::wstring could have any encoding too. So,
strictly speaking, conversion to std::wstring should also be allowed to fail.

Then again, converting from std::u16string or std::u32string can also fail if
they contain bad data (improperly paired surrogate codepoints for UTF-16, non-
Unicode entries in std::u32string) and conversion to them can also fail if the
source is improperly encoded.

In other words, any conversion between string formats needs to have a
reporting system for failures.

Jim Porter

unread,
May 13, 2014, 1:59:54 AM5/13/14
to std-pr...@isocpp.org
On Tuesday, May 13, 2014 12:35:07 AM UTC-5, Daniel Krügler wrote:
If we want such a family I would strongly encourage to determine what
is the intend here:

a) Unchanged against any localization/global effects (except memory allocation)
->
This is some important property that I would like to see somewhere. We
cannot use to_string for this, because it doesn't hold for the
built-in types.

b) ?

Java's toString implementations of the "primitive" types and their
corresponding Object reference types guarantees (a), which I consider
as a very reasonable strategy.

Yes, something locale-independent would be nice. In some of my experiments, I've tried creating a set of functions of the form to_string<StringType>(ArgType), e.g. to_string<std::string>(123) => std::string("123"). This expands on the original proposal's notion of using this for generic functions, since you can now parameterize on the string type as well as the argument type.

For locale-dependent stringification, I think using I/O streams makes more sense, since you can define the locale via the stream. Globally-defined locales are really just a good way to introduce machine-specific bugs.

- Jim

David Krauss

unread,
May 13, 2014, 5:58:44 AM5/13/14
to std-pr...@isocpp.org
On 2014–05–13, at 1:35 PM, Daniel Krügler <daniel....@gmail.com> wrote:

2014-05-13 2:01 GMT+02:00 David Krauss <pot...@gmail.com>:

As someone recently noted here, to_string doesn't respect localization. It's better suited to serialization than user or OS interfaces.


Unfortunately to_string still provides locale-dependency, albeit not
to grouping symbols, but is not guaranteed to be "pure". It still is
affected by locale effects that influence the decimal separator, for
example.

I’ve never used a compliant locales implementation (my OS of choice, Mac, is barely functional), but my impression of the spec is that to_string depends on the C library locale as set by setlocale, which is supposed to be independent of the C++ standard library (but isn’t, at least on my OS, where the C library is the *only* way to set the C++ locale).

the to_(w)string functions were provided just as convenience functions
and were not intended to derive a family of ADL-lookup functions from
this.

If we want such a family I would strongly encourage to determine what
is the intend here:

a) Unchanged against any localization/global effects (except memory allocation)
->
This is some important property that I would like to see somewhere. We
cannot use to_string for this, because it doesn't hold for the
built-in types.

Moreover, locale dependencies if any should be specified, along with general functionality, in terms of the C++ library, not C. The only reason I can imagine for definition in terms of sprintf was as a lazy attempt to avoid locale dependency.

There could be an optional locale argument which, if omitted, defaults to the “C” locale rather than the default locale. This would allow for elimination of dynamic dispatches, which is the only way to obtain satisfactory performance in string generation intensive applications.

Daniel Krügler

unread,
May 13, 2014, 6:49:56 AM5/13/14
to std-pr...@isocpp.org
2014-05-13 11:58 GMT+02:00 David Krauss <pot...@gmail.com>:
> On 2014-05-13, at 1:35 PM, Daniel Krügler <daniel....@gmail.com> wrote:
> 2014-05-13 2:01 GMT+02:00 David Krauss <pot...@gmail.com>:
> As someone recently noted here, to_string doesn't respect localization. It's
> better suited to serialization than user or OS interfaces.
>
> Unfortunately to_string still provides locale-dependency, albeit not
> to grouping symbols, but is not guaranteed to be "pure". It still is
> affected by locale effects that influence the decimal separator, for
> example.
>
> I've never used a compliant locales implementation (my OS of choice, Mac, is
> barely functional), but my impression of the spec is that to_string depends
> on the C library locale as set by setlocale, which is supposed to be
> independent of the C++ standard library (but isn't, at least on my OS, where
> the C library is the *only* way to set the C++ locale).

It is exactly the dependency on std::setlocale() I was referring to.
It is part of the C++ Standard library by inclusion and whether this
is a C dependency or a C++ dependency, is irrelevant for me. And any
(named) changes to the global std::local can also impact the
std::sprintf behaviour (22.3.1.5 [locale.statics]) p2:

"Effects: Causes future calls to the constructor locale() to return a
copy of the argument. If the
argument has a name, does

std::setlocale(LC_ALL, loc.name().c_str());

otherwise, the effect on the C locale, if any, is implementation-defined."

- Daniel
Reply all
Reply to author
Forward
0 new messages