Deprecating to_string() for floating point types and introducing a replacement

2,899 views
Skip to first unread message

u97...@gmail.com

unread,
Feb 28, 2016, 8:06:29 AM2/28/16
to ISO C++ Standard - Future Proposals
This post continues Deprecate to_string() for floating point types?

The original thread served it purpose and it seemed more appropriate to start a new thread in this forum with a more proposal-like text for evaluation.

------------------------

Summary
--------------

This document introduces the current behaviour of std::to_string() pointing out it's fundamental problems with floating point types and suggests a change to standard.

Introduction
-----------------

Converting numeric values to string is a common task in contexts like UI's and human readable files and common enough to be listed as a newbie question in C++ FAQ [6]. Before C++11 there was no clear choice for conversion and some of the options suggested [4] [5] for the task were:
  1. std::ostringstream
  2. sprintf
  3. boost::lexical_cast
These have problems ranging from complexity and performance to buffer handling hazards and external dependency. With std::to_string() the general answer got easy: use std::to_string().

This, however, is not a good answer for floating point types.

Floating point types are by their nature trickier to deal with and it's harder to define what std::to_string() should do with floating point types.

The papers introducing std::to_string() [1] [2] [3] does not discuss the purpose of std::to_string() (i.e. what problem it is intended to solve) and the current standard simply defines that return string of std::to_string(double val) is identical to what sprintf(buf, "%f", val) would generate with sufficient buffer size buf.

This paper argues that the decision is highly questionable, impractical and a source of bugs thus suggesting a change to the standard.

Problem
------------

In form of examples, the definition of std::to_string(double val) essentially as sprintf(buf, "%f", val) has the following implications:
  1. std::to_string(double(0)) == "0.000000"
  2. std::to_string(double(10000)) == "10000.000000"
  3. std::to_string(double(1e300)) == "1000000000000000052504760255204420248704468581108159154915854115511802457988908195786371375080447864043704443832883878176942523235360430575644792184786706982848387200926575803737830233794788090059368953234970799945081119038967640880074652742780142494579258788820056842838115669472196386865459400540160.000000"
  4. std::to_string(double(1e-9)) == "0.000000"
While it's not obvious how the conversion should work for floating point types, it's hard to see that any acceptable implementation would have behaviour as shown above. The problems are:
  • Redundant characters from information preservation point of view (1, 2, 3, 4).
  • Unreadable output (3)
  • Failing to preserve even a single significant digit (4).
While redundant decimal characters may be even desired in some use cases, the failure to preserve any significant digits is a serious issue especially as it is not a rare corner case issue but affects a big proportion of the whole double domain. How big a proportion? At least in common implementations of double, base 10 exponent can get values in range [-308, 308]. All values where exponent is in range [-308, -8] are converted to "0.000000" by std::to_string(). And given that readability of values with tens of digits is bad as demonstrated in example 3, the exponent range where std::to_string() creates acceptable string representation is quite small although the good range may well cover the most frequent use cases.

For comparison, here's a short survey of related implementations using double (or similar type) value 1.23456789e-9:

  • Visual Basic (VS2015) ToString() : "1.23456789E-09"
  • C# (VS2015) ToString() : "1.23456789E-09"
  • Java toString() : "1.23456789E-9"
  • JavaScript toString() : "1.23456789e-9"
  • boost::lexical_cast<std::string>(1.23456789e-9): "1.2345678899999999e-09"
  • Qt 5.4 QString::number(1.23456789e-9) : "1.23457e-09"
  • Qt 5.4 QVariant(1.23456789e-9).toString() : "1.23456789e-09"
  • C++ to_string(1.23456789e-9) : "0.000000"

8 implementations, in 5 the result is precision-wise identical to value written to source code, in 1 the result is non-lossy but longer than source code version, in 1 the value is rounded to 6 significant digits and then there's std::to_string() that simply wipes out all significant digits.

Ideal
-------

If std::to_string() for floating point types was created from scratch, the question would be: what should it do? The following starting point is assumed:

  • std::to_string() for integer types is defined and their implementations are what they currently are.
  • to_string() is required to print (human readable) decimal representation also of floating point values.

Some ideas:

  1. With integer types, the following condition is true: x != y <=> to_string(x) != to_string(y). Also for related from_string() implementation such as stoul(), stoul(to_string(x)) == x for all x of type T. In plain words this means that one can convert integer to string and read it back to get exactly the same item from which the string representation was created from. To keep the semantics the same for floating point types, require the same property for floating point types.
  2. std::cout -like implementation that uses a standard defined default number of significant digits that is less than needed for non-lossy conversion.
  3. Use fixed decimal count.
  4. Print value as close to mathematically exact value as possible.
Option 4 would for example imply hugely long string representations as demonstrated earlier so it's not viable option.

Option 3 is the current one and is obviously out of question.

In option 2 to_string() is allowed to create an approximation meaning that to_string(x) == to_string(y) for many different x and y thus changing the semantics compared to integer implementation. This is an essential difference so there should be a very good reason for choosing this. The argument is that there's none: prettiness and practicality arguments are use case specific that can't be defined on standard level.

With this reasoning the only reasonable choice would be option 1. Notes on it's implications:

  • The string representation is allowed to be ambiguous.
  • The string representation is not required to be mathematically exact: e.g. "1e300" is accepted because from_string(to_string(1e300)) == from_string("1e300") even though mathematically double(1e300) != 1e300.
  • Resulting string may be long but even in worst case is much shorter than what current to_string() prints for big values. For smaller values the resulting string may be shorter than with current to_string() because redundant decimals can be shortened (e.g. "1.0" vs "1.000000")
In this context the ideal could be formulated as follows:

to_string() with floating point types shall create such a string representation that for related from_string() implementation such as stod(), from_string(to_string(x)) == x for all x in floating point type domain. The string representation shall be portable across all implementations that use the same floating point implementation in the sense that reading the string in another implementation shall result to value exactly the same as from which the string was created from. The string representations are not, however, required to be identical on such implementations, but is it recommended to be shortest possible using similar scheme as defined in %g-family format in sprintf().

How to change the standard
----------------------------------------

Big complication is that to_string() can't be changed without introducing a (breaking) change. The change could be of various type:

  1. Behaviour change on runtime: may break any existing code using the current definition.
  2. Compilation breaking or deprecation change: safer alternative implemented by API-change causing a compilation failure.
  3. Keep current behaviour but introduce parameters to make it possible to get a better behaviour.

Not particularly nice choices, but the runtime behaviour change can be ruled out for starter leaving only two options. Breaking compilation or deprecation is undesired, but preferred to having numerous programmers making bugs due to this for decades to come. Thus the proposal is to deprecate current to_string for floating point types and introduce a new function with name to_string_f() or similar that behaves as described earlier. to_string_f() shall also take optional parameters that allows developer to fine tune the conversion like implemented e.g. in  QString::number() [7]. Options for fine tuning is important in the sense that making to_string_f() too simple and restricted will too quickly cause programmer to resort to old and lacking alternatives, although at this point the purpose of to_string_f() is not to act as full featured sprintf-like formatting implementation.

Early draft suggestion
-------------------------------

(float, long double and wstring omitted for clarity):

std::string to_string_f(double d, int precision = special_value_requesting_shortest_nonlossy_representation, char format = 'g') noexcept;

Precision: Meaning depends on format-specifier as in sprintf in addition to special items.
Format: Possible values: a, A, e, E, f, F, g, G as available for sprintf().

If precision or format is not within accepted range, to_string_f() returns empty string.


Notes and open questions
--------------------------------------

-Usage examples and possible return values:
    to_string_f(1.0) -> "1.0"
    to_string_f(1.0, 10) -> "1.0"
    to_string_f(1.0, 2, 'f') -> "1.00"
    to_string_f(1.23456789e-9) -> "1.23456789e-9"
    to_string_f(1.23456789e-9, 2) -> "1.2e-9"
    to_string_f(1e300) -> "1e300"


-The essential question: is it feasible to implemented the from_string(to_string(x)) == x requirement given all the possible floating point implementations permitted by the C++ standard? If not, can the condition be fulfilled by not requiring it for some corner cases such as NaN's (e.g. having only single NaN instead of multiple)?

-Having int and char as parameters instead of string such as "%.3f" is chosen for safety and simplicity: eliminates the need to do string parsing and avoids possibility for errors in the users defined format string.

-Shortest necessary string representation can't be acquired using sprintf() with maximal precision, but such implementation would be acceptable because shortest possible string is not required. This is chosen to avoid the need to define "shortest possible" which is not expected to have essential relevance for the usage of to_string_f(). Also the implementation effort from library implementers to guarantee the shortest possible on all standard conforming C++ implementations might be unnecessarily high.

-For portability of string representation, see the end of section 'Ideal'

-Should the default value for precision be, say, -1 or an enum given the conventions used for enums in std-namespace?

-Handling of out-of-range parameter values should be revised and defined. The noexcept-property should however be preserved.

-How to handle locale: implicitly use the same as sprintf or require that the output is locale independent? Locale-dependency would cause portability issues so having locale-independent output would be desirable and in accordance with the design intention as to_string_f() is not intended to do full featured locale aware formatting.

-Would to_string_f() be appropriate name?


Details on examples
-----------------------------

Visual Basic (VS2015)
    Dim aDouble As Double
    aDouble = 0.00000000123456789
    Dim s As String = aDouble.ToString()
    Console.WriteLine(s)
   
C# (VS2015)
    double a = 1.23456789e-9;
    String s = a.ToString();
    Console.Write(s);
   
boost::lexical_cast (boost 1.55)
    auto s = boost::lexical_cast<std::string>(1.23456789e-9);
    std::cout << s;
   
Java (sun-jdk-8u51)
    Double a = 1.23456789e-9;
    String s = a.toString();
    System.out.println(s);
   
JavaScript (tested in Firefox 44)
    var dA = 1.23456789e-9;
    var sA = dA.toString();
   
Qt (5.4)
    QString s = QString::number(1.23456789e-9);
    auto s2 = QVariant(1.23456789e-9).toString();


References
----------

[1] N1803: Simple Numeric Access. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1803.html
[2] N1982: Simple Numeric Access Revision 1. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1982.html
[3] N2408: Simple Numeric Access Revision 2. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2408.html
[4] How to convert a number to string and vice versa in C++. http://stackoverflow.com/questions/5290089/how-to-convert-a-number-to-string-and-vice-versa-in-c
[5] How do I convert a double into a string in C++?. http://stackoverflow.com/questions/332111/how-do-i-convert-a-double-into-a-string-in-c
[6] How do I convert an integer to a string? https://isocpp.org/wiki/faq/newbie#int-to-string
[7] QString::number(). http://doc.qt.io/qt-5/qstring.html#number-6.
   

Moritz Klammler

unread,
Feb 28, 2016, 11:38:51 AM2/28/16
to std-pr...@isocpp.org
I have loosely followed the previous discussion and this is a nice
write-up. I have a few questions about your proposal.

It seems to me that you consider the ability to have lossless
round-trips important. I don't think I agree that a lack of this
property is an inconsistency with the `to_string` overloads for
integers. It's rather inherent to the nature of floating-point. If an
application needs value-preserving round-trips (as would be highly
desirable for storing numeric values in a text-based database, for
example), using "hexfloat" format would be an economic alternative that
is almost perfect except that a mere mortal human cannot make sense out
of the gibberish. It seems to me that formatting floating-point values
for storing in text format and parsing again later without loss and
formatting floating-point values for presentation to humans are very
distinct use-cases that should not be confused. `std::to_string` should
decide which use-case it wants to serve. I opt for the human interface
but then the value-preserving property of round-trips is less important
an argument. If we choose the other use-case, I'd really expect
"hexfloat" as an option. Your current draft doesn't mention "hexfloat"
at all.

How likely is it that actual software would be broken by changing
`std::to_string`'s definition from using `%f` to `%g`? Given all of its
problems discussed in your text, it seems to me that the author of any
application that uses `std::to_string` today probably didn't think too
much about corner cases anyway. Switching from `%f` to `%g` would be a
win for both, presenting to humans and storing for later re-use. I
think the potential damage would be very limited. Having `to_string`
overloaded for all built-in types is useful, especially for writing
generic code, so I wouldn't like to deprecate it easily if the
undeniable deficits can also be fixed reasonably.

> std::string to_string_f(double d,
> int precision = special_value_requesting_shortest_nonlossy_representation,
> char format = 'g') noexcept;
>
> [...]
>
> If precision or format is not within accepted range, to_string_f()
> returns empty string.
>
> - Handling of out-of-range parameter values should be revised and
> defined. The noexcept-property should however be preserved.

None of the `std::to_string` overloads today is `noexcept` and given
that `std::string`'s constructor isn't, they cannot realistically be.
Of course, in practice, SSO will often prevent the need for memory
allocation but besides the fact that the standard doesn't mandate it,
especially with very long floating-point representations, dynamic memory
allocation could easily be required.

I also don't think that silently returning an empty string is a good way
to communicate invalid inputs. And if I remember correctly, the
accepted practice is to only declare functions `noexcept` if they have
no preconditions.

Many of the conversion functions in § 21.5 [string.conversions] are
specified to throw `std::invalid_argument` so I don't see why the
proposed `std::to_string_f` should deviate from this.

Finally, why does it have to be a new function? Couldn't the same
benefit be achieved by adding those default arguments to the existing
`to_string`? I see that you didn't want to change the `%f` requirement
but do like to use `%g` as default for the new function. But as
discussed above, I think that this is a risk we might well consider
worth taking.

d.h.go...@gmail.com

unread,
Feb 29, 2016, 10:47:51 AM2/29/16
to ISO C++ Standard - Future Proposals, u97...@gmail.com
Have you read the paper "How to print floating point numbers accurately", e.g. from here: http://kurtstephens.com/files/p372-steele.pdf
This seems to cover some of the problems you described.

u97...@gmail.com

unread,
Feb 29, 2016, 6:26:49 PM2/29/16
to ISO C++ Standard - Future Proposals, u97...@gmail.com
Thanks for the responses.

@d.h.go...

I haven't, might be a good idea to do so.

@Moritz Klammler

About the inconsistency with integer versions, importance of round-tripping and hexfloat

The inconsistency is in some sense hard to evaluate because, as mentioned, the original paper did not specify the purpose of to_string(). In many use cases such as human readable files using hexfloat might be a fundamental change to usability: for example I would reckon that it's quite different for interoperability with other tools (spreadsheet etc.) to have a file with %g-formatted values compared to hexfloats. And using it is even harder to justify when one can use non-lossy decimal representation. But requiring round-tripping might actually be indirect: what if it is not required? What should to_string() do? If the suggestion is a lossy precision, the arguments can directly address the question why the chosen number of digits can be considered a "fits for all" choice. Of course round-tripping format does not fit for all, but I considered loss of information much bigger an issue than having an inconvenience in the output; standard can't make a guarantee that the formatted looks nice and has the right number of digits for everyone, but it could guarantee some aspects of information preservation for everyone.

hexfloat is by my understanding available through options 'a' and 'A'.


Likelihood of causing breaking changes

Generally speaking if a developer has created an application relying on a standard guaranteed behaviour, any change can be a breaking one (here for example one could be assuming that the result does not contain character 'e'). Especially the silent runtime behaviour change is potentially hazardous.

Use of exceptions

Unfortunately the noexcept seems to be out of question due to the potential allocation of std::string. thanks for pointing that out. Why the (direct) use of exceptions such as std::invalid_argument does not seem appropriate is that it seems a bit too severe a consequence that failing to create a string would in practice often crash the application; i.e. people would forget to use try-catch with to_string().  It's not a serious error if conversion fails as the function can simply report "I got garbage, I can't give an answer and you can check this from the return value if interested". If the failure is a serious error for the application, it should handle it accordingly. But this is more a question of when and why the standard seems appropriate to use exception and how to apply it in this case if this ever proceed to such stage.

"Finally, why does it have to be a new function?"

Because any change is potentially a breaking change; if experienced standard people feel the expected consequences are negligible, personally I would probably use to_string() more happily than to_string_f(). And while "%g" would have been better than "%f" in some sense, %g implies only 6 significant digits which also has a severe problem:

    const auto t = std::time(nullptr);
   
const auto d = static_cast<double>(t);
   
if (t == d)
   
{
       
char szBuffer[128];
        std
::cout << "Values are identical\n"
               
<< "to_string(t) = " << std::to_string(t) << '\n'
               
<< "to_string(d) = " << std::to_string(d) << '\n';
        sprintf
(szBuffer, "%g", d);
        std
::cout << "to_string(d) if %g instead of %f = " << szBuffer << '\n';
   
}


Example output:
Values are identical
to_string
(t) = 1456783429
to_string
(d) = 1456783429.000000
to_string
(d) if %g instead of %f = 1.45678e+09

So with %g identical values would result to fundamentally different strings depending on type. So simply changing %f -> %g might result to loss of precision and severely break existing code on runtime. And if setting a higher precision to avoid this, the question is how precise? The proposal would make it easy: it's precise enough that there's no need to worry about it.

Moritz Klammler

unread,
Feb 29, 2016, 7:10:36 PM2/29/16
to std-pr...@isocpp.org
> The inconsistency is in some sense hard to evaluate because, as
> mentioned, the original paper did not specify the purpose of
> to_string().

That's true. In order to overcome this, I think that your proposal
would benefit from putting a discussion at the beginning what purpose
you want the (new) function to serve. I'm opting for "having something
simple to show a 'nice' string to humans" but you might disagree.
Without defining a use-case for this function, any further decision will
be debatable because, as you say, there is no solution that seems right
for all cases.

> hexfloat is by my understanding available through options 'a' and 'A'.

You are right. I didn't realize that.

> Generally speaking if a developer has created an application relying
> on a standard guaranteed behaviour, any change can be a breaking one
> (here for example one could be assuming that the result does not
> contain character 'e'). Especially the silent runtime behaviour change
> is potentially hazardous.

I would consider any software that uses `to_string` (as specified today)
for data exchange with another application or for internal textual
storage flawed. I don't think that this should restrict us too much in
thinking about ways to improve the standard library. Better fix a
broken tool now than carrying it along (even if deprecated) for years to
come. The beauty of the `to_string` family of functions really shows in
generic code and it would be a pity if we had to special-case
floating-point every time. It is already too bad the the parsing
functions all have different names instead of being generic in the form
`from_string<T>` but I believe I have seen a proposal to address this.

I would prefer to keep the `to_string` functions *simple* tools for
presenting numbers to humans in a "reasonable" format. It is true that
"reasonable" is hard to define but the standard already did this long
time ago when specifying `%g` so it seems logical to me to re-use this
definition instead of inventing a new one. If an application needs
finer control, it can always use `sprintf` directly with all the
flexibility it offers. (Or you can use streams if you prefer.)

> Why the (direct) use of exceptions such as std::invalid_argument does
> not seem appropriate is that it seems a bit too severe a consequence
> that failing to create a string would in practice often crash the
> application; i.e. people would forget to use try-catch with
> to_string(). It's not a serious error if conversion fails as the
> function can simply report "I got garbage, I can't give an answer and
> you can check this from the return value if interested". If the
> failure is a serious error for the application, it should handle it
> accordingly.

I think that passing an invalid format specifier is a bug and the
function should fail as loudly as possible in that case. Potentially
crashing the application seems appropriate to me.

Nicol Bolas

unread,
Feb 29, 2016, 11:38:15 PM2/29/16
to ISO C++ Standard - Future Proposals
On Monday, February 29, 2016 at 7:10:36 PM UTC-5, Moritz Klammler wrote:
> Generally speaking if a developer has created an application relying
> on a standard guaranteed behaviour, any change can be a breaking one
> (here for example one could be assuming that the result does not
> contain character 'e'). Especially the silent runtime behaviour change
> is potentially hazardous.

I would consider any software that uses `to_string` (as specified today)
for data exchange with another application or for internal textual
storage flawed.

Flawed or not, I don't think it's reasonable to dismiss this use-case. Reality has taught us that people will use the simplest tool available to them that seemingly does the job, without looking too closely at it. `to_string` became the simplest way of turning a float into a string. Therefore, people have and will use it for every form of turning a float into a string.

That's not to say we shouldn't just change the behavior, even if it causes breakage. But we shouldn't ignore the fact that it will cause breakage; we can't just declare that these users are wrong for using the tool we made for them. The standards committee was the one who was wrong here, for creating a simple tool without any real direction in what it means or how it should be used.

u97...@gmail.com

unread,
Mar 2, 2016, 5:14:43 PM3/2/16
to ISO C++ Standard - Future Proposals
@ Moritz Klammler

So far I've seen no good arguments why to_string() should use something else than round-tripping precision, but luckily that doesn't mean the result would be hard to read: in many (common) cases the result will/should be identical to what %g produces and be way better than the result of %f. Sure there will be cases where it means more characters than with %g, but as demonstrated with the time_t-example, 6 significant digits is not enough; trading correctness for vaguely defined readability is like trading correctness for performance. What's the point of evaluating readability of string representation that misses essential information of the source value? What comes the 6 digits, I don't know the history but the decision seems strange: with uint32 number of significant digits goes up to 10 and with uint64 up to 20, why should 6 be enough for floating points?

The need to create wrappers for various functions in order to be used in templates is indeed an annoyance as is the idea that the standard would introduce more reasons to write such wrappers, so from that point of view I agree that maintaining to_string() would be desirable. And speaking of templates, it's a good example why to_string<T> should have well defined and from that point of view identical behaviour for all T.

One can easily come up with realistic examples why passing invalid arguments to to_string_f() is not a bug, but if the conventions in the standard would still insist on using exceptions in situations like this, which I hope it doesn't, then it must be so. I'll have to look into that more closely.

Edward Catmur

unread,
Mar 2, 2016, 6:27:38 PM3/2/16
to ISO C++ Standard - Future Proposals, u97...@gmail.com
On Thursday, 3 March 2016 06:14:43 UTC+8, u97...@gmail.com wrote:
> @ Moritz Klammler
>
> So far I've seen no good arguments why to_string() should use something else than round-tripping precision, but luckily that doesn't mean the result would be hard to read: in many (common) cases the result will/should be identical to what %g produces and be way better than the result of %f. Sure there will be cases where it means more characters than with %g, but as demonstrated with the time_t-example, 6 significant digits is not enough; trading correctness for vaguely defined readability is like trading correctness for performance.

I agree entirely. It's not possible to separate the use cases of data interchange and human readability; textual interchange formats are mostly read by machines but are also monitored and in some cases edited by humans. Another example would be logging; usually log files are ignored, occasionally they are read (by humans) and occasionally a human will need to feed the logged values back into the program and be confident of getting the same behavior.

Currently I use Milo Yip's implementation of Florian Loitsch's Grisu2 algorithm (https://github.com/miloyip/dtoa-benchmark/blob/master/readme.md), but it would be great to be able to use a standard facility instead.

This would mean that to_string couldn't be specified in terms of printf format specifiers, but I don't see that as much of a problem - those were standardized before the utility of perfect representation was understood.

u97...@gmail.com

unread,
May 20, 2016, 9:34:23 AM5/20/16
to ISO C++ Standard - Future Proposals, u97...@gmail.com
There's now a preliminary proposal draft available for comments. It is by no means ready for submit and there are many open questions the need to be addressed.

A quick summary of current views:
  • Default std::to_string() must provide base 10 representation with round-tripping possibility at least for non-NaN's, NaN's are yet to be decided.
  • std::to_string() has overload for easy formatting similar to QString::number()
  • String format is independent of locales.
  • Exact string representation is implementation specific as long as it fulfils these requirements.
  • The way how to change the standard is open: every option evaluated has major drawbacks.

Nicol Bolas

unread,
May 20, 2016, 4:36:01 PM5/20/16
to ISO C++ Standard - Future Proposals, u97...@gmail.com


On Friday, May 20, 2016 at 9:34:23 AM UTC-4, u97...@gmail.com wrote:
There's now a preliminary proposal draft available for comments.

It isn't really showing up as HTML in my browser. Or rather, it is showing up as HTML. I can see the tags. Google Drive seems to think it was a text file.

Also, you seem to have been beaten to the punch by a fairly wide margin.

u97...@gmail.com

unread,
May 21, 2016, 5:10:56 AM5/21/16
to ISO C++ Standard - Future Proposals, u97...@gmail.com
The file can be viewed correctly by downloading it. Big thanks for point out P0067; a great proposal addressing the fundamental machinery needed but it doesn't discuss the future of std::to_string() so these would be complementary instead of competing.
Reply all
Reply to author
Forward
0 new messages