This post continues
Deprecate to_string() for floating point types?The original thread served it purpose and it seemed more appropriate to start a new thread in this forum with a more proposal-like text for evaluation.
------------------------
Summary
--------------
This document introduces the current behaviour of std::to_string() pointing out it's fundamental problems with floating point types and suggests a change to standard.
Introduction
-----------------
Converting numeric values to string is a common task in contexts like UI's and human readable files and common enough to be listed as a newbie question in
C++ FAQ [6]. Before C++11 there was no clear choice for conversion and some of the options suggested
[4] [5] for the task were:
- std::ostringstream
- sprintf
- boost::lexical_cast
These have problems ranging from complexity and performance to buffer handling hazards and external dependency. With std::to_string() the general answer got easy: use std::to_string().
This, however, is not a good answer for floating point types.
Floating point types are by their nature trickier to deal with and it's harder to define what std::to_string() should do with floating point types.
The papers introducing std::to_string()
[1] [2] [3] does not discuss the purpose of std::to_string() (i.e. what problem it is intended to solve) and the current standard simply defines that return string of std::to_string(double val) is identical to what sprintf(buf, "%f", val) would generate with sufficient buffer size buf.
This paper argues that the decision is highly questionable, impractical and a source of bugs thus suggesting a change to the standard.
Problem
------------
In form of examples, the definition of std::to_string(double val) essentially as sprintf(buf, "%f", val) has the following implications:
- std::to_string(double(0)) == "0.000000"
- std::to_string(double(10000)) == "10000.000000"
- std::to_string(double(1e300)) ==
"1000000000000000052504760255204420248704468581108159154915854115511802457988908195786371375080447864043704443832883878176942523235360430575644792184786706982848387200926575803737830233794788090059368953234970799945081119038967640880074652742780142494579258788820056842838115669472196386865459400540160.000000"
- std::to_string(double(1e-9)) == "0.000000"
While it's not obvious how the conversion should work for floating point types, it's hard to see that any acceptable implementation would have behaviour as shown above. The problems are:
- Redundant characters from information preservation point of view (1, 2, 3, 4).
- Unreadable output (3)
- Failing to preserve even a single significant digit (4).
While redundant decimal characters may be even desired in some use cases, the failure to preserve any significant digits is a serious issue especially as it is not a rare corner case issue but affects a big proportion of the whole double domain. How big a proportion? At least in common implementations of double, base 10 exponent can get values in range [-308, 308]. All values where exponent is in range [-308, -8] are converted to "0.000000" by std::to_string(). And given that readability of values with tens of digits is bad as demonstrated in example 3, the exponent range where std::to_string() creates acceptable string representation is quite small although the good range may well cover the most frequent use cases.
For comparison, here's a short survey of related implementations using double (or similar type) value 1.23456789e-9:
- Visual Basic (VS2015) ToString() : "1.23456789E-09"
- C# (VS2015) ToString() : "1.23456789E-09"
- Java toString() : "1.23456789E-9"
- JavaScript toString() : "1.23456789e-9"
- boost::lexical_cast<std::string>(1.23456789e-9): "1.2345678899999999e-09"
- Qt 5.4 QString::number(1.23456789e-9) : "1.23457e-09"
- Qt 5.4 QVariant(1.23456789e-9).toString() : "1.23456789e-09"
- C++ to_string(1.23456789e-9) : "0.000000"
8 implementations, in 5 the result is precision-wise identical to value written to source code, in 1 the result is non-lossy but longer than source code version, in 1 the value is rounded to 6 significant digits and then there's std::to_string() that simply wipes out all significant digits.
Ideal
-------
If std::to_string() for floating point types was created from scratch, the question would be: what should it do? The following starting point is assumed:
- std::to_string() for integer types is defined and their implementations are what they currently are.
- to_string() is required to print (human readable) decimal representation also of floating point values.
Some ideas:
- With integer types, the following condition is true: x != y <=>
to_string(x) != to_string(y). Also for related from_string()
implementation such as stoul(), stoul(to_string(x)) == x for all x of
type T. In plain words this means that one can convert integer to string
and read it back to get exactly the same item from which the string
representation was created from. To keep the semantics the same for
floating point types, require the same property for floating point
types.
- std::cout -like implementation that uses a standard defined default number of significant
digits that
is less than needed for non-lossy conversion.
- Use fixed decimal count.
- Print value as close to mathematically exact value as possible.
Option 4 would for example imply hugely long string representations as demonstrated earlier so it's not viable option.
Option 3 is the current one and is obviously out of question.
In option 2 to_string() is allowed to create an approximation meaning that to_string(x) == to_string(y) for many different x and y thus changing the semantics compared to integer implementation. This is an essential difference so there should be a very good reason for choosing this. The argument is that there's none: prettiness and practicality arguments are use case specific that can't be defined on standard level.
With this reasoning the only reasonable choice would be option 1. Notes on it's implications:
- The string representation is allowed to be ambiguous.
- The string representation is not required to be mathematically exact:
e.g. "1e300" is accepted because from_string(to_string(1e300)) ==
from_string("1e300") even though mathematically double(1e300) != 1e300.
- Resulting string may be long but even in worst case is much shorter than
what current to_string() prints for big values. For smaller values the
resulting string may be shorter than with current to_string() because
redundant decimals can be shortened (e.g. "1.0" vs "1.000000")
In this context the ideal could be formulated as follows:
to_string() with floating point types shall create such a string representation that for related from_string() implementation such as stod(), from_string(to_string(x)) == x for all x in floating point type domain. The string representation shall be portable across all implementations that use the same floating point implementation in the sense that reading the string in another implementation shall result to value exactly the same as from which the string was created from. The string representations are not, however, required to be identical on such implementations, but is it recommended to be shortest possible using similar scheme as defined in %g-family format in sprintf().
How to change the standard
----------------------------------------
Big complication is that to_string() can't be changed without introducing a (breaking) change. The change could be of various type:
- Behaviour change on runtime: may break any existing code using the current definition.
- Compilation breaking or deprecation change: safer alternative implemented by API-change causing a compilation failure.
- Keep current behaviour but introduce parameters to make it possible to get a better behaviour.
Not particularly nice choices, but the runtime behaviour change can be ruled out for starter leaving only two options. Breaking compilation or deprecation is undesired, but preferred to having numerous programmers making bugs due to this for decades to come. Thus the proposal is to deprecate current to_string for floating point types and introduce a new function with name to_string_f() or similar that behaves as described earlier. to_string_f() shall also take optional parameters that allows developer to fine tune the conversion like implemented e.g. in
QString::number() [7]. Options for fine tuning is important in the sense that making to_string_f() too simple and restricted will too quickly cause programmer to resort to old and lacking alternatives, although at this point the purpose of to_string_f() is not to act as full featured sprintf-like formatting implementation.
Early draft suggestion
-------------------------------
(float, long double and wstring omitted for clarity):
std::string to_string_f(double d, int precision = special_value_requesting_shortest_nonlossy_representation, char format = 'g') noexcept;
Precision: Meaning depends on format-specifier as in sprintf in addition to special items.
Format: Possible values: a, A, e, E, f, F, g, G as available for sprintf().
If precision or format is not within accepted range, to_string_f() returns empty string.
Notes and open questions
--------------------------------------
-Usage examples and possible return values:
to_string_f(1.0) -> "1.0"
to_string_f(1.0, 10) -> "1.0"
to_string_f(1.0, 2, 'f') -> "1.00"
to_string_f(1.23456789e-9) -> "1.23456789e-9"
to_string_f(1.23456789e-9, 2) -> "1.2e-9"
to_string_f(1e300) -> "1e300"
-The essential question: is it feasible to implemented the from_string(to_string(x)) == x requirement given all the possible floating point implementations permitted by the C++ standard? If not, can the condition be fulfilled by not requiring it for some corner cases such as NaN's (e.g. having only single NaN instead of multiple)?
-Having int and char as parameters instead of string such as "%.3f" is chosen for safety and simplicity: eliminates the need to do string parsing and avoids possibility for errors in the users defined format string.
-Shortest necessary string representation can't be acquired using sprintf() with maximal precision, but such implementation would be acceptable because shortest possible string is not required. This is chosen to avoid the need to define "shortest possible" which is not expected to have essential relevance for the usage of to_string_f(). Also the implementation effort from library implementers to guarantee the shortest possible on all standard conforming C++ implementations might be unnecessarily high.
-For portability of string representation, see the end of section 'Ideal'
-Should the default value for precision be, say, -1 or an enum given the conventions used for enums in std-namespace?
-Handling of out-of-range parameter values should be revised and defined. The noexcept-property should however be preserved.
-How to handle locale: implicitly use the same as sprintf or require that the output is locale independent? Locale-dependency would cause portability issues so having locale-independent output would be desirable and in accordance with the design intention as to_string_f() is not intended to do full featured locale aware formatting.
-Would to_string_f() be appropriate name?
Details on examples
-----------------------------
Visual Basic (VS2015)
Dim aDouble As Double
aDouble = 0.00000000123456789
Dim s As String = aDouble.ToString()
Console.WriteLine(s)
C# (VS2015)
double a = 1.23456789e-9;
String s = a.ToString();
Console.Write(s);
boost::lexical_cast (boost 1.55)
auto s = boost::lexical_cast<std::string>(1.23456789e-9);
std::cout << s;
Java (sun-jdk-8u51)
Double a = 1.23456789e-9;
String s = a.toString();
System.out.println(s);
JavaScript (tested in Firefox 44)
var dA = 1.23456789e-9;
var sA = dA.toString();
Qt (5.4)
QString s = QString::number(1.23456789e-9);
auto s2 = QVariant(1.23456789e-9).toString();
References
----------
[1] N1803: Simple Numeric Access.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1803.html[2] N1982: Simple Numeric Access Revision 1.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1982.html[3] N2408: Simple Numeric Access Revision 2.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2408.html[4] How to convert a number to string and vice versa in C++.
http://stackoverflow.com/questions/5290089/how-to-convert-a-number-to-string-and-vice-versa-in-c[5] How do I convert a double into a string in C++?.
http://stackoverflow.com/questions/332111/how-do-i-convert-a-double-into-a-string-in-c[6] How do I convert an integer to a string?
https://isocpp.org/wiki/faq/newbie#int-to-string[7] QString::number().
http://doc.qt.io/qt-5/qstring.html#number-6.