Deprecate to_string() for floating point types?

1,152 views
Skip to first unread message

u97...@gmail.com

unread,
Nov 8, 2015, 9:07:26 AM11/8/15
to ISO C++ Standard - Discussion
(bikeshedding note to avoid unwanted content: topic is about a somewhat small detail in the standard and this may turn out to be bikeshedding for people with more significant responsibilities)

For starters, please think an answer to the following question: what would you consider to be the string representation of double value 1e-9?

C++11 added std::to_string() that converts numeric value to string representation. Not thinking about historical ballast for C, generally speaking what would be reasonable behaviour for such a function? Given that numeric literals are written as text in the source code, it's natural to expect that numeric_to_string() would produce such a string that the original numeric value could be obtained from the string without losing information. What, however, to_string() does with floating point values is that it does not only lose information, but it can in practice wipe out the numeric information completely making it potential or, I would argue, likely cause of bugs and be generally a highly dubious implementation.

Examples:
1.
std::cout << std::to_string(1e-9);           // output: "0.000000"
std
::cout << std::to_string(6.67408e-11);    // output: "0.000000"
std
::cout << std::to_string(6.626070040e-34) // output: "0.000000"

All values are certainly different but their string representations are identical. As such to_string() essentially wipes out the numeric information. It's also worth noting that the resulting string may even be longer than a valid and likely the most natural string presentation (e.g. "1e-9") even though to_string() has destroyed the information.

2.
std::cout << std::to_string(1.23e100); // output:
"1229999999999999945619502435678791882061496502770990950045684429327960298864608335541984218516600989160291306221939122973741400364055485571676274743695192965637069768948118175959863951770799435358111025735195134313314113829815221797071926323389168215764573082356023275727273837119288529943287157489664.000000"
std
::cout << std::to_string(1.23e-300); // output:"0.000000"
std
::cout << 1.23e300;   // " output: "1.23e+300"
std
::cout << 1.23e-300;  // " output: "1.23e-300"

Values in above are quite similar: both have the same numeric precision and it would be fair to expect that their string representations would also be similar. In addition the representation of 1.23e300 is impractical due to it's length. Note also the contrast to default representation in std::cout.

3.
std::cout << 1e-9;                 // output: "1e-09"
std
::cout << std::to_string(1e-9); // output: "0.000000"

Fundamentally different behaviour (i.e. not only precision or formatting difference) of default to-stream printing and to_string() is probably unexpected as both lines essentially does (or should do) at least approximately the same thing: convert double to string representation.

With these examples the current definition of std::to_string() with floating points is fair to be considered fundamentally flawed and likely cause of bugs especially as the original(?) papers (N1803, N2408) did not mention this problem in no way indicating that this behaviour wasn't even though about. Changing the behaviour is a breaking change, so deprecating might be the best way to minimize the damages.


Example code:

#include <iostream>
#include <string>

int main()
{
    std
::cout << std::to_string(1e-9) << '\n';
    std
::cout << std::to_string(6.67408e-11) << '\n';
    std
::cout << std::to_string(6.626070040e-34) << '\n';

    std
::cout << std::to_string(1.23e300) << '\n';
    std
::cout << std::to_string(1.23e-300) << '\n';
    std
::cout << 1.23e300 << '\n';
    std
::cout << 1.23e-300 << '\n';

    std
::cout << 1e-9 << '\n';
    std
::cout << std::to_string(1e-9) << '\n';
   
return 0;
}

u97...@gmail.com

unread,
Nov 8, 2015, 9:17:15 AM11/8/15
to ISO C++ Standard - Discussion, u97...@gmail.com
Corrections:
  • first line in example 2 should have 1.23e300 instead of 1.23e100
  • "did not mention this problem in no way" -> "did not mention this problem at all"

Nicol Bolas

unread,
Nov 8, 2015, 10:13:12 AM11/8/15
to ISO C++ Standard - Discussion, u97...@gmail.com
I see no reason to deprecate these functions. Sure, it's not a completely backwards-compatible change, but fixing the functions to behave more reasonably is preferable to adding a bunch of new functions and telling people not to use the old ones.

Also, it's really hard to write code where this change isn't backwards compatible. Not unless you're explicitly relying on a particular size of characters.

Andrey Semashev

unread,
Nov 8, 2015, 4:54:13 PM11/8/15
to std-dis...@isocpp.org
On 2015-11-08 17:07, u97...@gmail.com wrote:
> (bikeshedding note to avoid unwanted content
> <https://groups.google.com/a/isocpp.org/d/msg/std-discussion/A4fhAM0OlhY/hNV47qdgbzIJ>:
> topic is about a somewhat small detail in the standard and this may turn
> out to be bikeshedding for people with more significant responsibilities)
>
> For starters, please think an answer to the following question: what
> would you consider to be the string representation of double value 1e-9?

The standard defines the format of the string returned by to_string
rather clearly. In case of double the formatting is equivalent to %f
snprintf flag.

> Given
> that numeric literals are written as text in the source code, it's
> natural to expect that numeric_to_string() would produce such a string
> that the original numeric value could be obtained from the string
> without losing information.

I wouldn't say this is a fair expectation with regard to floating point
numbers. The standard doesn't give you that guarantee, even with
iostreams, and given how volatile FP numbers are the lossless roundtrip
cannot be implied.

> What, however, to_string() does with
> floating point values is that it does not only lose information, but it
> can in practice wipe out the numeric information completely making it
> potential or, I would argue, likely cause of bugs and be generally a
> highly dubious implementation.

I'd say every design that relies on FP<->string<->FP to be a lossless
roundtrip is likely broken, whether it employs to_string/stod or
iostreams or C equivalents. Even if you set formatting flags correctly
(i.e. infinite width and precision, locale-agnostic formatting, etc.),
the formatting and parsing process itself may introduce error. Then
there are denormal FP values, which may or may not be rounded to zero
during any arithmetic operations that are applied during formatting and
parsing.

> Examples:
> 1.
> |
> std::cout <<std::to_string(1e-9);// output: "0.000000"
> std::cout <<std::to_string(6.67408e-11);// output: "0.000000"
> std::cout <<std::to_string(6.626070040e-34)// output: "0.000000"
> |
>
> All values are certainly different but their string representations are
> identical. As such to_string() essentially wipes out the numeric
> information. It's also worth noting that the resulting string may even
> be longer than a valid and likely the most natural string presentation
> (e.g. "1e-9") even though to_string() has destroyed the information.

Again, the standard defines the format of the strings and it does not
allow for "1e-9" output. It does not make the function work incorrectly
or not useful. Personally, I do find the std::fixed style more readable,
even though it leads to information loss like in the above. That makes
to_string more useful for diagnostic purposes to me.

If you want a different format then you can use snprintf or iostreams.

> 2.
> |
> std::cout <<std::to_string(1.23e100);// output:
> "1229999999999999945619502435678791882061496502770990950045684429327960298864608335541984218516600989160291306221939122973741400364055485571676274743695192965637069768948118175959863951770799435358111025735195134313314113829815221797071926323389168215764573082356023275727273837119288529943287157489664.000000"
> std::cout <<std::to_string(1.23e-300);// output:"0.000000"
> std::cout <<1.23e300;// " output: "1.23e+300"
> std::cout <<1.23e-300;// " output: "1.23e-300"
> |
>
> Values in above are quite similar: both have the same numeric precision
> and it would be fair to expect that their string representations would
> also be similar. In addition the representation of 1.23e300 is
> impractical due to it's length. Note also the contrast to default
> representation in std::cout.

The 1.23e300 number is not representable exactly as a double value. You
get information loss either way, it's just less apparent in the last two
lines. I even suspect the first line is giving you a more accurate
representation of the number that you have in your code in run time.

> 3.
> |
> std::cout <<1e-9;// output: "1e-09"
> std::cout <<std::to_string(1e-9);// output: "0.000000"
> |
>
> Fundamentally different behaviour (i.e. not only precision or formatting
> difference) of default to-stream printing and to_string() is probably
> unexpected as both lines essentially does (or should do) at least
> approximately the same thing: convert double to string representation.

And so they do.

> With these examples the current definition of std::to_string() with
> floating points is fair to be considered fundamentally flawed and likely
> cause of bugs especially as the original(?) papers (N1803
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1803.html>,
> N2408
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2408.html>)
> did not mention this problem in no way indicating that this behaviour
> wasn't even though about. Changing the behaviour is a breaking change,
> so deprecating might be the best way to minimize the damages.

I don't think to_string is flawed, its behavior is well defined and
useful in certain contexts. I'm certainly not in favor of deprecating or
removing it.

FP numbers are tricky.

Howard Hinnant

unread,
Nov 8, 2015, 5:26:07 PM11/8/15
to std-dis...@isocpp.org
On Nov 8, 2015, at 4:54 PM, Andrey Semashev <andrey....@gmail.com> wrote:
>
> I'd say every design that relies on FP<->string<->FP to be a lossless roundtrip is likely broken, whether it employs to_string/stod or iostreams or C equivalents. Even if you set formatting flags correctly (i.e. infinite width and precision, locale-agnostic formatting, etc.), the formatting and parsing process itself may introduce error.

std::hexfloat is pretty awesome, unless you’re a human… ;-)

Howard

u97...@gmail.com

unread,
Nov 8, 2015, 6:01:59 PM11/8/15
to ISO C++ Standard - Discussion
@Nicol Bolas

I'm not a standard expert so whether deprecation or modification would be better should be evaluated by more experienced people. Do you have an example of a function for which the semantics had been changed in past?

@Andrey Semashev

I'm quite astounded that someone finds nothing wrong in value-to-string mapping such as 1e-9 -> "0.000000". I'll clarify a few point, although disagreement will likely remain:

1. There was no claim that the behaviour mandated by the standard was not clearly defined, the argument was that the definition was error prone and dubious.
2. I see no reason why floating point values couldn't be represented as text in non-lossy way (please feel free to show me wrong). Your job is actually easy, it's enough to give one double value that can't be presented as text which would map exactly to the original double value (let's ignore NaN's etc. for now). For example the value 1.23e300 which you said "is not representable exactly as a double value": true, but the relevant part is that if you have a double defined in the source as 1.23e300, converting it to string "1.23e300" e.g. by format "%g" and then back to double will give exactly the same double value as the original; whether or not the original value can be exactly represented as double is irrelevant. And "%g" needs less than 10 chars instead of over 300 hundred.
3. Whether exactly non-lossy or not isn't the main point; values presented in the examples are not exceptional corner cases and still to_string() fails to convert them reasonably.

Yes floating points are tricky and that's why they shouldn't be made any trickier by this kind of implementations. The title of the referenced papers were "Simple numeric access",  but it's not particularly simple that instead of just using to_string(), one would always have to check whether given value is in the range where to_string() does not malfunction.

Andrey Semashev

unread,
Nov 8, 2015, 6:43:39 PM11/8/15
to std-dis...@isocpp.org
On 2015-11-09 02:01, u97...@gmail.com wrote:
>
> I'm quite astounded that someone finds nothing wrong in value-to-string
> mapping such as 1e-9 -> "0.000000".

You're missing precision from the picture, while it plays the key role
in the task of representation. First, when you type 1e-9 in your source
code, that textual representation is transformed into double (let's
assume it's a 64-bit IEEE 754 FP number), which is luckily possible
without loss (i.e. the double's precision is enough to hold that
number). Next you convert that double to a string with a given format -
%f, which also has a precision. It so happens that the format is
restricted to 6 digits after the decimal point and thus is not capable
to represent the original number. As a result you get the closest
representable number. There's nothing wrong or new with it, it's how any
conversion works. You will get a similar result if you try to store
1e-2000 into a double - the number will truncate to 0.

> I'll clarify a few point, although
> disagreement will likely remain:
>
> 1. There was no claim that the behaviour mandated by the standard was
> not clearly defined, the argument was that the definition was error
> prone and dubious.

Like I said, FP numbers are tricky (read - error prone). Care must be
taken when working with them, including when you format them. The
to_string function is well defined (not dubious) and has valid uses,
some of which I mentioned in my previous reply.

> 2. I see no reason why floating point values couldn't be represented as
> text in non-lossy way (please feel free to show me wrong). Your job is
> actually easy, it's enough to give one double value that can't be
> presented as text which would map exactly to the original double value
> (let's ignore NaN's etc. for now).

It's not a question of a particular number but rather a reasonable
implementation that is allowed by the standard.

> For example the value 1.23e300 which
> you said "is not representable exactly as a double value": true, but the
> relevant part is that if you have a double defined in the source as
> 1.23e300, converting it to string "1.23e300" e.g. by format "%g" and
> then back to double will give exactly the same double value as the
> original; whether or not the original value can be exactly represented
> as double is irrelevant.

I'm sorry, but no. What you're seeing in your example is just two
computational errors cancelling each other. That's
implementation-specific behavior at best.

While programming math you care exactly of the numbers that your program
operates on. If you write 1.23e300 and expect your program to operate on
that number then you have a bug. You may not care about the precision
error that occurs when your 1.23e300 is converted into double, but then
why do you care when the same error is exposed when you format the
double to a string?

> And "%g" needs less than 10 chars instead of
> over 300 hundred.

Different formats have different pros and cons. At this point you're
basically saying that you like %g more than %f.

> 3. Whether exactly non-lossy or not isn't the main point; values
> presented in the examples are not exceptional corner cases and still
> to_string() fails to convert them reasonably.

Like I said, I find %f a quite reasonable format for diagnostic purposes
(i.e. when a human is supposed to read those numbers). For other
purposes I'd choose a different format, which might be less suited for
humans and offer better other characteristics, such as conciseness and
range of representable values. But frankly, I find the latter cases
diminishingly rare.

Andrey Semashev

unread,
Nov 8, 2015, 6:44:45 PM11/8/15
to std-dis...@isocpp.org
Hmm, I forgot about that flag. :)

Nicol Bolas

unread,
Nov 8, 2015, 8:17:04 PM11/8/15
to ISO C++ Standard - Discussion

I suggest you take that up with pretty much every user of Collada or similar text-based technologies. Whatever your personal feelings on the matter may be, the facts on the ground are that round-tripping is important to many users.

Even if you set formatting flags correctly
(i.e. infinite width and precision, locale-agnostic formatting, etc.),
the formatting and parsing process itself may introduce error. Then
there are denormal FP values, which may or may not be rounded to zero
during any arithmetic operations that are applied during formatting and
parsing.

Yes, perfect round-tripping is not viable.

How about round-tripping that isn't ridiculously broken? It's one thing to have a value that loses a couple of digits of precision. It's quite another to lose the entire value.

We don't need the round trip to guarantee perfect precision reproduction. But that's no excuse for not improving what we have.

It should also be noted that most other languages have much better round-tripping precision than `to_string` provides.
 
> Examples:
> 1.
> |
> std::cout <<std::to_string(1e-9);// output: "0.000000"
> std::cout <<std::to_string(6.67408e-11);// output: "0.000000"
> std::cout <<std::to_string(6.626070040e-34)// output: "0.000000"
> |
>
> All values are certainly different but their string representations are
> identical. As such to_string() essentially wipes out the numeric
> information. It's also worth noting that the resulting string may even
> be longer than a valid and likely the most natural string presentation
> (e.g. "1e-9") even though to_string() has destroyed the information.

Again, the standard defines the format of the strings and it does not
allow for "1e-9" output. It does not make the function work incorrectly
or not useful. Personally, I do find the std::fixed style more readable,
even though it leads to information loss like in the above. That makes
to_string more useful for diagnostic purposes to me.

How useful that is for "diagnostic purposes" rather depends on what you are diagnosing, don't you think? I've been in situations in graphics work where getting only 2 digits of precision in text is not even close to acceptable even as a diagnostic.

Andrey Semashev

unread,
Nov 9, 2015, 4:04:34 AM11/9/15
to std-dis...@isocpp.org
On 2015-11-09 04:17, Nicol Bolas wrote:
>
>
> On Sunday, November 8, 2015 at 4:54:13 PM UTC-5, Andrey Semashev wrote:
>
> I'd say every design that relies on FP<->string<->FP to be a lossless
> roundtrip is likely broken, whether it employs to_string/stod or
> iostreams or C equivalents.
>
>
> I suggest you take that up with pretty much every user of Collada or
> similar text-based technologies. Whatever your personal feelings on the
> matter may be, the facts on the ground are that round-tripping is
> /important/ to many users.

I'm not sure what exactly I have to take to those users. I don't think
the problems I'm pointing out will be a surprise for them.

I understand that the lossless roundtrip is desirable to some, and up to
some amount of error it is possible, given the required precautions are
taken. If you're willing to accept those errors, then go ahead. But you
have to be aware that you're losing something along the way, and that
something may affect the result of your program.

I say this design is likely broken because people (myself included)
typically assume no information loss across a (reliable) network channel
or serialization/deserialization roundtrip, where FP<->string<->FP
convertsions are likely to happen. And that is a fair expectation, IMO -
loss should not happen in those areas.

> How about round-tripping that isn't ridiculously broken? It's one thing
> to have a value that loses a couple of digits of precision. It's quite
> another to lose /the entire value/.

No, it's really not. A loss is a loss, only with %f you have a constant
factor of error and in case of %g you get error that depends on the
value itself. There will always be the case when the representation
looses more than you're willing to accept.

> We don't need the round trip to guarantee perfect precision
> reproduction. But that's no excuse for not /improving/ what we have.

I didn't say I'm against progress. :) What kind of improvement do you
have in mind?

> How useful that is for "diagnostic purposes" rather depends on what you
> are diagnosing, don't you think? I've been in situations in graphics
> work where getting only 2 digits of precision in text is not even close
> to acceptable even as a diagnostic.

Fair enough. If you're dealing with numbers that tend to be very large
or very small, applying %f directly may be suboptimal. Thing is, %g is
also not a very good choice because this format is difficult to
comprehend. Can you immediately tell if 1.23847623e10 is greater than
1.23843754e11? I'd say an improved format would be a %g with a fixed
user-definable e so that the two numbers could be presented as
1.23847623e10 and 12.3843754e10.

Andrey Semashev

unread,
Nov 9, 2015, 4:09:57 AM11/9/15
to std-dis...@isocpp.org
On 2015-11-09 12:04, Andrey Semashev wrote:
>
> I say this design is likely broken because people (myself included)
> typically assume no information loss across a (reliable) network channel
> or serialization/deserialization roundtrip, where FP<->string<->FP
> convertsions are likely to happen. And that is a fair expectation, IMO -
> loss should not happen in those areas.

I meant "serialization/deserialization to a file" above.

Nicol Bolas

unread,
Nov 9, 2015, 8:31:29 AM11/9/15
to ISO C++ Standard - Discussion


On Monday, November 9, 2015 at 4:04:34 AM UTC-5, Andrey Semashev wrote:
On 2015-11-09 04:17, Nicol Bolas wrote:
>
>
> On Sunday, November 8, 2015 at 4:54:13 PM UTC-5, Andrey Semashev wrote:
>
>     I'd say every design that relies on FP<->string<->FP to be a lossless
>     roundtrip is likely broken, whether it employs to_string/stod or
>     iostreams or C equivalents.
>
>
> I suggest you take that up with pretty much every user of Collada or
> similar text-based technologies. Whatever your personal feelings on the
> matter may be, the facts on the ground are that round-tripping is
> /important/ to many users.

I'm not sure what exactly I have to take to those users. I don't think
the problems I'm pointing out will be a surprise for them.

You said such designs are "broken". I pointed out that a huge number of people rely on such designs, and thus by your standards are relying on "broken" designs. Whether you like it or not, other people don't consider them to be "broken".
> How about round-tripping that isn't ridiculously broken? It's one thing
> to have a value that loses a couple of digits of precision. It's quite
> another to lose /the entire value/.

No, it's really not.

Yes, it is. Outputting 0.000000 when I gave it a non-zero number is not acceptable. It is far worse than outputting 0.000001 when I gave it 0.0000008.
 
A loss is a loss, only with %f you have a constant
factor of error

I'm not sure what you mean by "factor of error". The amount of error you lose with `%f` is certainly not constant with respect to the size of the number. If you output 1234500.0f, you will lose no digits of precision. If you output `0.00012345`, you will lose two decimal digits of precision.

That's not "constant".

and in case of %g you get error that depends on the
value itself.

... How does it depend on the value? That's the whole point of %g; it retains X digits of precision, regardless of the input number.

There will always be the case when the representation
looses more than you're willing to accept.

> We don't need the round trip to guarantee perfect precision
> reproduction. But that's no excuse for not /improving/ what we have.

I didn't say I'm against progress. :) What kind of improvement do you
have in mind?

Anything that makes the actual precision loss better. Preferably where the precision loss has a maximum of 1-2 digits of least significance.

> How useful that is for "diagnostic purposes" rather depends on what you
> are diagnosing, don't you think? I've been in situations in graphics
> work where getting only 2 digits of precision in text is not even close
> to acceptable even as a diagnostic.

Fair enough. If you're dealing with numbers that tend to be very large
or very small, applying %f directly may be suboptimal. Thing is, %g is
also not a very good choice because this format is difficult to
comprehend. Can you immediately tell if 1.23847623e10 is greater than
1.23843754e11?

Yes. And even if I were the sort who would say "no", I can still tell which is bigger. It just takes a bit longer.

Consider instead 0.000000 and 0.000000. Can you tell which is bigger? Nobody can, no matter how long you look at it.

Comprehension may take longer with %g, but it is at least possible.
 
I'd say an improved format would be a %g with a fixed
user-definable e so that the two numbers could be presented as
1.23847623e10 and 12.3843754e10.

This goes to the point of having a generic "to_string" function that takes as few arguments as possible. That is, what exactly is the point of such a function?

It's the default case. It's the one that works. It's the one that's quick, simple, effective, doesn't require a lot of fiddling around, and it almost always gives you the right answer.

Outputting 0.000000 for an input value that had many digits of significance is not "working". It is not "effective". And it certainly is not the "right answer".

If a user wants pretty formatted output so that the numbers are all aligned and easy to read, or all have some exponent of precision or something, we have tools for that. `to_string` doesn't need to be that. It needs to work. And it currently does not.

snprintf can be used for special case needs. `to_string` is for when you need the value in a string. And outputting 0.000000 for 0.00000011232 is not putting the value into a string.

Andrey Semashev

unread,
Nov 9, 2015, 9:33:48 AM11/9/15
to std-dis...@isocpp.org
On 2015-11-09 16:31, Nicol Bolas wrote:
>
>
> On Monday, November 9, 2015 at 4:04:34 AM UTC-5, Andrey Semashev wrote:
>
> On 2015-11-09 04:17, Nicol Bolas wrote:
> >
> >
> > On Sunday, November 8, 2015 at 4:54:13 PM UTC-5, Andrey Semashev
> wrote:
> >
> > I'd say every design that relies on FP<->string<->FP to be a
> lossless
> > roundtrip is likely broken, whether it employs to_string/stod or
> > iostreams or C equivalents.
> >
> >
> > I suggest you take that up with pretty much every user of Collada or
> > similar text-based technologies. Whatever your personal feelings
> on the
> > matter may be, the facts on the ground are that round-tripping is
> > /important/ to many users.
>
> I'm not sure what exactly I have to take to those users. I don't think
> the problems I'm pointing out will be a surprise for them.
>
>
> You said such designs are "broken". I pointed out that a huge number of
> people /rely/ on such designs, and thus by your standards are relying on
> "broken" designs. Whether you like it or not, other people don't
> consider them to be "broken".

Well, people do all kinds of crazy stuff, it doesn't make this stuff any
less crazy. :)

> > How about round-tripping that isn't ridiculously broken? It's one
> thing
> > to have a value that loses a couple of digits of precision. It's
> quite
> > another to lose /the entire value/.
>
> No, it's really not.
>
>
> Yes, it is. Outputting 0.000000 when I gave it a non-zero number is not
> acceptable.

Well, we disagree on that. My point is that whatever format you choose,
unless it is 100% perfect, you will always have a case when something
gets rounded and you get not what you want.

> It is far worse than outputting 0.000001 when I gave it
> 0.0000008.

So is the problem in the rounding mode? What if the number is 0.0000001,
is it allowed to be formatted as 0.000000?

> A loss is a loss, only with %f you have a constant
> factor of error
>
>
> I'm not sure what you mean by "factor of error". The amount of error you
> lose with `%f` is certainly not constant with respect to the size of the
> number. If you output 1234500.0f, you will lose no digits of precision.
> If you output `0.00012345`, you will lose two decimal digits of precision.
>
> That's not "constant".

Maybe my wording was poor; English is not my native language, sorry.
What I mean is that whatever number you format with %f you will get a
fixed precision up to 6-th digit after the decimal point. You lose
anything more precise than that. In other words, the represented number
is N±1e-6, and the representation error is constant. That is not the
case with %g, which only keeps a fixed portion of the most significant
digits and looses the rest (thus the amount of error depends on the number).

> Consider instead 0.000000 and 0.000000. Can you tell which is bigger?
> /Nobody/ can, no matter how long you look at it.

I can - the numbers are equal. And yes, this outcome, while probably is
surprising to some, is perfectly logical. The same way '(int)1.2 ==
(int)1.3' holds true because you're comparing not the doubles but ints.

> Comprehension may take longer with %g, but it is at least /possible/.

You will have the same kind of problem with %g if you compare formatted
numbers 1e50+1 and 1e50.

Viacheslav Usov

unread,
Nov 9, 2015, 9:59:45 AM11/9/15
to std-dis...@isocpp.org
On Sun, Nov 8, 2015 at 10:54 PM, Andrey Semashev <andrey....@gmail.com> wrote:

[...]

> I'd say every design that relies on FP<->string<->FP to be a lossless roundtrip is likely broken, whether it employs to_string/stod or iostreams or C equivalents.

Given that just about every floating point unit in the world operates in accordance with IEEE 754 [1], there is no good excuse for not having a standardized textual representation of floating point numbers that is loseless. At the very least, the C++ standard could require an implementation-defined format that is loseless when used within that particular implementation.

It should probably be more straight-forward to extend [lex.fcon] by including a format or formats that can represent either IEEE 754 or implementation-specific numbers exactly. Then standard library converters can be defined in terms of that format or formats.

Cheers,
V.

Nicol Bolas

unread,
Nov 9, 2015, 10:05:52 AM11/9/15
to ISO C++ Standard - Discussion

Nobody is asking for 100% perfect. What we're asking for is something that preserves digits as much as is possible.

Outputting 0.000000 for a non-zero number is not preserving significant digits as much as possible.

> It is far worse than outputting 0.000001 when I gave it
> 0.0000008.

So is the problem in the rounding mode? What if the number is 0.0000001,
is it allowed to be formatted as 0.000000?

>     A loss is a loss, only with %f you have a constant
>     factor of error
>
>
> I'm not sure what you mean by "factor of error". The amount of error you
> lose with `%f` is certainly not constant with respect to the size of the
> number. If you output 1234500.0f, you will lose no digits of precision.
> If you output `0.00012345`, you will lose two decimal digits of precision.
>
> That's not "constant".

Maybe my wording was poor; English is not my native language, sorry.
What I mean is that whatever number you format with %f you will get a
fixed precision up to 6-th digit after the decimal point.

And that might be fine... if I were using a fixed-point number. I am instead using a floating-point number. The entire purpose of which is that it retains a specific number of significant digits, regardless of the location of its digits relative to the decimal point. Obviously within the limits of the numeric precision of the floating point type, of course.

If your `to_string` for floating-point acts like `to_string` for fixed-point, you're doing it wrong.
 
You lose
anything more precise than that. In other words, the represented number
is N±1e-6, and the representation error is constant. That is not the
case with %g, which only keeps a fixed portion of the most significant
digits and looses the rest (thus the amount of error depends on the number).

> Consider instead 0.000000 and 0.000000. Can you tell which is bigger?
> /Nobody/ can, no matter how long you look at it.

I can - the numbers are equal.

But one of them was in fact bigger than the other. Therefore, information has been lost.

Or to say it this way, if the floating point value of x is greater than the floating point value of y, then the to_string representation should preserve that information as much as is reasonable.
 
And yes, this outcome, while probably is
surprising to some, is perfectly logical. The same way '(int)1.2 ==
(int)1.3' holds true because you're comparing not the doubles but ints.

> Comprehension may take longer with %g, but it is at least /possible/.

You will have the same kind of problem with %g if you compare formatted
numbers 1e50+1 and 1e50.

But that's only because you broke floating-point precision. Within the boundaries of floating point precision, %g will print the right answer.

Thus far, the only positive defense you've been able to make for the current behavior is about visual formatting. It looks nicer and more readable to some people.

Nicol Bolas

unread,
Nov 9, 2015, 10:09:37 AM11/9/15
to ISO C++ Standard - Discussion
On Monday, November 9, 2015 at 9:59:45 AM UTC-5, Viacheslav Usov wrote:
On Sun, Nov 8, 2015 at 10:54 PM, Andrey Semashev <andrey....@gmail.com> wrote:

[...]

> I'd say every design that relies on FP<->string<->FP to be a lossless roundtrip is likely broken, whether it employs to_string/stod or iostreams or C equivalents.

Given that just about every floating point unit in the world operates in accordance with IEEE 754 [1], there is no good excuse for not having a standardized textual representation of floating point numbers that is loseless. At the very least, the C++ standard could require an implementation-defined format that is loseless when used within that particular implementation.

No, it is not.

It is not possible to convert every binary IEEE-754 representable number into a decimal version that will convert back to exactly that float. Just as there are many decimal float numbers that do not convert into an exact binary IEEE-754 number.

The only way to do such perfect textual conversion is to turn it into a hex-float. And that's not what we're talking about. We want `to_string` to convert it into a readable decimal number.

Oh, and the C++ standard does not require `float` or any other floating-point type to be IEEE-754.

We're not asking for perfection. We're asking for not-stupidity, not losing precision for arbitrary reasons. Not losing precision where precision does not have to be lost.

Jean-Marc Bourguet

unread,
Nov 9, 2015, 10:24:48 AM11/9/15
to ISO C++ Standard - Discussion
Le lundi 9 novembre 2015 16:09:37 UTC+1, Nicol Bolas a écrit :
On Monday, November 9, 2015 at 9:59:45 AM UTC-5, Viacheslav Usov wrote:
On Sun, Nov 8, 2015 at 10:54 PM, Andrey Semashev <andrey....@gmail.com> wrote:

[...]

> I'd say every design that relies on FP<->string<->FP to be a lossless roundtrip is likely broken, whether it employs to_string/stod or iostreams or C equivalents.

Given that just about every floating point unit in the world operates in accordance with IEEE 754 [1], there is no good excuse for not having a standardized textual representation of floating point numbers that is loseless. At the very least, the C++ standard could require an implementation-defined format that is loseless when used within that particular implementation.

No, it is not.

It is not possible to convert every binary IEEE-754 representable number into a decimal version that will convert back to exactly that float. Just as there are many decimal float numbers that do not convert into an exact binary IEEE-754 number.

Every non-infinite, non-NaN IEEE-754 number has an exact decimal representation.  The algorithm to print the "best" decimal representation (i.e. it will gives 0.1 and not something like 0.099999999987689) which will be read as a FP is even know, see _How to print floating-point numbers accurately_ by G. Steel and J. White, http://dl.acm.org/citation.cfm?id=93559 and I'd surprised if that algorithm has not been improved since.  It does use multi-precision arithmetic, though, but it's already common for stdlib implementation to use multi-precision arithmetic when reading the number.

-- 
Jean-Marc

Viacheslav Usov

unread,
Nov 9, 2015, 10:30:42 AM11/9/15
to std-dis...@isocpp.org
On Mon, Nov 9, 2015 at 4:09 PM, Nicol Bolas <jmck...@gmail.com> wrote:

> It is not possible to convert every binary IEEE-754 representable number into a decimal version that will convert back to exactly that float.

You are disputing a statement that I never made.
 
> And that's not what we're talking about.

You are mistaken, because we are very obviously talking about that.

We're asking for not-stupidity, not losing precision for arbitrary reasons.

Base 10 is entirely arbitrary.

Cheers,
V.

Andrey Semashev

unread,
Nov 9, 2015, 10:31:49 AM11/9/15
to std-dis...@isocpp.org
On 2015-11-09 18:05, Nicol Bolas wrote:
>
> Maybe my wording was poor; English is not my native language, sorry.
> What I mean is that whatever number you format with %f you will get a
> fixed precision up to 6-th digit after the decimal point.
>
>
> And that might be fine... if I were using a /fixed-point/ number. I am
> instead using a /floating-point/ number. The entire purpose of which is
> that it retains a specific number of significant digits, regardless of
> the location of its digits relative to the decimal point. Obviously
> within the limits of the numeric precision of the floating point type,
> of course.
>
> If your `to_string` for floating-point acts like `to_string` for
> fixed-point, /you're doing it wrong/.

The problem is that humans are used to reading numbers in fixed point
format.

> > Consider instead 0.000000 and 0.000000. Can you tell which is
> bigger?
> > /Nobody/ can, no matter how long you look at it.
>
> I can - the numbers are equal.
>
>
> But one of them was in fact bigger than the other. Therefore,
> information has been lost.

Exactly. And it will always be lost as long as FP to string conversion
is lossy.

> Or to say it this way, if the floating point value of x is greater than
> the floating point value of y, then the to_string representation should
> preserve that information as much as is reasonable.

You can't define 'reasonable'. I mean, I find the current %f behavior
reasonable.

> > Comprehension may take longer with %g, but it is at least
> /possible/.
>
> You will have the same kind of problem with %g if you compare formatted
> numbers 1e50+1 and 1e50.
>
>
> But that's only because you broke floating-point precision. Within the
> boundaries of floating point precision, %g will print the right answer.

Sorry, I miscalculated the numbers. Here's the correct example:

#include <cstdio>

int main()
{
double n1 = (1e15) + 1.0;
double n2 = 1e15;

std::printf("%g\n%g\n%d\n", n1, n2, (int)(n1 == n2));

return 0;
}

The output is:

1e+15
1e+15
0

Nicol Bolas

unread,
Nov 9, 2015, 10:53:35 AM11/9/15
to ISO C++ Standard - Discussion
On Monday, November 9, 2015 at 10:30:42 AM UTC-5, Viacheslav Usov wrote:
On Mon, Nov 9, 2015 at 4:09 PM, Nicol Bolas <jmck...@gmail.com> wrote:

> It is not possible to convert every binary IEEE-754 representable number into a decimal version that will convert back to exactly that float.

You are disputing a statement that I never made.
 
> And that's not what we're talking about.

You are mistaken, because we are very obviously talking about that.

No, we aren't. If we just wanted to turn an IEEE-754 number into a string of text to be read by another program and converted back into a float, we wouldn't bother to use decimal at all. We'd use Base64 or whatever to encode the binary bytes directly.

The implicit assumption with `to_string` is that a human ought to be able to read the results. Just like every other float-to-string functionality that other languages provide.
 
We're asking for not-stupidity, not losing precision for arbitrary reasons.

Base 10 is entirely arbitrary.

Then take it up with the world's school systems for teaching everyone base 10. That's what people know, so that's what `to_string` ought to output.

Nicol Bolas

unread,
Nov 9, 2015, 11:16:38 AM11/9/15
to ISO C++ Standard - Discussion

If unnecessarily losing precision in float-to-string conversions is "reasonable" to you, then I would say that your definition of "reasonable" doesn't coincide with those of others.

And there's substantial evidence for that:

Lua:

print(0.000000001)

Yields:

1e-9

Python:

print(0.000000001)

Yields

1e-9

C#:

using System;

public class Test
{
   
public static void Main()
   
{
       
// your code goes here
       
Console.WriteLine("{0}", 0.000000001);
   
}
}

Yields:

1e-9

Java:

import java.io.*;

class Ideone
{
   
public static void main (String[] args) throws java.lang.Exception
   
{
       
System.out.println(Float.toString(0.000000001f));
   
}
}

Yields:

1e-9

C++:

#include <iostream>
#include <string>
using namespace std;
     
int main() {
    cout
<< to_string(0.000000001f) << "\n";
   
return 0;
}

Yields:

0.000000

The rest of the world seems to have decided what "reasonable" means. And it's not what `to_string` does.
 
>      > Comprehension may take longer with %g, but it is at least
>     /possible/.
>
>     You will have the same kind of problem with %g if you compare formatted
>     numbers 1e50+1 and 1e50.
>
>
> But that's only because you broke floating-point precision. Within the
> boundaries of floating point precision, %g will print the right answer.

Sorry, I miscalculated the numbers.

OK, you're right; %g won't always give the right answer.

However, I just checked this for both Lua and Python. And neither of them give the right answer either. It seems that the rest of the world decided that %g's failings were more tolerable than %f's.

Thiago Macieira

unread,
Nov 9, 2015, 12:24:50 PM11/9/15
to std-dis...@isocpp.org
On Monday 09 November 2015 12:04:30 Andrey Semashev wrote:
> Thing is, %g is
> also not a very good choice because this format is difficult

And %g is also insufficient. You want %.19g so that you have enough digits to
make the round-trip conversion lossless.

Then you'll get bug reports from your users that you're writing too many
digits and that some imprecise FP numbers like 1.3 are actually shown as a
very long number close to 1.3 but different from it (same FP number though).

This is not a wild prediction on my part. It has already happened. I changed
the QVariant conversion of double to string to %.19g and we got several bug
reports about the too-precise number. Our conclusion: we'll import a whole new
library into QtCore that is better at converting doubles to string than
snprintf.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358

Thiago Macieira

unread,
Nov 9, 2015, 12:32:30 PM11/9/15
to std-dis...@isocpp.org
On Monday 09 November 2015 15:59:43 Viacheslav Usov wrote:
> Given that just about every floating point unit in the world operates in
> accordance with IEEE 754 [1], there is no good excuse for not having a
> standardized textual representation of floating point numbers that is
> loseless.

Yes, there is. See these bug reports about using a too-precise number:

https://bugreports.qt.io/browse/QTBUG-47371
https://bugreports.qt.io/browse/QTBUG-47192
https://bugreports.qt.io/browse/QTBUG-47575

Those were caused by fixing
https://bugreports.qt.io/browse/QTBUG-42574

which asked for lossless conversion. Once we implemented the lossless
conversion, people started complaining that the FP number they were getting
was not the number they were expecting.

u97...@gmail.com

unread,
Nov 9, 2015, 12:40:28 PM11/9/15
to ISO C++ Standard - Discussion, u97...@gmail.com
@ Nicol Bolas

Don't let Andrey fool you with the nonsense ;) Of course %g will "fail" in the example because it has by default 6 digit precision, using e.g. "%.16g" shows the whole value and it often but not always can avoid showing the redundant digits (see example below)

Example:

#include <cstdio>
#include <iostream>
 
int main()
{
   
char buffer[32];
    sprintf
(buffer, "%f", 123.3);
    std
::cout << buffer << '\n';
    sprintf
(buffer, "%g", 123.3);
    std::cout << buffer << '\n';
    sprintf
(buffer, "%.16g", 123.3);
    std
::cout << buffer << '\n';
    sprintf
(buffer, "%.19g", 123.3);
    std
::cout << buffer << '\n';

   
double n1 = (1e15) + 1.0;
   
double n2 = 1e15;

    std
::printf("%.19g\n%.19g", n1, n2);
}


Output
123.300000
123.3
123.3
123.2999999999999972
1000000000000001
1000000000000000




loic.act...@numericable.fr

unread,
Nov 9, 2015, 12:42:31 PM11/9/15
to Thiago Macieira, std-dis...@isocpp.org
The situation might be different: You are talking about a UI framework, that displays strings to the end user. std::to_string is a more general facility. And I think round-trip is something important in this situation.

---
Loïc



---- Message d'origine ----
De : "Thiago Macieira" <thi...@macieira.org>
À : std-dis...@isocpp.org
Objet : Re: [std-discussion] Deprecate to_string() for floating point types?
Date : 09/11/2015 18:32:20 CET
--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-discussion/.

Jean-Marc Bourguet

unread,
Nov 9, 2015, 12:46:12 PM11/9/15
to ISO C++ Standard - Discussion, thi...@macieira.org
Again, since at least Steele's paper it is know how to give out strings which reads back as the original yet are not surprising to the user. I'd not be surprised if Scheme or Java mandated their use BTW.  And IIRC Lawrence Crawl hinted in the numeric group that he wanted to have such thing normalized as well, I don't remember having seen a paper.

-- 
Jean-Marc

Andrey Semashev

unread,
Nov 9, 2015, 12:46:53 PM11/9/15
to std-dis...@isocpp.org
On 2015-11-09 20:40, u97...@gmail.com wrote:
> @ Nicol Bolas
>
> Don't let Andrey fool you with the nonsense ;)

Fooling anyone is the opposite of what I was trying to do in this
discussion. Oh, well...

Viacheslav Usov

unread,
Nov 9, 2015, 12:48:52 PM11/9/15
to std-dis...@isocpp.org
On Mon, Nov 9, 2015 at 4:53 PM, Nicol Bolas <jmck...@gmail.com> wrote:

> No, we aren't.

I am talking about that. I am part of 'we'. Can you complete this syllogism?

> If we just wanted to turn an IEEE-754 number into a string of text to be read by another program and converted back into a float, we wouldn't bother to use decimal at all. We'd use Base64 or whatever to encode the binary bytes directly.

I would have no problem with that, if it were standardized and portable. The problem is that there is no standard way to represent floating point numbers as literals and strings exactly and portably. Which is extremely weird given that modern hardware de facto uses the same standard for floating point numbers.

> Then take it up with the world's school systems for teaching everyone base 10.

This is non sequitur, because we are not talking about everyone.

> That's what people know, so that's what `to_string` ought to output.

People know a few other things, too. Feel free to apply your reasoning to this premise.

The need to represent floating point literals in C++ exactly is different from the need to represent floating point numbers as taught by the world's school system. And by the way, the subject of this thread is about deprecating to_string, so to_string can happily continue losing precision.

Cheers,
V.

Viacheslav Usov

unread,
Nov 9, 2015, 12:54:33 PM11/9/15
to std-dis...@isocpp.org
On Mon, Nov 9, 2015 at 6:46 PM, Jean-Marc Bourguet <jm.bo...@gmail.com> wrote:
Again, since at least Steele's paper it is know how to give out strings which reads back as the original yet are not surprising to the user. I'd not be surprised if Scheme or Java mandated their use BTW.  And IIRC Lawrence Crawl hinted in the numeric group that he wanted to have such thing normalized as well, I don't remember having seen a paper.

I have not read the paper, but you mentioned certain limitations, such as no infinities or NaNs. There is no reason to have those limitations. Just for the record.

Cheers,
V.

Viacheslav Usov

unread,
Nov 9, 2015, 12:56:53 PM11/9/15
to std-dis...@isocpp.org
On Mon, Nov 9, 2015 at 6:32 PM, Thiago Macieira <thi...@macieira.org> wrote:
On Monday 09 November 2015 15:59:43 Viacheslav Usov wrote:
> Given that just about every floating point unit in the world operates in
> accordance with IEEE 754 [1], there is no good excuse for not having a
> standardized textual representation of floating point numbers that is
> loseless.

Yes, there is. See these bug reports about using a too-precise number:

I did not suggest making a breaking change anywhere.

Cheers,
V.

Nicol Bolas

unread,
Nov 9, 2015, 1:28:52 PM11/9/15
to ISO C++ Standard - Discussion


On Monday, November 9, 2015 at 12:48:52 PM UTC-5, Viacheslav Usov wrote:
On Mon, Nov 9, 2015 at 4:53 PM, Nicol Bolas <jmck...@gmail.com> wrote:
> That's what people know, so that's what `to_string` ought to output.

People know a few other things, too. Feel free to apply your reasoning to this premise.

The need to represent floating point literals in C++ exactly is different from the need to represent floating point numbers as taught by the world's school system.

Right. That's why I'm wondering why you brought it up in a discussion about `to_string`. The purpose of `to_string` is not to guarantee a full round-trip with no loss of precision. The purpose of `to_string` is to be a simple and effective way of making a numerical value printable and readable.
 
And by the way, the subject of this thread is about deprecating to_string, so to_string can happily continue losing precision.

No, the thread is about the problems of `to_string`, with the suggestion to deprecate it in favor of something else as a solution.

Nicol Bolas

unread,
Nov 9, 2015, 1:41:12 PM11/9/15
to ISO C++ Standard - Discussion


On Monday, November 9, 2015 at 12:24:50 PM UTC-5, Thiago Macieira wrote:
On Monday 09 November 2015 12:04:30 Andrey Semashev wrote:
> Thing is, %g is
> also not a very good choice because this format is difficult

And %g is also insufficient. You want %.19g so that you have enough digits to
make the round-trip conversion lossless.

Then you'll get bug reports from your users that you're writing too many
digits and that some imprecise FP numbers like 1.3 are actually shown as a
very long number close to 1.3 but different from it (same FP number though).

This is not a wild prediction on my part. It has already happened. I changed
the QVariant conversion of double to string to %.19g and we got several bug
reports about the too-precise number. Our conclusion: we'll import a whole new
library into QtCore that is better at converting doubles to string than
snprintf.

Hmm... That's unfortunate, but understandable.

So if we want to improve `to_string`, we need to come up with a way to format floats that is compact, while still representing the value.

I think this would best be done by making it a QOI issue, with the standard making it clear that the resulting string from `to_string` should have "at least X significant digits", and that the resulting string should be formatted so that it can be fed back through stof/d/ld.

Matthew Woehlke

unread,
Nov 9, 2015, 2:17:24 PM11/9/15
to std-dis...@isocpp.org
On 2015-11-09 12:24, Thiago Macieira wrote:
> On Monday 09 November 2015 12:04:30 Andrey Semashev wrote:
>> Thing is, %g is
>> also not a very good choice because this format is difficult
>
> And %g is also insufficient. You want %.19g so that you have enough digits to
> make the round-trip conversion lossless.
>
> Then you'll get bug reports from your users that you're writing too many
> digits and that some imprecise FP numbers like 1.3 are actually shown as a
> very long number close to 1.3 but different from it (same FP number though).
>
> This is not a wild prediction on my part. It has already happened. I changed
> the QVariant conversion of double to string to %.19g and we got several bug
> reports about the too-precise number. Our conclusion: we'll import a whole new
> library into QtCore that is better at converting doubles to string than
> snprintf.

Heh. Please make that library available to inkscape, which has the same
problem ;-).

I think what we humans really want is the decimal floating-point (%g)
number that has as few digits as possible while not altering the binary
representation more than 1-2 of the lowermost bits of precision. Note
that that includes preferring "1.5" over "1.500000"; trailing zeros are
not always desirable.

--
Matthew

Thiago Macieira

unread,
Nov 9, 2015, 2:58:34 PM11/9/15
to loic.act...@numericable.fr, std-dis...@isocpp.org
On Monday 09 November 2015 18:42:28 loic.act...@numericable.fr wrote:
> The situation might be different: You are talking about a UI framework, that
> displays strings to the end user. std::to_string is a more general
> facility. And I think round-trip is something important in this situation.

No, I'm not. I'm talking about a change made in QtCore, inside the QVariant
code. There's nothing GUI about that, but it affected user-visible strings.

That's a very good parallel to std::to_string.