floating-point to/from_chars: halfway values, underspecification, buffer size

302 views
Skip to first unread message

Edward Catmur

unread,
Apr 10, 2017, 5:52:42 AM4/10/17
to ISO C++ Standard - Discussion
Firstly, I want to say that I think to_chars and from_chars are a great addition to the Standard and I look forward to using them in C++17. I have a few questions regarding their behavior on floating point types.

(As background for the first few questions: for each floating-point type there are a (relatively) small number of large integers that are exactly halfway between two adjacent values of that type, and which have a relatively short scientific decimal representation. For example, 1e23 has hexadecimal floating-point representation 0x1.52d02c7e14af68p76, which is exactly halfway between the adjacent IEEE 754 (64-bit) double values 0x1.52d02c7e14af6p76 and 0x1.52d02c7e14af7p76. Parsing the string "1e23" into double using from_chars [utility.from.chars] is required to produce one of those two values.)

Firstly, is from_chars expected to have idempotent behavior, or is it allowed to be dependent on e.g. floating-point environment or the use of 80-bit floating point (32-bit Linux on x86)?

Secondly, is from_chars expected or encouraged to have the same behavior as the compiler? i.e. for double d; auto s = "1e23" should we expect 1e23 == (from_chars(s, s + 4, d), d) ever to fail?

Most importantly, is to_chars permitted to produce an overlong output where the shorter output round-trips on the same implementation but is not guaranteed to do so globally? For example, if an implementation always reads "1e23" as 0x1.52d02c7e14af6p76, is it permitted to output 0x1.52d02c7e14af6p76 as "9.999999999999999e22" on the basis that this is guaranteed to be read correctly by a different implementation that might read "1e23" as 0x1.52d02c7e14af7p76?

In addition, I would be interested in knowing whether the following underspecification is intentional:

Is the result of to_chars() required to represent the closest to the input value among strings of that length that round-trip? For example 0x1.0000000000001p0 is approx. 1.000000000000000222045, so is 1.0000000000000003 an acceptable output from to_chars, or only 1.0000000000000002? Or consider the smallest positive subnormal IEEE double, 0x1p-1074, approx. 4.94e-324 - is 4e-324 an acceptable output, or only 5e-324? (In Florian Loitsch [1], this is the "closeness" property of Grisu3.)

I hope the above questions don't come across as overly pedantic; I would be perfectly satisfied to be told that all of the above are QOI matters, but I'd hope to know what to expect before retiring our current code using Google double-conversion[2].

Finally, it would be useful to know the minimum buffer size necessary to guarantee successful conversion in all cases. I would guess this is something like 4 + numeric_limits<T>::max_digits10 + max(log10(numeric_limits<T>::max_exponent10), 1 + log10(-numeric_limits<T>::min_exponent10)) but it would be useful to have confirmation of this calculation or indeed to have it available in the Standard as a constant.

Thanks!

Bo Persson

unread,
Apr 10, 2017, 7:16:20 AM4/10/17
to std-dis...@isocpp.org
The results are not required to be portable, only to roundtrip on the
same implementation. The standard says:

"The functions that take a floating-point value but not a precision
parameter ensure that the string representation consists of the smallest
number of characters such that there is at least one digit before the
radix point (if present) and parsing the representation using the
corresponding from_chars function recovers value exactly.

[Note: This guarantee applies only if to_chars and from_chars are
executed on the same implementation. —end note ]"


Bo Persson



Edward Catmur

unread,
Apr 10, 2017, 9:05:49 AM4/10/17
to std-dis...@isocpp.org
Thanks. My question (with regard to the round-trip guarantee) was whether a conforming implementation is *permitted* to ensure portability.

In other words, is a conforming implementation permitted to prioritise portability over the shortest-string guarantee? This does not seem clear to me from the quoted text.

Nicol Bolas

unread,
Apr 10, 2017, 9:29:11 AM4/10/17
to ISO C++ Standard - Discussion

It seems clear enough to me: the implementation shall generate the smallest number of characters to permit round-tripping. If the "smallest number of characters" is insufficient to guarantee inter-implementation portability, then inter-implementation portability isn't going to happen.

"Smallest" means smallest.

Edward Catmur

unread,
Apr 10, 2017, 10:51:36 AM4/10/17
to std-dis...@isocpp.org
OK, I see your point. That's a little unfortunate, 

I wonder, would that change if from_chars has nondeterministic behavior or if its behavior is dependent on environment or configuration settings?

Nicol Bolas

unread,
Apr 10, 2017, 11:32:11 AM4/10/17
to ISO C++ Standard - Discussion
On Monday, April 10, 2017 at 10:51:36 AM UTC-4, Edward Catmur wrote:
I wonder, would that change if from_chars has nondeterministic behavior or if its behavior is dependent on environment or configuration settings?

`from_chars` is required to have deterministic behavior (round-tripping), within its implementation. Matters of "environment" or "configuration settings" define what the implementation is. So those are extra-specification matters.

Edward Catmur

unread,
Apr 10, 2017, 5:41:13 PM4/10/17
to std-dis...@isocpp.org


On 10 Apr 2017 16:32, "Nicol Bolas" <jmck...@gmail.com> wrote:
On Monday, April 10, 2017 at 10:51:36 AM UTC-4, Edward Catmur wrote:
I wonder, would that change if from_chars has nondeterministic behavior or if its behavior is dependent on environment or configuration settings?

`from_chars` is required to have deterministic behavior (round-tripping), within its implementation.

I'm not sure I understand. Certainly to_chars followed by from_chars is required to round-trip. But why does it follow that from_chars is required to be deterministic? Surely it is only required to be deterministic on values in the codomain of to_chars?

Matters of "environment" or "configuration settings" define what the implementation is. So those are extra-specification matters.

OK, I see that for compiler flags etc. What about floating point environment; isn't that within the purview of the standard? 

---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-discussion/5iGjnDD61tQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-discussion+unsubscribe@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Richard Smith

unread,
Apr 10, 2017, 7:00:40 PM4/10/17
to std-dis...@isocpp.org
On 10 April 2017 at 14:41, 'Edward Catmur' via ISO C++ Standard - Discussion <std-dis...@isocpp.org> wrote:
On 10 Apr 2017 16:32, "Nicol Bolas" <jmck...@gmail.com> wrote:
On Monday, April 10, 2017 at 10:51:36 AM UTC-4, Edward Catmur wrote:
I wonder, would that change if from_chars has nondeterministic behavior or if its behavior is dependent on environment or configuration settings?

`from_chars` is required to have deterministic behavior (round-tripping), within its implementation.

I'm not sure I understand. Certainly to_chars followed by from_chars is required to round-trip. But why does it follow that from_chars is required to be deterministic? Surely it is only required to be deterministic on values in the codomain of to_chars?

Matters of "environment" or "configuration settings" define what the implementation is. So those are extra-specification matters.

OK, I see that for compiler flags etc. What about floating point environment; isn't that within the purview of the standard?

Well, <cfenv> is part of the C++ standard, but it occupies a somewhat dubious position since a conforming C++ implementation is not required to support "#pragma STDC FENV_ACCESS ON", and without that, the functions to modify the floating-point environment in <cfenv> (might) result in UB.

My reading of the current wording is: to_chars must produce a value that round-trips, regardless of changes to global state between the to_chars call and the from_chars call (even across invocations of the program, as far as I can see). Thus if a particular implementation supports modifying the floating-point environment at runtime, and from_chars depends on the floating-point environment, then to_chars must produce a[1] shortest representation that will produce the correct value regardless of the floating-point environment at the point of the call to from_chars.

I don't know if that matches the intent or not (or even whether this was considered).

 [1]: I see no requirement as to how to choose between multiple shortest representations, nor that this choice even be deterministic.

Nicol Bolas

unread,
Apr 10, 2017, 7:46:21 PM4/10/17
to ISO C++ Standard - Discussion
On Monday, April 10, 2017 at 5:41:13 PM UTC-4, Edward Catmur wrote:
On 10 Apr 2017 16:32, "Nicol Bolas" <jmck...@gmail.com> wrote:
On Monday, April 10, 2017 at 10:51:36 AM UTC-4, Edward Catmur wrote:
I wonder, would that change if from_chars has nondeterministic behavior or if its behavior is dependent on environment or configuration settings?

`from_chars` is required to have deterministic behavior (round-tripping), within its implementation.

I'm not sure I understand. Certainly to_chars followed by from_chars is required to round-trip. But why does it follow that from_chars is required to be deterministic? Surely it is only required to be deterministic on values in the codomain of to_chars?

It depends on what you mean by "deterministic". The specification says that a `to_chars`/`from_chars` loop "recovers `value` exactly". I can't think of something that would be more "deterministic" than that.

After all, the standard doesn't specify the exact representation of floating-point values. So there's no way that it can guarantee the exact behavior when given a particular string, except to say if that it will represent that value and that if it were generated by `to_chars`, you'll get the exact same value back.

Richard Smith

unread,
Apr 10, 2017, 9:25:14 PM4/10/17
to std-dis...@isocpp.org
On 10 April 2017 at 16:46, Nicol Bolas <jmck...@gmail.com> wrote:
On Monday, April 10, 2017 at 5:41:13 PM UTC-4, Edward Catmur wrote:
On 10 Apr 2017 16:32, "Nicol Bolas" <jmck...@gmail.com> wrote:
On Monday, April 10, 2017 at 10:51:36 AM UTC-4, Edward Catmur wrote:
I wonder, would that change if from_chars has nondeterministic behavior or if its behavior is dependent on environment or configuration settings?

`from_chars` is required to have deterministic behavior (round-tripping), within its implementation.

I'm not sure I understand. Certainly to_chars followed by from_chars is required to round-trip. But why does it follow that from_chars is required to be deterministic? Surely it is only required to be deterministic on values in the codomain of to_chars?

It depends on what you mean by "deterministic". The specification says that a `to_chars`/`from_chars` loop "recovers `value` exactly". I can't think of something that would be more "deterministic" than that.

Guaranteeing that from_chars always produces the same sequence of characters from the same floating-point value would be more deterministic than that.

After all, the standard doesn't specify the exact representation of floating-point values. So there's no way that it can guarantee the exact behavior when given a particular string, except to say if that it will represent that value and that if it were generated by `to_chars`, you'll get the exact same value back.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.

Jens Maurer

unread,
Apr 11, 2017, 8:15:42 AM4/11/17
to std-dis...@isocpp.org

Thanks for your questions.

On 04/10/2017 11:52 AM, Edward Catmur wrote:
> Firstly, I want to say that I think to_chars and from_chars are a
> great addition to the Standard and I look forward to using them in
> C++17. I have a few questions regarding their behavior on floating
> point types.

As an initial comment, I'd like to point out that C++ is woefully
underspecified in its floating-point semantics, so there's lots
of room for QoI.

The round-trip guarantees for to_chars / from_chars are for the
same implementation only, because a different implementation might
not even have enough bits in their "double" to represent the
original number.

> Firstly, is from_chars expected to have idempotent behavior, or is it
> allowed to be dependent on e.g. floating-point environment or the use
> of 80-bit floating point (32-bit Linux on x86)?

The question seems to be whether the floating-point environment
is part of the (allowed and unavoidable) implementation divergence
under the hood, or whether it's exposed. Richard makes a good
point here: We do expose the floating-point environment using
<cfenv>, and I don't think there's anything in a string <-> double
conversion that would intrinsically depend on the floating-point
environment (as opposed to, say, floating-point multiplication,
where the rounding mode is taken into consideration).

So, my understanding is that from_chars applied to a given string
should always yield the same "double" value, regardless of floating-
point environment. (That might mean for the implementation to
temporarily switch back to "round to nearest" while parsing the
string.)

(No, that particular question was not considered before.)

Regarding the 80-bit FP on x86 issue, this seems dubious to me to
start with, because storing a "double" computation result in
(e.g. volatile) memory might mean that the double value I read
from that memory doesn't compare equal to the (in-register)
computation result. And of course, it's totally uncontrollable
when the compiler decides to spill some values to memory. So,
arithmetic in 80-bit precision seems ok, but once comparisons
with other doubles happen, it seems least surprising to
truncate / round to 64-bit at that point.

That said, I'm not sure which accidents you envision when
80-bit floating-point numbers are used in from_chars returning
a double.

> Secondly, is from_chars expected or encouraged to have the same
> behavior as the compiler? i.e. for double d; auto s = "1e23" should
> we expect 1e23 == (from_chars(s, s + 4, d), d) ever to fail?

It's desirable, yes, but not prescribed. We also allow
"constexpr" floating-point evaluations (at compile-time)
to yield different results from runtime evaluations of
the same expression.

If you feel that a non-normative note would be helpful,
this could be accommodated, I believe.

> Most importantly, is to_chars permitted to produce an overlong output
> where the shorter output round-trips on the same implementation but
> is not guaranteed to do so globally?

No, it's required to produce the shortest output so that it can
round-trip on the same implementation.

> For example, if an
> implementation always reads "1e23" as 0x1.52d02c7e14af6p76, is it
> permitted to output 0x1.52d02c7e14af6p76 as "9.999999999999999e22" on
> the basis that this is guaranteed to be read correctly by a different
> implementation that might read "1e23" as 0x1.52d02c7e14af7p76?

No, 1e23 is shorter than the 9.99..999e22 number you gave, so 1e23
takes precedence.

Note that your argument is flawed in that another implementation might
use 128-bit decimal floating-point numbers, where "9.999999999999999e22"
is actually a different number than 1e23, so the round-trip guarantee
is violated right then and there.

Should we give non-normative encouragement to round towards zero
for these cases, so that we get practical portability across
64-bit IEEE double platforms?

> In addition, I would be interested in knowing whether the following
> underspecification is intentional:
>
> Is the result of to_chars() required to represent the closest to the
> input value among strings of that length that round-trip? For example
> 0x1.0000000000001p0 is approx. 1.000000000000000222045, so is
> 1.0000000000000003 an acceptable output from to_chars, or only
> 1.0000000000000002? Or consider the smallest positive subnormal IEEE
> double, 0x1p-1074, approx. 4.94e-324 - is 4e-324 an acceptable
> output, or only 5e-324? (In Florian Loitsch [1], this is the
> "closeness" property of Grisu3.)

If the output has a choice between two strings that are both
equally short and would both round-trip to the same (original)
number, there is no specification which one to use. I think
that would be an area where we could improve the specification
normatively by prescribing minimal numeric distance to the "true"
number.

> Finally, it would be useful to know the minimum buffer size necessary
> to guarantee successful conversion in all cases. I would guess this
> is something like 4 + numeric_limits<T>::max_digits10 +
> max(log10(numeric_limits<T>::max_exponent10), 1 +
> log10(-numeric_limits<T>::min_exponent10)) but it would be useful to
> have confirmation of this calculation or indeed to have it available
> in the Standard as a constant.

The calculation seems right to me, but I wouldn't burden the standard
with it. (I do wonder why we have "4 +" at the start, given
that we need to account for the sign, the decimal point, and the
"e" only, which is just 3 characters.)

Jens

Matthew Woehlke

unread,
Apr 11, 2017, 10:12:31 AM4/11/17
to std-dis...@isocpp.org
On 2017-04-11 08:15, Jens Maurer wrote:
> I do wonder why we have "4 +" at the start, given
> that we need to account for the sign, the decimal point, and the
> "e" only, which is just 3 characters.

Trailing NUL?

--
Matthew

Edward Catmur

unread,
Apr 11, 2017, 10:34:39 AM4/11/17
to std-dis...@isocpp.org
On Tue, Apr 11, 2017 at 1:15 PM, Jens Maurer <Jens....@gmx.net> wrote:
> Firstly, is from_chars expected to have idempotent behavior, or is it
> allowed to be dependent on e.g. floating-point environment or the use
> of 80-bit floating point (32-bit Linux on x86)?

The question seems to be whether the floating-point environment
is part of the (allowed and unavoidable) implementation divergence
under the hood, or whether it's exposed.  Richard makes a good
point here: We do expose the floating-point environment using
<cfenv>, and I don't think there's anything in a string <-> double
conversion that would intrinsically depend on the floating-point
environment (as opposed to, say, floating-point multiplication,
where the rounding mode is taken into consideration).

So, my understanding is that from_chars applied to a given string
should always yield the same "double" value, regardless of floating-
point environment.  (That might mean for the implementation to
temporarily switch back to "round to nearest" while parsing the
string.)

(No, that particular question was not considered before.)

Hm. According to https://sourceware.org/bugzilla/show_bug.cgi?id=14518 glibc strtod() respects the rounding mode (and failure to do so was considered a bug). I'm not saying that from_chars would have to behave identically to strtod, but it might be considered odd if it didn't.

That said, I'm not sure which accidents you envision when
80-bit floating-point numbers are used in from_chars returning
a double.

Thinking about it more closely, the values at issue are exactly representable in 80-bit double so any sensible implementation of from_chars wouldn't be affected by double-rounding.

> Secondly, is from_chars expected or encouraged to have the same
> behavior as the compiler? i.e. for double d; auto s = "1e23" should
> we expect 1e23 == (from_chars(s, s + 4, d), d) ever to fail?

It's desirable, yes, but not prescribed.  We also allow
"constexpr" floating-point evaluations (at compile-time)
to yield different results from runtime evaluations of
the same expression.

If you feel that a non-normative note would be helpful,
this could be accommodated, I believe.

I guess we already have [expr.const]/6 and footnote 89 thereto, so the intent is pretty clear.
 
Note that your argument is flawed in that another implementation might
use 128-bit decimal floating-point numbers, where "9.999999999999999e22"
is actually a different number than 1e23, so the round-trip guarantee
is violated right then and there.

Ah, of course. Thanks!
 
Should we give non-normative encouragement to round towards zero
for these cases, so that we get practical portability across
64-bit IEEE double platforms?

Well, I'd prefer round-to-nearest (which breaks ties as round-to-even).
 
I do wonder why we have "4 +" at the start, given
that we need to account for the sign, the decimal point, and the
"e" only, which is just 3 characters.

You need log10(k) + 1 characters to represent an exponent k (e.g. for k = 100 you need 3 characters).

Jens Maurer

unread,
Apr 11, 2017, 2:56:25 PM4/11/17
to std-dis...@isocpp.org
On 04/11/2017 04:34 PM, 'Edward Catmur' via ISO C++ Standard - Discussion wrote:
> On Tue, Apr 11, 2017 at 1:15 PM, Jens Maurer <Jens....@gmx.net <mailto:Jens....@gmx.net>> wrote:

> So, my understanding is that from_chars applied to a given string
> should always yield the same "double" value, regardless of floating-
> point environment. (That might mean for the implementation to
> temporarily switch back to "round to nearest" while parsing the
> string.)
>
> Hm. According to
> https://sourceware.org/bugzilla/show_bug.cgi?id=14518 glibc strtod()
> respects the rounding mode (and failure to do so was considered a
> bug). I'm not saying that from_chars would have to behave identically
> to strtod, but it might be considered odd if it didn't.

Hm. I'd really like to keep the round-trip property irrespective
of the active rounding mode. Doesn't that remove the freedom to
respect the rounding mode in from_chars?

> > Secondly, is from_chars expected or encouraged to have the same
> > behavior as the compiler? i.e. for double d; auto s = "1e23" should
> > we expect 1e23 == (from_chars(s, s + 4, d), d) ever to fail?
>
> It's desirable, yes, but not prescribed. We also allow
> "constexpr" floating-point evaluations (at compile-time)
> to yield different results from runtime evaluations of
> the same expression.
>
> If you feel that a non-normative note would be helpful,
> this could be accommodated, I believe.
>
>
> I guess we already have [expr.const]/6 and footnote 89 thereto, so the intent is pretty clear.

But that's far away from to_chars / from_chars, I'd say.

> Should we give non-normative encouragement to round towards zero
> for these cases, so that we get practical portability across
> 64-bit IEEE double platforms?
>
>
> Well, I'd prefer round-to-nearest (which breaks ties as round-to-even).

... and should we consider the active rounding mode for these cases?

>
> I do wonder why we have "4 +" at the start, given
> that we need to account for the sign, the decimal point, and the
> "e" only, which is just 3 characters.
>
>
> You need log10(k) + 1 characters to represent an exponent k (e.g. for k = 100 you need 3 characters).

Right, thanks.

Jens

Edward Catmur

unread,
Apr 19, 2017, 11:27:38 AM4/19/17
to std-dis...@isocpp.org
On Tue, Apr 11, 2017 at 7:56 PM, Jens Maurer <Jens....@gmx.net> wrote:
On 04/11/2017 04:34 PM, 'Edward Catmur' via ISO C++ Standard - Discussion wrote:
> On Tue, Apr 11, 2017 at 1:15 PM, Jens Maurer <Jens....@gmx.net <mailto:Jens....@gmx.net>> wrote:

>     So, my understanding is that from_chars applied to a given string
>     should always yield the same "double" value, regardless of floating-
>     point environment.  (That might mean for the implementation to
>     temporarily switch back to "round to nearest" while parsing the
>     string.)
>
> Hm. According to
> https://sourceware.org/bugzilla/show_bug.cgi?id=14518 glibc strtod()
> respects the rounding mode (and failure to do so was considered a
> bug). I'm not saying that from_chars would have to behave identically
> to strtod, but it might be considered odd if it didn't.

Hm.  I'd really like to keep the round-trip property irrespective
of the active rounding mode.  Doesn't that remove the freedom to
respect the rounding mode in from_chars?

Yes, unless either:
* to_chars also respects the active rounding mode (that is, it outputs an in-between representation only if that representation rounds to the desired value in the current rounding mode), or
* to_chars never outputs in-between representations (violating the shortness requirement, by some lights).

>     Should we give non-normative encouragement to round towards zero
>     for these cases, so that we get practical portability across
>     64-bit IEEE double platforms?
>
>
> Well, I'd prefer round-to-nearest (which breaks ties as round-to-even).

... and should we consider the active rounding mode for these cases?

Maybe. I'd certainly expect glibc to do so.

Reply all
Reply to author
Forward
0 new messages