Thanks for your questions.
On 04/10/2017 11:52 AM, Edward Catmur wrote:
> Firstly, I want to say that I think to_chars and from_chars are a
> great addition to the Standard and I look forward to using them in
> C++17. I have a few questions regarding their behavior on floating
> point types.
As an initial comment, I'd like to point out that C++ is woefully
underspecified in its floating-point semantics, so there's lots
of room for QoI.
The round-trip guarantees for to_chars / from_chars are for the
same implementation only, because a different implementation might
not even have enough bits in their "double" to represent the
original number.
> Firstly, is from_chars expected to have idempotent behavior, or is it
> allowed to be dependent on e.g. floating-point environment or the use
> of 80-bit floating point (32-bit Linux on x86)?
The question seems to be whether the floating-point environment
is part of the (allowed and unavoidable) implementation divergence
under the hood, or whether it's exposed. Richard makes a good
point here: We do expose the floating-point environment using
<cfenv>, and I don't think there's anything in a string <-> double
conversion that would intrinsically depend on the floating-point
environment (as opposed to, say, floating-point multiplication,
where the rounding mode is taken into consideration).
So, my understanding is that from_chars applied to a given string
should always yield the same "double" value, regardless of floating-
point environment. (That might mean for the implementation to
temporarily switch back to "round to nearest" while parsing the
string.)
(No, that particular question was not considered before.)
Regarding the 80-bit FP on x86 issue, this seems dubious to me to
start with, because storing a "double" computation result in
(e.g. volatile) memory might mean that the double value I read
from that memory doesn't compare equal to the (in-register)
computation result. And of course, it's totally uncontrollable
when the compiler decides to spill some values to memory. So,
arithmetic in 80-bit precision seems ok, but once comparisons
with other doubles happen, it seems least surprising to
truncate / round to 64-bit at that point.
That said, I'm not sure which accidents you envision when
80-bit floating-point numbers are used in from_chars returning
a double.
> Secondly, is from_chars expected or encouraged to have the same
> behavior as the compiler? i.e. for double d; auto s = "1e23" should
> we expect 1e23 == (from_chars(s, s + 4, d), d) ever to fail?
It's desirable, yes, but not prescribed. We also allow
"constexpr" floating-point evaluations (at compile-time)
to yield different results from runtime evaluations of
the same expression.
If you feel that a non-normative note would be helpful,
this could be accommodated, I believe.
> Most importantly, is to_chars permitted to produce an overlong output
> where the shorter output round-trips on the same implementation but
> is not guaranteed to do so globally?
No, it's required to produce the shortest output so that it can
round-trip on the same implementation.
> For example, if an
> implementation always reads "1e23" as 0x1.52d02c7e14af6p76, is it
> permitted to output 0x1.52d02c7e14af6p76 as "9.999999999999999e22" on
> the basis that this is guaranteed to be read correctly by a different
> implementation that might read "1e23" as 0x1.52d02c7e14af7p76?
No, 1e23 is shorter than the 9.99..999e22 number you gave, so 1e23
takes precedence.
Note that your argument is flawed in that another implementation might
use 128-bit decimal floating-point numbers, where "9.999999999999999e22"
is actually a different number than 1e23, so the round-trip guarantee
is violated right then and there.
Should we give non-normative encouragement to round towards zero
for these cases, so that we get practical portability across
64-bit IEEE double platforms?
> In addition, I would be interested in knowing whether the following
> underspecification is intentional:
>
> Is the result of to_chars() required to represent the closest to the
> input value among strings of that length that round-trip? For example
> 0x1.0000000000001p0 is approx. 1.000000000000000222045, so is
> 1.0000000000000003 an acceptable output from to_chars, or only
> 1.0000000000000002? Or consider the smallest positive subnormal IEEE
> double, 0x1p-1074, approx. 4.94e-324 - is 4e-324 an acceptable
> output, or only 5e-324? (In Florian Loitsch [1], this is the
> "closeness" property of Grisu3.)
If the output has a choice between two strings that are both
equally short and would both round-trip to the same (original)
number, there is no specification which one to use. I think
that would be an area where we could improve the specification
normatively by prescribing minimal numeric distance to the "true"
number.
> Finally, it would be useful to know the minimum buffer size necessary
> to guarantee successful conversion in all cases. I would guess this
> is something like 4 + numeric_limits<T>::max_digits10 +
> max(log10(numeric_limits<T>::max_exponent10), 1 +
> log10(-numeric_limits<T>::min_exponent10)) but it would be useful to
> have confirmation of this calculation or indeed to have it available
> in the Standard as a constant.
The calculation seems right to me, but I wouldn't burden the standard
with it. (I do wonder why we have "4 +" at the start, given
that we need to account for the sign, the decimal point, and the
"e" only, which is just 3 characters.)
Jens