On Thursday, October 18, 2018 at 9:04:41 AM UTC-4,
luca.b...@gmail.com wrote:
> Hi Ben!
...
> > > Author says: "Now is when things start to get fun. The largest int is
> > >
2147483647 (2^31–1). If you convert that int to a float it rounds up
> > > to 2147483650f."
> >
> > (That's an odd result. I'd expect
2147483648 but then C offers very few
> > guarantees about exactly what floating point implementations should do.)
> >
>
> The somewhat counter intuitive fact about floating point is that at
> some point you start missing entire units, so the gap between
> subsequent floating point numbers is not less than 1, but starts to be
> higher than 1.
>
> As outlined in the blog post, the next floating point subsequent to
> 1000000000f is not "about" 1000000001, but "about" 1000000064 (!) so
> here the difference between a floating point number and its subsequent
> is about 64. And 1000000000 is a valid 32 bit signed integer!
> That's because IEEE 754 floating point is made by mantissa and
> exponent. If they were implemented by storing integral part and
> fractional part then you'd always have that the difference between
> subsequents is less than 1.
The issue you raise is a valid one, but is unlikely to apply in this
case. What the C standard says about such conversions is "When a value
of integer type is converted to a real floating type, if the value being
converted can be represented exactly in the new type, it is unchanged. If
the value being converted is in the range of values that can be
represented but cannot be represented exactly, the result is either the
nearest higher or nearest lower representable value, chosen in an
implementation-defined manner. ..." (6.3.1.4p2).
In the case you're talking about, 1000000000f == 5^9 * 2^9 ==
1953125*2^9. If FLT_RADIX -- 2 (by far, the most common case), then the
mantissa needs no more than 21 bits to represent 1953125 exactly. IEEE
754 single precision has 22 mantissa bits, with an assumed leading bit
of 1, so that number can be represented exactly. However, 1000000001F
isn't divisible by 2, so it requires a full 30 bits of mantissa to be
represented exactly. The next representable value is 1000000064, as you
say.
However, the number in this case is
2147483647, which is not exactly
representable as an IEEE 754 single precision number. However,
2147483648 is exactly 2^31, and should therefore be exactly
representable if FLT_RADIX is a power of 2, in any floating point format
that conforms to the requirements of the C standard. Therefore, on such
a system, the only values that are permitted by 6.3.1.4p2 for
(float)
2147483647 are 214783648.0 or the representable value just below
that one. An implementation that produces
2147483650 must either have
FLT_RADIX that is not a power of 2 (pretty uncommon, nowadays) or is
non-conforming.