Jeffrey Cunningham <
jef...@jkcunningham.com> writes:
> Isn't an "extra density of floats around 0" -- by definition -- a
> non-uniform distribution?
Not in the sense that I mean. Or possibly "yes", but welcome to the
world of floating point: because floating point numbers are a discrete
set, not a continuum, there's no such thing as a continuous uniform
distribution using floats, only particular approximations.
Briefly, a floating point number is represented by sign, mantissa and
exponent. Sign is always +1 or -1. What remains is the mantissa, which
is a representation of the bits after the decimal point in the binary
number 1.xxxxxxxxxxxxxx..., and the exponent, which indicates the power
to which 2 must be raised to get you the number you want. (I am eliding
lots of details here; there are references where you can read them).
So for example, 0.5 is represented as the sign/mantissa/exponent triple
(1, 0, -1); 0.75 is (1, 10000000..._2, -1), 0.875 is (1, 11000000..._2,
-1), and so on. The point here is that the density of representable
floating points /changes/ as the exponent changes: there are as many
machine floats between 0.25 and 0.5 as there are between 0.5 and 1, and
this isn't some kind of transfinite sleight of hand, this is a finite,
countable set.
So, near zero, there are more possible floats than there are near 1.
Obviously, the RNG compensates for that, by having the possible floats
near zero be generated with lower probability than those near one. But
the point is that there is more than one way of performing that
compensation: the appropriate probability mass could be distributed
evenly over all possible floats within an evenly-spaced region, or a
single representative in a region could be selected as an archetype --
and if so, which representative?
SBCL's strategy for generating numbers between 0 and 1 isn't so utterly
stupid as you seem to think; it makes one particular choice, by
selecting floats between 1 and 2 (which does have a uniform density of
representable floats), and then subtracting 1. This has the effect of
emphasizing the probability of generating 0 compared with the floating
point numbers which are in fact representable in the region of 0, but
that effect has potential virtues too (such as preserving the basic
symmetry of the region near 0 and near 1 in the conceptual uniform
distribution).
> I would think this is highly undesirable behavior in a uniform RNG and
> should be corrected.
Tell you what: please specify unambiguously, paying reference to the
hardware representations, the behaviour of the RNG when given a
single-float 1.0 argument, and justify why the specified behaviour is
better than all other behaviours in all circumstances.