[erlang-questions] On OTP rand module difference between OTP 19 and OTP 20

258 views
Skip to first unread message

Kenji Rikitake

unread,
Aug 28, 2017, 9:53:53 PM8/28/17
to Erlang Questions
I finally reviewed OTP rand module change between 19.3 and 20.0.
While most of users won't recognize the changes,
the details are considerably different.

A good news is that if you're not aware of the PRNG algorithm
you can keep the code untouched. The new default algorithm
works ok and even better than the OTP 19 with less bugs.

On the other hand, if you explicitly specify the algorithm in rand:seed/1,
you are suggested to use a newly-available algorithms
described in the rand module manual of OTP 20.0. Actually
all the old (OTP 18 and 19) algorithms are now deprecated.
See rand module manual of OTP 20. Use exrop (default of OTP 20) in most cases;
if you use exs1024 that should be changed to exs1024s;
if you stick with exsplus (default until OTP 19), use exsp instead.
If you use other algorithms, consider converting to exrop.

Also, if your code depends on the output range of
rand:uniform/1 or rand:uniform_s/1, it has been changed as follows:
OTP 18 and 19: 0.0 < X < 1.0
OTP 20: 0.0 =< X < 1.0 (note the =< operator)
where X is the output of the functions.
In short, since OTP 20, the functions may output 0.0.

I noticed the changes shortly after OTP 20.0 release.
These changes were in the last minute (19-MAR-2017 - 26-APR-2017)
just before the OTP 20 release
and after my Erlang and Elixir Factory 2017 trip so I didn't notice.
I was not notified at all by Raimo Niskanen and other contributors
about these changed either,
so I had to take time and chance to review the code again.
I know OTP is open source software and the OTP Team can modify
the code for better features and removing bugs without notifying to anyone,
so nobody is to blame on these changes.
I'm assured that Raimo and the other contributors have done a great job
on changing the details while maintaining the compatibility
and fixing the trivial bugs which I left in the code.

I would also like to note that crypto module also utilizes
the rand module API to simplify access to cryptographically strong
random number module since OTP 20. See crypto:rand_seed/{0,1}
and crypto:rand_uniform/2. This is a good example to make use of
rand module's extensible plugins.

Thanks to Tuncer Ayaz for giving me a chance to review this.

And always remember: if you are still dependent on random module,
migrate to rand module now.

Regards,
Kenji Rikitake

Raimo Niskanen

unread,
Aug 29, 2017, 4:35:41 AM8/29/17
to Erlang Questions
Thank you Kenji for reviewing the changes and summarizing the implications.

Sorry about not getting you into the loop during the rewrite!

As you guessed I was focused on plugging in 'crypto', fixing the flaws,
incorporating the new algorithm from Prof. Vignia and keeping it
as backwards compatible as possible, so I forgot about you...



Regarding the changed uniform float behaviour: it is the functions
rand:uniform/0 and rand:uniform_s/1 this concerns. They were previously
(OTP-19.3) documented to output a value 0.0 < X < 1.0 and are now
(OTP-20.0) documented to return 0.0 =< X < 1.0.

Previously they _could_ return exactly 0.0, even though not stated in the
documentation. But the probability for exactly 0.0 was about 2048 times
less than it is now. This is because the float generation has been fixed
to generate equidistant floats on the form K * 2^-53 so exactly 0.0 has now
got the same probability as all other possible float values.

I'd rather say that the documentation of rand:uniform/0 and rand:uniform_s/1
has been corrected to match their behaviour, and that the probability for
exactly 0.0 has increased from about 1/64 to about 1/53.

Despite this the distribution of generated numbers has actually not changed
- it is still uniform over the range 0.0 =< X < 1.0.

That is my view of the float changes, of which I am fairly certain. :-)


/ Raimo
> _______________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org
> http://erlang.org/mailman/listinfo/erlang-questions


--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Raimo Niskanen

unread,
Aug 29, 2017, 4:46:03 AM8/29/17
to Erlang Questions
On Tue, Aug 29, 2017 at 10:35:28AM +0200, Raimo Niskanen wrote:
> Thank you Kenji for reviewing the changes and summarizing the implications.
>
> Sorry about not getting you into the loop during the rewrite!
>
> As you guessed I was focused on plugging in 'crypto', fixing the flaws,
> incorporating the new algorithm from Prof. Vignia and keeping it
> as backwards compatible as possible, so I forgot about you...
>
>
>
> Regarding the changed uniform float behaviour: it is the functions
> rand:uniform/0 and rand:uniform_s/1 this concerns. They were previously
> (OTP-19.3) documented to output a value 0.0 < X < 1.0 and are now
> (OTP-20.0) documented to return 0.0 =< X < 1.0.
>
> Previously they _could_ return exactly 0.0, even though not stated in the
> documentation. But the probability for exactly 0.0 was about 2048 times
> less than it is now. This is because the float generation has been fixed
> to generate equidistant floats on the form K * 2^-53 so exactly 0.0 has now
> got the same probability as all other possible float values.
>
> I'd rather say that the documentation of rand:uniform/0 and rand:uniform_s/1
> has been corrected to match their behaviour, and that the probability for
> exactly 0.0 has increased from about 1/64 to about 1/53.

Sorry! That should be: from about 1/2^64 to about 1/2^53.

/ Raimo

Raimo Niskanen

unread,
Aug 29, 2017, 7:23:08 AM8/29/17
to erlang-q...@erlang.org
On Tue, Aug 29, 2017 at 10:45:51AM +0200, Raimo Niskanen wrote:
> On Tue, Aug 29, 2017 at 10:35:28AM +0200, Raimo Niskanen wrote:
> > Thank you Kenji for reviewing the changes and summarizing the implications.
> >
> > Sorry about not getting you into the loop during the rewrite!
> >
> > As you guessed I was focused on plugging in 'crypto', fixing the flaws,
> > incorporating the new algorithm from Prof. Vignia and keeping it
> > as backwards compatible as possible, so I forgot about you...
> >
> >
> >
> > Regarding the changed uniform float behaviour: it is the functions
> > rand:uniform/0 and rand:uniform_s/1 this concerns. They were previously
> > (OTP-19.3) documented to output a value 0.0 < X < 1.0 and are now
> > (OTP-20.0) documented to return 0.0 =< X < 1.0.
> >
> > Previously they _could_ return exactly 0.0, even though not stated in the
> > documentation. But the probability for exactly 0.0 was about 2048 times
> > less than it is now. This is because the float generation has been fixed
> > to generate equidistant floats on the form K * 2^-53 so exactly 0.0 has now
> > got the same probability as all other possible float values.
> >
> > I'd rather say that the documentation of rand:uniform/0 and rand:uniform_s/1
> > has been corrected to match their behaviour, and that the probability for
> > exactly 0.0 has increased from about 1/64 to about 1/53.
>
> Sorry! That should be: from about 1/2^64 to about 1/2^53.

And that was for the 64-bit generators. For the _default_ 58-bit generator
the probability for exactly 0.0 has changed from about 1/2^58 to about
1/2^53 i.e increased with a factor 32.

/ Raimo

Richard A. O'Keefe

unread,
Aug 29, 2017, 7:45:14 PM8/29/17
to erlang-q...@erlang.org


On 29/08/17 8:35 PM, Raimo Niskanen wrote:
>
> Regarding the changed uniform float behaviour: it is the functions
> rand:uniform/0 and rand:uniform_s/1 this concerns. They were previously
> (OTP-19.3) documented to output a value 0.0 < X < 1.0 and are now
> (OTP-20.0) documented to return 0.0 =< X < 1.0.

There are applications of random numbers for which it is important
that 0 never be returned. Of course, nothing stops me writing

uniform_nonzero() ->
X = rand:uniform(),
if X > 0.0 -> X
; X =< 0.0 -> uniform_nonzero()
end.

but it would be nice to have this already in the library.

(I know the old library had the same gap. But with such a major
reworking of random number generation, it's time to close it.)

Raimo Niskanen

unread,
Aug 30, 2017, 2:42:21 AM8/30/17
to erlang-q...@erlang.org
On Wed, Aug 30, 2017 at 11:44:57AM +1200, Richard A. O'Keefe wrote:
>
>
> On 29/08/17 8:35 PM, Raimo Niskanen wrote:
> >
> > Regarding the changed uniform float behaviour: it is the functions
> > rand:uniform/0 and rand:uniform_s/1 this concerns. They were previously
> > (OTP-19.3) documented to output a value 0.0 < X < 1.0 and are now
> > (OTP-20.0) documented to return 0.0 =< X < 1.0.
>
> There are applications of random numbers for which it is important
> that 0 never be returned. Of course, nothing stops me writing

What kind of applications? I would like to get a grip on how needed this
function is?

>
> uniform_nonzero() ->
> X = rand:uniform(),
> if X > 0.0 -> X
> ; X =< 0.0 -> uniform_nonzero()
> end.
>
> but it would be nice to have this already in the library.
>
> (I know the old library had the same gap. But with such a major
> reworking of random number generation, it's time to close it.)

We chose 0.0 =< X < 1.0 because it was most like the integer generator i.e
including the lower bound but excluding the upper. And as you say it is
easy to exclude the lower bound.

If we would implement 0.0 < X < 1.0, would then 0.0 =< X =< 1.0 also be
missing, and for completeness 0.0 < X =< 1.0?
Which of the four are worth implementing?

You clould argue that including both bounds is the most general
because it is easy to retry if you want to exclude a value.

Maybe something like:
uniform(Opts) -> float()
uniform_s(Opts, State) -> float()
Opts :: [(exclude_zero | exclude_one)]


>
> _______________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org
> http://erlang.org/mailman/listinfo/erlang-questions

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

zxq9

unread,
Aug 30, 2017, 2:48:39 AM8/30/17
to erlang-q...@erlang.org
On 2017年08月30日 水曜日 08:42:02 Raimo Niskanen wrote:
> On Wed, Aug 30, 2017 at 11:44:57AM +1200, Richard A. O'Keefe wrote:
> >
> >
> > On 29/08/17 8:35 PM, Raimo Niskanen wrote:
> > >
> > > Regarding the changed uniform float behaviour: it is the functions
> > > rand:uniform/0 and rand:uniform_s/1 this concerns. They were previously
> > > (OTP-19.3) documented to output a value 0.0 < X < 1.0 and are now
> > > (OTP-20.0) documented to return 0.0 =< X < 1.0.
> >
> > There are applications of random numbers for which it is important
> > that 0 never be returned. Of course, nothing stops me writing
>
> What kind of applications? I would like to get a grip on how needed this
> function is?

Any function where a zero would propagate.

This can be exactly as bad as accidentally comparing a NULL in SQL.

-Craig

Raimo Niskanen

unread,
Aug 30, 2017, 2:54:38 AM8/30/17
to erlang-q...@erlang.org
On Wed, Aug 30, 2017 at 03:48:16PM +0900, zxq9 wrote:
> On 2017年08月30日 水曜日 08:42:02 Raimo Niskanen wrote:
> > On Wed, Aug 30, 2017 at 11:44:57AM +1200, Richard A. O'Keefe wrote:
> > >
> > >
> > > On 29/08/17 8:35 PM, Raimo Niskanen wrote:
> > > >
> > > > Regarding the changed uniform float behaviour: it is the functions
> > > > rand:uniform/0 and rand:uniform_s/1 this concerns. They were previously
> > > > (OTP-19.3) documented to output a value 0.0 < X < 1.0 and are now
> > > > (OTP-20.0) documented to return 0.0 =< X < 1.0.
> > >
> > > There are applications of random numbers for which it is important
> > > that 0 never be returned. Of course, nothing stops me writing
> >
> > What kind of applications? I would like to get a grip on how needed this
> > function is?
>
> Any function where a zero would propagate.
>
> This can be exactly as bad as accidentally comparing a NULL in SQL.

That's vague for me.

Are you saying it is a common enought use pattern to divide with a
random number? Are there other reasons when a float() =:= 0.0 is fatal?


>
> -Craig

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

zxq9

unread,
Aug 30, 2017, 3:15:29 AM8/30/17
to erlang-q...@erlang.org
On 2017年08月30日 水曜日 08:54:30 Raimo Niskanen wrote:
> On Wed, Aug 30, 2017 at 03:48:16PM +0900, zxq9 wrote:
> > On 2017年08月30日 水曜日 08:42:02 Raimo Niskanen wrote:
> > > On Wed, Aug 30, 2017 at 11:44:57AM +1200, Richard A. O'Keefe wrote:
> > > >
> > > >
> > > > On 29/08/17 8:35 PM, Raimo Niskanen wrote:
> > > > >
> > > > > Regarding the changed uniform float behaviour: it is the functions
> > > > > rand:uniform/0 and rand:uniform_s/1 this concerns. They were previously
> > > > > (OTP-19.3) documented to output a value 0.0 < X < 1.0 and are now
> > > > > (OTP-20.0) documented to return 0.0 =< X < 1.0.
> > > >
> > > > There are applications of random numbers for which it is important
> > > > that 0 never be returned. Of course, nothing stops me writing
> > >
> > > What kind of applications? I would like to get a grip on how needed this
> > > function is?
> >
> > Any function where a zero would propagate.
> >
> > This can be exactly as bad as accidentally comparing a NULL in SQL.
>
> That's vague for me.
>
> Are you saying it is a common enought use pattern to divide with a
> random number? Are there other reasons when a float() =:= 0.0 is fatal?

It is relatively common whenever it is guaranteed to be safe! Otherwise it becomes a guarded expression.

Sure, that is a case of "well, just write it so that it can't do that" -- but the original function spec told us we didn't need to do that, so there is code out there that would rely on not using a factor of 0.0. I've probably written some in game servers, actually.

Propagating the product of multiplication by 0.0 is the more common problem I've seen, by the way, as opposed to division.

Consider: character stat generation in games, offset-by-random-factor calculations where accidentally getting exactly the same result is catastrophic, anti-precision routines in some aiming devices and simulations, adding wiggle to character pathfinding, unstuck() type routines, mutating a value in evolutionary design algorithms, and so on.

Very few of these cases are catastrophic and many would simply be applied again if the initial attempt failed, but a few can be very bad depending on how the system in which they are used is designed. The problem isn't so much that "there aren't many use cases" or "the uses aren't common" as much as the API was originally documented that way, and it has changed for no apparent reason. Zero has a very special place in mathematics and should be treated carefully.

I think ROK would have objected a lot less had the original spec been 0.0 =< X =< 1.0 (which is different from being 0.0 =< X < 1.0; which is another point of potentially dangerous weirdness). I'm curious to see what examples he comes up with. The ones above are just off the top of my head, and like I mentioned most of my personal examples don't happen to be really catastrophic in most cases because many of them involve offsetting from a known value (which would be relatively safe to reuse) or situations where failures are implicitly assumed to provoke retries.

-Craig

Raimo Niskanen

unread,
Aug 30, 2017, 4:29:28 AM8/30/17
to erlang-q...@erlang.org
On Wed, Aug 30, 2017 at 04:14:56PM +0900, zxq9 wrote:
> On 2017年08月30日 水曜日 08:54:30 Raimo Niskanen wrote:
> > On Wed, Aug 30, 2017 at 03:48:16PM +0900, zxq9 wrote:
> > > On 2017年08月30日 水曜日 08:42:02 Raimo Niskanen wrote:
> > > > On Wed, Aug 30, 2017 at 11:44:57AM +1200, Richard A. O'Keefe wrote:
> > > > >
> > > > >
> > > > > On 29/08/17 8:35 PM, Raimo Niskanen wrote:
> > > > > >
> > > > > > Regarding the changed uniform float behaviour: it is the functions
> > > > > > rand:uniform/0 and rand:uniform_s/1 this concerns. They were previously
> > > > > > (OTP-19.3) documented to output a value 0.0 < X < 1.0 and are now
> > > > > > (OTP-20.0) documented to return 0.0 =< X < 1.0.
> > > > >
> > > > > There are applications of random numbers for which it is important
> > > > > that 0 never be returned. Of course, nothing stops me writing
> > > >
> > > > What kind of applications? I would like to get a grip on how needed this
> > > > function is?
> > >
> > > Any function where a zero would propagate.
> > >
> > > This can be exactly as bad as accidentally comparing a NULL in SQL.
> >
> > That's vague for me.
> >
> > Are you saying it is a common enought use pattern to divide with a
> > random number? Are there other reasons when a float() =:= 0.0 is fatal?
>
> It is relatively common whenever it is guaranteed to be safe! Otherwise it becomes a guarded expression.
>
> Sure, that is a case of "well, just write it so that it can't do that" -- but the original function spec told us we didn't need to do that, so there is code out there that would rely on not using a factor of 0.0. I've probably written some in game servers, actually.
>
> Propagating the product of multiplication by 0.0 is the more common problem I've seen, by the way, as opposed to division.
>
> Consider: character stat generation in games, offset-by-random-factor calculations where accidentally getting exactly the same result is catastrophic, anti-precision routines in some aiming devices and simulations, adding wiggle to character pathfinding, unstuck() type routines, mutating a value in evolutionary design algorithms, and so on.
>
> Very few of these cases are catastrophic and many would simply be applied again if the initial attempt failed, but a few can be very bad depending on how the system in which they are used is designed. The problem isn't so much that "there aren't many use cases" or "the uses aren't common" as much as the API was originally documented that way, and it has changed for no apparent reason. Zero has a very special place in mathematics and should be treated carefully.

The spec did not match the reality. Either had to be corrected.
It is in general safer to change the documentation to match the reality.

So I do not agree that the spec changed for no apparent reason.

Furthermore Java's Random:nextFloat(), Python's random.random() and
Ruby's Random.rand all generate in the same interval:
http://docs.oracle.com/javase/6/docs/api/java/util/Random.html#nextFloat()
https://docs.python.org/3/library/random.html#random.random
http://ruby-doc.org/core-2.0.0/Random.html#method-i-rand

I think this all boils down to the fact that digital floating point values
(IEEE 754) has limited precision and in the interval 0.0 to 1.0 are better
regarded as 53 bit fixed point values.

A half open interval matches integer random number generators that also
in general use half open intervals.

With half open intervals you can generate numbers in [0.0,1.0) and other
numbers in [1.0,2.0), where the number 1.0 belongs to only one of these intervals.

This I think is a good default behaviour.


>
> I think ROK would have objected a lot less had the original spec been 0.0 =< X =< 1.0 (which is different from being 0.0 =< X < 1.0; which is another point of potentially dangerous weirdness). I'm curious to see what examples he comes up with. The ones above are just off the top of my head, and like I mentioned most of my personal examples don't happen to be really catastrophic in most cases because many of them involve offsetting from a known value (which would be relatively safe to reuse) or situations where failures are implicitly assumed to provoke retries.
>
> -Craig

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

zxq9

unread,
Aug 30, 2017, 6:33:35 AM8/30/17
to erlang-q...@erlang.org
On 2017年08月30日 水曜日 10:29:12 Raimo Niskanen wrote:

> It is in general safer to change the documentation to match the reality.

Wow.

Richard A. O'Keefe

unread,
Aug 30, 2017, 6:34:36 PM8/30/17
to erlang-q...@erlang.org


On 30/08/17 6:42 PM, Raimo Niskanen wrote:
> On Wed, Aug 30, 2017 at 11:44:57AM +1200, Richard A. O'Keefe wrote:

>> There are applications of random numbers for which it is important
>> that 0 never be returned. Of course, nothing stops me writing
>
> What kind of applications? I would like to get a grip on how needed this
> function is?

I'll give you just one example. There are two ways I know to generate
normally distributed random numbers. One of them goes like this:

sqrt(-2 * ln(randnz()) * cos(pi * random())

where random() is in [0,1) but randnz() must be in (0,1).

OK, I'll give you another example. This is part of an algorithm for
generating gamma variates, one of the best known.

U <- random()
if U <= r then
z <- -ln(U/r)
else
z <- ln(random()/lambda)
end

You will notice that both of the calls to ln will go wrong if
random() can return 0.

These aren't the only examples, but I have an appointment.

Raimo Niskanen

unread,
Aug 31, 2017, 8:30:20 AM8/31/17
to erlang-q...@erlang.org
On Thu, Aug 31, 2017 at 10:34:16AM +1200, Richard A. O'Keefe wrote:
>
>
> On 30/08/17 6:42 PM, Raimo Niskanen wrote:
> > On Wed, Aug 30, 2017 at 11:44:57AM +1200, Richard A. O'Keefe wrote:
>
> >> There are applications of random numbers for which it is important
> >> that 0 never be returned. Of course, nothing stops me writing
> >
> > What kind of applications? I would like to get a grip on how needed this
> > function is?
>
> I'll give you just one example. There are two ways I know to generate
> normally distributed random numbers. One of them goes like this:
>
> sqrt(-2 * ln(randnz()) * cos(pi * random())
>
> where random() is in [0,1) but randnz() must be in (0,1).
>
> OK, I'll give you another example. This is part of an algorithm for
> generating gamma variates, one of the best known.
>
> U <- random()
> if U <= r then
> z <- -ln(U/r)
> else
> z <- ln(random()/lambda)
> end
>
> You will notice that both of the calls to ln will go wrong if
> random() can return 0.
>
> These aren't the only examples, but I have an appointment.

Thank you!

Should I make a pull request of this?

https://github.com/erlang/otp/compare/OTP-20.0...RaimoNiskanen:raimo/stdlib/rand-uniformNZ

Is the name uniformNZ good enough?
Are uniform floats complete enough with this addition?

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

Loïc Hoguin

unread,
Aug 31, 2017, 9:42:26 AM8/31/17
to zxq9, erlang-q...@erlang.org
On 08/30/2017 12:33 PM, zxq9 wrote:
> On 2017年08月30日 水曜日 10:29:12 Raimo Niskanen wrote:
>
>> It is in general safer to change the documentation to match the reality.
>
> Wow.

I certainly hope this is not the general policy for OTP. We program
against the documentation. The documentation *is* our reality.

It also seems it's not even listed in the release notes. We program
against the documentation, if the documentation has a breaking changes
it would be great to know about it.

--
Loïc Hoguin
https://ninenines.eu

Jesper Louis Andersen

unread,
Aug 31, 2017, 10:21:18 AM8/31/17
to Loïc Hoguin, zxq9, erlang-q...@erlang.org
On Thu, Aug 31, 2017 at 3:42 PM Loïc Hoguin <es...@ninenines.eu> wrote:

I certainly hope this is not the general policy for OTP. We program
against the documentation. The documentation *is* our reality.


I think it is fair to evaluate on a case by case basis. Some times, the documentation and the implementation are not matching up. This means either the documentation or the implementation is wrong (not xor here!). Which of which is wrong depends a bit on the case, and there are definitely borderline situations where it is very hard to determine which way you should let the thing fall.

I don't think you can make blanket statements on which way you should lean because there are good counterexamples in both "camps" so to speak.

Another view is that the documentation is the specification. But again, both the specification and the implementation can be wrong and some times the correct operation is to change the specification. When I worked with formal semantics, it was quite common that you altered the specification in ways that let you prove a meta-theoretic property about the specification. Not altering it would simply make the proof way to complicated and hard. Perhaps even impossible. It is an extreme variant of letting the documentation match the reality in a certain sense.

Loïc Hoguin

unread,
Aug 31, 2017, 10:57:35 AM8/31/17
to Jesper Louis Andersen, zxq9, erlang-q...@erlang.org
On 08/31/2017 04:20 PM, Jesper Louis Andersen wrote:
> On Thu, Aug 31, 2017 at 3:42 PM Loïc Hoguin <es...@ninenines.eu
> <mailto:es...@ninenines.eu>> wrote:
>
>
> I certainly hope this is not the general policy for OTP. We program
> against the documentation. The documentation *is* our reality.
>
>
> I think it is fair to evaluate on a case by case basis.

I'm not arguing against that, but rather against that it's "in general
safer to change the documentation".

It should be a case by case basis but it's also important to recognize
that users write software against the documentation, and to take this
into account when making breaking changes.

To give an example, if such a thing were to happen in Cowboy, which
follows semver, two cases could happen:

* The documentation is wrong but the next version is a patch release:
fix the code to match the documentation. The rule is: don't break
people's programs.

* The documentation is wrong but the next version is a major release:
fix the documentation AND announce it as a breaking change (with all
details; and probably release a patch release for the old version as
above). The rule is: breaking people's programs is OK, just make sure
you tell them about it!

> I don't think you can make blanket statements on which way you should
> lean because there are good counterexamples in both "camps" so to speak.

Properly matching people's expectations is a lot more important than
whatever counterexamples may exist.

Attila Rajmund Nohl

unread,
Aug 31, 2017, 11:13:45 AM8/31/17
to Loïc Hoguin, erlang-q...@erlang.org
2017-08-31 15:42 GMT+02:00 Loïc Hoguin <es...@ninenines.eu>:
> On 08/30/2017 12:33 PM, zxq9 wrote:
>>
>> On 2017年08月30日 水曜日 10:29:12 Raimo Niskanen wrote:
>>
>>> It is in general safer to change the documentation to match the reality.
>>
>>
>> Wow.
>
>
> I certainly hope this is not the general policy for OTP. We program against
> the documentation. The documentation *is* our reality.

I disagree. Take this example:
https://lwn.net/SubscriberLink/732420/9b9f8f2825f1877f/ The printk()
function in the Linux kernel was documented to print new logs to new
lines unless the KERN_CONT option was passed. In reality it didn't
always started new lines and people expected (maybe even relied on)
this - and when the code was updated to match the documentation, they
were genuinely surprised when their code was broken.

zxq9

unread,
Aug 31, 2017, 11:20:05 AM8/31/17
to erlang-q...@erlang.org

The other route is to make existing functions do what they say they are going to whenever possible, add functions that provide the prescribed functionality, and deprecate and annotate (with warnings where appropriate) the ones that cannot provide whatever they originally claimed to. And be quite noisy about all of this.

OTP has many, many examples of this. It prevents surprise breakage of old code that depends on some particular (and occasionally peculiar) behavior while forging a path ahead -- allowing users to make an informed decision to review and update old code or stick with an older version of the runtime (which tends to be the more costly choice in many cases, but at least it can be an informed decision).

Consider what happened with now/0, for example. Now we have a more complex family of time functions but never was it viewed as an acceptable approach to simply shift documentation around a bit here and there in a rather quiet manner while adding in contextual execution features (that is to say, hidden states) that would cause now/0 to behave in a new way. And now/0 is deprecated.

> I think it is fair to evaluate on a case by case basis.

OK. I'll buy that.

In an EXTREMELY limited number of cases you will have a function that simply cannot live up to its spec without a ridiculous amount of nitpicky work that wouldn't really matter to anyone. This is not one of those cases. And in this case we are talking about providing a largely pure API in the standard library, not some meta behavior that acts indirectly through a rewrite system based on some proofing mechanics where the effects of improper definitions are magnified with each transformation.

So I get what you're saying, but this is not one of those cases, and for those odd cases it is much safer to deprecate functions, mark them as unsafe, provide compiler warnings and so on if the situation is just THAT BAD, and write a new function that is properly documented in a way that won't suddenly change later. For a functional language's standard library the majority of functions aren't going to be magically tricky, and specs are concrete promises while implementations ephemeral.

At least this change happened in a major release, not a minor one. If it is forgivable anywhere, it is in a major release. The tricky bit is that the promises a language's standard libs make to authors are a bit more sticky than those made by separate libraries provided in a given language. And yes, that is at least as much part of the social contract inherent in the human part of the programming world as it is a part of the technical contract implicit in published documentation. The social part of the contract is more important, from what I've seen. Consider why Ruby and many previously popular JS frameworks are considered to be cancer now -- its not just because things changed, it is that the way they changed jerked people around.

The issue I am addressing is a LOT more important than whether `0 =< X < 1.0`, of course (yeah, on this one issue, we'll figure it out). It is a general attitude that is absolutely dangerous.

>> It is in general safer to change the documentation to match the reality.

This is as corrosive a statement as can be. We need to think very carefully about that before this sort of thinking starts becoming common in other areas of OTP in general.

-Craig

Loïc Hoguin

unread,
Aug 31, 2017, 11:27:28 AM8/31/17
to Attila Rajmund Nohl, erlang-q...@erlang.org
On 08/31/2017 05:13 PM, Attila Rajmund Nohl wrote:
> 2017-08-31 15:42 GMT+02:00 Loïc Hoguin <es...@ninenines.eu>:
>> On 08/30/2017 12:33 PM, zxq9 wrote:
>>>
>>> On 2017年08月30日 水曜日 10:29:12 Raimo Niskanen wrote:
>>>
>>>> It is in general safer to change the documentation to match the reality.
>>>
>>>
>>> Wow.
>>
>>
>> I certainly hope this is not the general policy for OTP. We program against
>> the documentation. The documentation *is* our reality.
>
> I disagree. Take this example:
> https://lwn.net/SubscriberLink/732420/9b9f8f2825f1877f/ The printk()
> function in the Linux kernel was documented to print new logs to new
> lines unless the KERN_CONT option was passed. In reality it didn't
> always started new lines and people expected (maybe even relied on)
> this - and when the code was updated to match the documentation, they
> were genuinely surprised when their code was broken.

This story is not about people following the documentation and then have
the documentation be "fixed" under their feet without them noticing, it
is in fact the complete opposite.

--
Loïc Hoguin
https://ninenines.eu

zxq9

unread,
Aug 31, 2017, 11:29:05 AM8/31/17
to erlang-q...@erlang.org
On 2017年09月01日 金曜日 00:19:26 zxq9 wrote:
> `0 =< X < 1.0`

DHOH! Yes, I know, I know...

0.0 =< X < 1.0

/(>.<)\

Attila Rajmund Nohl

unread,
Aug 31, 2017, 11:32:54 AM8/31/17
to Loïc Hoguin, erlang-q...@erlang.org
2017-08-31 17:27 GMT+02:00 Loïc Hoguin <es...@ninenines.eu>:
> On 08/31/2017 05:13 PM, Attila Rajmund Nohl wrote:
>>
>> 2017-08-31 15:42 GMT+02:00 Loïc Hoguin <es...@ninenines.eu>:
[...]

>>> I certainly hope this is not the general policy for OTP. We program
>>> against
>>> the documentation. The documentation *is* our reality.
>>
>>
>> I disagree. Take this example:
>> https://lwn.net/SubscriberLink/732420/9b9f8f2825f1877f/ The printk()
>> function in the Linux kernel was documented to print new logs to new
>> lines unless the KERN_CONT option was passed. In reality it didn't
>> always started new lines and people expected (maybe even relied on)
>> this - and when the code was updated to match the documentation, they
>> were genuinely surprised when their code was broken.
>
>
> This story is not about people following the documentation and then have the
> documentation be "fixed" under their feet without them noticing, it is in
> fact the complete opposite.

The moral of the story: people are programming against
behavior/implementation, not documentation. In these cases fixing the
implementation instead of the documentation has very real possibility
of breaking existing programs. Of course, one can tell its users that
"it's your fault you haven't followed the documentation!" but it
doesn't necessarily make those users happy...

Loïc Hoguin

unread,
Aug 31, 2017, 11:35:54 AM8/31/17
to Attila Rajmund Nohl, erlang-q...@erlang.org
On 08/31/2017 05:32 PM, Attila Rajmund Nohl wrote:
> 2017-08-31 17:27 GMT+02:00 Loïc Hoguin <es...@ninenines.eu>:
>> On 08/31/2017 05:13 PM, Attila Rajmund Nohl wrote:
>>>
>>> 2017-08-31 15:42 GMT+02:00 Loïc Hoguin <es...@ninenines.eu>:
> [...]
>>>> I certainly hope this is not the general policy for OTP. We program
>>>> against
>>>> the documentation. The documentation *is* our reality.
>>>
>>>
>>> I disagree. Take this example:
>>> https://lwn.net/SubscriberLink/732420/9b9f8f2825f1877f/ The printk()
>>> function in the Linux kernel was documented to print new logs to new
>>> lines unless the KERN_CONT option was passed. In reality it didn't
>>> always started new lines and people expected (maybe even relied on)
>>> this - and when the code was updated to match the documentation, they
>>> were genuinely surprised when their code was broken.
>>
>>
>> This story is not about people following the documentation and then have the
>> documentation be "fixed" under their feet without them noticing, it is in
>> fact the complete opposite.
>
> The moral of the story: people are programming against
> behavior/implementation, not documentation. In these cases fixing the
> implementation instead of the documentation has very real possibility
> of breaking existing programs. Of course, one can tell its users that
> "it's your fault you haven't followed the documentation!" but it
> doesn't necessarily make those users happy...

Maybe in the Linux kernel. Outside, where there is such a thing as
documentation (comments are not documentation), if the code behaves
differently than the documentation, you open a ticket... And in that
case, yes, for a limited time, you will program against the behavior and
not against the documentation. But it's the exception, not the rule.

--
Loïc Hoguin
https://ninenines.eu

zxq9

unread,
Aug 31, 2017, 11:53:05 AM8/31/17
to erlang-q...@erlang.org
On 2017年08月31日 木曜日 17:32:44 Attila Rajmund Nohl wrote:
> 2017-08-31 17:27 GMT+02:00 Loïc Hoguin <es...@ninenines.eu>:
> > On 08/31/2017 05:13 PM, Attila Rajmund Nohl wrote:
> >>
> >> 2017-08-31 15:42 GMT+02:00 Loïc Hoguin <es...@ninenines.eu>:
> [...]
> >>> I certainly hope this is not the general policy for OTP. We program
> >>> against
> >>> the documentation. The documentation *is* our reality.
> >>
> >>
> >> I disagree. Take this example:
> >> https://lwn.net/SubscriberLink/732420/9b9f8f2825f1877f/ The printk()
> >> function in the Linux kernel was documented to print new logs to new
> >> lines unless the KERN_CONT option was passed. In reality it didn't
> >> always started new lines and people expected (maybe even relied on)
> >> this - and when the code was updated to match the documentation, they
> >> were genuinely surprised when their code was broken.
> >
> >
> > This story is not about people following the documentation and then have the
> > documentation be "fixed" under their feet without them noticing, it is in
> > fact the complete opposite.
>
> The moral of the story: people are programming against
> behavior/implementation, not documentation. In these cases fixing the
> implementation instead of the documentation has very real possibility
> of breaking existing programs. Of course, one can tell its users that
> "it's your fault you haven't followed the documentation!" but it
> doesn't necessarily make those users happy...

There was once a boy who always rode his bike on the right side of the streets in his neighborhood. Sure, the signs all said "keep left" but, well, everyone just ignores the signs where he lives.

One day a new sign was in its place that said "keep right".

Now what should he do?

-Craig

zxq9

unread,
Aug 31, 2017, 11:58:24 AM8/31/17
to erlang-q...@erlang.org
On 2017年08月31日 木曜日 17:32:44 you wrote:
> The moral of the story: people are programming against
> behavior/implementation, not documentation. In these cases fixing the
> implementation instead of the documentation has very real possibility
> of breaking existing programs. Of course, one can tell its users that
> "it's your fault you haven't followed the documentation!" but it
> doesn't necessarily make those users happy...

There was once a boy who always rode his bike on the right side of the streets in his neighborhood. Sure, the signs all said "keep left" but, well, everyone just ignores the signs where he lives.

One day a new sign was in its place that said "keep right".

Now what should he do?

-Craig

Michael Truog

unread,
Aug 31, 2017, 2:35:37 PM8/31/17
to erlang-q...@erlang.org
As I argued in the original pull request for these recent 20.0 random number changes, a uniform distribution is much more intuitive if it is inclusive: [0,1]

For example, if you are dealing with probabilities, it is simpler to think in percentages from 0.00 to 1.00

An example from the python documentation is at https://docs.python.org/3/library/random.html#random.uniform though they have ambiguity about the highest value due to a rounding problem they have.

I have had my own dependency for uniform as [0,1] at https://github.com/okeuday/quickrand/blob/fc5e21ec70ee94dd4ce1c5ee02b55ceea03f9008/src/quickrand.erl#L294-L309 so I have been working around this absence. Though I would assume other people would benefit from the addition of a [0,1] function in Erlang/OTP.

Best Regards,
Michael

Fred Hebert

unread,
Aug 31, 2017, 4:30:53 PM8/31/17
to Loïc Hoguin, erlang-q...@erlang.org
On 08/31, Loïc Hoguin wrote:
>Maybe in the Linux kernel. Outside, where there is such a thing as
>documentation (comments are not documentation), if the code behaves
>differently than the documentation, you open a ticket... And in that
>case, yes, for a limited time, you will program against the behavior
>and not against the documentation. But it's the exception, not the
>rule.
>

I think 'it depends' truly *is* the best way to go about it. Let's see a
few examples:

- A function does what is in the doc, but also does a bit more at the
same time. Do you fix by removing the additional functionality people
may now rely on, or by updating the doc to match the implementation?
- The documentation specifies that by sending the atom 'tsl1.3' you can
set up a TLS 1.3 connection, but the implementation only accepts
'tsl1.3' and crashes on 'tsl1.3'. Do you not update the documentation
for what was a typo, or do you add support for 'tsl1.3' as a
parameter? If anybody relied on that behaviour, they relied on the
code not working!
- A function for socket handling returns values such as `{error, emfile
| enfile | econnrefused}`. Someone finds out that the syscalls it
relays data to also may return `{error, eperm | eaccess}` on top of
what was specified before. Do you swallow the errors and mask them, or
update the docs? Or is it not just a bug in the docs?
- A function's doc says it sorts in a stable manner but it does not.
Which one should you change? There seems to be no winner on this one.
- A function logs information while it operates on data, but the
documentation makes no reference to it. Someone working with it in a
specific environment has issues with that behaviour.*

There's plenty of cases where the doc and the implementation may be
wrong individually. Either as a mistake on either sides, by omission, or
through a bug. Usually you have to take a pragmatic approach wondering
which of the fixes will give the lowest conflict or impact to people
using the code.

Now is the case of the random behaviour similar to specifying a bad type
boundary similar to "not supporting all the errors", to a breach of a
well-established contract, etc.? That's a really hard question, but
saying either "the doc always wins" or "the code always wins"
unequivocally sounds like a real bad policy to me.

Regards,
Fred.

* In adding shell history to Erlang, we got tripped up on disk_log doing
just that. We added a new option to force it to be silent when needed,
for example, so both the code and the doc required a fix!

Loïc Hoguin

unread,
Aug 31, 2017, 5:57:25 PM8/31/17
to Fred Hebert, erlang-q...@erlang.org
On 08/31/2017 10:30 PM, Fred Hebert wrote:
> On 08/31, Loïc Hoguin wrote:
>> Maybe in the Linux kernel. Outside, where there is such a thing as
>> documentation (comments are not documentation), if the code behaves
>> differently than the documentation, you open a ticket... And in that
>> case, yes, for a limited time, you will program against the behavior
>> and not against the documentation. But it's the exception, not the rule.
>>
>
> I think 'it depends' truly *is* the best way to go about it.

Well it's a good thing I agreed with this in an email sent almost 6
hours ago then. :-)

--
Loïc Hoguin
https://ninenines.eu

Richard A. O'Keefe

unread,
Aug 31, 2017, 9:11:45 PM8/31/17
to erlang-q...@erlang.org
Sometimes people program against the documentation.
Sometimes people program against the behaviour.
Sometimes people program against their expectations
and the hell with the docs and the code.

In fact you are never going to please everyone.
My case study:
The Quintus Prolog documentation had always said
"do not add or remove clauses of an interpreted
predicate while it is running; the behaviour is
not defined."
In major release 3, as part of implementing modules,
we defined the behaviour to be "every call acts as if
it made a snapshot of the predicate on entry and used
that snapshot; changes to the predicate will affect
later calls but not earlier ones".
One customer complained that we had broken his code.
We not only pointed out that the manual had warned
against what he was doing, I rewrote his code to
run 700 times faster. (I kid you not.)

Of course we lost that customer.

I wish there was a blanket rule I could recommend.
There is one, of course, and that is "test before you
release". In this particular case, it's quite hard
to test because the bad outcome is extremely rare.
(Although I do have a photocopy of the library of an
old programming language where the Gaussian random
number generator could not possibly have worked,
because 0 turned up once every 2^16 calls.)

There is another rule, which is that changes to the
documentation and changes to the code both need to
be clearly communicated in release nodes, and as a
user, it's my job to *read* the release notes.

The irony here, of course, is that Erlang copied as183
from Prolog, and I wrote the Prolog version of the
Wichmann-Hill 3-cycle generator, and it doesn't take
any care to avoid 0.0, and that's one reason why I was
glad to see it replaced...

Richard A. O'Keefe

unread,
Aug 31, 2017, 10:58:05 PM8/31/17
to erlang-q...@erlang.org


On 1/09/17 6:35 AM, Michael Truog wrote:
> As I argued in the original pull request for these recent 20.0 random
> number changes, a uniform distribution is much more intuitive if it is
> inclusive: [0,1]

Intuitive? For integers, I'll grant that. For reals? Not so much.
I certainly cannot grant "MUCH more intuitive".
I've had occasion to do (random next * 2 - 1) arcTanh, which of
course breaks down if *either* 0 or 1 is returned by (random next).

> For example, if you are dealing with probabilities, it is simpler to
> think in percentages from 0.00 to 1.00

Actually, when I'm dealing with probabilities, I never think
about them as percentages. Now the interesting thing here
is this. Suppose you want to get a true [false] outcome
with probability p [1-p]. Then random next < p does the
job perfectly, but ONLY if 1 is excluded.

The trick of generating a random integer from 1 to N by
doing (in C): (int)(random() * N) + 1 can of course give
you N+1 if random() can return 1.0, and this is a thing I very
often do. (Yes, if 0.0 is excluded, the probability of getting
1 is very slightly skewed, but it's _very_ slightly.)

> An example from the python documentation is at
> https://docs.python.org/3/library/random.html#random.uniform though they
> have ambiguity about the highest value due to a rounding problem they have.

Oh, the bit where they say "The end-point value b may or may not be
included in the range." Worst of both worlds. You cannot rely on it
being included and you cannot rely on it being excluded.

Let's face it, the usual expectation is that a uniform random number
generator will return a value in the half-open range [0,1).

I have uses for (0.0, 1.0).
Michael Truog has uses for [0.0,1.0], although I wasn't able to tell
from a quick scan of his code what they are.

I could personally live with a warning in the documentation that says
that the random number generator could return 0.0, and here's a little
loop you might use to avoid that, and another suggestion in the code
about how to get the result Michael Truog wants.

I just want it to be obvious that it's dangerous to assume that the
result will not be 0.

By the way, given that a common way to make random floats is to
generate a bitvector, consider
(0 to: 15) collect: [:each | ((each / 15) * 256) truncated].
You will notice that the spacing between the values is *almost*
uniform, but not at the end.

Richard A. O'Keefe

unread,
Aug 31, 2017, 11:40:14 PM8/31/17
to erlang-q...@erlang.org


On 1/09/17 8:30 AM, Fred Hebert wrote:
> I think 'it depends' truly *is* the best way to go about it.

+1. The universal expert's answer. (The other is,
"some do, some don't.") I'm going to nitpick gently.

> - A function does what is in the doc, but also does a bit more at the
> same time. Do you fix by removing the additional functionality people
> may now rely on, or by updating the doc to match the implementation?

There is no pressing need to do either.
Perhaps adding a note to the documentation that
"In some releases this function may accept additional
arguments and appear to do sensible things. This is
subject to change."
is all that is required.

> - The documentation specifies that by sending the atom 'tsl1.3' you can
> set up a TLS 1.3 connection, but the implementation only accepts
> 'tsl1.3' and crashes on 'tsl1.3'. Do you not update the documentation
> for what was a typo, or do you add support for 'tsl1.3' as a
> parameter?

You do both, and you add a note to the documentation that
"Earlier releases incorrectly accepted 'tsl1.3' instead of
'tls1.3'. This will be corrected in release N+2."

> - A function for socket handling returns values such as `{error, emfile
> | enfile | econnrefused}`. Someone finds out that the syscalls it
> relays data to also may return `{error, eperm | eaccess}` on top of
> what was specified before. Do you swallow the errors and mask them, or
> update the docs? Or is it not just a bug in the docs?

This one requires serious digging to find out. I've been there,
though not with sockets. Discovering that the C program you wrote
for "UNIX" version x reports an error in "UNIX" version y that
doesn't even exist in version x is a pain. When I was working at
Quintus I spent weeks trawling through Unix manuals carefully
checking the error returns of every system call we used, and tried
to figure out what to do for each. My aim was to classify errors
as - the program should not have done this (EINVAL &c)
- the resource doesn't exist (ENOENT, ECHILD, ...) when it should
or does exist (EEXIST) when it shouldn't
- the resource named does exist but is the wrong kind of resource
(ENOTDIR, EISDIR, ENOTBLK, ...)
- the program doesn't have permission to do this (EPERM, ...)
- the resource is busy (EBUSY, EDEADLK, ...)
- some resource ran out (ENOMEM, ENFILE, EMFILE, EFBIG, ENOSPC,...)
- something went wrong in the system (ENETDOWN, EHOSTDOWN, ...)
I eventually had to stop, but did write a little script using
the C preprocessor to dump all the macros obtained by including
<errno.h> and find all the E[A-Z0-9]+ ones, and check them against
the list of known ones. It was a lot of work that I hoped POSIX
would render unnecessary in the future. It didn't.

So this raises the quality-of-documentation issue, "Why did we ever
document this as only raising these specific errors?"

> - A function's doc says it sorts in a stable manner but it does not.
> Which one should you change? There seems to be no winner on this one.

There is a very clear winner: fix the code. When quicksort was
invented, it was known to do more comparisons than merge sort.
Hoare's problem was that he needed to sort a bunch of numbers on
a machine with 256 words of data memory, so that (a) he *really*
couldn't afford extra memory, and (b) comparisons were cheap.
Sedgewick's PhD thesis (which I did read once) did a very thorough
examination of quicksort performance and several variations on it,
BUT assumed that a comparison cost 1/4 as much as a memory reference.
Most comparisons of quicksort vs mergesort I've seen were misleading
because they compared numbers rather than strings or richer data
and because they compared a relatively smart quicksort implementation
against a pretty dump merge sort. You can, for example,
- do *optimal* sorting for small subfiles
- not only ping-pong between the two arrays but run alternating
passes backwards (this gets a bit more leverage out of the cache)
- do 4-way merge instead of 2-way merge (does as many comparisons
but is nicer to the memory subsystem)
and you can use a variant of the natural merge like samsort to
exploit existing order.

If you want an efficient algorithm with minimal space overhead,
there's a variant of heapsort that gets NlgN + O(N) comparisons
worst case, which beats quicksort.

> - A function logs information while it operates on data, but the
> documentation makes no reference to it. Someone working with it in a
> specific environment has issues with that behaviour.*

The documentation and the code both have to be changed.
The logging has to be made conditional, and the documentation
has to mention it.
>
> * In adding shell history to Erlang, we got tripped up on disk_log doing
> just that. We added a new option to force it to be silent when needed,
> for example, so both the code and the doc required a fix!

Ah, so I agreed with you on that one. For the 2nd half of my concurrent
programming course, I have strongly recommended LYSE, so it's nice to
find myself on your side...

Richard A. O'Keefe

unread,
Sep 1, 2017, 12:57:38 AM9/1/17
to erlang-q...@erlang.org
For what it's worth, here is an extract from the R documentation.

runif(n, min = 0, max = 1) # min and max have defaults
...
'runif' will not generate either of the extreme values unless
'max = min' or 'max-min' is small compared to 'min',
and in particular not for the default arguments.

Michael Truog

unread,
Sep 1, 2017, 1:29:46 AM9/1/17
to Richard A. O'Keefe, erlang-q...@erlang.org
On 08/31/2017 07:57 PM, Richard A. O'Keefe wrote:
>
> On 1/09/17 6:35 AM, Michael Truog wrote:
>> As I argued in the original pull request for these recent 20.0 random
>> number changes, a uniform distribution is much more intuitive if it is
>> inclusive: [0,1]
>
> Intuitive? For integers, I'll grant that. For reals? Not so much.
> I certainly cannot grant "MUCH more intuitive".
> I've had occasion to do (random next * 2 - 1) arcTanh, which of
> course breaks down if *either* 0 or 1 is returned by (random next).

A uniform distribution should be uniformly distributed. I understand the woes of floating-point prevent perfect uniform distribution, but we could at least try to pay attention to the limits involved, and if we did, that would make the idea much more intuitive.

My belief is that the [0,1) distribution is the most common because it is the easiest to implement given the IEEE floating point standard format. However, I would also like to be proven wrong, to have more faith in the current situation.

>> For example, if you are dealing with probabilities, it is simpler to
>> think in percentages from 0.00 to 1.00
>
> Actually, when I'm dealing with probabilities, I never think
> about them as percentages. Now the interesting thing here
> is this. Suppose you want to get a true [false] outcome
> with probability p [1-p]. Then random next < p does the
> job perfectly, but ONLY if 1 is excluded.

I see this as much simpler when it is possible to have random =< p , not that it matters much in this context, only when things get more complex.

>
>
> The trick of generating a random integer from 1 to N by
> doing (in C): (int)(random() * N) + 1 can of course give
> you N+1 if random() can return 1.0, and this is a thing I very
> often do. (Yes, if 0.0 is excluded, the probability of getting
> 1 is very slightly skewed, but it's _very_ slightly.)
>
>> An example from the python documentation is at
>> https://docs.python.org/3/library/random.html#random.uniform though they
>> have ambiguity about the highest value due to a rounding problem they have.
>
> Oh, the bit where they say "The end-point value b may or may not be
> included in the range." Worst of both worlds. You cannot rely on it
> being included and you cannot rely on it being excluded.
>
> Let's face it, the usual expectation is that a uniform random number
> generator will return a value in the half-open range [0,1).
>
> I have uses for (0.0, 1.0).
> Michael Truog has uses for [0.0,1.0], although I wasn't able to tell
> from a quick scan of his code what they are.

I have some examples that can make this desire a bit clearer:

https://github.com/CloudI/cloudi_core/blob/a1c10a02245f0f4284d701a2ee5f07aad17f6e51/src/cloudi_core_i_runtime_testing.erl#L139-L149

% use Box-Muller transformation to generate Gaussian noise
% (G. E. P. Box and Mervin E. Muller,
% A Note on the Generation of Random Normal Deviates,
% The Annals of Mathematical Statistics (1958),
% Vol. 29, No. 2 pp. 610–611)
X1 = random(),
X2 = PI2 * random(),
K = StdDev * math:sqrt(-2.0 * math:log(X1)),
Result1 = erlang:max(erlang:round(Mean + K * math:cos(X2)), 1),
Result2 = erlang:max(erlang:round(Mean + K * math:sin(X2)), 1),
sleep(Result2),


https://github.com/CloudI/cloudi_core/blob/a1c10a02245f0f4284d701a2ee5f07aad17f6e51/src/cloudi_core_i_runtime_testing.erl#L204-L210

X = random(),
if
X =< Percent ->
erlang:exit(monkey_chaos);
true ->
ok
end,

with:
random() ->
quickrand:strong_float().

These are code segments used for the CloudI service configuration options monkey_latency and monkey_chaos so that normal distribution latency values and random service deaths can occur, respectively (with the more common names as Latency Monkey and Chaos Monkey, but the words switched to make the concepts easier to find and associate). For the Box-Muller transformation, it really does want a definite range [0,1] and it helps make the monkey_chaos service death easier to understand at a glance.


>
> I could personally live with a warning in the documentation that says
> that the random number generator could return 0.0, and here's a little
> loop you might use to avoid that, and another suggestion in the code
> about how to get the result Michael Truog wants.
>
> I just want it to be obvious that it's dangerous to assume that the
> result will not be 0.
>
> By the way, given that a common way to make random floats is to
> generate a bitvector, consider
> (0 to: 15) collect: [:each | ((each / 15) * 256) truncated].
> You will notice that the spacing between the values is *almost*
> uniform, but not at the end.
>
I agree, but I still think the use of the word uniform here is better suited to the extremes. We know it is IEEE floating-point, so we know it is inexact.

Raimo Niskanen

unread,
Sep 1, 2017, 3:58:30 AM9/1/17
to erlang-q...@erlang.org
On Thu, Aug 31, 2017 at 03:42:06PM +0200, Loïc Hoguin wrote:
> On 08/30/2017 12:33 PM, zxq9 wrote:
> > On 2017年08月30日 水曜日 10:29:12 Raimo Niskanen wrote:
> >
> >> It is in general safer to change the documentation to match the reality.
> >
> > Wow.
>
> I certainly hope this is not the general policy for OTP. We program
> against the documentation. The documentation *is* our reality.
>
> It also seems it's not even listed in the release notes. We program
> against the documentation, if the documentation has a breaking changes
> it would be great to know about it.

I had no idea that statement would be so flammable. :-)

I simply wanted to point out that from the point of view of a developer of
a mature product like Erlang/OTP it is has happened too many times that
a subtle behaviour change breaks something for a customer.

And that is something that programmers writing new code often do not
appreciate since they simply want the libraries to be "right" where it is
a very reasonable view that the documentation defines what is "right".

I also realize that in this particular case to stop returning 0.0 from
rand:uniform() would also have been a safe choice since that would be
almost impossible to detect and almost certainly cause no harm.

And no, I did not state an OTP policy. We decide from case to case.

>
> --
> Loïc Hoguin
> https://ninenines.eu

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

Raimo Niskanen

unread,
Sep 1, 2017, 4:41:28 AM9/1/17
to erlang-q...@erlang.org
On Thu, Aug 31, 2017 at 10:29:34PM -0700, Michael Truog wrote:
> On 08/31/2017 07:57 PM, Richard A. O'Keefe wrote:
> >
> > On 1/09/17 6:35 AM, Michael Truog wrote:
> >> As I argued in the original pull request for these recent 20.0 random
> >> number changes, a uniform distribution is much more intuitive if it is
> >> inclusive: [0,1]
> >
> > Intuitive? For integers, I'll grant that. For reals? Not so much.
> > I certainly cannot grant "MUCH more intuitive".
> > I've had occasion to do (random next * 2 - 1) arcTanh, which of
> > course breaks down if *either* 0 or 1 is returned by (random next).
>
> A uniform distribution should be uniformly distributed. I understand the woes of floating-point prevent perfect uniform distribution, but we could at least try to pay attention to the limits involved, and if we did, that would make the idea much more intuitive.

If I try to be philosophical, picking a random number in the range
0.0 to 1.0 of real numbers, the probability of getting a number exactly 0.0
(or exactly 1.0) is infinitely low. Therefore the range (0.0,1.0) is more
natural.

>
> My belief is that the [0,1) distribution is the most common because it is the easiest to implement given the IEEE floating point standard format. However, I would also like to be proven wrong, to have more faith in the current situation.

I think that is very possible.

We can not forget the fact that digital floating point numbers will always
be some kind of integer values in disguise.

:


>
> I have some examples that can make this desire a bit clearer:
>
> https://github.com/CloudI/cloudi_core/blob/a1c10a02245f0f4284d701a2ee5f07aad17f6e51/src/cloudi_core_i_runtime_testing.erl#L139-L149
>
> % use Box-Muller transformation to generate Gaussian noise
> % (G. E. P. Box and Mervin E. Muller,
> % A Note on the Generation of Random Normal Deviates,
> % The Annals of Mathematical Statistics (1958),
> % Vol. 29, No. 2 pp. 610–611)
> X1 = random(),
> X2 = PI2 * random(),
> K = StdDev * math:sqrt(-2.0 * math:log(X1)),

math:log(X1) will badarith if X1 =:= 0.0. You need a generator for X1
that does not return 0.0, just as RO'K says.

> Result1 = erlang:max(erlang:round(Mean + K * math:cos(X2)), 1),
> Result2 = erlang:max(erlang:round(Mean + K * math:sin(X2)), 1),

If random() for X2 is in [0.0,1.0] then both 0.0 and 1.0 will produce the
same value after math:cos(X2) or math:sin(X2), which I am convinced will
bias the result since that particular value will have twice the probability
compared to all other values. I think you should use a generator for X2
that only can return one of the endpoints.

Actually, it seems a generator for (0.0,1.0] would be more appropriate
here...

> sleep(Result2),
>
>
> https://github.com/CloudI/cloudi_core/blob/a1c10a02245f0f4284d701a2ee5f07aad17f6e51/src/cloudi_core_i_runtime_testing.erl#L204-L210
>
> X = random(),
> if
> X =< Percent ->
> erlang:exit(monkey_chaos);
> true ->
> ok
> end,

In this kind of code, I think that (when thinking integers, since we are
talking about integers in disguise) half open intervals are more correct.

The interval [0.0,0.1] contains say N+1 numbers, the interval [0.0,0.2]
contains 2*N+1 nubers so subtracting the first interval from the second
would get the interval (1.0,2.0) which have N numbers. So you get a bias
because you include both endpoints.

In this case I believe more in a generator that gives [0.0,1.0) and the
test X < Percent, since that is what I would have written using integers to
avoid off-by-one errors.

>
> with:
> random() ->
> quickrand:strong_float().
>
> These are code segments used for the CloudI service configuration options monkey_latency and monkey_chaos so that normal distribution latency values and random service deaths can occur, respectively (with the more common names as Latency Monkey and Chaos Monkey, but the words switched to make the concepts easier to find and associate). For the Box-Muller transformation, it really does want a definite range [0,1] and it helps make the monkey_chaos service death easier to understand at a glance.

Please explain why the Box-Muller transformation needs a definite range
[0.0,1.0].

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

Raimo Niskanen

unread,
Sep 1, 2017, 4:50:06 AM9/1/17
to erlang-q...@erlang.org
On Fri, Sep 01, 2017 at 02:57:49PM +1200, Richard A. O'Keefe wrote:
:
>
> I could personally live with a warning in the documentation that says
> that the random number generator could return 0.0, and here's a little
> loop you might use to avoid that, and another suggestion in the code
> about how to get the result Michael Truog wants.
>
> I just want it to be obvious that it's dangerous to assume that the
> result will not be 0.

That can surely merit a warning, even though the interval is documented.

>
> By the way, given that a common way to make random floats is to
> generate a bitvector, consider
> (0 to: 15) collect: [:each | ((each / 15) * 256) truncated].
> You will notice that the spacing between the values is *almost*
> uniform, but not at the end.

That sounds interesting but I do not understand. Is that Elixir code?

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

Raimo Niskanen

unread,
Sep 1, 2017, 4:54:47 AM9/1/17
to erlang-q...@erlang.org
On Thu, Aug 31, 2017 at 10:29:34PM -0700, Michael Truog wrote:
:

>
> I have some examples that can make this desire a bit clearer:
>
> https://github.com/CloudI/cloudi_core/blob/a1c10a02245f0f4284d701a2ee5f07aad17f6e51/src/cloudi_core_i_runtime_testing.erl#L139-L149
>
> % use Box-Muller transformation to generate Gaussian noise
> % (G. E. P. Box and Mervin E. Muller,
> % A Note on the Generation of Random Normal Deviates,
> % The Annals of Mathematical Statistics (1958),
> % Vol. 29, No. 2 pp. 610–611)
> X1 = random(),
> X2 = PI2 * random(),
> K = StdDev * math:sqrt(-2.0 * math:log(X1)),
> Result1 = erlang:max(erlang:round(Mean + K * math:cos(X2)), 1),
> Result2 = erlang:max(erlang:round(Mean + K * math:sin(X2)), 1),
> sleep(Result2),

Why not use rand:normal/3?

It uses the Ziggurat Method and is supposed to be much faster and
numerically more stable than the basic Box-Muller method.

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

Hugo Mills

unread,
Sep 1, 2017, 5:36:48 AM9/1/17
to erlang-q...@erlang.org
On Fri, Sep 01, 2017 at 10:41:15AM +0200, Raimo Niskanen wrote:
> On Thu, Aug 31, 2017 at 10:29:34PM -0700, Michael Truog wrote:
> > On 08/31/2017 07:57 PM, Richard A. O'Keefe wrote:
> > >
> > > On 1/09/17 6:35 AM, Michael Truog wrote:
> > >> As I argued in the original pull request for these recent 20.0 random
> > >> number changes, a uniform distribution is much more intuitive if it is
> > >> inclusive: [0,1]
> > >
> > > Intuitive? For integers, I'll grant that. For reals? Not so much.
> > > I certainly cannot grant "MUCH more intuitive".
> > > I've had occasion to do (random next * 2 - 1) arcTanh, which of
> > > course breaks down if *either* 0 or 1 is returned by (random next).
> >
> > A uniform distribution should be uniformly distributed. I understand the woes of floating-point prevent perfect uniform distribution, but we could at least try to pay attention to the limits involved, and if we did, that would make the idea much more intuitive.
>
> If I try to be philosophical, picking a random number in the range
> 0.0 to 1.0 of real numbers, the probability of getting a number exactly 0.0
> (or exactly 1.0) is infinitely low. Therefore the range (0.0,1.0) is more
> natural.

Mathematically, there's a distinction. What you've just described
is that in a random variable over the interval [0.0, 1.0], 0.0 and 1.0
happen *almost never* (which is a very specific technical term), and
that values in the open interval (0.0, 1.0) occur *almost surely*.

Being discrete, the computer implementation based on floating point
numbers ensures that the probability of getting 0.0 or 1.0 in that
case is measurably non-zero, whereas in the ideal version over the
reals, above, it is infinitesimally small. In that distinction lie
most of the problems that people are talking about here, I think.

> > My belief is that the [0,1) distribution is the most common
> > because it is the easiest to implement given the IEEE floating
> > point standard format. However, I would also like to be proven
> > wrong, to have more faith in the current situation.

> I think that is very possible.

From my relatively limited practical experience, either I've wanted
[0, 1) or I don't care. Example:

Red = int(random() * 256)

where I don't want the value 256, because it's out of range for my
8-bit graphics mode, but I do want the probability of 255 to be the
same as every other value. So I want [0, 1) as my range.

Alternatively:

P = random(),
if
P =< 0.3 -> ...;
P =< 0.7 -> ...;
P > 0.7 -> ...
end

where, in general, I don't care if I could get 0.0 or 1.0 or not,
because the differences are immeasurably small for all practical
purposes.

I think it's clear to me that _several_ functions are needed for
different cases, with fully-closed, fully-open and half-open
intervals. IMO, the fully-closed and half-open are probably the most
useful (and, modulo any floating-point issues which I'm not qualified
to talk about, [0,1) can be turned into (0,1] with
1-random_halfopen()).

With a fully-closed interval, it should be possible to write
helpers for generating the other three by simply calling
random_closed() again if you get an undesirable end-point. You can't
easily extend the range of the half-open or open intervals to give you
the closed ones. So I'd say at minimum, there should be a function
giving the closed interval.

Whether the "test and retry" approach is the best implementation
or not is a matter for discussion, as is the question of whether all
or some of these functions should be in the standard lib, or they are
expected to be hacked together on an as-needed basis.

Hugo.

--
Hugo Mills | Anyone who says their system is completely secure
hugo@... carfax.org.uk | understands neither systems nor security.
http://carfax.org.uk/ |
PGP: E2AB1DE4 | Bruce Schneier
signature.asc

Michael Truog

unread,
Sep 1, 2017, 6:53:52 AM9/1/17
to erlang-q...@erlang.org
On 09/01/2017 01:41 AM, Raimo Niskanen wrote:
>> I have some examples that can make this desire a bit clearer:
>>
>> https://github.com/CloudI/cloudi_core/blob/a1c10a02245f0f4284d701a2ee5f07aad17f6e51/src/cloudi_core_i_runtime_testing.erl#L139-L149
>>
>> % use Box-Muller transformation to generate Gaussian noise
>> % (G. E. P. Box and Mervin E. Muller,
>> % A Note on the Generation of Random Normal Deviates,
>> % The Annals of Mathematical Statistics (1958),
>> % Vol. 29, No. 2 pp. 610–611)
>> X1 = random(),
>> X2 = PI2 * random(),
>> K = StdDev * math:sqrt(-2.0 * math:log(X1)),
> math:log(X1) will badarith if X1 =:= 0.0. You need a generator for X1
> that does not return 0.0, just as RO'K says.
>
>> Result1 = erlang:max(erlang:round(Mean + K * math:cos(X2)), 1),
>> Result2 = erlang:max(erlang:round(Mean + K * math:sin(X2)), 1),
> If random() for X2 is in [0.0,1.0] then both 0.0 and 1.0 will produce the
> same value after math:cos(X2) or math:sin(X2), which I am convinced will
> bias the result since that particular value will have twice the probability
> compared to all other values. I think you should use a generator for X2
> that only can return one of the endpoints.
>
> Actually, it seems a generator for (0.0,1.0] would be more appropriate
> here...
>
>> sleep(Result2),
>>
>> with:
>> random() ->
>> quickrand:strong_float().
>>
>> These are code segments used for the CloudI service configuration options monkey_latency and monkey_chaos so that normal distribution latency values and random service deaths can occur, respectively (with the more common names as Latency Monkey and Chaos Monkey, but the words switched to make the concepts easier to find and associate). For the Box-Muller transformation, it really does want a definite range [0,1] and it helps make the monkey_chaos service death easier to understand at a glance.
> Please explain why the Box-Muller transformation needs a definite range
> [0.0,1.0].

That was my understanding after not having modified that routine for a decent amount of time, though I must be mistaken. I will need to fix this source code and regret not seeing these problems in the Box-Muller transformation source code. Thank you for pointing them out. At least this shows a need for a (0.0,1.0] function.

Thanks,
Michael

Michael Truog

unread,
Sep 1, 2017, 7:01:16 AM9/1/17
to erlang-q...@erlang.org
On 09/01/2017 01:54 AM, Raimo Niskanen wrote:
> On Thu, Aug 31, 2017 at 10:29:34PM -0700, Michael Truog wrote:
> :
>> I have some examples that can make this desire a bit clearer:
>>
>> https://github.com/CloudI/cloudi_core/blob/a1c10a02245f0f4284d701a2ee5f07aad17f6e51/src/cloudi_core_i_runtime_testing.erl#L139-L149
>>
>> % use Box-Muller transformation to generate Gaussian noise
>> % (G. E. P. Box and Mervin E. Muller,
>> % A Note on the Generation of Random Normal Deviates,
>> % The Annals of Mathematical Statistics (1958),
>> % Vol. 29, No. 2 pp. 610–611)
>> X1 = random(),
>> X2 = PI2 * random(),
>> K = StdDev * math:sqrt(-2.0 * math:log(X1)),
>> Result1 = erlang:max(erlang:round(Mean + K * math:cos(X2)), 1),
>> Result2 = erlang:max(erlang:round(Mean + K * math:sin(X2)), 1),
>> sleep(Result2),
> Why not use rand:normal/3?
>
> It uses the Ziggurat Method and is supposed to be much faster and
> numerically more stable than the basic Box-Muller method.
>
The Box-Muller is simpler and producing 2 results instead of 1 . I believe I looked at the source code for rand:normal/3 and expected the Box-Muller to be faster only because it creates 2 results, though I should check that. I will have to investigate it more.

Raimo Niskanen

unread,
Sep 1, 2017, 8:06:54 AM9/1/17
to erlang-q...@erlang.org

Yes. That is easily produced by (pointed out earlier in this thread):
1.0 - rand:uniform()

>
> Thanks,
> Michael

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

Raimo Niskanen

unread,
Sep 1, 2017, 8:13:55 AM9/1/17
to erlang-q...@erlang.org
On Fri, Sep 01, 2017 at 04:00:59AM -0700, Michael Truog wrote:
> On 09/01/2017 01:54 AM, Raimo Niskanen wrote:
> > On Thu, Aug 31, 2017 at 10:29:34PM -0700, Michael Truog wrote:
> > :
> >> I have some examples that can make this desire a bit clearer:
> >>
> >> https://github.com/CloudI/cloudi_core/blob/a1c10a02245f0f4284d701a2ee5f07aad17f6e51/src/cloudi_core_i_runtime_testing.erl#L139-L149
> >>
> >> % use Box-Muller transformation to generate Gaussian noise
> >> % (G. E. P. Box and Mervin E. Muller,
> >> % A Note on the Generation of Random Normal Deviates,
> >> % The Annals of Mathematical Statistics (1958),
> >> % Vol. 29, No. 2 pp. 610–611)
> >> X1 = random(),
> >> X2 = PI2 * random(),
> >> K = StdDev * math:sqrt(-2.0 * math:log(X1)),
> >> Result1 = erlang:max(erlang:round(Mean + K * math:cos(X2)), 1),
> >> Result2 = erlang:max(erlang:round(Mean + K * math:sin(X2)), 1),
> >> sleep(Result2),
> > Why not use rand:normal/3?
> >
> > It uses the Ziggurat Method and is supposed to be much faster and
> > numerically more stable than the basic Box-Muller method.
> >
> The Box-Muller is simpler and producing 2 results instead of 1 . I believe I looked at the source code for rand:normal/3 and expected the Box-Muller to be faster only because it creates 2 results, though I should check that. I will have to investigate it more.

Simpler - yes.

The basic benchmark in rand_SUITE indicates that rand:normal() is only
about 50% slower than rand:uniform(1 bsl 58) (internal word size),
which I think is a very good number.

The Box-Muller transform method needs 4 calls to the 'math' module for
non-trivial floating point functions i.e log(), sqrt(), cos() and sin(),
which is why I think that "must" be slower.

But I have also not measured... :-/

Looking forward to hear your results!
--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

Raimo Niskanen

unread,
Sep 1, 2017, 8:53:48 AM9/1/17
to erlang-q...@erlang.org
Precisely!

>
> > > My belief is that the [0,1) distribution is the most common
> > > because it is the easiest to implement given the IEEE floating
> > > point standard format. However, I would also like to be proven
> > > wrong, to have more faith in the current situation.
>
> > I think that is very possible.
>
> From my relatively limited practical experience, either I've wanted
> [0, 1) or I don't care. Example:
>
> Red = int(random() * 256)
>
> where I don't want the value 256, because it's out of range for my
> 8-bit graphics mode, but I do want the probability of 255 to be the
> same as every other value. So I want [0, 1) as my range.
>
> Alternatively:
>
> P = random(),
> if
> P =< 0.3 -> ...;
> P =< 0.7 -> ...;
> P > 0.7 -> ...
> end
>
> where, in general, I don't care if I could get 0.0 or 1.0 or not,
> because the differences are immeasurably small for all practical
> purposes.

Especially since decimal numbers below 1.0 have no exact representation as
IEEE floating point numbers.

>
> I think it's clear to me that _several_ functions are needed for
> different cases, with fully-closed, fully-open and half-open
> intervals. IMO, the fully-closed and half-open are probably the most
> useful (and, modulo any floating-point issues which I'm not qualified
> to talk about, [0,1) can be turned into (0,1] with
> 1-random_halfopen()).

There should be no issues in this case with the updated algorithms in
the 'rand' module that produce numbers on the form N * 2^-53, and due to
the fact that IEEE floating point arithmetics is defined to produce
correctly rounded results for the simple operations +, -, *, /, ...

>
> With a fully-closed interval, it should be possible to write
> helpers for generating the other three by simply calling
> random_closed() again if you get an undesirable end-point. You can't
> easily extend the range of the half-open or open intervals to give you
> the closed ones. So I'd say at minimum, there should be a function
> giving the closed interval.

The closed interval [0.0,1.0] may be the most general for the reasons you
mention. Unfortunately it is cumbersome to implement "efficiently".

You would have to generate integers [0, 1+2^53] and to do that with a 58
bit generator (the default) you generate one number and have roughly a
1/32 chance of landing in the high interval where you need to retry.

The amortized impact of this is only about 3% slower, but the code has to
do all the testing for these corner cases which adds up to maybe 20% slower.

But once in a million it will take 4 full attempts or more to get a number,
which is rather annoying, and the worst case is infinity (but will never
happen ;-).

Therefore the half open interval [0.0,1.0) is probably the most useful one,
and the question is if the closed interval [0.0,1.0] (and the open interval
(0.0,1.0) is worth to implement...

>
> Whether the "test and retry" approach is the best implementation
> or not is a matter for discussion, as is the question of whether all

I have never heard of a better alternative than "test and retry" when you
are limiting the interval, except possibly when the probability for retry is
high (approaching 50%) - then you may consider to generate twice as many
bits and do a simple 'mod' operation for the result; the skew would be
impossible to notice. The "test and retry" method is still expected to be
as fast or faster, amortized, but the "generate double and 'mod'" method
has more or less fixed execution time.

> or some of these functions should be in the standard lib, or they are
> expected to be hacked together on an as-needed basis.

Yes. That is the question.

>
> Hugo.
>
> --
> Hugo Mills | Anyone who says their system is completely secure
> hugo@... carfax.org.uk | understands neither systems nor security.
> http://carfax.org.uk/ |
> PGP: E2AB1DE4 | Bruce Schneier


Raimo Niskanen

unread,
Sep 1, 2017, 10:52:37 AM9/1/17
to erlang-q...@erlang.org
On Thu, Aug 31, 2017 at 02:30:08PM +0200, Raimo Niskanen wrote:
:
> Should I make a pull request of this?
>
> https://github.com/erlang/otp/compare/OTP-20.0...RaimoNiskanen:raimo/stdlib/rand-uniformNZ
>
> Is the name uniformNZ good enough?
> Are uniform floats complete enough with this addition?

I have updated the suggestion above. Comments?

Michael Truog

unread,
Sep 2, 2017, 7:35:54 PM9/2/17
to erlang-q...@erlang.org
I have some interesting results.

These results use https://github.com/okeuday/erlbench which includes a copy of the source code at https://github.com/okeuday/quickrand :

TEST pseudo_randomness
N == 10000 (10 runs)
         18_bxor_abs get:     1612.7 us (  1.3)
18_erlang:system_tim get:     1254.1 us (  1.0)
        18_monotonic get:     1372.5 us (  1.1)
 18_os:system_time/1 get:     1221.7 us (  1.0)
19_os:perf_counter/1 get:     3752.2 us (  3.1)
      20_rand:normal get:     6832.0 us (  5.6)
       20_rand_exrop get:     3949.3 us (  3.2)
    20_rand_exs1024s get:    12073.3 us (  9.9)
        20_rand_exsp get:     3390.4 us (  2.8)
      os:timestamp/0 get:     1392.3 us (  1.1)
os_time:perf_counter get:     4109.4 us (  3.4)
quickrand_c:floatR/0 get:     5776.0 us (  4.7)
quickrand_c:floatR/1 get:     5704.3 us (  4.7)
   quickrand_c:uni/1 get:     4015.2 us (  3.3)
   quickrand_c:uni/2 get:     3960.7 us (  3.2)
quickrand_c_normal/2 get:     9329.5 us (  7.6)
quickrand_c_normal/3 get:     8917.7 us (  7.3)
random_wh06_int:unif get:    10777.5 us (  8.8)
random_wh82:uniform/ get:     4750.0 us (  3.9)
random_wh82_int:unif get:     4866.4 us (  4.0)

The function names that are relevant for a normal distribution are:
      20_rand:normal ->   rand:normal/0 (when using rand:seed(exsp, _))
        20_rand_exsp ->   rand:uniform/1 (when using
rand:seed(exsp, _))
quickrand_c:floatR/0 ->   quickrand_cache:floatR/0
quickrand_c:floatR/1
->   quickrand_cache:floatR/1
quickrand_c_normal/2
->   quickrand_cache_normal:box_muller/2
quickrand_c_normal/3
->   quickrand_cache_normal:box_muller/3

The rand module exsp algorithm was used here because it is the fastest pseudo-random number generator in the rand module.

A rough look at the latency associated with the normal distribution method, ignoring the latency for random number source is:
rand:normal/0
  3441.6 us =
6832.0 us - (
rand:uniform/1 3390.4 us)
quickrand_cache_normal:box_muller/2
  3553.5 us =
9329.5 us - (quickrand_cache:floatR/0 5776.0 us)
quickrand_cache_normal:box_muller/3
  3213.4 us = 8917.7 us - (quickrand_cache:floatR/1 5704.3 us)

So, this helps to show that the latency with both methods is very similar if you ignore the random number generation.  However, it likely requires some explanation:  The quickrand_cache module is what I am using here for random number generation, which stores cached data from crypto:strong_rand_bytes/1 with a default size of 64KB for the cache.  The difference between the functions quickrand_cache_normal:box_muller/2 and quickrand_cache_normal:box_muller/3 is that the first uses the process dictionary while the second uses a state variable.  Using the large amount of cached random data, the latency associated with individual calls to crypto:strong_rand_bytes/1 is avoided at the cost of the extra memory consumption, and the use of the cache makes the speed of random number generation similar to the speed of pseudo-random number generation that occurs in the rand module.

In CloudI, I instead use quickrand_normal:box_muller/2 to avoid the use of cached data to keep the memory use minimal (the use-case there doesn't require avoiding the latency associated with crypto:strong_rand_bytes/1 because it is adding latency for testing (at https://github.com/CloudI/cloudi_core/blob/299df02e6d22103415c8ba14379e90ca8c3d3b82/src/cloudi_core_i_runtime_testing.erl#L138) and it is best using a cryptographic random source to keep the functionality widely applicable).  However, the same function calls occur in the quickrand Box-Muller transformation source code, so the overhead is the same.
 
I used Erlang/OTP 20.0 (without HiPE) using the hardware below:
Core i7 2670QM 2.2GHz 1 cpu, 4 cores/cpu, 2 hts/core
L2:4×256KB L3:6MB RAM:8GB:DDR3-1333MHz
Sandy Bridge-HE-4 (Socket G2)

Best Regards,
Michael

Richard A. O'Keefe

unread,
Sep 3, 2017, 8:38:03 PM9/3/17
to erlang-q...@erlang.org


On 1/09/17 8:49 PM, Raimo Niskanen wrote:
>> By the way, given that a common way to make random floats is to
>> generate a bitvector, consider
>> (0 to: 15) collect: [:each | ((each / 15) * 256) truncated].
>> You will notice that the spacing between the values is *almost*
>> uniform, but not at the end.
>
> That sounds interesting but I do not understand. Is that Elixir code?

Nope, Smalltalk. I wanted to use rational arithmetic. In fact I did
not need to. Here it is in Haskell:
> [(x * 256) `div` 15 | x <- [0..15]]
[0,17,34,51,68,85,102,119,136,153,170,187,204,221,238,256]

Let's push that a bit further. Let's generate all possible 10-bit
integers and map them to the range [0..63]. We find again that
the gap sizes are not all the same. They can't be. If you
consider all vectors of N bits and map them to the range
[0..2**M] they *cannot* be uniformly distributed no matter what
method you use because (2**M+1) does not divide 2**N. You can
fix this by rejecting some of the bit vectors, but that would
be asking everyone to pay extra for a result they don't have any
particular need for.

Richard A. O'Keefe

unread,
Sep 3, 2017, 9:42:35 PM9/3/17
to erlang-q...@erlang.org


On 2/09/17 12:53 AM, Raimo Niskanen wrote:

>> P = random(),
>> if
>> P =< 0.3 -> ...;
>> P =< 0.7 -> ...;
>> P > 0.7 -> ...
>> end

> Especially since decimal numbers below 1.0 have no exact representation as
> IEEE floating point numbers.

There is in fact an IEEE standard for *decimal* floating point numbers.
If memory serves me correctly, z/Series and POWER support it. There is
a supplement to the C standard for it. Software emulation is available.
See decimal32, decimal64, decimal128 in
https://en.wikipedia.org/wiki/IEEE_754
or the actual IEEE 754-2008 if you have a copy (which I sadly don't).

In any case,
P = 10*rand(),
if P < 3 -> ...
; P < 7 -> ...
; P >= 7 -> ...
end
evades that issue. (With the numbers from drand48() the multiplication
is exact; with 53-bit random numbers it is not.)

Raimo Niskanen

unread,
Sep 4, 2017, 4:48:28 AM9/4/17
to erlang-q...@erlang.org

Thank you for sharing these numbers!

>
> A rough look at the latency associated with the normal distribution method, ignoring the latency for random number source is:
> rand:normal/0
> 3441.6 us = 6832.0 us - (rand:uniform/1 3390.4 us)

Should not the base value come from rand:uniform/0 instead. I know the
difference is not big - rand_SUITE:measure/1 suggests 3%, but it also
suggests that rand:normal/0 is about 50% slower than rand:uniform/0 while
your numbers suggests 100% slower. Slightly strange...

> quickrand_cache_normal:box_muller/2
> 3553.5 us = 9329.5 us - (quickrand_cache:floatR/0 5776.0 us)
> quickrand_cache_normal:box_muller/3
> 3213.4 us = 8917.7 us - (quickrand_cache:floatR/1 5704.3us)

It is really interesting to see that the calls to the 'math' module
does not slow that algorithm down very much (hardly noticable)!

>
> So, this helps to show that the latency with both methods is very similar if you ignore the random number generation. However, it likely requires some explanation: The quickrand_cache module is what I am using here for random number generation, which stores cached data from crypto:strong_rand_bytes/1 with a default size of 64KB for the cache. The difference between the functions quickrand_cache_normal:box_muller/2 and quickrand_cache_normal:box_muller/3 is that the first uses the process dictionary while the second uses a state variable. Using the large amount of cached random data, the latency associated with individual calls to crypto:strong_rand_bytes/1 is avoided at the cost of the extra memory consumption, and the use of the cache makes the speed of random number generation similar to the speed of pseudo-random number generation that occurs in the rand module.

We should add a 'rand' plugin to the 'crypto' module that does this
buffered crypto:strong_random_bytes/1 trick. There is something like that
in rand_SUITE, but we should really have an official one.

I also wonder where the sweet spot is? 64 KB seems like a lot of buffer.

>
> In CloudI, I instead use quickrand_normal:box_muller/2 to avoid the use of cached data to keep the memory use minimal (the use-case there doesn't require avoiding the latency associated with crypto:strong_rand_bytes/1 because it is adding latency for testing (at https://github.com/CloudI/cloudi_core/blob/299df02e6d22103415c8ba14379e90ca8c3d3b82/src/cloudi_core_i_runtime_testing.erl#L138) and it is best using a cryptographic random source to keep the functionality widely applicable). However, the same function calls occur in the quickrand Box-Muller transformation source code, so the overhead is the same.
>
> I used Erlang/OTP 20.0 (without HiPE) using the hardware below:
> |Core i7 2670QM 2.2GHz 1 cpu, 4 cores/cpu, 2 hts/core
> L2:4×256KB L3:6MB RAM:8GB:DDR3-1333MHz
> Sandy Bridge-HE-4 (Socket G2)
>
> Best Regards,
> Michael
> |

Best regards

Raimo Niskanen

unread,
Sep 4, 2017, 5:49:17 AM9/4/17
to erlang-q...@erlang.org
On Mon, Sep 04, 2017 at 12:37:50PM +1200, Richard A. O'Keefe wrote:
>
>
> On 1/09/17 8:49 PM, Raimo Niskanen wrote:
> >> By the way, given that a common way to make random floats is to
> >> generate a bitvector, consider
> >> (0 to: 15) collect: [:each | ((each / 15) * 256) truncated].
> >> You will notice that the spacing between the values is *almost*
> >> uniform, but not at the end.
> >
> > That sounds interesting but I do not understand. Is that Elixir code?
>
> Nope, Smalltalk. I wanted to use rational arithmetic. In fact I did
> not need to. Here it is in Haskell:
> > [(x * 256) `div` 15 | x <- [0..15]]
> [0,17,34,51,68,85,102,119,136,153,170,187,204,221,238,256]
>
> Let's push that a bit further. Let's generate all possible 10-bit
> integers and map them to the range [0..63]. We find again that
> the gap sizes are not all the same. They can't be. If you
> consider all vectors of N bits and map them to the range
> [0..2**M] they *cannot* be uniformly distributed no matter what
> method you use because (2**M+1) does not divide 2**N. You can
> fix this by rejecting some of the bit vectors, but that would
> be asking everyone to pay extra for a result they don't have any
> particular need for.
>

I see, but do not quite understand what you are getting at.

The current left-closed float generator starts with 58 random bits and
puts 53 of these into the mantissa in an IEEE 754 double binary float,
so that would not be it.

I guess it is a generator for the closed interval [0.0,1.0] or the open
(0.0,1.0) you talk about. If so:

This one-liner generates over [0.0,1.0]:
(rand:uniform((1 bsl 53) + 1) -1) * math:pow(2, -53)
and it uses an integer range R = (2^53 + 1), which is not dividable by 2.

The implementation for that range will generate a 58 bit number and then
check if the number is 0 =< X < R*31 and if so return the number mod R
(31 repetitions of the range is all that fits completely in 58 bits).

If the generated number is R*31 =< X that is in the top truncated interval
then we start over with a new number.

The implementation may in theory retry forever before finding a good
number, but the odds for retry is about 1/32 for each round so the
accumulated time is about 32/31 times one round. And only once in a million
it will take 4 attempts or more.

I discussed a different implementation with Prof. Vigna that is to always
generate one word too much and then use mod R on that which would bury the
skew in one word of random bits hence the difference in probability between
generated numbers should be about (2^58 - 1)/2^58, which would take quite
some effort to measure with statistical significance. But he considered
that as a bad idea since it would get half the speed for most ranges.

So this is an already solved problem, as I see it.

We *can* write efficient and good generators for open, closed and
half-closed intervals, if we want.

So far I have only seen the need for a (0.0,1.0] generator, which can be
implemented with:
1.0 - rand:uniform()
but in some applications such as 1.0/X and math:log(X) I think that the form
N * 2^-53 might be less than optimal, so I have a new suggestion:

https://github.com/erlang/otp/compare/OTP-20.0...RaimoNiskanen:raimo/stdlib/rand-uniformNZ

This variant never returns exactly 0.0 and have better precision for low
values. Comments? Especially about the name.

And so far I have not seen any actual need for (0.0,1.0) nor [0.0,1.0].

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

Kenji Rikitake

unread,
Sep 6, 2017, 8:30:54 AM9/6/17
to Erlang Questions, Raimo Niskanen
Raimo and all:

I got late to follow the thread.

I think the new function name should be
rand:uniform_non_zero/{1,2}
because I've rarely seen somethingUPPERCASE
names in Erlang functions.
(I might be wrong.)

On ranges:

I'm concerned about how OTP pre-20 (i.e., <= OTP 19.3.x) rand:uniform/1 code
might crash or cause bugs when running on the OTP 20 implementation.
At least how to write the OTP pre-20-equivalent code should be documented.

I have no firm idea on what should be the default behavior on ranges
and whether the borders should be inclusive/exclusive to the limit values.
In fact, the behavior differs between languages and implementations.
Some examples I've found follow:

JavaScript math.random(): [0.0, 1.0)
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random

Python random.uniform(a, b): [a, b] (when a <= b)
https://stackoverflow.com/questions/6088077/how-to-get-a-random-number-between-a-float-range

C++ std::uniform_real_distribution<> gen(a, b): [a, b)
http://en.cppreference.com/w/cpp/numeric/random/uniform_real_distribution

Ruby 2.4.1 Random class: rand(max): [0.0, max)
https://ruby-doc.org/core-2.4.1/Random.html
"When max is a Float, rand returns a random floating point number
 between 0.0 and max, including 0.0 and excluding max."

R runif(min=0.0, max=1.0): [0.0, 1.0] (See Note)
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Uniform.html
Note: "runif will not generate either of the extreme values
       unless max = min or max-min is small compared to min,
       and in particular not for the default arguments."

MySQL 5.7 RAND(): [0.0, 1.0)
https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand

PostgreSQL 10 random(): [0.0, 1.0)
https://www.postgresql.org/docs/10/static/functions-math.html#functions-math-random-table

MS SQL Server: (0.0, 1.0)
https://docs.microsoft.com/en-us/sql/t-sql/functions/rand-transact-sql
"Returns a pseudo-random float value from 0 through 1, exclusive."

dSFMT: "[1, 2), [0, 1), (0, 1] and (0, 1)"
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/#dSFMT
(dSFMT: SFMT PRNG implementation for double-precision floating point only)

It took me an hour to investigate this.
Lesson learned: don't take the range definitions for granted.

Regards,
Kenji Rikitake

Raimo Niskanen

unread,
Sep 6, 2017, 9:33:46 AM9/6/17
to erlang-q...@erlang.org
On Wed, Sep 06, 2017 at 09:30:17PM +0900, Kenji Rikitake wrote:
> Raimo and all:
>
> I got late to follow the thread.
>
> I think the new function name should be
> rand:uniform_non_zero/{1,2}
> because I've rarely seen somethingUPPERCASE
> names in Erlang functions.
> (I might be wrong.)

At least it would be the first (only) in the rand module, so you surely
have a point.

Another possible name would be rand:uniform_right_closed/{1,2}, or
rand:uniform_left_open/{1,2} but the mathematical reference might be
a bit obscure.

>
> On ranges:
>
> I'm concerned about how OTP pre-20 (i.e., <= OTP 19.3.x) rand:uniform/1 code
> might crash or cause bugs when running on the OTP 20 implementation.
> At least how to write the OTP pre-20-equivalent code should be documented.

If you mean pre-20 code using rand:uniform/1 that later runs on 20.0 or
later, then I am not aware of any drastic changes, at least not crashes.

Code calling rand:seed/{1,2} will select the explicit algorithm from pre-20
and behave exactly as before. Code not calling rand:seed/{1,2} will as
before get a random number sequence it has never seen before.

The same applies for code using rand:uniform/0 - it could pre-20 get
exactly 0.0 and the same on 20.0 and later.

You can, of course, only use the explicit algorithms that existed pre-20
when executing pre-20, and the same algorithms still exists but are not
documented on OTP 20.0 and should behave exactly the same.

So since there should be nothing more to writing pre-20 code than to read
the documentation for the release in question, and that such code should
work just as before on 20.0 and later, I do not think there should be any
need for documenting that. This should be exactly as expected.

>
> I have no firm idea on what should be the default behavior on ranges
> and whether the borders should be inclusive/exclusive to the limit values.
> In fact, the behavior differs between languages and implementations.
> Some examples I've found follow:
>
> JavaScript math.random(): [0.0, 1.0)
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random
>
> Python random.uniform(a, b): [a, b] (when a <= b)
> https://stackoverflow.com/questions/6088077/how-to-get-a-random-number-between-a-float-range

There is a note that the upper bound may depending on rounding not be
returned. Really nasty.

>
> C++ std::uniform_real_distribution<> gen(a, b): [a, b)
> http://en.cppreference.com/w/cpp/numeric/random/uniform_real_distribution
>
> Ruby 2.4.1 Random class: rand(max): [0.0, max)
> https://ruby-doc.org/core-2.4.1/Random.html
> "When max is a Float, rand returns a random floating point number
> between 0.0 and max, including 0.0 and excluding max."
>
> R runif(min=0.0, max=1.0): [0.0, 1.0] (See Note)
> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Uniform.html
> Note: "runif will not generate either of the extreme values
> unless max = min or max-min is small compared to min,
> and in particular not for the default arguments."

That is a really sloppy definition!

>
> MySQL 5.7 RAND(): [0.0, 1.0)
> https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand
>
> PostgreSQL 10 random(): [0.0, 1.0)
> https://www.postgresql.org/docs/10/static/functions-math.html#functions-math-random-table
>
> MS SQL Server: (0.0, 1.0)
> https://docs.microsoft.com/en-us/sql/t-sql/functions/rand-transact-sql
> "Returns a pseudo-random float value from 0 through 1, exclusive."
>
> dSFMT: "[1, 2), [0, 1), (0, 1] and (0, 1)"
> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/#dSFMT
> (dSFMT: SFMT PRNG implementation for double-precision floating point only)
>
> It took me an hour to investigate this.
> Lesson learned: don't take the range definitions for granted.

You serched better than me and did find some that excludes 0.0.
Good to know!

It seems [0.0, 1.0) dominates but does not rule.

Best regards
/ Raimo

Richard A. O'Keefe

unread,
Sep 6, 2017, 8:02:53 PM9/6/17
to erlang-q...@erlang.org


On 7/09/17 12:30 AM, Kenji Rikitake wrote:
> Raimo and all:
>
> I got late to follow the thread.
>
> I think the new function name should be
> rand:uniform_non_zero/{1,2}
> because I've rarely seen somethingUPPERCASE
> names in Erlang functions.
> (I might be wrong.)

"nonzero" is one word, so rand:uniform_nonzero/{1,2}
would be better still.
Ad fontes! Check the official documentation.
https://docs.python.org/3/library/random.html
says quite explicitly
"Almost all module functions depend on the basic function random(),
which generates a random float uniformly in the semi-open range
[0.0, 1.0)."

The actual source code for random.uniform(a, b) is

def uniform(self, a, b):
"Get a random number in the range
[a, b) or [a, b] depending on rounding."
return a + (b-a) * self.random()

Now of course in exact real arithmetic, if 0 <= u < 1
and a < b then a <= (b-a)*u + a < b. So they are using
an algorithm that *would* exclude b except for roundoff.
And they are leaving it to users to deal with the
consequences of the numerical error, and weaselling out
of it by blaming the computer.

In any case, Python is a [0,1) example.

> C++ std::uniform_real_distribution<> gen(a, b): [a, b)
> http://en.cppreference.com/w/cpp/numeric/random/uniform_real_distribution
>
> Ruby 2.4.1 Random class: rand(max): [0.0, max)
> https://ruby-doc.org/core-2.4.1/Random.html
> "When max is a Float, rand returns a random floating point number
> between 0.0 and max, including 0.0 and excluding max."
>
> R runif(min=0.0, max=1.0): [0.0, 1.0] (See Note)
> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Uniform.html
> Note: "runif will not generate either of the extreme values
> unless max = min or max-min is small compared to min,
> and in particular not for the default arguments."
>
> MySQL 5.7 RAND(): [0.0, 1.0)
> https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand
>
> PostgreSQL 10 random(): [0.0, 1.0)
> https://www.postgresql.org/docs/10/static/functions-math.html#functions-math-random-table
>
> MS SQL Server: (0.0, 1.0)
> https://docs.microsoft.com/en-us/sql/t-sql/functions/rand-transact-sql
> "Returns a pseudo-random float value from 0 through 1, exclusive."

All of the systems you have mentioned to this point use [0,1)
as the building block (or in the case of R, (0,1).
>
> dSFMT: "[1, 2), [0, 1), (0, 1] and (0, 1)"
> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/#dSFMT
> (dSFMT: SFMT PRNG implementation for double-precision floating point only)

That is to say, it has these functions in the interface:
dsfmt_genrand_close1_open2() => [1,2) -- their primitive
dsfmt_genrand_close_open() => [0,1)
dsfmt_genrand_open_close() => (0,1]
dsfmt_genrand_open_open() => (0,1)

You *can't* take the range for granted because you have to choose.
The one range it does not offer is [0,1].

The key lesson is that none of these systems offers [0,1] as a
basic building block, and that to the extent that it is possible
to tell, the ones that offer [a,b] as a derived version -- R and
Python -- do so as a sloppy implementation of [a,b).

For what it's worth, my Smalltalk library now has
aRandom
next [0,1)
nextNonzero (0,1]
nextBetween: a andExclusive: b [a,b)
nextBetweenExclusive: a and: b (a,b]

Raimo Niskanen

unread,
Sep 8, 2017, 8:26:42 AM9/8/17
to erlang-q...@erlang.org
To conclude
===========

It would be convenient to have functions in the rand module that
generates on the interval (0.0, 1.0] e.g rand:uniform_nonzero/0
and rand:uniform_nonzero_s/1 or maybe rand:uniform_nz/0
and rand:uniform_nz_s/1.

But since it is very easy to use e.g (1.0 - rand:uniform()) I see
little value in adding them. Adding hints in the documentation could
suffice.

However, I think we could add functions with the same names and
interval that also has got a different uniformity i.e still uniform,
but not equidistant, instead increasing precison towards 0. This would
have a greater value. Such a uniformity would work better for some
suggested algorithms such as Box-Muller.

Ironically, the implementation by Kenji that I changed was a few bits in
that direction i.e used the generators' extra bits over 53 (5 or 11)
for increased precision, but I want to have at least 53 extra bits which I
hope is close enough to infinity.

I have implemented such functions in my own GitHub repo, but ran into
problems with the number distribution that I suspect is caused by
rounding errors in the current implementation of erlang:float/1.

So I will try to find the time to investigate that further...

/ Raimo
--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

Raimo Niskanen

unread,
Sep 25, 2017, 5:05:36 AM9/25/17
to erlang-q...@erlang.org
On Mon, Sep 04, 2017 at 10:48:15AM +0200, Raimo Niskanen wrote:
:
> We should add a 'rand' plugin to the 'crypto' module that does this
> buffered crypto:strong_random_bytes/1 trick. There is something like that
> in rand_SUITE, but we should really have an official one.
>
> I also wonder where the sweet spot is? 64 KB seems like a lot of buffer.
>

I have created a pull request for such an extension to the crypto module:
https://github.com/erlang/otp/pull/1573

Raimo Niskanen

unread,
Sep 25, 2017, 5:07:30 AM9/25/17
to erlang-q...@erlang.org
On Fri, Sep 08, 2017 at 02:26:28PM +0200, Raimo Niskanen wrote:
> To conclude
> ===========
>
> It would be convenient to have functions in the rand module that
> generates on the interval (0.0, 1.0] e.g rand:uniform_nonzero/0
> and rand:uniform_nonzero_s/1 or maybe rand:uniform_nz/0
> and rand:uniform_nz_s/1.
>
> But since it is very easy to use e.g (1.0 - rand:uniform()) I see
> little value in adding them. Adding hints in the documentation could
> suffice.
>
> However, I think we could add functions with the same names and
> interval that also has got a different uniformity i.e still uniform,
> but not equidistant, instead increasing precison towards 0. This would
> have a greater value. Such a uniformity would work better for some
> suggested algorithms such as Box-Muller.
>
> Ironically, the implementation by Kenji that I changed was a few bits in
> that direction i.e used the generators' extra bits over 53 (5 or 11)
> for increased precision, but I want to have at least 53 extra bits which I
> hope is close enough to infinity.
>
> I have implemented such functions in my own GitHub repo, but ran into
> problems with the number distribution that I suspect is caused by
> rounding errors in the current implementation of erlang:float/1.
>
> So I will try to find the time to investigate that further...

I have created a pull request with such an extension to the rand module:
https://github.com/erlang/otp/pull/1574
Reply all
Reply to author
Forward
0 new messages