Faster bounded random integers

Thomas Depierre

unread,

Sep 13, 2021, 11:51:10 AM9/13/21

to Erlang Questions

Generating an unbiased random integer in a range 0..N where N is not a power of two requires rejection sampling.

`rand` uses the "Java" algorithm for this as far as i can tell (could not find a better name). https://github.com/erlang/otp/blob/master/lib/stdlib/src/rand.erl#L150

Daniel Lemire presents a good analysis of the current field in https://arxiv.org/abs/1805.10941 and then propose a "near divisionless" algorithm that should provide speedup. This could already provide interesting results.

A few days ago, the Swift team came up with an algorithm that:
- never divides
- avoids rejection sampling entirely
- achieves a theoretically optimal bound on the amount of randomness consumed to generate a sample
- delivers actual performance improvements for most real cases

https://github.com/apple/swift/pull/39143

Am I understanding this right ? As far as i can tell, our expert is Raimo Niskanen.

The EEF would be interested in funding me trying to explore the potential gains and possibly implement it for `rand`. Would this be of interest ? I expect the maintenance to be relatively low, this is not a lot of SLOC

Thomas Depierre

Schedule a meeting with me

Richard O'Keefe

unread,

Sep 13, 2021, 8:28:42 PM9/13/21

to Thomas Depierre, Erlang Questions

On Tue, 14 Sept 2021 at 03:51, Thomas Depierre

Richard O'Keefe

unread,

Sep 13, 2021, 8:29:35 PM9/13/21

to Thomas Depierre, Erlang Questions

I am grateful for these references. Not having had time to read them yet,
I wonder if this applies equally well to large integers?

Thomas Depierre

unread,

Sep 14, 2021, 4:40:53 AM9/14/21

to Richard O'Keefe, Erlang Questions

In theory, yes. You would just need to generate more bits of randomness to have enough entropy.

There is a catch though. If you begin to need so much entropy that generating that is far slower than divisions, then you should look at another approach. The Lemire paper offers a reference to other approaches more suited to these through less well known and mostly forgotten algorithms from the 70s.

This particular paper is long (60 pages) and is not super applicable to my needs, so I have not taken the time to dive in details yet.

In particular, you would need to have generated enough random bits to be slower than the big int divisions you need to do. That is probably not happening under any realistic integer range we would deal with on the BEAM imho.

Thomas Depierre

Schedule a meeting with me

Raimo Niskanen

unread,

Sep 16, 2021, 4:15:41 AM9/16/21

to erlang-q...@erlang.org

It is an interesting approach, but the question is how it would work
or be made to work for Erlang...

The range algorithm in rand.erl seems to be what Lemire's paper calls
"The Java Approach", I agree.

Lemire's "Avoiding Division" approach avoids integer division and instead
uses multiplication of the random number and the range and bit shifts the
result, assuming multiplication and bit shift are significantly faster
operations than integer division.

The default PRNG:s in the 'rand' module generate 58 bit integers. This
size is carefully selected to avoid creating bignums since they are much
slower to operate on. You can add 3 58 bit integers to get a 59 bit,
and it is still not a bignum. A 60 bit integer is a bignum. All this
assuming a 64 bit VM with the current term tag scheme.

So, multiplying a generated integer of 58 bit with any range larger than 3
would create a bignum, and a subsequent bit shift left of 58 bits would
have to operate on a bignum. I reckon that would be a significant
performance penalty.

To remove the bias, the fastest I can think of right now would be to
mask out the low 58 bits, compare if it is a biased number, and if so
reject and reiterate. Or generate more bits and do a new bignum
multiplication and shift.

For ranges larger than 58 bits you are in bignum land to start with,
so Lemire's approach suggests that we could just generate one extra
58-bit random word, multiply, shift and say the bias is buried.
This might be simpler and faster than the current approach.
An interesting idea.

I think the "Avoiding Division" approach might be efficient when
the range is less than 26 bit because then we could use only 26
bits from the random number and keep the multiplication and shifts
in smallnum integers. This could be interesting to try out.

Those are my first impressions and guesses. They are not the truth.
The truth has to be measured...

Also, if we should come up with a new way of generating integers in
a range, since they would produce new sequences of integers for
the same seed; we have to decide to create new algorithm names
- duplicating all existing. Either that or introduce new
API functions besides rand:uniform/1 and rand:uniform_s/2.

So the performance gain would have to merit such a change...

Cheers
/ Raimo Niskanen

On Tue, Sep 14, 2021 at 10:40:29AM +0200, Thomas Depierre wrote:
> In theory, yes. You would just need to generate more bits of randomness to
> have enough entropy.
>
> There is a catch though. If you begin to need so much entropy that
> generating that is far slower than divisions, then you should look at
> another approach. The Lemire paper offers a reference to other approaches
> more suited to these through less well known and mostly forgotten
> algorithms from the 70s.
> This particular paper is long (60 pages) and is not super applicable to my
> needs, so I have not taken the time to dive in details yet.
>
> In particular, you would need to have generated enough random bits to be
> slower than the big int divisions you need to do. That is probably not
> happening under any realistic integer range we would deal with on the BEAM
> imho.
>
> Thomas Depierre

> Schedule a meeting with me <https://calendly.com/thomas-depierre/30min>

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

Raimo Niskanen

unread,

Sep 16, 2021, 1:42:13 PM9/16/21

to erlang-q...@erlang.org

I benchmarked a naïve implementation of Lemire's algorithm,
as I understand it:

exsss2_uniform(Range, AlgHandler, R0) ->
{V, R} = exsss_next(R0), % generates 58 bits
W = (V bsr 29) * Range,
if
?MASK(29, W) < Range ->
exsss2_uniform(Range, AlgHandler, R);
true ->
{(W bsr 29) + 1, {AlgHandler, R}}
end.

To avoid bignum operations it only works for Range < ?BIT(29)
and uses the discard-and-reiterate approach to avoid bias.

I ran rand_SUITE:measure/1 5 times and for range 10000 the
algorithm above performed an execution time of 87, 97, 83, 94, 103 %
of the current range implementation. So 93 % on average with quite
a spread. That is only a 7 % improvement, on average.

I also tried to entirely remove the bias avoidance and then
got the numbers: 91, 107, 92, 91, 80 - averages to 92 %, so bias
avoidance does not seem to be a big performance problem.

Maybe it is incorrect, can be improved, performs better
for smaller ranges, the benchmark is bad, etc...

But so far that does not look very promising.

Cheers
/ Raimo

Reply all

Reply to author

Forward