Generating uniform doubles in the unit interval

Kevin M

unread,

Apr 1, 2021, 10:22:39 AM4/1/21

to prng

Hello, I was introduced to the method for generating a double in the interval [0,1) from a 64-bit int, outlined here, i.e.

double d = (x >> 11) * 0x1.0p-53;

I was wondering if someone could explain the choice of shift and exponent values here. According to the text:

A standard double (64-bit) floating-point number in IEEE floating point format has 52 bits of significand, plus an implicit bit at the left of the significand. Thus, the representation can actually store numbers with 53 significant binary digits.

So I can see that that's where 53 is coming from, and 11+53 = 64. But I really don't have a strong understanding of why this is justified. I notice that other values return a double in the same range, but I assume there's some quirk of floating-point that means that the above example gives the best results in some way?

Any help shedding light on this is much appreciated. Also, going off the same logic I assume that the following is the "correct" way of generating a float from a 32-bit int in the same way? Thanks!

float f = (x >> 8) * 0x1.0p-24f;

Sebastiano Vigna

unread,

Apr 1, 2021, 12:50:50 PM4/1/21

to prng

Il giorno giovedì 1 aprile 2021 alle 15:22:39 UTC+1 Kevin M ha scritto:

Hello, I was introduced to the method for generating a double in the interval [0,1) from a 64-bit int, outlined here, i.e.

double d = (x >> 11) * 0x1.0p-53;

I was wondering if someone could explain the choice of shift and exponent values here. According to the text:

Well I think the page you linked explains everything. Frankly I cannot add anything...

float f = (x >> 8) * 0x1.0p-24f;

Yes.

Kevin M

unread,

Apr 5, 2021, 7:28:09 AM4/5/21

to prng

Well for example, why not use:

double d = x * 0x1.0p-64;

What errors/artifacts will this introduce over using the correct method? Thanks.

Sebastiano Vigna

unread,

Apr 5, 2021, 8:41:02 AM4/5/21

to prng

Il giorno lunedì 5 aprile 2021 alle 12:28:09 UTC+1 Kevin M ha scritto:

Well for example, why not use:

double d = x * 0x1.0p-64;

What errors/artifacts will this introduce over using the correct method? Thanks.

That some outputs will be more frequent than others. Using 53 bits guarantees that you get all 2^53 possible outputs equally likely.

Repeated outputs happen because you lose precision when converting to double. You might decide you're OK with those.

Kevin M

unread,

Apr 6, 2021, 12:30:02 PM4/6/21

to prng

Cool, thank you very much!

Sebastian Werhausen

unread,

Apr 12, 2021, 6:14:14 AM4/12/21

to prng

If I may join the discussion about something closely related: What would the correct code be to generate numbers on the closed interval, ie [0,1]? I'm having some trouble with that.

Sebastiano Vigna

unread,

Apr 12, 2021, 3:19:21 PM4/12/21

to prng

Il giorno lunedì 12 aprile 2021 alle 11:14:14 UTC+1 Sebastian Werhausen ha scritto:

If I may join the discussion about something closely related: What would the correct code be to generate numbers on the closed interval, ie [0,1]? I'm having some trouble with that.

I believe rejection is your only option if you want uniformity. Like, use the upper 54 bits, reject numbers strictly larger than 2^53 and divide by 2^53.

Why do you need to generate a uniform value in [0..1], if I may?

Reply all

Reply to author

Forward