Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Getting a random integer

71 views
Skip to first unread message

Cecil Westerhof

unread,
May 31, 2018, 2:59:04 AM5/31/18
to
I am going to write a function to generate an UUID on Linux. For this
I need to generate random integers. At the moment I have the
following:
proc getRandomLnx {{secure False}} {
if ${secure} {
set randFile /dev/random
} else {
set randFile /dev/urandom
}
set randDev [open ${randFile} rb]
set random [read ${randDev} 8]
binary scan ${random} H8 randomX
set randomI [scan ${randomX} %x]
close ${randDev}
return ${randomI}
}

Is this a good way, or is there a better way?


For the moment 32 bits is good enough. But probably a good idea to add
a parameter that I want a 63 bits version instead of a 32 bits
version.

--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Cecil Westerhof

unread,
May 31, 2018, 5:44:05 AM5/31/18
to
Cecil Westerhof <Ce...@decebal.nl> writes:

> I am going to write a function to generate an UUID on Linux. For this
> I need to generate random integers. At the moment I have the
> following:
> proc getRandomLnx {{secure False}} {
> if ${secure} {
> set randFile /dev/random
> } else {
> set randFile /dev/urandom
> }
> set randDev [open ${randFile} rb]
> set random [read ${randDev} 8]
> binary scan ${random} H8 randomX
> set randomI [scan ${randomX} %x]
> close ${randDev}
> return ${randomI}
> }
>
> Is this a good way, or is there a better way?
>
>
> For the moment 32 bits is good enough. But probably a good idea to add
> a parameter that I want a 63 bits version instead of a 32 bits
> version.

I made it and 64 bits instead of 63 bits:
proc getRandomIntLnx {{secure False} {longVersion False}} {
if ${secure} {
set randFile /dev/random
} else {
set randFile /dev/urandom
}
if ${longVersion} {
set bytes 8
set format H16
} else {
set bytes 4
set format H8
}
set randDev [open ${randFile} rb]
set random [read ${randDev} ${bytes}]
close ${randDev}
binary scan ${random} ${format} randomX
set randomI [scan ${randomX} %x]
# I do not want to return negative values (only a problem with long version)
if {${randomI} < 0} {
incr randomI [expr 2 ** 64]
}
return ${randomI}

Rich

unread,
May 31, 2018, 7:30:27 AM5/31/18
to
Cecil Westerhof <Ce...@decebal.nl> wrote:
> I am going to write a function to generate an UUID on Linux. For this
> I need to generate random integers. At the moment I have the
> following:
> proc getRandomLnx {{secure False}} {

> if ${secure} {
> set randFile /dev/random
> } else {
> set randFile /dev/urandom
> }

This is unnecessary, just use /dev/urandom
(https://www.2uo.de/myths-about-urandom/). The Linux manpage is
incorrect in their implication that 'random' generates 'more random'
values. The same generator is used for both, the only difference is
'random' will block when its 'estimate' of entropy is too low (and this
is simply a black magic 'estimate' without much foundation in reality).

> set randDev [open ${randFile} rb]
> set random [read ${randDev} 8]
> binary scan ${random} H8 randomX
> set randomI [scan ${randomX} %x]
> close ${randDev}
> return ${randomI}
> }
>
> Is this a good way, or is there a better way?

There is no need to binary scan then scan. If you want a 32 bit
integer, use the i or I conversions of 'binary scan'. If you want the
integer to be unsigned, add the 'u' qualifier.

> For the moment 32 bits is good enough.

Not for a proper type 4 UUID (note, I'm presuming the above is headed towards
UUID generation due to your other thread).

> But probably a good idea to add a parameter that I want a 63 bits
> version instead of a 32 bits version.

In which case, look at the w or W conversion available from binary
scan.

Cecil Westerhof

unread,
May 31, 2018, 8:28:04 AM5/31/18
to
Rich <ri...@example.invalid> writes:

> Cecil Westerhof <Ce...@decebal.nl> wrote:
>> I am going to write a function to generate an UUID on Linux. For this
>> I need to generate random integers. At the moment I have the
>> following:
>> proc getRandomLnx {{secure False}} {
>
>> if ${secure} {
>> set randFile /dev/random
>> } else {
>> set randFile /dev/urandom
>> }
>
> This is unnecessary, just use /dev/urandom
> (https://www.2uo.de/myths-about-urandom/). The Linux manpage is
> incorrect in their implication that 'random' generates 'more random'
> values. The same generator is used for both, the only difference is
> 'random' will block when its 'estimate' of entropy is too low (and this
> is simply a black magic 'estimate' without much foundation in
> reality).

That is why default secure is False. I do not want to block, but when
the randomness is really important (encryption, passwords, …) I go for
random with secure is True.


>
>> set randDev [open ${randFile} rb]
>> set random [read ${randDev} 8]
>> binary scan ${random} H8 randomX
>> set randomI [scan ${randomX} %x]
>> close ${randDev}
>> return ${randomI}
>> }
>>
>> Is this a good way, or is there a better way?
>
> There is no need to binary scan then scan. If you want a 32 bit
> integer, use the i or I conversions of 'binary scan'. If you want the
> integer to be unsigned, add the 'u' qualifier.

OK, thanks: I will look into that.


>> For the moment 32 bits is good enough.
>
> Not for a proper type 4 UUID (note, I'm presuming the above is headed towards
> UUID generation due to your other thread).

Not only, but also yes.


The way I build the UUID I work with random values with 16 and 12
bits, so 32 bits is more as enough.

I already made the 64 bits version also. I was thinking that a 32 bits
version would be a good idea because in most cases it is enough. And
it would not exhaust /dev/random.

>> But probably a good idea to add a parameter that I want a 63 bits
>> version instead of a 32 bits version.
>
> In which case, look at the w or W conversion available from binary
> scan.

I will do that.

Cecil Westerhof

unread,
May 31, 2018, 8:59:04 AM5/31/18
to
Rich <ri...@example.invalid> writes:

> In which case, look at the w or W conversion available from binary
> scan.

Where do I information about the different formats? I found the
letters (except w), but did not find a description about there
meaning.

It is true that w and W means different things?
binary scan ${random} w randomI; puts $randomI -> 7360756988818333942
binary scan ${random} W randomI; puts $randomI -> -675401970501474714

binary scan ${random} Wu randomI; puts $randomI -> 17771342103208076902

17771342103208076902 - -675401970501474714 = 18446744073709551616 = 2 ^ 64
So the unsigned bit works correct.

Cecil Westerhof

unread,
May 31, 2018, 9:14:04 AM5/31/18
to
I changed it to:
proc getRandomIntLnx {{secure False} {longVersion False}} {
if ${secure} {
set randFile /dev/random
} else {
set randFile /dev/urandom
}
if ${longVersion} {
set bytes 8
set format wu
} else {
set bytes 4
set format iu
}
set randDev [open ${randFile} rb]
set random [read ${randDev} ${bytes}]
close ${randDev}
binary scan ${random} ${format} randomI
return ${randomI}

Cecil Westerhof

unread,
May 31, 2018, 10:44:04 AM5/31/18
to
Cecil Westerhof <Ce...@decebal.nl> writes:

> I am going to write a function to generate an UUID on Linux. For this
> I need to generate random integers. At the moment I have the
> following:
> proc getRandomLnx {{secure False}} {
> if ${secure} {
> set randFile /dev/random
> } else {
> set randFile /dev/urandom
> }
> set randDev [open ${randFile} rb]
> set random [read ${randDev} 8]
> binary scan ${random} H8 randomX
> set randomI [scan ${randomX} %x]
> close ${randDev}
> return ${randomI}
> }
>
> Is this a good way, or is there a better way?
>
>
> For the moment 32 bits is good enough. But probably a good idea to add
> a parameter that I want a 63 bits version instead of a 32 bits
> version.

I created a page about this:
http://wiki.tcl.tk/55342

I renamed the proc to getRandomIntNix and there is also a proc called
getRandomIntInRangeNix.

Rich

unread,
May 31, 2018, 10:59:46 AM5/31/18
to
Cecil Westerhof <Ce...@decebal.nl> wrote:
> Rich <ri...@example.invalid> writes:
>
>> Cecil Westerhof <Ce...@decebal.nl> wrote:
>>> I am going to write a function to generate an UUID on Linux. For
>>> this I need to generate random integers. At the moment I have the
>>> following:
>>> proc getRandomLnx {{secure False}} {
>>
>>> if ${secure} {
>>> set randFile /dev/random
>>> } else {
>>> set randFile /dev/urandom
>>> }
>>
>> This is unnecessary, just use /dev/urandom
>> (https://www.2uo.de/myths-about-urandom/). The Linux manpage is
>> incorrect in their implication that 'random' generates 'more random'
>> values. The same generator is used for both, the only difference is
>> 'random' will block when its 'estimate' of entropy is too low (and
>> this is simply a black magic 'estimate' without much foundation in
>> reality).
>
> That is why default secure is False. I do not want to block, but when
> the randomness is really important (encryption, passwords, ?) I go
> for random with secure is True.

Did you read through the URL I referenced? Both return the same
"randomness". The only difference is blocking on "/dev/random" using a
mystical black box of "perceived amounts" of entropy.

But "/dev/random" is not more, or less random that "/dev/urandom".
Both give you the same amount of randomness.

Rich

unread,
May 31, 2018, 11:00:39 AM5/31/18
to
Cecil Westerhof <Ce...@decebal.nl> wrote:
> Rich <ri...@example.invalid> writes:
>
>> In which case, look at the w or W conversion available from binary
>> scan.
>
> Where do I information about the different formats? I found the
> letters (except w), but did not find a description about there
> meaning.
>
> It is true that w and W means different things?
> binary scan ${random} w randomI; puts $randomI -> 7360756988818333942
> binary scan ${random} W randomI; puts $randomI -> -675401970501474714
>
> binary scan ${random} Wu randomI; puts $randomI -> 17771342103208076902
>
> 17771342103208076902 - -675401970501474714 = 18446744073709551616 = 2 ^ 64
> So the unsigned bit works correct.

man n binary

And start reading under the "binary scan" section. However the
"unsigned" modifier "u" is defined at the start of the section, not for
each individual conversion modifier.

Rich

unread,
May 31, 2018, 11:06:36 AM5/31/18
to
Cecil Westerhof <Ce...@decebal.nl> wrote:
> Cecil Westerhof <Ce...@decebal.nl> writes:
>
>> I am going to write a function to generate an UUID on Linux. For this
>> I need to generate random integers. At the moment I have the
>> following:
>> proc getRandomLnx {{secure False}} {
>> if ${secure} {
>> set randFile /dev/random
>> } else {
>> set randFile /dev/urandom
>> }
>> set randDev [open ${randFile} rb]
>> set random [read ${randDev} 8]
>> binary scan ${random} H8 randomX
>> set randomI [scan ${randomX} %x]
>> close ${randDev}
>> return ${randomI}
>> }
>>
>> Is this a good way, or is there a better way?
>>
>>
>> For the moment 32 bits is good enough. But probably a good idea to add
>> a parameter that I want a 63 bits version instead of a 32 bits
>> version.
>
> I created a page about this:
> http://wiki.tcl.tk/55342
>
> I renamed the proc to getRandomIntNix and there is also a proc called
> getRandomIntInRangeNix.

Here:

expr {${random} % ${modulo} + ${min}}

You've fallen for a common pitfall. You have a biased output unless
the modulo happens to fall exactly on a unit that divides the input
random range into exact equal sized units (half, quarters, thirds,
etc.).

Consider this to be your random input range (and the caret (^) to be
the point where the modulo operator will 'divide' the range):

|------------------------------|
^

So what comes out is this set of values:

|---------------------| if input is less than the break point
|-------| if the input is larger than the break point

^^^^^^^^^

The small subrange at the start has a slightly higher probability of
occurrence than the rest. Therefore, a biased output.

Note that this may not matter for your uses, but for some situations it
can matter significantly (crypto being one of those areas).

Cecil Westerhof

unread,
May 31, 2018, 2:14:05 PM5/31/18
to
You are right. I should put something about that on the page.


By the way: it is not completely true. ;-)
When I call:
getRandomIntInRangeNix 0 6

It is not exactly sized, but the bias is (in my opinion) negligible.

I will also extend it (with a parameter) to circumvent this problem.

Rich

unread,
May 31, 2018, 6:36:44 PM5/31/18
to
Cecil Westerhof <Ce...@decebal.nl> wrote:
> Rich <ri...@example.invalid> writes:
>
>> Cecil Westerhof <Ce...@decebal.nl> wrote:
>>> I created a page about this:
>>> http://wiki.tcl.tk/55342
>>>
>> Here:
>>
>> expr {${random} % ${modulo} + ${min}}
>>
>> You've fallen for a common pitfall. You have a biased output unless
>> the modulo happens to fall exactly on a unit that divides the input
>> random range into exact equal sized units (half, quarters, thirds,
>> etc.).
>>
>> Consider this to be your random input range (and the caret (^) to be
>> the point where the modulo operator will 'divide' the range):
>>
>> |------------------------------|
>> ^
>>
>> So what comes out is this set of values:
>>
>> |---------------------| if input is less than the break point
>> |-------| if the input is larger than the break point
>>
>> ^^^^^^^^^
>>
>> The small subrange at the start has a slightly higher probability of
>> occurrence than the rest. Therefore, a biased output.
>>
>> Note that this may not matter for your uses, but for some situations it
>> can matter significantly (crypto being one of those areas).
>
> You are right. I should put something about that on the page.
>
>
> By the way: it is not completely true. ;-)
> When I call:
> getRandomIntInRangeNix 0 6
>
> It is not exactly sized, but the bias is (in my opinion) negligible.

Well, it is 'true', it *is* biased. But the bias might be negligible,
and that might not matter, or it might matter. For usage in a game,
the bias might not matter at all. To a cryptographer, *any* bias in a
random number generator is a significant deal.

So it all depends upon the intended usage.

Cecil Westerhof

unread,
Jun 1, 2018, 1:59:04 AM6/1/18
to
expr 2 ** 32 / 7 -> 613566756
So the bias is really small. ;-)

I created the following test script:
array set count {}
set maxVal [expr {2 ** 20}]
for {set n 1} {${n} < ${maxVal}} {incr n} {
incr count([getRandomIntInRangeNix 0 6])
}
parray count

Which when running five times gave the following output:
count(0) = 150224
count(1) = 150241
count(2) = 150094
count(3) = 150000
count(4) = 149069
count(5) = 149568
count(6) = 149379

count(0) = 149837
count(1) = 149143
count(2) = 149973
count(3) = 149667
count(4) = 149864
count(5) = 150357
count(6) = 149734

count(0) = 149216
count(1) = 149725
count(2) = 149813
count(3) = 149469
count(4) = 150461
count(5) = 149564
count(6) = 150327

count(0) = 149650
count(1) = 149083
count(2) = 149426
count(3) = 149610
count(4) = 149501
count(5) = 150764
count(6) = 150541

count(0) = 149645
count(1) = 149660
count(2) = 150073
count(3) = 149571
count(4) = 149572
count(5) = 150173
count(6) = 149881

So in most cases this would be quit acceptable. But I make also a
variant where I am going to solve this problem.

And I have to implement something that raises an error when there is a
real problem. For example:
getRandomIntInRangeNix 0 [expr {2 ** 32 - 2}]

Should not be allowed.

I think that when (with the not long version}
expr {2 ** 32 / modulo}

is lower as 10 that I should raise an error.


Another possibility (and that is maybe even better) is to create a
solution that is never biased. I think about it this weekend.


To show that there is a real problem I also created:
array set count {}
set maxVal [expr {2 ** 20}]
for {set n 1} {${n} < ${maxVal}} {incr n} {
set index [expr {[getRandomIntInRangeNix 0 [expr {2 ** 28 * 10}]] / 2 ** 28}]
incr count(${index})
}
parray count

This gives:
count(0) = 131094
count(1) = 130876
count(2) = 130931
count(3) = 131093
count(4) = 130746
count(5) = 131215
count(6) = 65935
count(7) = 65561
count(8) = 65283
count(9) = 65841

That is a bias that is even not acceptable in a game.

Cecil Westerhof

unread,
Jun 1, 2018, 4:28:04 AM6/1/18
to
Cecil Westerhof <Ce...@decebal.nl> writes:

> Another possibility (and that is maybe even better) is to create a
> solution that is never biased. I think about it this weekend.

Done. Have some other things to do now, but will publish it later.

Cecil Westerhof

unread,
Jun 2, 2018, 10:59:05 AM6/2/18
to
It makes one exception for random, so I have dropped it now from
getRandom. (Did not update the WiKi yet.)
0 new messages