Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

rand() and PRNG “quality”

3,014 views
Skip to first unread message

Morris Dovey

unread,
Sep 9, 2014, 2:53:43 AM9/9/14
to
Back in “Re: OT: printf 64-bit and gcc” Tim Rentsch showed an example
PRNG algorithm that sent me off to take a closer look at the rand()
function and some of the algorithms that might be used to implement a
rand64() function.

My use of rand() has nearly always involved scaling the generated value
to fit in a smaller range than RAND_MAX, and my investigation led me to
this question for which I couldn’t find an answer:

If a PRNG algorithm used for rand() produces a sequence of “quality” Q,
does application of the modulo operator as in

rand() % n

yield, in general, a sequence of the same “quality”?

--
Morris Dovey
http://www.iedu.com/Solar/
http://www.facebook.com/MorrisDovey

Robert Wessel

unread,
Sep 9, 2014, 3:30:03 AM9/9/14
to
On Tue, 09 Sep 2014 01:53:30 -0500, Morris Dovey <mrd...@iedu.com>
wrote:

>Back in “Re: OT: printf 64-bit and gcc” Tim Rentsch showed an example
>PRNG algorithm that sent me off to take a closer look at the rand()
>function and some of the algorithms that might be used to implement a
>rand64() function.
>
>My use of rand() has nearly always involved scaling the generated value
>to fit in a smaller range than RAND_MAX, and my investigation led me to
>this question for which I couldn’t find an answer:
>
>If a PRNG algorithm used for rand() produces a sequence of “quality” Q,
>does application of the modulo operator as in
>
> rand() % n
>
>yield, in general, a sequence of the same “quality”?


No. At the very least, if (RAND_MAX+1) is not an exact multiple of n,
the result of that modulus operation will introduce a bias. Simple
example: let's say RAND_MAX was 15, and n was 7. The result would be
that (on average) for every 16 generated numbers, you'd have two
instances of 2-6, but *three* each of 0 and 1.

Even if the above is not an issue (IOW, (RAND_MAX+1) *is* an exact
multiple of n), some (P)RNGs exhibit different amounts of bias in
different parts of the returned number. For example, some generators
(fortunately not often used) alternate between even and odd numbers.
So while the original (not great) sequence might be fairly random,
except for the last bit, rand()%2 would result in the horrible:
0,1,0,1,0,1,0,1...

Morris Dovey

unread,
Sep 9, 2014, 4:01:25 AM9/9/14
to
Yes – I can see that. I’ve been tinkering with the xorshift64* algorithm
shown at https://en.wikipedia.org/wiki/Xorshift and may need to re-seed
at a frequency determined by n.

Thanks for your clear, informative response!

Morris Dovey

unread,
Sep 9, 2014, 4:29:44 AM9/9/14
to
On 9/9/14 3:01 AM, Morris Dovey wrote:

> Yes – I can see that. I’ve been tinkering with the xorshift64* algorithm
> shown at https://en.wikipedia.org/wiki/Xorshift and may need to re-seed
> at a frequency determined by n.

Oops! xorshift64* should have been xorshift1024* (brainfart)

Robert Wessel

unread,
Sep 9, 2014, 4:36:34 AM9/9/14
to
On Tue, 09 Sep 2014 03:01:13 -0500, Morris Dovey <mrd...@iedu.com>
If you care about the quality of the pseudo-random numbers, ignore the
C library function and acquire one of the many implementations of
Mersenne Twister. Most of those include functions that allow you to
specify a range as well (and will return a well distributed number in
that range), so you don't have to do the problematic reduction
yourself. If you happen to be using (or have available) a C++11
compiler, the stuff in <random> will do the trick.

Ben Bacarisse

unread,
Sep 9, 2014, 5:32:09 AM9/9/14
to
Morris Dovey <mrd...@iedu.com> writes:

> On 9/9/14 2:29 AM, Robert Wessel wrote:
<snip>
>> [...] For example, some generators
>> (fortunately not often used) alternate between even and odd numbers.
>> So while the original (not great) sequence might be fairly random,
>> except for the last bit, rand()%2 would result in the horrible:
>> 0,1,0,1,0,1,0,1...
>
> Yes – I can see that. I’ve been tinkering with the xorshift64*
> algorithm shown at https://en.wikipedia.org/wiki/Xorshift and may need
> to re-seed at a frequency determined by n.

This sounds dodgy. In general, regularly (or even pseudo-randomly),
re-seeding a PRNG reduced the quality of the sequence unless the quality
of your stream of seeds is at least as good as that of the PRNG.

--
Ben.

Malcolm McLean

unread,
Sep 9, 2014, 5:57:21 AM9/9/14
to
On Tuesday, September 9, 2014 7:53:43 AM UTC+1, Morris Dovey wrote:
> Back in "Re: OT: printf 64-bit and gcc" Tim Rentsch showed an example
> PRNG algorithm that sent me off to take a closer look at the rand()
> function and some of the algorithms that might be used to implement a
> rand64() function.
>
> My use of rand() has nearly always involved scaling the generated value
> to fit in a smaller range than RAND_MAX, and my investigation led me to
> this question for which I couldn't find an answer:
>
> If a PRNG algorithm used for rand() produces a sequence of "quality" Q,
> does application of the modulo operator as in
>
> rand() % n
>
> yield, in general, a sequence of the same "quality"?
>
>
All RNGs fail some statistical tests for randomness. Fiddling with them almost always makes
the failures worse. If an RNG produces a million uniformly-distributed numbers, and you
take modulus which is not a factor of a million, e.g., 600000, obviously you're going
to get a skewed distribution. If you take modulus 3, the skew will be insignificant.

But modulus also tends to reveal other statistical non-randomness, like cycling or anti-
cycling (sequences like 0101010101 coming up less often than expected) in the lower
bits.

Richard Heathfield

unread,
Sep 9, 2014, 7:40:56 AM9/9/14
to
Malcolm McLean wrote:

> On Tuesday, September 9, 2014 7:53:43 AM UTC+1, Morris Dovey wrote:
>> Back in "Re: OT: printf 64-bit and gcc" Tim Rentsch showed an example
>> PRNG algorithm that sent me off to take a closer look at the rand()
>> function and some of the algorithms that might be used to implement a
>> rand64() function.
>>
>> My use of rand() has nearly always involved scaling the generated value
>> to fit in a smaller range than RAND_MAX, and my investigation led me to
>> this question for which I couldn't find an answer:
>>
>> If a PRNG algorithm used for rand() produces a sequence of "quality" Q,
>> does application of the modulo operator as in
>>
>> rand() % n
>>
>> yield, in general, a sequence of the same "quality"?
>>
>>
> All RNGs fail some statistical tests for randomness.

If you made that "PRNGs", you'd stand a better chance of being right. A
truly random number generator would only fail a statistical randomness test
by chance. For example, if your Geiger counter produces a sequence of eight
hundred set bits - which it could, by chance - then it's likely to fail a
test, despite the fact that the next eight hundred bits are all over the
place. But, on the whole, it would pass all such tests, provided that the
tests were worth the time it took to design and implement them.

<snip>

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

Richard Heathfield

unread,
Sep 9, 2014, 7:43:30 AM9/9/14
to
Morris Dovey wrote:

> Back in “Re: OT: printf 64-bit and gcc” Tim Rentsch showed an example
> PRNG algorithm that sent me off to take a closer look at the rand()
> function and some of the algorithms that might be used to implement a
> rand64() function.
>
> My use of rand() has nearly always involved scaling the generated value
> to fit in a smaller range than RAND_MAX, and my investigation led me to
> this question for which I couldn’t find an answer:
>
> If a PRNG algorithm used for rand() produces a sequence of “quality” Q,
> does application of the modulo operator as in
>
> rand() % n
>
> yield, in general, a sequence of the same “quality”?

I've read the other responses, and have nothing to add or take away from
them on their own terms... but nobody seems to have mentioned

int randrange(int low, int high)
{
return (high + 1 - low) * (rand() / (RAND_MAX + 1.0)) + low;
}

which may at least give you better quality than % n.

Jorgen Grahn

unread,
Sep 9, 2014, 8:00:46 AM9/9/14
to
On Tue, 2014-09-09, Morris Dovey wrote:
> Back in ?Re: OT: printf 64-bit and gcc? Tim Rentsch showed an example
> PRNG algorithm that sent me off to take a closer look at the rand()
> function and some of the algorithms that might be used to implement a
> rand64() function.
>
> My use of rand() has nearly always involved scaling the generated value
> to fit in a smaller range than RAND_MAX, and my investigation led me to
> this question for which I couldn?t find an answer:
>
> If a PRNG algorithm used for rand() produces a sequence of ?quality? Q,
> does application of the modulo operator as in
>
> rand() % n
>
> yield, in general, a sequence of the same ?quality??

No. Disregarding the other (good) replies, let's assume you have a
PRNG which generates

xxxxxxxxxxxxxxxxxxxxxxxxxxxxx000

where 'x' are very nicely random bits.

That rand() arguably has higher quality than rand() % 8, which always
returns zero and has no quality at all.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Ben Bacarisse

unread,
Sep 9, 2014, 8:23:40 AM9/9/14
to
Richard Heathfield <inv...@see.sig.invalid> writes:

> Morris Dovey wrote:
<snip>
>> If a PRNG algorithm used for rand() produces a sequence of “quality” Q,
>> does application of the modulo operator as in
>>
>> rand() % n
>>
>> yield, in general, a sequence of the same “quality”?
>
> I've read the other responses, and have nothing to add or take away from
> them on their own terms... but nobody seems to have mentioned
>
> int randrange(int low, int high)
> {
> return (high + 1 - low) * (rand() / (RAND_MAX + 1.0)) + low;
> }
>
> which may at least give you better quality than % n.

How? It seems to me to be in many ways worse. It doesn't solve the
bias problem; high + 1 might overflow; the "+ 1.0" could be lost,
leading to an out-of-range result; and it seems to offer more than it
can deliver when RAND_MAX is lower than INT_MAX.

--
Ben.

Malcolm McLean

unread,
Sep 9, 2014, 9:32:14 AM9/9/14
to
On Tuesday, September 9, 2014 12:40:56 PM UTC+1, Richard Heathfield wrote:
> Malcolm McLean wrote:
>
> If you made that "PRNGs", you'd stand a better chance of being right. A
> truly random number generator would only fail a statistical randomness test
> by chance. For example, if your Geiger counter produces a sequence of eight
> hundred set bits - which it could, by chance - then it's likely to fail a
> test, despite the fact that the next eight hundred bits are all over the
> place. But, on the whole, it would pass all such tests, provided that the
> tests were worth the time it took to design and implement them.
>
Quantum events are maybe truly random. I know there's substantial discussion
amongst theoretical physicists on that point. I couldn't summarise the
arguments much less have a qualified view.

Robert Wessel

unread,
Sep 9, 2014, 9:59:09 AM9/9/14
to
Most (all?) hardware entropy sources have severe biases and
correlations, and require considerable processing before the random
stream meets even the minimal statistical tests. The raw stream is
basically never usable directly. For example, Intel's hardware RNG
("RDRAND") in some newer x86s, uses an AES based conditioner on the
raw source to generate a good stream for input in to the CSPRNG that
then feeds the RDRAND instructions.

David Brown

unread,
Sep 9, 2014, 10:10:20 AM9/9/14
to
On 09/09/14 15:31, Malcolm McLean wrote:
> On Tuesday, September 9, 2014 12:40:56 PM UTC+1, Richard Heathfield wrote:
>> Malcolm McLean wrote:
>>
>> If you made that "PRNGs", you'd stand a better chance of being right. A
>> truly random number generator would only fail a statistical randomness test
>> by chance. For example, if your Geiger counter produces a sequence of eight
>> hundred set bits - which it could, by chance - then it's likely to fail a
>> test, despite the fact that the next eight hundred bits are all over the
>> place. But, on the whole, it would pass all such tests, provided that the
>> tests were worth the time it took to design and implement them.
>>

This demonstrates that there are no true tests for randomness - any test
will give both false positives and false negatives.

It's easy to prove that (this is only a very rough indication of a
proof, of course).

In order for the test to give a result, it must finish within a finite
number of samples. Call the maximum number tested N. Take a sequence
that is viewed as random by the test, take the first N samples, and then
repeat those indefinitely. This repeating sequence is not random, but
will get a false positive from the test.

Any truly random sequence will contain any finite subsequence (in the
appropriate range) at some point. It will therefore contain a sequence
of N zeros. The test will give a false positive given that subsequence
of a truly random sequence.

That's why randomness tests are only statistical - they show the
likelihood of a sequence being random, and there is /always/ a chance of
judging incorrectly.


> Quantum events are maybe truly random. I know there's substantial discussion
> amongst theoretical physicists on that point. I couldn't summarise the
> arguments much less have a qualified view.
>

I don't think there is much disagreement that many quantum effects are
truly random. Radioactive decay is one such effect. (There are some
physicists who believe that everything has a fixed course, so there is
/some/ discussion on the topic - but I think they are very much in the
minority.)

Thermal noise is another source of physical randomness, and often used
in hardware random number generation.

You could also use the digits of a "normal" number, such as pi, as a
source of randomness. The interesting thing here is that the numbers
are arguably truly random, yet they are pre-determined and repeatable,
and are not "Kolmogorov random" (a "Kolmogorov" random sequence is one
which is uncompressable by a program smaller than the sequence itself).


Robert Wessel

unread,
Sep 9, 2014, 10:13:09 AM9/9/14
to
On Tue, 09 Sep 2014 12:43:19 +0100, Richard Heathfield
<inv...@see.sig.invalid> wrote:

>Morris Dovey wrote:
>
>> Back in “Re: OT: printf 64-bit and gcc” Tim Rentsch showed an example
>> PRNG algorithm that sent me off to take a closer look at the rand()
>> function and some of the algorithms that might be used to implement a
>> rand64() function.
>>
>> My use of rand() has nearly always involved scaling the generated value
>> to fit in a smaller range than RAND_MAX, and my investigation led me to
>> this question for which I couldn’t find an answer:
>>
>> If a PRNG algorithm used for rand() produces a sequence of “quality” Q,
>> does application of the modulo operator as in
>>
>> rand() % n
>>
>> yield, in general, a sequence of the same “quality”?
>
>I've read the other responses, and have nothing to add or take away from
>them on their own terms... but nobody seems to have mentioned
>
>int randrange(int low, int high)
>{
> return (high + 1 - low) * (rand() / (RAND_MAX + 1.0)) + low;
>}
>
>which may at least give you better quality than % n.


That doesn't help with the introduced bias. To simplify, we'll assume
low is zero (which just means we're going to generate a number
[0..high]). Then let's say RAND_MAX is 15 and high is 6. You're
still going to be (consistently) mapping each of the inputs ([0..15])
from rand() to a specific output ([0..6]). That cannot work out
evenly. For example, if rand() returns 3, you'll always get
((6+1)*(3/(15+1.0)), or 1.3145, which will then truncate to 1.
Specifically you'll get 0's and 3s three times per sixteen calls (when
rand() generates 0/1/2 and 7/8/9, respectively), while 1/2/4/5/6 will
each get two instances per-sixteen.

You get a slightly different bias (extra 0s and 3s, instead of 0s and
1s for the basic modulus approach), but the statistic are the same -
two of the outputs will be 50% more common than the other five.

jadi...@gmail.com

unread,
Sep 9, 2014, 10:56:47 AM9/9/14
to
On Tuesday, September 9, 2014 2:53:43 AM UTC-4, Morris Dovey wrote:
> Back in "Re: OT: printf 64-bit and gcc" Tim Rentsch showed an example
> PRNG algorithm that sent me off to take a closer look at the rand()
> function and some of the algorithms that might be used to implement a
> rand64() function.
>
> My use of rand() has nearly always involved scaling the generated value
> to fit in a smaller range than RAND_MAX, and my investigation led me to
> this question for which I couldn't find an answer:
>
> If a PRNG algorithm used for rand() produces a sequence of "quality" Q,
> does application of the modulo operator as in
>
> rand() % n
>
> yield, in general, a sequence of the same "quality"?

Bias is injected into the distribution anytime the pseudo-random value
is greater than or equal to the largest evenly-divisible value of 'n'.
For example, if RAND_MAX is 32767, and 'n' is 2, there are 16383 values
of '0' and 16384 values of '1'. If 'n' is 10, then any value above
32759 introduces bias with the modulus because the values 0 thru 7 occur
1 more time than 8 and 9 (there is no 32768 and 32769 to make the
distribution even).

\code
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define SAMPLES 1000000

int main( void )
{
int count[3] = { 0 };
int r;
double theoretical_probability;
double sample_frequency;
double diff;
size_t itr;
size_t count_idx;

srand( (unsigned int)time( NULL ) );

for ( itr = 0; itr < SAMPLES; ++itr )
{
/* Uses the bitwise 'and' operator to limit RAND_MAX to 3 bits,
which reduces its maximum value to 7. */
r = ( rand() & 0x7 ) % 3;
++count[r];
}

printf( " Bias Example\n" );
printf( " ------------------------------------------\n" );
printf( "| Value | Count | Frequency | Difference | (Samples %lu)\n",
SAMPLES );
printf( " ------------------------------------------\n" );

theoretical_probability = 1. / 3.;
for ( count_idx = 0;
count_idx < sizeof (count) / sizeof (*count); ++count_idx )
{
sample_frequency = (double)count[count_idx] / SAMPLES;
diff = sample_frequency - theoretical_probability;
printf( " %-5d %-7d %-9f % f\n",
count_idx, count[count_idx], sample_frequency, diff );
}
printf( "\n" );

printf( "RAND_MAX = %d\n", RAND_MAX & 0x7 );

return 0;
}
\endcode

I wrote this sample program a while ago to show the impact of bias on a
smaller scale to make it easier to visualize to those new to the concept.

The simplest method to eliminate bias is to simply "re-roll the dice".
In the example where RAND_MAX is 32767 and 'n' is 2, you would re-roll
if the output value is 32767. If 'n' is 10, you would re-roll for the
values 32760-32767.

Of course, with any random variable, it is possible but increasingly
unlikely that sequences of observations would continue to appear in the
bias or "re-roll" zone. Here is a snippet of what I use.

\code snippet
int c_random( int n )
{
int limit;
int r;
int bias_retry_count;

c_return_value_if_fail( 0 <= n && n <= RAND_MAX, rand() );

limit = RAND_MAX - RAND_MAX % n;

bias_retry_count = GC_MAXIMUM_RAND_SAMPLES;
do {
r = rand();
} while ( bias_retry_count-- > 0 && r >= limit );

c_unexpected( bias_retry_count < 0 );

return r % n;
}
\endcode

The c_unexpected and c_return_value_if_fail are macros I use that output
diagnostics if those conditions trigger. For example, if you #define
'GC_MAXIMUM_RAND_SAMPLES' to 1, you get one shot at rolling an unbiased
value after modulus without a warning, but as you increase it, the
probability of finding a value in the unbiased range gets smaller and
smaller.

Best regards,
John D.

jadi...@gmail.com

unread,
Sep 9, 2014, 12:00:16 PM9/9/14
to
On Tuesday, September 9, 2014 10:56:47 AM UTC-4, jadi...@gmail.com wrote:
>
> Bias is injected into the distribution anytime the pseudo-random value
> is greater than or equal to the largest evenly-divisible value of 'n'.
> For example, if RAND_MAX is 32767, and 'n' is 2, there are 16383 values
> of '0' and 16384 values of '1'. If 'n' is 10, then any value above
> 32759 introduces bias with the modulus because the values 0 thru 7 occur
> 1 more time than 8 and 9 (there is no 32768 and 32769 to make the
> distribution even).

Upon re-reading what I posted, I realized that '2' is a poor example
since there are actually even values of '0' and '1'.

0 % 2 = 0
1 % 2 = 1
2 % 2 = 0
3 % 2 = 1
...
32766 % 2 = 0
32767 % 2 = 1

16384 0's and 1's since the range is inclusive of 0 and 32767, meaning a
total domain of 32768 possible values.

Sorry for any confusion. I should have used the value '3' instead. I
suppose I need to re-read at least 2 times before posting.

Best regards,
John D.

Richard Heathfield

unread,
Sep 9, 2014, 12:53:30 PM9/9/14
to
Ben Bacarisse wrote:

> Richard Heathfield <inv...@see.sig.invalid> writes:
>
>> Morris Dovey wrote:
> <snip>
>>> If a PRNG algorithm used for rand() produces a sequence of “quality” Q,
>>> does application of the modulo operator as in
>>>
>>> rand() % n
>>>
>>> yield, in general, a sequence of the same “quality”?
>>
>> I've read the other responses, and have nothing to add or take away from
>> them on their own terms... but nobody seems to have mentioned
>>
>> int randrange(int low, int high)
>> {
>> return (high + 1 - low) * (rand() / (RAND_MAX + 1.0)) + low;
>> }
>>
>> which may at least give you better quality than % n.
>
> How? It seems to me to be in many ways worse.

In normal usage it's actually better.

> It doesn't solve the bias problem;

Let's leave that aside for now.

> high + 1 might overflow;

I would argue that this is an extreme condition.

> the "+ 1.0" could be lost, leading to an out-of-range result;

I don't see how you can lose the +1.0. Surely this could only happen if
RAND_MAX's digit count approached or exceeded double's precision, which
again sounds rather extreme.

> and it seems to offer more than it
> can deliver when RAND_MAX is lower than INT_MAX.

So does rand(), which (you will recall) returns an int.

Okay, the bias problem. Let's say you have a bias towards an odd-even
pattern, such as you might get from this PRNG:

r = (r * 3) + 3;

(I'm restricting myself to a four-bit word and no overflow considerations
for reasons of brevity and a simple argument.)

Assuming RAND_MAX is 15, this gives us a pattern of:

0 -> 3 -> 12 -> 7 -> 8 -> 11 -> 4 -> 15 -> 0 -> 3 (repeats)

Note some biases: alternation of 0 and 1 in the lowest bit, only half the
values used.

Let's toss a coin using the %n method:

result = rand() % 2; will give us T H T H T H T H

result = 2 * rand() / (RAND_MAX + 1.0); T T H T H H T H

which has at least got rid of the simple alternation.

This example is far from conclusive, obviously, but it does suggest that the
division technique *can* be better than the mod n technique. Everything you
say about ranges is true, but I think if we're getting towards those kinds
of numbers we would in any case be heading towards MT. For normal die-
rolling, coin-tossing, roulette-wheel-spinning, door-choosing usage, rand()
/ (RAND_MAX + 1.0) is fine.

See also Q13.16 of the FAQ.

Ben Bacarisse

unread,
Sep 9, 2014, 1:02:29 PM9/9/14
to
The most dramatic cases are when the requested range is large. Try, for
example, requesting numbers in the range 0 to 2*RAND_MAX/3.

--
Ben.

Nobody

unread,
Sep 9, 2014, 1:48:03 PM9/9/14
to
On Tue, 09 Sep 2014 02:29:49 -0500, Robert Wessel wrote:

> For example, some generators
> (fortunately not often used) alternate between even and odd numbers. So
> while the original (not great) sequence might be fairly random, except for
> the last bit, rand()%2 would result in the horrible: 0,1,0,1,0,1,0,1...

It's not just the last bit.

For a "raw" linear congruential generator (where all bits of the state
are returned), the last bit will have a cycle with period at most 2, the
next-to-last bit will have a cycle with period at most 4, and so on. I.e.
bit N (where the last bit is N=0) will have a cycle with period at most
2^(N+1).

With most modern C implementations, if rand() uses a LCG, only the
top-most bits will be returned. The C99 standard gives the following as an
example:

static unsigned long int next = 1;

int rand(void) // RAND_MAX assumed to be 32767
{
next = next * 1103515245 + 12345;
return (unsigned int)(next/65536) % 32768;
}

Examining the entire state ("next") of the above LCG, we can see that the
last bit cycles 1,0,1,0,1,..., the last two bits cycle 01,10,11,00,01,...,
the last 3 bits cycle 001,110,111,100,101,010,011,000,001,..., and so on.

By omitting the last 16 bits from the returned value, the last bit of the
returned value will have a cycle with period 131072.

Some older implementations returned the entire state. This gave the user
the option of using as many bits as were necessary for their purpose.
The risk was that users unfamiliar with the properties of LCGs would use
the lowest bits rather than the highest.

Tim Rentsch

unread,
Sep 9, 2014, 2:18:03 PM9/9/14
to
> Yes - I can see that. I've been tinkering with the xorshift64*
> algorithm shown at https://en.wikipedia.org/wiki/Xorshift and may
> need to re-seed at a frequency determined by n.

Some practical tips for 64-bit PRNGs -

1. xorshift64 is pretty good but not good enough by itself.
(I have not looked at the modified/extended versions on
the wikipedia page.)

2. The amount of saved state doesn't need to be huge but
it should be bigger than 64 bits - at least 96, 128 is
better, and anything over 256 is probably wasted.

3. I recommend combining two or three methods whose periods
are relatively prime; see next and next next. (Note
that the period of xorshift64 is 2**64 - 1.)

4. One of those should be one whose period matches the size
of random numbers generated, eg, 2**64 for a 64-bit PRNG.
Even just incrementing a 64-bit counter and adding it to
xorshift64 produces a better source of PRN's than just
xorshift64 alone.

5. A good third method is a Park-Miller style generator
with modulus slightly less than 2**64. (If you would
like details feel free to send me an email.)

6. To produce a uniform distribution in a limited range,
it's okay to use mod (ie, and not worry about problems
in the low order bits) with this scheme, especially if
suggestion (5) is included, but the result does need to
be screened to avoid unequal distribution of values.
Here is code illustrating that (disclaimer: typed in,
not compiled):

unsigned long long
uniform64_under( unsigned long long k ){
unsigned long long n, z = 0xFFFFFFFFFFFFFFFFuLL + 1 - k % k;
do n = uniform64(); while( n < z );
return (n-z) % k;
}

Morris Dovey

unread,
Sep 9, 2014, 2:22:00 PM9/9/14
to
Thanks to everyone who responded! I won’t be taking the % route, and I
have a /lot/ of food for thought. :-)

The context is encrypted communication with semi-autonomous remote
devices with “lightweight” controllers and my aim is to prevent immature
types from interfering with command/data streams and causing an
otherwise benevolent device to present a safety hazard.

I implemented the encryption first with rand(), then with random(), and
now I’m exploring other options.

Thankfully, I don’t need perfect randomness (although the first device
does count gamma rays and beta particles) and I expect to re-seed with
sufficient frequency to discourage hackers.

Your comments are much appreciated and highly valued.

Richard Heathfield

unread,
Sep 9, 2014, 3:20:22 PM9/9/14
to
Morris Dovey wrote:

> Thanks to everyone who responded! I won’t be taking the % route, and I
> have a /lot/ of food for thought. :-)
>
> The context is encrypted communication

STOP RIGHT THERE! I gotta know right now...
Before we go any further...

(With apologies to Jim Steinman)

If you intend to use this PRNG for cryptographic purposes, there's a whole
world of pain^bcwresearch awaiting you. If you thought some of the responses
here were bordering on the pickily technical, it gets a lot lot worse if you
have to resist attacks on the sequence.

<snip>

> I implemented the encryption first with rand(), then with random(), and
> now I’m exploring other options.

If you are encrypting, it is highly likely that you will need to use a
CSPRNG.

> Thankfully, I don’t need perfect randomness (although the first device
> does count gamma rays and beta particles) and I expect to re-seed with
> sufficient frequency to discourage hackers.

If you're actually using gamma rays and beta particles, how does Bob (the
legitimate recipient of the information) duplicate your number stream for
decryption? Quantum entanglement? :-)

I don't get how that can work. That may just be because I'm slow today, but
I really don't see it.

Note, also, that frequent re-seeding can have the opposite effect to the one
you intend. If the re-seeding forms a pattern, it can be attacked.

I really, really, really think you should take this to sci.crypt, especially
if your threat model is non-trivial (eg if the code is for a financial
institution or application). That group has suffered a huge amount of damage
in recent years, but there are still some experts kicking around. Talk to
them.

Ben Bacarisse

unread,
Sep 9, 2014, 3:54:58 PM9/9/14
to
Richard Heathfield <inv...@see.sig.invalid> writes:

> Ben Bacarisse wrote:
>
>> Richard Heathfield <inv...@see.sig.invalid> writes:
>>
>>> Morris Dovey wrote:
>> <snip>
>>>> If a PRNG algorithm used for rand() produces a sequence of “quality” Q,
>>>> does application of the modulo operator as in
>>>>
>>>> rand() % n
>>>>
>>>> yield, in general, a sequence of the same “quality”?
>>>
>>> I've read the other responses, and have nothing to add or take away from
>>> them on their own terms... but nobody seems to have mentioned
>>>
>>> int randrange(int low, int high)
>>> {
>>> return (high + 1 - low) * (rand() / (RAND_MAX + 1.0)) + low;
>>> }
>>>
>>> which may at least give you better quality than % n.
>>
>> How? It seems to me to be in many ways worse.
>
> In normal usage it's actually better.

I don't think you've offered much in the way of benefit.

<snip>
>> high + 1 might overflow;
>
> I would argue that this is an extreme condition.

In the literal sense, sure, but there are two sorts of extreme
condition. When counting, the possibility that x++ might overflow could
simply be ignored as an unlikely extreme case. But when you write a
function parameterised with an int range, you can't really assume that 0
to INT_MAX, or INT_MIN to INT_MAX are that sort of extreme condition.
Both seem to me quite reasonable calls. It's more like a corner case
than an extreme condition.

(It's actually worse than I thought because you do the whole range
calculation in integer arithmetic. high + 1 - low can overflow for a
wider range of, er, ranges than I spotted first time round.)

Anyway, you are doing floating point arithmetic, so there's no need to
take the risk.

>> the "+ 1.0" could be lost, leading to an out-of-range result;
>
> I don't see how you can lose the +1.0. Surely this could only happen if
> RAND_MAX's digit count approached or exceeded double's precision, which
> again sounds rather extreme.

Again, literally true, but a system with RAND_MAX being 2**63-1 and IEEE
double is hardly extreme, and in those cases RAND_MAX + 1.0 == RAND_MAX.
The returns from rand() that actually cause a problem are indeed very
rare, but I don't like programming with my fingers crossed.

At the very least, the reader has to reassure themselves that they are
willing to take the chance, and if this is just a published API, they
don't even get that opportunity.

>> and it seems to offer more than it
>> can deliver when RAND_MAX is lower than INT_MAX.
>
> So does rand(), which (you will recall) returns an int.

I thought you were comparing it with rand() % n which seems to me to be
more up-front about the limitations. And I missed something else the
first time round: advertising the full range of int goes wrong even when
RAND_MAX is *equal* to INT_MAX (quite a common case, I imagine).

> Okay, the bias problem. Let's say you have a bias towards an odd-even
> pattern, such as you might get from this PRNG:

That's not the bias problem I was talking about. rand() % n does not
always produce uniformly distributed results and there's a common
misconception that all that arithmetic somehow fixes that bias.

> r = (r * 3) + 3;
>
> (I'm restricting myself to a four-bit word and no overflow considerations
> for reasons of brevity and a simple argument.)
>
> Assuming RAND_MAX is 15, this gives us a pattern of:
>
> 0 -> 3 -> 12 -> 7 -> 8 -> 11 -> 4 -> 15 -> 0 -> 3 (repeats)
>
> Note some biases: alternation of 0 and 1 in the lowest bit, only half the
> values used.
>
> Let's toss a coin using the %n method:
>
> result = rand() % 2; will give us T H T H T H T H
>
> result = 2 * rand() / (RAND_MAX + 1.0); T T H T H H T H

(aside: that's not equivalent to what you wrote because 2 * rand() can
overflow in perfectly ordinary situations -- just a "beware" to anyone
who might copy that line.)

> which has at least got rid of the simple alternation.
>
> This example is far from conclusive, obviously, but it does suggest that the
> division technique *can* be better than the mod n technique.

It's not obvious (at least to me) that hiding a problem with rand()
makes the code better. Would you not rather have the obvious,
repeatable, behaviour? (Didn't you ask for that very thing only a post
or two ago?)

<snip>
> See also Q13.16 of the FAQ.

Which bit are you directing my attention to?

--
Ben.

Ben Bacarisse

unread,
Sep 9, 2014, 4:37:05 PM9/9/14
to
Richard Heathfield <inv...@see.sig.invalid> writes:

> Morris Dovey wrote:
>
>> Thanks to everyone who responded! I won’t be taking the % route, and I
>> have a /lot/ of food for thought. :-)
>>
>> The context is encrypted communication
>
> STOP RIGHT THERE! I gotta know right now...
> Before we go any further...

See below (but if I'm wrong below, I agree wholeheartedly).

<snip>
>> Thankfully, I don’t need perfect randomness (although the first device
>> does count gamma rays and beta particles) and I expect to re-seed with
>> sufficient frequency to discourage hackers.
>
> If you're actually using gamma rays and beta particles, how does Bob (the
> legitimate recipient of the information) duplicate your number stream for
> decryption? Quantum entanglement? :-)
>
> I don't get how that can work. That may just be because I'm slow today, but
> I really don't see it.

Very likely the PRNG is being used to generate a "nonce" or some such
token. I doubt he's planning to the PRNG as a cipher.

<snip>
--
Ben.

Richard Heathfield

unread,
Sep 9, 2014, 4:40:11 PM9/9/14
to
Ben Bacarisse wrote:

> Richard Heathfield <inv...@see.sig.invalid> writes:
>
>> Ben Bacarisse wrote:
>>
>>> Richard Heathfield <inv...@see.sig.invalid> writes:
>>>
>>>> Morris Dovey wrote:
>>> <snip>
>>>>> If a PRNG algorithm used for rand() produces a sequence of “quality”
>>>>> Q, does application of the modulo operator as in
>>>>>
>>>>> rand() % n
>>>>>
>>>>> yield, in general, a sequence of the same “quality”?
>>>>
>>>> I've read the other responses, and have nothing to add or take away
>>>> from them on their own terms... but nobody seems to have mentioned
>>>>
>>>> int randrange(int low, int high)
>>>> {
>>>> return (high + 1 - low) * (rand() / (RAND_MAX + 1.0)) + low;
>>>> }
>>>>
>>>> which may at least give you better quality than % n.
>>>
>>> How? It seems to me to be in many ways worse.
>>
>> In normal usage it's actually better.
>
> I don't think you've offered much in the way of benefit.

The amount of benefit will depend on the quality of the PRNG, I think.
Dividing by (RAND_MAX + 1.0) to give a number in the range [0-1) places more
emphasis on the higher bits. For PRNGs that have poor randomness in the low
bits, this is useful. For better PRNGs, less so.

<snip - sorry, short on time - consider your points well-made and well-
taken>

>> See also Q13.16 of the FAQ.
>
> Which bit are you directing my attention to?

The bit from which I stole the RAND_MAX + 1.0 thing, I suppose. If I'm wrong
about that, I am at least in good company. :-)

Morris Dovey

unread,
Sep 9, 2014, 5:02:12 PM9/9/14
to
On 9/9/14 2:20 PM, Richard Heathfield wrote:
> Morris Dovey wrote:
>
>> Thanks to everyone who responded! I won’t be taking the % route, and I
>> have a /lot/ of food for thought. :-)
>>
>> The context is encrypted communication
>
> STOP RIGHT THERE! I gotta know right now...
> Before we go any further...
>
> (With apologies to Jim Steinman)
>
> If you intend to use this PRNG for cryptographic purposes, there's a whole
> world of pain^bcwresearch awaiting you. If you thought some of the responses
> here were bordering on the pickily technical, it gets a lot lot worse if you
> have to resist attacks on the sequence.

Fortunately, it’s unlikely that anyone could find financial advantage in
a successful attack - but they could screw up some good research. I’m a
lot less worried about bad actors than with interference from high
school wannabe hackers out for kicks.

> If you're actually using gamma rays and beta particles, how does Bob (the
> legitimate recipient of the information) duplicate your number stream for
> decryption? Quantum entanglement? :-)

QE would be a great solution, and could obviate the need for encryption
altogether. My thought is to derive a shared next seed /hint/ from the
(unencoded) data stream. NSA/GCHQ probably wouldn’t approve, but I’m not
really interested in playing the game at that level again.

> Note, also, that frequent re-seeding can have the opposite effect to the one
> you intend. If the re-seeding forms a pattern, it can be attacked.

Well, yes - pretty much everything forms patterns, and there are folks
who’ll attack whatever’s in reach, just because...

> I really, really, really think you should take this to sci.crypt, especially
> if your threat model is non-trivial (eg if the code is for a financial
> institution or application). That group has suffered a huge amount of damage
> in recent years, but there are still some experts kicking around. Talk to
> them.

Actually, I’ve had some good expert input already - and I’m comfortable
with the approach chosen. Let’s not tempt ourselves into over-thinking
the problem. :-)

Richard Heathfield

unread,
Sep 10, 2014, 3:25:11 AM9/10/14
to
Ben Bacarisse wrote:

<snip>

> Very likely the PRNG is being used to generate a "nonce" or some such
> token. I doubt he's planning to the PRNG as a cipher.

Or salt, yes. That would certainly be much less problematic.

Tim Rentsch

unread,
Sep 17, 2014, 9:18:09 AM9/17/14
to
jadi...@gmail.com writes:

> [Using rand() to generate a number between 0 and n-1]
>
> The simplest method to eliminate bias is to simply "re-roll the
> dice". In the example where RAND_MAX is 32767 and 'n' is 2, you
> would re-roll if the output value is 32767 0subseuqently
> corrected]. If 'n' is 10, you would re-roll for the values
(I expect you meant "biased range" at the end of the last
sentence.)

The code for c_random() looks a bit off. Not wrong for the most
part, but not quite right: initial range test, setting/testing
of limit, and logic of retry counter.

The range tested in c_return_value_if_fail() goes too far at
the low end and not as far as it could at the high end. The
largest range that can work is 1 <= n && n-1 <= RAND_MAX.

How limit is set/tested rules out more values than it has to for
some values of n, eg, if n is 16384 and RAND_MAX is 32767. To
fix this both the value and the while() test need changing:

largest = RAND_MAX - (RAND_MAX - (n-1)) % n;
...
} while ( ... r > largest ... );

This way of setting/testing allows using all values up to
RAND_MAX for those values of n that are compatible with that.

The retry counter logic isn't wrong necessarily but it is at
least somewhat misleading. If GC_MAXIMUM_RAND_SAMPLES is 1,
rand() may be called twice. Also a warning can be triggered
even when the value being used is in the good range. Better
would be, eg,

retries = GC_MAXIMUM_RAND_RETRIES;
do {
...
} while ( r > largest && retries-- > 0 );
c_unexpected( retries < 0 );

This way the macro name better reflects how its value is used,
and c_unexpected() is triggered only when the value of r
actually used is in the biased range rather than the unbiased
range.

jadi...@gmail.com

unread,
Sep 18, 2014, 10:55:38 AM9/18/14
to
On Wednesday, September 17, 2014 9:18:09 AM UTC-4, Tim Rentsch wrote:
Yes.

> The code for c_random() looks a bit off. Not wrong for the most
> part, but not quite right: initial range test, setting/testing
> of limit, and logic of retry counter.
>
> The range tested in c_return_value_if_fail() goes too far at
> the low end and not as far as it could at the high end. The
> largest range that can work is 1 <= n && n-1 <= RAND_MAX.
>
> How limit is set/tested rules out more values than it has to for
> some values of n, eg, if n is 16384 and RAND_MAX is 32767. To
> fix this both the value and the while() test need changing:

I knew this had its issues for modulus values that were evenly
divisible into 32768 and you highlighted the worst one :-)

> largest = RAND_MAX - (RAND_MAX - (n-1)) % n;
> ...
> } while ( ... r > largest ... );
>
> This way of setting/testing allows using all values up to
> RAND_MAX for those values of n that are compatible with that.

Very nice! Works great for all the powers of two I tried.

> The retry counter logic isn't wrong necessarily but it is at
> least somewhat misleading. If GC_MAXIMUM_RAND_SAMPLES is 1,
> rand() may be called twice. Also a warning can be triggered
> even when the value being used is in the good range. Better
> would be, eg,
>
> retries = GC_MAXIMUM_RAND_RETRIES;
> do {
> ...
> } while ( r > largest && retries-- > 0 );
>
> c_unexpected( retries < 0 );
>
> This way the macro name better reflects how its value is used,
> and c_unexpected() is triggered only when the value of r
> actually used is in the biased range rather than the unbiased
> range.

I agree that it's a definite improvement. I'd just like say thanks
for spending your free time for a well-thought code review.

Best regards,
John D.

James Dow Allen

unread,
Sep 19, 2014, 3:53:35 AM9/19/14
to
I've placed some code online
http://fabpedigree.com/james/j_random.c
which addresses a variety of issues related to PNRG's.
(You may also want j_random.h and j_rdemo.c from
the same directory.) Comments welcome.

The function rand_index() may be of interest. It
returns random integers in a specified range
WHILE PRESERVING left-over "entropy." Fo example,
if your PRNG returns 32-bit numbers you should be
able to get eight variates from {0,1,2,3,...,14}
from one PRNG call, and that function achieves
that.

James Dow Allen

Morris Dovey

unread,
Sep 19, 2014, 5:57:57 AM9/19/14
to
Thank you.

All this help is going to keep me occupied for a while.) :-)

Ben Bacarisse

unread,
Sep 19, 2014, 7:27:12 AM9/19/14
to
James Dow Allen <jdall...@yahoo.com> writes:

> I've placed some code online
> http://fabpedigree.com/james/j_random.c

In rand_combo you have some potentially undefined behaviour:

result += k - 1;
while (k) {
if (m == k || rand_decide(k / (double)m)) {
*result-- = m - 1;
k--;
}
m--;
}

The 'result' pointer will end up being set to point one element before
the pointer that was originally passed (so it's UB in the most likely
calling pattern). It goes away if you decrement first:

result += k;
while (k) {
if (m == k || rand_decide(k / (double)m)) {
*--result = m - 1;
k--;
}
m--;
}

<snip>
--
Ben.

James Dow Allen

unread,
Sep 19, 2014, 8:39:23 AM9/19/14
to
On Friday, September 19, 2014 6:27:12 PM UTC+7, Ben Bacarisse wrote:
> In rand_combo you have some potentially undefined behaviour:
>
> The 'result' pointer will end up being set to point one element before
> the pointer that was originally passed (so it's UB in the most likely
> calling pattern). It goes away if you decrement first:

Thanks Ben. Elegant that the fixed code also reduces
the keystroke count!

I feel complimented that you scrutinised the code in this much
detail. Or do you find such things easily?

James

Ben Bacarisse

unread,
Sep 19, 2014, 10:35:26 AM9/19/14
to
James Dow Allen <jdall...@yahoo.com> writes:
<snip>
> I feel complimented that you scrutinised the code in this much
> detail. Or do you find such things easily?

I think, maybe, I do. People have very different perceptual skills. My
mother, for example, could scan a double-page spread from a broadsheet
newspaper for a few seconds and pick out spelling mistakes. As my posts
show, I can't see one in a single sentence, even if I stare at it for
hours. To make up for that, I do seem to be good at bug spotting.

But years of marking programming coursework might have played a part
too: *++p and *p-- are red flags that always need another look (though
they are not always wrong, of course).

--
Ben.

Tim Rentsch

unread,
Sep 23, 2014, 1:20:50 AM9/23/14
to
James Dow Allen <jdall...@yahoo.com> writes:

> I've placed some code online
> http://fabpedigree.com/james/j_random.c
> which addresses a variety of issues related to PNRG's.
> (You may also want j_random.h and j_rdemo.c from
> the same directory.) Comments welcome.

sigh...

It's a good illustration of Hoare's aphorism.

James Dow Allen

unread,
Sep 23, 2014, 5:20:27 AM9/23/14
to
Tim Rentsch <t...@alumni.caltech.edu> might have writ, in
news:kfn38bi...@x-alumni2.alumni.caltech.edu:
I wonder what you think you mean. The PNRG's in that file, which do
appear very complicated, are taken directly from some of the best
available.

If you're referring to rand_index() with its complicated-looking
ir_range/ir_vari, then it's obvious you fail to grasp what problem
the mechanism solves -- despite that I mentioned it upthread. The
special handling is useful only when one is using a very expensive RNG,
but it's a FUN problem and, of course its complexity (assuming that's
what your "sigh" was about) is irrelevant, since the code is done.

So:
(1)What does your nasty little sigh refer to?
(2) Assuming it's rand_index(), try to grasp the purpose for the
complexity; then present your own improved solution.
(3) Feel free to criticise the rest of that code in that module. For
example, present your improved code for the setup for Walker's
algorithm.

Thak you.
James

Malcolm McLean

unread,
Sep 23, 2014, 6:02:48 AM9/23/14
to
On Tuesday, September 23, 2014 10:20:27 AM UTC+1, James Dow Allen wrote:
> Tim Rentsch <t...@alumni.caltech.edu> might have writ, in
>
> > It's a good illustration of Hoare's aphorism.
>
> I wonder what you think you mean. The PNRG's in that file, which do
> appear very complicated, are taken directly from some of the best
> available.
>
PRNGs are a bit like cryptography. If you just need to hide some data, a simple XOR mask
will do fine. If your activities have attracted the interest of government intelligence
agencies, you need a very complicated method.

Osmium

unread,
Sep 23, 2014, 8:00:30 AM9/23/14
to
"Malcolm McLean" wrote:

> PRNGs are a bit like cryptography. If you just need to hide some data, a
> simple XOR mask
> will do fine. If your activities have attracted the interest of government
> intelligence
> agencies, you need a very complicated method.

In a sensible world Radio Shack would sell a white noise based RNG that
connects to the USB port for about $10. No P needed. A simple resistor
can provide all the 1's and 0's you want for the foreseeable future.

I wrote the paragraph above and then looked at google and not many hits. I
wonder if some US government action, or concern about possible of action,
has squelched such a thing?


Richard Heathfield

unread,
Sep 23, 2014, 8:06:19 AM9/23/14
to
If you need to hide stuff from governments, write it in big letters on your
tax return. They will never, ever read it. Guaranteed.

But if you need to hide stuff from their intelligence agencies, a simple XOR
mask will *still* do fine, as long as it's as long as the plaintext,
randomly chosen, kept a secret, and never re-used.

If you don't like the restrictions, you can just use something like AES,
which isn't all that complicated really.

Robert Wessel

unread,
Sep 23, 2014, 8:14:33 PM9/23/14
to
On Tue, 23 Sep 2014 07:00:16 -0500, "Osmium" <r124c...@comcast.net>
wrote:
There are a number of cryptographic accelerators that include hardware
RNGs. People do keep popping up to sell (relatively) low cost
single-purpose TRNGs, but they don't seem to last very long. I
suspect that few people actually think they need enough really random
bits who aren't likely to want cryptographic acceleration as well. In
any event, hardware RNGs always require considerable (and non-trivial)
conditioning before they're usable (and this sort of thing is pretty
hard to test - really bad output can be indistinguishable from high
quality stuff).

FWIW, Intel includes a hardware random number generator on a number of
recent x86 CPUs, accessible via the RDRAND facility. Get the right
CPU, and the facility is "free". So there's not that much of market
between the "free" built in stuff (which is incidentally, quite high
performance) and high end devices, and that middle range is going to
have trust issues.

Keith Thompson

unread,
Sep 23, 2014, 9:34:50 PM9/23/14
to
Robert Wessel <robert...@yahoo.com> writes:
[...]
> FWIW, Intel includes a hardware random number generator on a number of
> recent x86 CPUs, accessible via the RDRAND facility. Get the right
> CPU, and the facility is "free". So there's not that much of market
> between the "free" built in stuff (which is incidentally, quite high
> performance) and high end devices, and that middle range is going to
> have trust issues.

See http://en.wikipedia.org/wiki/RdRand, particularly the "Reception"
section.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Osmium

unread,
Sep 24, 2014, 9:47:57 AM9/24/14
to
"Keith Thompson" wrote:

> Robert Wessel <robert...@yahoo.com> writes:
> [...]
>> FWIW, Intel includes a hardware random number generator on a number of
>> recent x86 CPUs, accessible via the RDRAND facility. Get the right
>> CPU, and the facility is "free". So there's not that much of market
>> between the "free" built in stuff (which is incidentally, quite high
>> performance) and high end devices, and that middle range is going to
>> have trust issues.
>
> See http://en.wikipedia.org/wiki/RdRand, particularly the "Reception"
> section.

That article simply confirms my long held belief that certain fields are
dominated by kooks. PRNG, RNG ( now with the god-awful modifier known as
TRNG) and encryption are near the top of the list.

I had in mind a resistor, an amplifier, a zero cross detector, followed by
whitening. I can imagine a Collection of Kooks hovering over it, and
debating "Is that resistor looking thing REALLY a resistor?"

I don't know if electronic hobbyist magazines even exist anymore. If they
do, the back issues certainly must contain something pretty close to what I
had in mind.


Udyant Wig

unread,
Sep 24, 2014, 10:40:49 AM9/24/14
to
"Osmium" <r124c...@comcast.net> writes:

| That article simply confirms my long held belief that certain fields
| are dominated by kooks. PRNG, RNG ( now with the god-awful modifier
| known as TRNG) and encryption are near the top of the list.

How might one determine that a given field is dominated by kooks?

--
Udyant Wig
GitHub: https://github.com/udyant
Poetry: http://www.writing.com/main/profile/biography/frosthrone

Robert Wessel

unread,
Sep 24, 2014, 10:57:54 AM9/24/14
to
On Tue, 23 Sep 2014 18:34:31 -0700, Keith Thompson <ks...@mib.org>
wrote:

>Robert Wessel <robert...@yahoo.com> writes:
>[...]
>> FWIW, Intel includes a hardware random number generator on a number of
>> recent x86 CPUs, accessible via the RDRAND facility. Get the right
>> CPU, and the facility is "free". So there's not that much of market
>> between the "free" built in stuff (which is incidentally, quite high
>> performance) and high end devices, and that middle range is going to
>> have trust issues.
>
>See http://en.wikipedia.org/wiki/RdRand, particularly the "Reception"
>section.


While I'm not willing to fully trust Intel on this, it's still going
to be a useful source of entropy in almost all cases, even if it does
require some additional cooking. And in most cases, it's probably
good enough as is, even if the NSA has back-doored it.

Robert Wessel

unread,
Sep 24, 2014, 11:09:02 AM9/24/14
to
On Wed, 24 Sep 2014 08:47:45 -0500, "Osmium" <r124c...@comcast.net>
wrote:

>"Keith Thompson" wrote:
>
>> Robert Wessel <robert...@yahoo.com> writes:
>> [...]
>>> FWIW, Intel includes a hardware random number generator on a number of
>>> recent x86 CPUs, accessible via the RDRAND facility. Get the right
>>> CPU, and the facility is "free". So there's not that much of market
>>> between the "free" built in stuff (which is incidentally, quite high
>>> performance) and high end devices, and that middle range is going to
>>> have trust issues.
>>
>> See http://en.wikipedia.org/wiki/RdRand, particularly the "Reception"
>> section.
>
>That article simply confirms my long held belief that certain fields are
>dominated by kooks. PRNG, RNG ( now with the god-awful modifier known as
>TRNG) and encryption are near the top of the list.


They sure are.


>I had in mind a resistor, an amplifier, a zero cross detector, followed by
>whitening. I can imagine a Collection of Kooks hovering over it, and
>debating "Is that resistor looking thing REALLY a resistor?"


Two of my personal favorites are Lavarand (sadly now gone, but they
pointed a camera at a lavalamp and watched the blobs bubbling), and
the Dice-O-Matic mark II:

http://gamesbyemail.com/news/diceomatic

A *wonderful* exercise in over-engineering. You must watch the video.

Still for a limited number of random bits, why is noise from a
resistor any better than timing jitter from network packets, hard disk
I/Os or keystrokes? And for a lot of bits, you'll need considerable
hardware to process that before feeding it into your computer, and
then you're back to trusting a complex design by someone else.


>I don't know if electronic hobbyist magazines even exist anymore. If they
>do, the back issues certainly must contain something pretty close to what I
>had in mind.


I'm sure I've seen that sort of thing as part of a home-built white
noise generator, or something like that. Decades ago, of course.

Robert Wessel

unread,
Sep 24, 2014, 11:09:42 AM9/24/14
to
On Wed, 24 Sep 2014 20:10:33 +0530, Udyant Wig <udy...@gmail.com>
wrote:

>"Osmium" <r124c...@comcast.net> writes:
>
>| That article simply confirms my long held belief that certain fields
>| are dominated by kooks. PRNG, RNG ( now with the god-awful modifier
>| known as TRNG) and encryption are near the top of the list.
>
> How might one determine that a given field is dominated by kooks?


In this case, just read sci.crypt for a while.

Osmium

unread,
Sep 24, 2014, 11:16:08 AM9/24/14
to
"Udyant Wig" wrote:

> "Osmium" <r124c...@comcast.net> writes:
>
> | That article simply confirms my long held belief that certain fields
> | are dominated by kooks. PRNG, RNG ( now with the god-awful modifier
> | known as TRNG) and encryption are near the top of the list.
>
> How might one determine that a given field is dominated by kooks?

By applying common sense standards to what you read. For example, did you
actually read the link that is topical here?

Look at this:

"Relying solely on the hardware random number generator which is using an
implementation sealed inside a chip which is impossible to audit is a BAD
idea."

This suggests that someone at Intel could (or did) conspire to produce an
RNG that had some kind of magical back door. Can't you see the need for
magic? The correlation problem? The encrypted data, unknown to the Intel
mole, is unknown. Yet a receiver of the encrypted message recognizes the
Intel bit stream and synchronizes with it. I didn't capitalize "bad", they
did.


Osmium

unread,
Sep 24, 2014, 11:24:26 AM9/24/14
to
"Robert Wessel" wrote:


> On Wed, 24 Sep 2014 08:47:45 -0500, "Osmium" <r124c...@comcast.net>
> wrote:

>>I had in mind a resistor, an amplifier, a zero cross detector, followed by
>>whitening. I can imagine a Collection of Kooks hovering over it, and
>>debating "Is that resistor looking thing REALLY a resistor?"

> Still for a limited number of random bits, why is noise from a
> resistor any better than timing jitter from network packets, hard disk
> I/Os or keystrokes? And for a lot of bits, you'll need considerable
> hardware to process that before feeding it into your computer, and
> then you're back to trusting a complex design by someone else.

It's not necessarily any better, it just happens to be something I know how
to do and trust completely. If some higher power has been encoding the hiss
I heard on radio or the snow on a B&W TV I will admit I was wrong and
promise to commit hara-kari for the collapse of the world wide financial
system.

So there.


Richard Heathfield

unread,
Sep 24, 2014, 12:28:09 PM9/24/14
to
It should be said that, although kooks are quite prevalent on sci.crypt,
there are still enough non-kooks there to at least puncture some of the
kookery. At present, anyway.

It's actually a very good example, because cryptography is one of those
subjects that seems to attract non-professionals. Amateurs are a different
matter. An amateur cryptographer studies cryptography out of sheer love of
the subject, and is therefore likely to learn quite a lot about it. Non-
professionals, however, are in it for the money, and can't be bothered to do
all that tedious research stuff. Nope - just stir up the bits a bit, post a
ciphertext-only challenge (preferably in a very provocative way), fail to
motivate anyone to even try, and then boast that the folks on sci.crypt
can't crack the code.

Of course, the field is really dominated by NSA (Never Say Anything), GCHQ
(Generate Cryptographic High Quality), and the sigint folks in what used to
be KGB (Keep Guessing, Barack, perhaps?), but since they do indeed Never Say
Anything, the second string would be Daemen, Rijmen, Schneier, and so on.
These are most certainly not kooky people.

Alas, they don't seem to be Usenetty people either, at least not now. Maybe
in the halcyon days of the Usenet summer, when halcyons soared from group to
group, and the sun shone out of I think I've rambled on quite enough, don't
you?

Random numbers are easy, by the way. Just pick 2^n truly or nearly truly
random sources (where n depends on how paranoid you are, and I can imagine
some foil hat types taking it as high as 8), such as white noise, the
current number of processor ticks mod some bizarre prime, the Geiger counter
that practically everyone seems to have lying around, a video feed pointed
at a busy street, network latencies, mouse movements, etc. XOR them all
together and, for all practical purposes, you are done.

Osmium

unread,
Sep 24, 2014, 1:02:44 PM9/24/14
to
"Richard Heathfield" wrote:

> It's actually a very good example, because cryptography is one of those
> subjects that seems to attract non-professionals. Amateurs are a different
> matter. An amateur cryptographer studies cryptography out of sheer love of
> the subject, and is therefore likely to learn quite a lot about it. Non-
> professionals, however, are in it for the money, and can't be bothered to
> do
> all that tedious research stuff.

Is there some garble there? A distinction between a non-professional and an
amateur?

I got hooked on this stuff in the fourth grade (ten years old) when we read
_The Gold Bug_ by Poe and never looked back.

But I seem to have misplaced my Little Orphan Annie decoder badge ...




Richard Heathfield

unread,
Sep 24, 2014, 1:30:21 PM9/24/14
to
Osmium wrote:

> "Richard Heathfield" wrote:
>
>> It's actually a very good example, because cryptography is one of those
>> subjects that seems to attract non-professionals. Amateurs are a
>> different matter. An amateur cryptographer studies cryptography out of
>> sheer love of the subject, and is therefore likely to learn quite a lot
>> about it. Non- professionals, however, are in it for the money, and can't
>> be bothered to do
>> all that tedious research stuff.
>
> Is there some garble there? A distinction between a non-professional and
> an amateur?

No garble, just care over terms. An amateur is someone who pursues an
activity for the sheer joy of it - from the Latin 'amare', meaning 'to
love'. The term is often misused to mean "someone who isn't very good at
it". Patrick Moore and Will Hay were both amateur astronomers, but neither
would have been considered "not very good at it" by their professional
contemporaries. W G Grace was an amateur cricketer, but who could question
his skill at the game? James Gillogly (Erdos Number: 3) is an amateur
cryptographer, but I know from personal experience that he's capable of
astounding (to me, anyway) feats of cryptanalysis.

I am using the term "non-professional" in direct contrast to "amateur". A
non-professional *does* get paid, so he *should* be good at it... but isn't.

> I got hooked on this stuff in the fourth grade (ten years old) when we
> read _The Gold Bug_ by Poe and never looked back.

That, and "The Dancing Men".

> But I seem to have misplaced my Little Orphan Annie decoder badge ...

I doubt whether it will take you as long as ten minutes to write a C
replacement for it. :-)

James Kuyper

unread,
Sep 24, 2014, 1:50:47 PM9/24/14
to
On 09/24/2014 01:30 PM, Richard Heathfield wrote:
...
> I am using the term "non-professional" in direct contrast to "amateur". A
> non-professional *does* get paid, so he *should* be good at it... but isn't.

Your definition of a "non-professional" means that every person it
applies to also qualifies as a professional: "A person who earns his
living from a specified activity"
<http://en.wiktionary.org/wiki/professional>. They don't qualify despite
also matching your definition of "non-professional" - they qualify as
"professional" precisely because they meet your definition of a
"non-professional". That is a very confusing way for you to define the
term, and completely inconsistent with the normal usage of "non-".

Keith Thompson

unread,
Sep 24, 2014, 2:25:51 PM9/24/14
to
Like "non-directive" (N1570 6.10)?

Robert Wessel

unread,
Sep 24, 2014, 2:39:07 PM9/24/14
to
On Wed, 24 Sep 2014 17:27:49 +0100, Richard Heathfield
<inv...@see.sig.invalid> wrote:

>Random numbers are easy, by the way. Just pick 2^n truly or nearly truly
>random sources (where n depends on how paranoid you are, and I can imagine
>some foil hat types taking it as high as 8), such as white noise, the
>current number of processor ticks mod some bizarre prime, the Geiger counter
>that practically everyone seems to have lying around, a video feed pointed
>at a busy street, network latencies, mouse movements, etc. XOR them all
>together and, for all practical purposes, you are done.


I think you're underestimating just how bad many raw sources of "real"
random numbers are. Sure do a simple xor of enough of even those
sources, and you'll probably be OK (so long as there are no
unfortunate correlations), but it's going to be a pretty big number.
Much better to do a good job cooking ("whitening") those raw sources
into high quality streams, and then xor's a few of them together.

And not only do raw sources tend to have all sorts of biases and
correlations, they tend to be very fragile. For example, the resistor
noise based generator that Osmium mentioned will be rather sensitive
to temperature, unless you take considerable pains to compensate for
that. So not only do you have to process raw random streams pretty
heavily before you can use them, you also have to monitor them for
performance.

Now if you don't mind being a bit wasteful of your input entropy, it's
not that hard, at least once you understand just how much entropy is
actually being delivered per raw bit (quite often well under .5,
frequently much less than that). As a simple example, you can gather
enough raw bits for 256 bits of entropy, plus a bit of margin, then
SHA-2 those bits into 256 bits of output.

James Kuyper

unread,
Sep 24, 2014, 2:43:11 PM9/24/14
to
On 09/24/2014 02:25 PM, Keith Thompson wrote:
> James Kuyper <james...@verizon.net> writes:
>> On 09/24/2014 01:30 PM, Richard Heathfield wrote:
>> ...
>>> I am using the term "non-professional" in direct contrast to "amateur". A
>>> non-professional *does* get paid, so he *should* be good at it... but isn't.
>>
>> Your definition of a "non-professional" means that every person it
>> applies to also qualifies as a professional: "A person who earns his
>> living from a specified activity"
>> <http://en.wiktionary.org/wiki/professional>. They don't qualify despite
>> also matching your definition of "non-professional" - they qualify as
>> "professional" precisely because they meet your definition of a
>> "non-professional". That is a very confusing way for you to define the
>> term, and completely inconsistent with the normal usage of "non-".
>
> Like "non-directive" (N1570 6.10)?

Very like. :-) That does NOT count as "normal usage". The C committee
should not be used as a role model when coining new terms.

Osmium

unread,
Sep 24, 2014, 2:49:18 PM9/24/14
to
"Robert Wessel" wrote:

> And not only do raw sources tend to have all sorts of biases and
> correlations, they tend to be very fragile. For example, the resistor
> noise based generator that Osmium mentioned will be rather sensitive
> to temperature, unless you take considerable pains to compensate for
> that.

The whitening was provided to solve that problem.


Richard Heathfield

unread,
Sep 24, 2014, 6:41:36 PM9/24/14
to
James Kuyper wrote:

> On 09/24/2014 01:30 PM, Richard Heathfield wrote:
> ...
>> I am using the term "non-professional" in direct contrast to "amateur". A
>> non-professional *does* get paid, so he *should* be good at it... but
>> isn't.
>
> Your definition of a "non-professional" means that every person it
> applies to also qualifies as a professional: "A person who earns his
> living from a specified activity"
> <http://en.wiktionary.org/wiki/professional>.

I'm not convinced that Wiktionary makes for a particularly authoritative
dictionary. But let's see where we can get with it.

Wiktionary defines "unprofessional" as "unbecoming of a professional". There
are people who are paid for what they do (which fits your meaning of
"professional" if not mine), but who nevertheless behave in a way that is
not appropriate or becoming for someone who is paid for what they do (for
example, charges for expertise they do not possess). According to
Wiktionary's definitions, then, it is possible to have an unprofessional
professional. If you're okay with that, why wouldn't you be okay with a non-
professional professional?

> They don't qualify despite
> also matching your definition of "non-professional" - they qualify as
> "professional" precisely because they meet your definition of a
> "non-professional". That is a very confusing way for you to define the
> term, and completely inconsistent with the normal usage of "non-".

But bang alongside the meaning of "un-", apparently. You may not have
thought this one through. :-)

Morris Dovey

unread,
Sep 24, 2014, 8:24:13 PM9/24/14
to
On 9/9/14 1:53 AM, Morris Dovey wrote:
> Back in “Re: OT: printf 64-bit and gcc” Tim Rentsch showed an example
> PRNG algorithm that sent me off to take a closer look at the rand()
> function and some of the algorithms that might be used to implement a
> rand64() function.
>
> My use of rand() has nearly always involved scaling the generated value
> to fit in a smaller range than RAND_MAX, and my investigation led me to
> this question for which I couldn’t find an answer:
>
> If a PRNG algorithm used for rand() produces a sequence of “quality” Q,
> does application of the modulo operator as in
>
> rand() % n
>
> yield, in general, a sequence of the same “quality”?
>


--
Morris Dovey
http://www.iedu.com/Solar/
http://www.facebook.com/MorrisDovey

James Kuyper

unread,
Sep 24, 2014, 9:37:32 PM9/24/14
to
On 09/24/2014 06:41 PM, Richard Heathfield wrote:
> James Kuyper wrote:
>
>> On 09/24/2014 01:30 PM, Richard Heathfield wrote:
>> ...
>>> I am using the term "non-professional" in direct contrast to "amateur". A
>>> non-professional *does* get paid, so he *should* be good at it... but
>>> isn't.
>>
>> Your definition of a "non-professional" means that every person it
>> applies to also qualifies as a professional: "A person who earns his
>> living from a specified activity"
>> <http://en.wiktionary.org/wiki/professional>.
>
> I'm not convinced that Wiktionary makes for a particularly authoritative
> dictionary.

I don't really care - I've been reading (and, to a lesser extent,
listening to) large quantities of English for more than a half century,
and in my personal experience that definition fits a very large fraction
of all the cases where I've read/heard "professional" used as a noun.
The cases that don't match that definition do match one of the other two
definitions mentioned by wiktionary. If you show me a supposedly
more-authoritative source that provides no definition for "professional"
that is equivalent to the one above, I would consider that evidence
against that source's accuracy.
--
James Kuyper

Richard Heathfield

unread,
Sep 24, 2014, 11:24:35 PM9/24/14
to
James Kuyper wrote:

> On 09/24/2014 06:41 PM, Richard Heathfield wrote:
>> James Kuyper wrote:
>>
>>> On 09/24/2014 01:30 PM, Richard Heathfield wrote:
>>> ...
>>>> I am using the term "non-professional" in direct contrast to "amateur".
>>>> A non-professional *does* get paid, so he *should* be good at it... but
>>>> isn't.
>>>
>>> Your definition of a "non-professional" means that every person it
>>> applies to also qualifies as a professional: "A person who earns his
>>> living from a specified activity"
>>> <http://en.wiktionary.org/wiki/professional>.
>>
>> I'm not convinced that Wiktionary makes for a particularly authoritative
>> dictionary.
>
> I don't really care

Neither do I, at least not very much, as I trust that my reply made clear.
By the way, you appear to have ignored the substantive part of my reply.
(Since the whole thing is off-topic anyway, that doesn't particularly
matter. No big deal.)

Udyant Wig

unread,
Sep 25, 2014, 2:03:36 AM9/25/14
to
"Osmium" <r124c...@comcast.net> writes:
| By applying common sense standards to what you read.

I thought that each domain had its own local meaning of common sense.
So I asked for some general criteria to determine whether a field is
dominated by kooks, if such criteria can be found.

But I could be mistaken in my assumption in the above.

| For example, did you actually read the link that is topical here?

I did. It was an alarming read.

| Look at this:
|
| "Relying solely on the hardware random number generator which is using
| an implementation sealed inside a chip which is impossible to audit is
| a BAD idea."
|
| This suggests that someone at Intel could (or did) conspire to produce
| an RNG that had some kind of magical back door. Can't you see the
| need for magic? The correlation problem? The encrypted data, unknown
| to the Intel mole, is unknown. Yet a receiver of the encrypted
| message recognizes the Intel bit stream and synchronizes with it. I
| didn't capitalize "bad", they did.

Has any project used this, given that it is a bad idea?

christ...@cbau.wanadoo.co.uk

unread,
Sep 25, 2014, 7:05:46 AM9/25/14
to
A professional is by definition a person who does something to make money. Some manage to actually lose money in the process. Some are rubbish at what they are doing. But they are all professionals.

An amateur is by definition a person who does something because they like doing it. Some are good at what they are doing, some are not. What they have in common is that they don't try to make money.

christ...@cbau.wanadoo.co.uk

unread,
Sep 25, 2014, 7:12:47 AM9/25/14
to
On Tuesday, September 9, 2014 12:43:30 PM UTC+1, Richard Heathfield wrote:
> int randrange(int low, int high)
> {
> return (high + 1 - low) * (rand() / (RAND_MAX + 1.0)) + low;
> }
>
> which may at least give you better quality than % n.

Any method that maps any of n possible values to any of m possible values, where n is not divisible by m, _must_ be biased. The only solution is to map (km) of the n possible values to m values, for the largest possible k, and if the original value is not one of those k*m values, choose another one and try again (possibly using the information which one of n - k*m values was rejected).

Osmium

unread,
Sep 25, 2014, 7:27:56 AM9/25/14
to
"Morris Dovey" wrote:

> On 9/9/14 1:53 AM, Morris Dovey wrote:
>> Back in “Re: OT: printf 64-bit and gcc” Tim Rentsch showed an example
>> PRNG algorithm that sent me off to take a closer look at the rand()
>> function and some of the algorithms that might be used to implement a
>> rand64() function.
>>
>> My use of rand() has nearly always involved scaling the generated value
>> to fit in a smaller range than RAND_MAX, and my investigation led me to
>> this question for which I couldn’t find an answer:
>>
>> If a PRNG algorithm used for rand() produces a sequence of “quality” Q,
>> does application of the modulo operator as in
>>
>> rand() % n
>>
>> yield, in general, a sequence of the same “quality”?

No. See a similar current thread in comp.lang.c for details.


Osmium

unread,
Sep 25, 2014, 7:54:02 AM9/25/14
to
"Udyant Wig" wrote:

> "Osmium" <r124c...@comcast.net> writes:
> | By applying common sense standards to what you read.
>
> I thought that each domain had its own local meaning of common sense.
> So I asked for some general criteria to determine whether a field is
> dominated by kooks, if such criteria can be found.
>
> But I could be mistaken in my assumption in the above.

I just meant generalized common sense,

>the link that is topical here?
>
> I did. It was an alarming read.
>
> | Look at this:
> |
> | "Relying solely on the hardware random number generator which is using
> | an implementation sealed inside a chip which is impossible to audit is
> | a BAD idea."
> |
> | This suggests that someone at Intel could (or did) conspire to produce
> | an RNG that had some kind of magical back door. Can't you see the
> | need for magic? The correlation problem? The encrypted data, unknown
> | to the Intel mole, is unknown. Yet a receiver of the encrypted
> | message recognizes the Intel bit stream and synchronizes with it. I
> | didn't capitalize "bad", they did.
>
> Has any project used this, given that it is a bad idea?

By "this" I assume you mean exploited the vulnerability of the Intel RNG.
No, it has not been used. I can only assume the code breakers are waiting
for some ginormous inter-bank transfer and are then going to make a
fantastic profit on the outcome of that one episode. That's what I would
do.

:-(

It's kind of like a perpetual; motion machine, once you have proof of
concept, anyone can do it.


Malcolm McLean

unread,
Sep 25, 2014, 8:17:58 AM9/25/14
to
Professional is used in three ways.

Properly it means membership of a professional body, which has a charter or similar recognition
from government, which is controlled by its members, and which has the power to regulate the
activity for the protection of the public and the maintenance of standards. Which means that only
a full member can describe himself as a "pharmacist" or "solicitor" or whatever, and can be struck
off for serious misconduct. Membership is usually by examination, administered by the body.

It can be used as an alternative to "amateur". Until recently, this was reserved for activities which
were attractive to amateurs, like acting or football. The professional derives his living from
the activity which others do for pleasure. But recently it's been used of professional cleaners
the like.

The third use is to denote an attitude to the activity. A "professional" is one who takes personal
responsibility for maintaining high standards. It's not quite synonymous with "middle class",
but is nearly. Even things like video games joysticks are sometimes sold as "professional".

Morris Dovey

unread,
Sep 25, 2014, 11:21:39 AM9/25/14
to
Okay. Thank you. Will do.

--
Morris Dovey
http://www.iedu.com/Solar/
(usenet is a *very* strange place)

glen herrmannsfeldt

unread,
Sep 25, 2014, 12:24:27 PM9/25/14
to
There is discussion in another group about the meaning of
the term "prosumer".

In some cases equipment, such as audio or photographic, is
classed as professional or consumer grade, and not usually
amateur grade. So, prosumer grade is somewhere in between in
features and reliability.

High-end amateurs in many fields like to buy and use professional
grade equipment. Professionals sometimes would rather have buy
and throw away lower quality equipment, than to worry about
the cost and upkeep of higher quality.

So, the prosumer.

What do you call a professional in a field, when doing the same
thing for hobby or amateur reasons?

-- glen

Malcolm McLean

unread,
Sep 25, 2014, 12:28:20 PM9/25/14
to
On Thursday, September 25, 2014 5:24:27 PM UTC+1, glen herrmannsfeldt wrote:
>
> What do you call a professional in a field, when doing the same
> thing for hobby or amateur reasons?
>
In computer programming, we use the term "hobby project".

For example I work as a programmer, but Baby X is a hobby project. It's not part of any paid job, and I don't have any plans to make it into a living.

Osmium

unread,
Sep 25, 2014, 12:41:57 PM9/25/14
to
"glen herrmannsfeldt" wrote:

<g...@ugcs.caltech.edu> wrote in message
news:m01fjc$urh$1...@speranza.aioe.org...
Metro guy.


Richard Heathfield

unread,
Sep 25, 2014, 12:43:34 PM9/25/14
to
That's actually a very good summary of "professional".

It should, therefore, be clearer now how a professional (sense 2) can be
non- (or, if you prefer, un-) professional (sense 3) without contradiction.

Tim Rentsch

unread,
Sep 26, 2014, 2:43:42 PM9/26/14
to
James Dow Allen <gm...@jamesdowallen.nospam> writes:

> Tim Rentsch <t...@alumni.caltech.edu> might have writ, in
> news:kfn38bi...@x-alumni2.alumni.caltech.edu:
>
>> James Dow Allen <jdall...@yahoo.com> writes:
>>
>>> I've placed some code online
>>> http://fabpedigree.com/james/j_random.c
>>> which addresses a variety of issues related to PNRG's.
>>> (You may also want j_random.h and j_rdemo.c from
>>> the same directory.) Comments welcome.
>>
>> sigh...
>>
>> It's a good illustration of Hoare's aphorism.

Let me preface any further comments by saying the sigh was only
an expression of an internal non-analytic response. It wasn't
meant to be nasty, or even critical, just how I felt trying to
write a reply, and I'm sorry that how it came across was
something other than that.

> I wonder what you think you mean. The PNRG's in that file, which
> do appear very complicated, are taken directly from some of the
> best available.

I didn't look closely at the ma_rand() and me_rand() function
bodies. I assumed they are faithful implementations of some
well-known RNG's. It would be nice to know which ones, ie, it
would be nice if the code gave references or even just the names
in a comment, but I didn't consider the internals of these
functions to be of any concern. Going back and looking at them
now, I have mostly the same reaction, but I wonder what parts are
taken directly from the originals and what parts were added (by
you or someone else) that were not included in the orginal
algorithms. But that's about documentation, not about the code
per se.

> If you're referring to rand_index() with its complicated-looking
> ir_range/ir_vari, then it's obvious you fail to grasp what problem
> the mechanism solves -- despite that I mentioned it upthread.

I believe I understand the problem exactly. A few months ago I
posted some code in c.l.c for a simplified version of the same
concern. (The upthread description was given in the post I was
responding to - I wouldn't have left that part out of what was
quoted if my response were specifically about that area.)

> The special handling is useful only when one is using a very
> expensive RNG, but it's a FUN problem and,

I agree, it is a fun problem.

> of course its complexity (assuming that's what your "sigh" was
> about) is irrelevant, since the code is done.

In my perception this isn't a hard problem. It's a tricky
problem, but not really a hard problem.

The code though I think is more complicated than what the problem
calls for. That condition is always relevant IMO, even for code
that is already written. The result may not merit revising the
existing code, but it's always worth noting.


> So:
>
> (1) What does your nasty little sigh refer to?

A combination of disappointment and feeling unable to respond in
a constructive way. I was looking forward to seeing how you
solved the problem, especially knowing you are a smart guy. It
was frustrating trying to understand how that was done and not
being able to (at least not without putting in a lot more effort
than it seems like should have been necessary). Then I wanted
to write some constructive comments in response, but found the
prospect overwhelming: to figure out what to say (let alone how
to say it) looked like it would take just an enormous amount of
time. The result of all that came out as just a sigh.


> (2) Assuming it's rand_index(), try to grasp the purpose for the
> complexity; then present your own improved solution.

I will give code below.


> (3) Feel free to criticise the rest of that code in that module.
> For example, present your improved code for the setup for Walker's
> algorithm.

I didn't really look at this part of the code before. Looking at
it now (and making an educated guess about what it's supposed to
do, which wasn't obvious to me before), it would take me a little
while to write out a good solution (ie, one I had confidence in
that it works, and never screws up). I can say, I probably would
change the approach slightly, in two ways. One, even if the
input likelihoods are given in floating point, I would want to do
the calculations using integers, both to be sure the results are
repeatable and that the "filling" algorithm works correctly. (In
situations like this using floating point makes me nervous.)
Two, when making a choice (ie, in rand_walker()), it would be
nice to make a choice without using floating point, but instead
making a sequence of binary decisions, stopping as soon as the
outcome is determined. This approach uses somewhat mroe entropy
(I believe two bits on the average, versus one or less (yes?)
with how rand_walker()/rand_decide() are written now), but is
certainly going to be repeatable, which the floating point
calculation may not be. Alternatively, maybe the same method
that rand_decide() uses now could be used, except with scaled
integers rather than floating point so the results will be
repeatable. (Obviously I need to give more thought to this
before proposing any concrete alternatives, which is kinda where
I started in talking about this.)

Switching to more general comments...

My high order bit is that the code is opaque. I don't mean
it's incomprehensible, which is something very different. For
example, the ma_rand() is incomprehensible, by which I mean how
it works or what it's supposed to be doing are not evident.
However this is primarily a problem of documentation, not
code quality or organization.

Code that is opaque has a different quality. It's hard to
understand just how everything fits together. I feel like I
have to understand everything all at once before I understand
anything; there isn't good high-level organization. When
reading code, normally I take little bits and understand them a
piece at a time. Reading this code I am unable to do that -
there is too much inter-connection and inter-dependency. The
code seems to have problems, but the problems aren't obvious,
and it takes a lot of effort to figure out whether it works.
For example, I think there is a dependence on unspecifed
behavior, specifically order of evaluation; depending on which
compiler is used, or what optimization settings, there may be
different results out of, eg, rand32(). But it's hard to be
sure of that, because there isn't much layering, and the
cross-function linkages make it hard to reason about how things
work. I can't say the code is wrong, but /for sure/ I can say
it's hard to tell if it's right. IMO that derives mostly from
there being a lot of cross-function dependencies and not enough
structure in how the code is organized.

I'm sorry if the above seems critical but not helpful. It's
partly because I was having so much trouble expressing my
reaction in a constructive way that I wrote what I did before.
A lot of effort went into composing this response, and for now
it's the best I can do.

Returning to the earlier topic of rand_index(), two items.
First I would write rand_bits() in terms of rand_index() rather
than the other way around, as for example (and please excuse
minor changes in function or type names)

U32
uniform_bits_of_width( unsigned w ){
return w > 31 ? uniform_32() : uniform_32_below( 1uL << w );
}

This change simplifies the implementation of this function, and
allows the other function to be self-contained.

For the entropy-preserving "choose a number less than k", I
might write it this way, at least to illustrate an approach:


U32
uniform_32_below( U32 k ){
return k < 2 ? 0 : uu32( k, - (-k%k + 1) );
}

U32
uu32( U32 k, U32 z ){
static U64 W = 1, X = 0; // invariant: X < W
U32 u, r, m, y;
return k > W
? u = uniform_32(), u > z
? y = z+1, W *= -y, X = X*-y + (u-y), uu32( k, z )
: (m = 1 + z/k, W *= m, X = X*m + u/k, u%k)
: X >= (y = W - W%k)
? W -= y, X -= y, uu32( k, z )
: (r = X % k, W /= k, X /= k, r)
;
}

Obviously the style is rather terse, and some comments are needed
to make what's going on comprehensible (mainly the roles of W, X,
y, and z). But the organization is not complicated. There are
four cases, corresponding to whether the entropy already stored
suffices or a new random number needs to be generated, and in
each case whether the value being used is in the biased range or
the unbiased range. The two sub-cases where the value is in the
biased range update W and X and go again via a recursive call.

By contrast, to see what rand_index is doing, more than 100 lines
of code, including four function definitions, need examining. I
can't help feeling that the resulting complications are more than
the problem warrants.

Morris Dovey

unread,
Sep 27, 2014, 2:52:25 AM9/27/14
to
I went wandering to find a source for initial seeding for a PRNG and
ended up downloading pi and e to a million digits. One of the first
things I did was to take an inventory of the digits in each of the files
because I was curious about digit distribution. (The file for e had some
extra digits.)

In case anyone might be interested:

* 0 1 2 3 4 5 6 7 8 9
pi 99959 99758 100026 100230 100230 100359 99548 99800 99985 100106
e 99425 100136 99848 100231 100390 100089 100481 99913 99816 99692

James Dow Allen

unread,
Sep 27, 2014, 3:05:55 AM9/27/14
to
Tim Rentsch <t...@alumni.caltech.edu> might have writ, in
news:kfnsije...@x-alumni2.alumni.caltech.edu:

> James Dow Allen <gm...@jamesdowallen.nospam> writes:
>
>> Tim Rentsch <t...@alumni.caltech.edu> might have writ, in
>> news:kfn38bi...@x-alumni2.alumni.caltech.edu:
>>
>>> James Dow Allen <jdall...@yahoo.com> writes:
>>>
>>>> I've placed some code online
>>>> http://fabpedigree.com/james/j_random.c
>>>> which addresses a variety of issues related to PNRG's.
>>>> (You may also want j_random.h and j_rdemo.c from
>>>> the same directory.) Comments welcome.
>>>
>>> sigh...
>>>
>>> It's a good illustration of Hoare's aphorism.

Thanks for your detailed reply.
Let me respond to you in the same spirit.


> I didn't look closely at the ma_rand() and me_rand() function
> bodies.

The "me" and "ma" are mnemonics. :-)

me_rand() implements a very standard ME-rsenne Twister with minor
coding changes but no functional difference introduced by me.
ma_rand() similarly implements MA-rsaglia's final(?) masterpiece.
These are overkill for many functions but
(a) As seen I do have devices to reduce calls to the PRNG's,
sometimes consuming just 1 or 2 bits where ordinary code
would consume 32.
(b) It seemed good to have these functions available in a convenient place.
(c) A fast_random() is present in the module for those who prefer a
faster basic PRNG.

I agree there should be more comments! I think I had such a version,
decided it had too many, so erred the opposite way. As excuse, let me
offer that if I got interest in the code via e-mail, I'd invest more work
documenting etc. If I had strong interest in promulgating the code
I'd use sourceforge or such, but I did it mainly for me. No one at c.l.c
has shown interest in using the code, with the possible exception of the OP
of this thread. You're the only one who's taken a stab at improving it.
We'll see in a moment what your improvements consist of.

>> If you're referring to rand_index() with its complicated-looking
>> ir_range/ir_vari, then it's obvious you fail to grasp what problem
>> the mechanism solves -- despite that I mentioned it upthread.
>
> I believe I understand the problem exactly. A few months ago I
> posted some code in c.l.c for a simplified version of the same
> concern.

Did that code, whether "simplified" or not, work correctly?
The code you posted here did NOT.

>> The special handling is useful only when one is using a very
>> expensive RNG, but it's a FUN problem and,
>
> I agree, it is a fun problem.

> My high order bit is that the code is opaque.

The code follows my usual self-imposed convention that functions
are defined before they're referenced. In other words, I don't really need
to include the header with its function declarations; the module progresses
nicely from low-level to high-level. (This was explicit before I stripped
most of the comments. ::whack:: )

I strongly disagree that rand_bits should call rand_index. The former
is lower-level. The latter calls it for a reason.

Once you grok rand_bits(k) -- hardly a problem -- the entirety
of rand_index and its single subroutine prime_ir() (which "primes"
the "ir" mechanism) comprises 47 non-blank lines. (And please note
that prime_ir() is called from a completely different routine,
rand_decide(), which also needs a minimum number of random bits to work
with but doesn't want to waste.


> ... This change simplifies the implementation of this function, and
> allows the other function to be self-contained.
>
> For the entropy-preserving "choose a number less than k", I
> might write it this way, at least to illustrate an approach:
>
>
> U32
> uniform_32_below( U32 k ){
> return k < 2 ? 0 : uu32( k, - (-k%k + 1) );
> }
>
> U32
> uu32( U32 k, U32 z ){
> static U64 W = 1, X = 0; // invariant: X < W
> U32 u, r, m, y;
> return k > W
> ? u = uniform_32(), u > z
> ? y = z+1, W *= -y, X = X*-y + (u-y), uu32( k, z )
> : (m = 1 + z/k, W *= m, X = X*m + u/k, u%k)
> : X >= (y = W - W%k)
> ? W -= y, X -= y, uu32( k, z )
> : (r = X % k, W /= k, X /= k, r)
> ;
> }

When I change this into the sort of code I write, where
return A ? B, C, D : E ? F, G : H;
becomes
if (A) {
B;
C;
return D;
} else if (E) {
F;
return G;
} else
return H;
Your code occupies 39 blank lines compared with my 47. I'd still want to
cringe in shame at my excess verbosity ... except that my code is longer
BECAUSE YOURS OMITS IMPORTANT FUNCTIONALITY.

And ... I'm much more tolerant than the average C programmer of constructs
like
return A ? B, C, D : E ? F, G : H;
but your usage here makes me wonder. Is this how you usually write? Do
you think it's easier to grok than the alternative?

Finally, but MOST IMPORTANTLY, your code does NOT work.

Try
uniform_32_below(2);
It should return 1 about 50% of the time, no? Instead it returns
1 about 66% of the time. The problem occurs with every argument I tried
but was most visible with k=2 and k=4.

I'd try to debug it for you, but your code is hard to read. :-)
Anyway, I've grokked it enough to suspect you're basic approach is wrong.

> By contrast, to see what rand_index is doing, more than 100 lines
> of code, including four function definitions, need examining. I
> can't help feeling that the resulting complications are more than
> the problem warrants.

47 lines, 2 function definitions. I find it LESS complicated than yours
and, more importantly, functionally correct. :-)

James Dow Allen

Richard Heathfield

unread,
Sep 27, 2014, 3:24:56 AM9/27/14
to
James Dow Allen wrote:

> The "me" and "ma" are mnemonics. :-)
>
> me_rand() implements a very standard ME-rsenne Twister with minor
> coding changes but no functional difference introduced by me.
> ma_rand() similarly implements MA-rsaglia's final(?) masterpiece.

James, I have a certain sympathy for the spirit of your reply, so I'm
somewhat reluctant to pick nits. Nevertheless, nits must be picked. After
all, who wants nits?

So here we go. If me_ and ma_ are mnemonics, they're not very good ones.
There are a couple of mnemonics that are far superior:

mersenne_twister_rand and marsaglia_rand

In gvim, this is very fast to type: mers^P or mars^P respectively will do it
every time except the first. And what you end up with is readable code.

To me, the name me_rand conjures up an image of a PRNG I wrote myself and
named at 4:30am, which is never a good idea. My mother never wrote C
programs, so I don't know where ma_rand comes from, but it's nowhere good.

We don't talk enough here about good name choices. They can make a huge
amount of difference to the readability of code. Far too much C code I see
nowadays is written with the express intent, or so it seems, of being
difficult to read, as if the author wanted his code only to be understood by
some kind of exclusive club.

James Dow Allen

unread,
Sep 27, 2014, 3:33:16 AM9/27/14
to
Richard Heathfield <inv...@see.sig.invalid> might have writ, in
news:2DtVv.464141$PG2.3...@fx12.am4:

> So here we go. If me_ and ma_ are mnemonics, they're not very good
> ones. There are a couple of mnemonics that are far superior:
>
> mersenne_twister_rand and marsaglia_rand

Valid point. Perhaps you'd also like to compare Tim's and James'
implementations of rand_index() for readability, and ... help Tim find his
biug.

James

Richard Heathfield

unread,
Sep 27, 2014, 4:09:24 AM9/27/14
to
His first bug is impenetrability. He needs to fix that bug *first*. Once he
has done so, others can help him to find the other bugs.

Of course, impenetrability may be his intent. But it's still a bug.

Malcolm McLean

unread,
Sep 27, 2014, 4:16:46 AM9/27/14
to
On Saturday, September 27, 2014 8:24:56 AM UTC+1, Richard Heathfield wrote:
> James Dow Allen wrote:
>
> > The "me" and "ma" are mnemonics. :-)
>
> > me_rand() implements a very standard ME-rsenne Twister with minor
> > coding changes but no functional difference introduced by me.
> > ma_rand() similarly implements MA-rsaglia's final(?) masterpiece.
>
>
> So here we go. If me_ and ma_ are mnemonics, they're not very good ones.
> There are a couple of mnemonics that are far superior:
>
> mersenne_twister_rand and marsaglia_rand
>
They're statics. So it doesn't matter to much.
But what you do need is a reference of sorts to whoever developed
the algorithm. In this day and age, anyone can type it into Wikipedia
and get the technical explanation.
>
> We don't talk enough here about good name choices. They can make a huge
> amount of difference to the readability of code. Far too much C code I see
> nowadays is written with the express intent, or so it seems, of being
> difficult to read, as if the author wanted his code only to be understood by
> some kind of exclusive club.
>
It's got to be short, meaningful, and unlikely to clash.
So three requirements pulling in different directions.

Baby X reserves bbx or BBX as a prefix. But there's only a limited
number of combinations of three letters.


Richard Heathfield

unread,
Sep 27, 2014, 4:32:52 AM9/27/14
to
Malcolm McLean wrote:

> On Saturday, September 27, 2014 8:24:56 AM UTC+1, Richard Heathfield
> wrote:
>> James Dow Allen wrote:
>>
>> > The "me" and "ma" are mnemonics. :-)
>>
>> > me_rand() implements a very standard ME-rsenne Twister with minor
>> > coding changes but no functional difference introduced by me.
>> > ma_rand() similarly implements MA-rsaglia's final(?) masterpiece.
>>
>>
>> So here we go. If me_ and ma_ are mnemonics, they're not very good ones.
>> There are a couple of mnemonics that are far superior:
>>
>> mersenne_twister_rand and marsaglia_rand
>>
> They're statics. So it doesn't matter to much.

So it doesn't hurt to give them descriptive names, then, does it?

> But what you do need is a reference of sorts to whoever developed
> the algorithm.

I propose that the reference for the MT function be incorporated into its
name. For example: mersenne_twister_rand. As for Marsaglia's function, I
propose instead that the reference (in a cunning twist) be incorporated into
its very name! For example: marsaglia_rand.

> In this day and age, anyone can type it into Wikipedia
> and get the technical explanation.

Which may even be right. Unless little Jimmy's been at the Edit screen
again, of course, in which case it may be subtly wrong.

Wikipedia has been characterised as on a par (for accuracy) with printed
encyclopaediae (at the time of printing, obviously). I'm not convinced. I
don't find myself wincing when I flick through Encyclopaedia Britannica.

<choosing a name>

> It's got to be short,

Why?

> meaningful,

This is far more important.

> and unlikely to clash.

Not such a big deal. A library prefix will normally sort this out.

> So three requirements pulling in different directions.

*One* requirement. The name must be meaningful.

> Baby X reserves bbx or BBX as a prefix. But there's only a limited
> number of combinations of three letters.

Worse still, you can't actually reserve bbx or BBX, no matter how much you'd
like to. Anyone else is free to use that combination themselves. The best
you can hope for is good manners.

You could, of course, go to four letters.

Malcolm McLean

unread,
Sep 27, 2014, 7:15:08 AM9/27/14
to
On Saturday, September 27, 2014 9:32:52 AM UTC+1, Richard Heathfield wrote:
> Malcolm McLean wrote:
>
> > But what you do need is a reference of sorts to whoever developed
> > the algorithm.
>
> I propose that the reference for the MT function be incorporated into its
> name. For example: mersenne_twister_rand. As for Marsaglia's function, I
> propose instead that the reference (in a cunning twist) be incorporated into
> its very name! For example: marsaglia_rand.
>
From the wiki entry for Marsaglia: (eliminating obvious other uses
of the same name).

"He is also known for developing some of the most commonly used
methods for generating random numbers and using them to produce
random samples from various distributions. Some of the most widely
used being the multiply-with-carry, subtract-with-borrow, Xorshift,
KISS and Mother methods for random numbers, and the ziggurat algorithm
for generating normally or other unimodally distributed random variables".
>
This has the ring of truth.

But he developed several algorithms. Since he sadly passed away a couple
of years ago, he won't be developing any more, but a living author can
easily publish a refinement to a method. So whilst it's not useless,
you need a little more that just the name.

rand() is one of those functions like sine and exponent that gets
incorporated a lot into expressions. So it's more important that
it be short. A function call which would normally be essentially
a line of source by itself doesn't need a very short identifier.

Tim Rentsch

unread,
Sep 27, 2014, 7:16:28 AM9/27/14
to
Richard Heathfield <inv...@see.sig.invalid> writes:

> James Dow Allen wrote:
>
>> Richard Heathfield <inv...@see.sig.invalid> might have writ, in
>> news:2DtVv.464141$PG2.3...@fx12.am4:
>>
>>> So here we go. If me_ and ma_ are mnemonics, they're not very
>>> good ones. There are a couple of mnemonics that are far
>>> superior:
>>>
>>> mersenne_twister_rand and marsaglia_rand
>>
>> Valid point. Perhaps you'd also like to compare Tim's and James'
>> implementations of rand_index() for readability, and ... help Tim
>> find his biug.
>
> His first bug is impenetrability. He needs to fix that bug
> *first*. Once he has done so, others can help him to find the
> other bugs.

I am wondering how much of this reaction is to style and how
much is to something more substantial. Would you mind if I
asked you to take a stab at refactoring my function (or both
if you prefer), keeping the same functional behavior, but
written in a way that you find less impenetrable? That
would help me better understand your objection.

Note btw that I already acknowledged that the code needs
explanatory comments for some things, prominently what some
of the variables are used for. So there may be a related
question here, which is do you have a sense of how much of
the problem could be addressed by commentary and how much
needs different code? I realize this is a pretty vague
question, but any information you can provide would help
me better understand what you're reacting to.

Ben Bacarisse

unread,
Sep 27, 2014, 7:34:34 AM9/27/14
to
Richard Heathfield <inv...@see.sig.invalid> writes:
<snip>
> Wikipedia has been characterised as on a par (for accuracy) with printed
> encyclopaediae (at the time of printing, obviously). I'm not convinced. I
> don't find myself wincing when I flick through Encyclopaedia
> Britannica.

What does Encyclopaedia Britannica have to say about the Mersenne
twister PRNG and any of Marsaglia's PRNGs? Is a lack of information an
error you'd count when determining accuracy?

<snip>
--
Ben.

Richard Heathfield

unread,
Sep 27, 2014, 8:05:52 AM9/27/14
to
Malcolm McLean wrote:

<snip>

> But [Marsaglia] developed several algorithms. Since he sadly passed
> away a couple of years ago, he won't be developing any more, but a
> living author can easily publish a refinement to a method. So whilst
> it's not useless, you need a little more that just the name.

Fair enough. In fact, he may have published such a refinement during his
lifetime. If it's library code (as opposed to static, say), the library docs
will make clear which of Marsaglia's PRNGs is being used. (If it's static, a
code comment will be sufficient.)

If several (two or more) of Marsaglia's PRNGs are implemented, you need an
even longer name. It is certainly true that ma_rand() just won't cut it, for
the very reason you gave.

> rand() is one of those functions like sine and exponent that gets
> incorporated a lot into expressions. So it's more important that
> it be short.

Why? (Second time of asking.) The fact that a name is much-used is not a
sufficient reason for making it short. In fact, it could be argued that it's
a reason for making it long. I understand why sin(), cos(), and tan() have
been chosen, rather than sine(), cosine(), and tangent() - this is in line
with standard mathematical usage. But there is no good reason for, say,
shortening create() to creat(), a fact that kt pointed out himself.

I am not arguing that function names should be excessively long, by the way.
I am just arguing that they should not be excessively short.

> A function call which would normally be essentially
> a line of source by itself doesn't need a very short identifier.

Nor do I think a function call which is used within expressions needs a very
short identifier, and you have not yet explained why you think otherwise.

It is my view that a function name should, on the whole, reflect its purpose
in a way that is reasonably clear and readable. Excessively short names do
not achieve this objective. Whole words are always a good idea, because
people type and parse whole words more easily than they type and parse
cntrctns.

On the subject of typing, the keyboard has come to be so important to our
daily lives that I am continually surprised at how few people can touch-
type. It is a huge time-saver, so I would have thought that bright, busy
people would see the advantage in taking a touch-typing course. If you are
locked in a room with an electric typewriter (computers are too distracting
for this, unless they are heavily restricted), a goodly supply of paper, and
an old battle-axe of a supervisor with a glare like a laser and a voice like
an angry wasp (to keep you focussed), learning to touch-type shouldn't take
more than three weeks and probably will take less, and it will pay dividends
for the rest of your career.

It may be because I can touch-type (and spell!) that I tend to use longer
names in identifiers. I find them quicker to type than shorter names. For
example, I can type create() more quickly than I can type creat(), because
for create() I don't have to backspace over the second 'e'. What's more, in
vim, ^p is your friend.

By way of illustration, here are some function names from one of my home-
grown libraries:

pcl_ArrayCopyElement
pcl_ArrayCopyElementToRange
pcl_ArrayCopyRange
pcl_ArrayCreate
pcl_ArrayDeleteElement
pcl_ArrayDeleteRange
pcl_ArrayDestroy
pcl_ArrayGetCapacity
pcl_ArrayGetCount
pcl_ArrayGetMaxCapacity
pcl_ArrayGetObjectSize
pcl_ArrayInsertElement
pcl_ArrayInsertRange
pcl_ArrayIterate
pcl_ArrayOrderedInsertItem
pcl_ArrayResize
pcl_ArrayRetrieveElement
pcl_ArraySearch
pcl_ArraySetComparatorFunction
pcl_ArraySetExecutorFunction
pcl_ArraySort
pcl_ArraySwap

Personally, I feel that these names are well-chosen. What would you have
called pcl_ArrayOrderedInsertItem, for example? pcl_arordiit?

Richard Heathfield

unread,
Sep 27, 2014, 8:32:13 AM9/27/14
to
Tim Rentsch wrote:

> Richard Heathfield <inv...@see.sig.invalid> writes:
>
>> James Dow Allen wrote:
>>
<snip>

>>>
>>> Valid point. Perhaps you'd also like to compare Tim's and James'
>>> implementations of rand_index() for readability, and ... help Tim
>>> find his biug.
>>
>> His first bug is impenetrability. He needs to fix that bug
>> *first*. Once he has done so, others can help him to find the
>> other bugs.
>
> I am wondering how much of this reaction is to style and how
> much is to something more substantial. Would you mind if I
> asked you to take a stab at refactoring my function (or both
> if you prefer), keeping the same functional behavior, but
> written in a way that you find less impenetrable?

Tim, I was contributing to clc for about a decade, then I took a few years
off. I don't recall your name from first time around, but from what I've
seen of your articles in the last three months or so, I have acquired a
certain amount of respect for your knowledge of the C language and the
clarity of your thought. I wanted to make that clear first, because what I'm
about to say may seem a little provocative, but I can't think of a better
way to say it, and I wanted to soften the blow a little in advance.

Ready? Okay, here we go:

I invite you to consider the meaning of the word "impenetrable".

There. That could have been worse, couldn't it? Right, what do I mean? Let's
take a look at your code again:

U32
uniform_bits_of_width( unsigned w ){
return w > 31 ? uniform_32() : uniform_32_below( 1uL << w );
}
U32
uniform_32_below( U32 k ){
return k < 2 ? 0 : uu32( k, - (-k%k + 1) );
}

U32
uu32( U32 k, U32 z ){
static U64 W = 1, X = 0; // invariant: X < W
U32 u, r, m, y;
return k > W
? u = uniform_32(), u > z
? y = z+1, W *= -y, X = X*-y + (u-y), uu32( k, z )
: (m = 1 + z/k, W *= m, X = X*m + u/k, u%k)
: X >= (y = W - W%k)
? W -= y, X -= y, uu32( k, z )
: (r = X % k, W /= k, X /= k, r)
;
}

How does one go about reading this code? Short answer: one doesn't. It's
perfectly safe from inspection. The excessive use of one-letter names, the
complete absence of documentation as to their meaning, the vast over-use of
the comma operator, the density of each expression... This is a good IOCCC
entry, but it is not a good advertisement for C. In short, it's
impenetrable.


> That
> would help me better understand your objection.
>
> Note btw that I already acknowledged that the code needs
> explanatory comments for some things, prominently what some
> of the variables are used for. So there may be a related
> question here, which is do you have a sense of how much of
> the problem could be addressed by commentary and how much
> needs different code? I realize this is a pretty vague
> question, but any information you can provide would help
> me better understand what you're reacting to.

Perhaps the best way I can explain what I mean is to show you some of my own
code, as it is entirely possible that you may not have seen any examples of
my own style.

In keeping with the thread subject, I have chosen some PRNG code. I am not
making any great claims for this code. It may be the worst PRNG you've seen
since int rand() { return ++seed % RAND_MAX; } or it may have bugs falling
out of its bugs. That's irrelevant. Don't worry about that. Just try to
figure out what the code *does*. At the end, there's a test question for
you.

#include "pcl.h"
#include <stdlib.h>
#include <time.h>

pcl_lcprng *pcl_LcprngCreate(unsigned long seed)
{
pcl_lcprng *new = pcl_Malloc(sizeof *new);
if(new != NULL)
{
new->p1 = 1103515245UL;
new->p2 = 12345UL;
new->next = seed;
pcl_LcprngRandomise(new); /* warm it up */
}
return new;
}

void pcl_LcprngDestroy(pcl_lcprng **lc)
{
if(lc != NULL)
{
pcl_Free(*lc);
*lc = NULL;
}

return;
}

double pcl_LcprngRandomise(pcl_lcprng *lc)
{
double r = 0.0;

if(lc != NULL)
{
r = (lc->next >> 8) / 16777217.0;
lc->next *= lc->p1;
lc->next &= 0xFFFFFFFF;
lc->next += lc->p2;
lc->next &= 0xFFFFFFFF;
}

return r;
}

unsigned long pcl_LcprngGetRandInRange(pcl_lcprng *lc,
unsigned long low,
unsigned long high)
{
double r = 0.0;
unsigned long result = 0;

if(lc != NULL)
{
if(low <= high)
{
r = pcl_LcprngRandomise(lc);
result = r * (high + 1 - low) + low;
}
}
return result;
}

void pcl_LcprngShuffle(pcl_array *arr,
pcl_lcprng *lc)
{
size_t i = 0;
unsigned long r = 0;
size_t count = arr->count;

for(i = 0; i < count; i++)
{
r = pcl_LcprngGetRandInRange(lc, i, count - 1);
if(i != r)
{
pcl_ArraySwap(arr, i, r);
}
}

return;
}

void pcl_LcprngSetSeed(pcl_lcprng *r, unsigned long seed)
{
r->next = seed;
pcl_LcprngRandomise(r); /* warm it up */
return;
}

/* Choose and return a random seed based on the current seed
setting and the current time. Based on code posted to Usenet's
comp.lang.c group by Lawrence Kirby.
*/
void pcl_LcprngRandomiseTimer(pcl_lcprng *r)
{
time_t timeval = {0};
unsigned char *ptr = (unsigned char *)&timeval;
size_t i = 0;

timeval = time(NULL);

while(i < sizeof timeval)
{
r->next *= UCHAR_MAX + 2U;
r->next += ptr[i++];
}

/* warm it up */
for(i = 0; i < 8; i++)
{
pcl_LcprngRandomise(r);
}

return;
}

Here's the test question: if we asked every C programmer in the world to
vote on whether they'd rather maintain your code or mine, how do you think
they'd vote?

Richard Heathfield

unread,
Sep 27, 2014, 8:41:57 AM9/27/14
to
As I'm sure you are only too well aware, I was speaking more generally than
that. Whilst I can and do accept that Wikipedia articles can be very useful
as a quick heads-up on a subject, when it comes to the nitty-gritty detail I
have often found them to be erroneous, and sometimes seriously so, when
discussing subjects I happen to know something about.

A few years ago I spent some considerable time fixing some of the more
egregious errors on their C page, but the changes were undone by someone who
must have learned C from Schildt.

You want an example? Here's an example (or at least it is at the moment -
you never can tell when Wikipedia will change underneath you):

"A struct in the C programming language (and many derivatives) is a complex
data type declaration that defines a physically grouped list of variables to
be placed under one name in a block of memory, allowing the different
variables to be accessed via a single pointer, or the struct declared name
which returns the same address. The struct can contain many other complex
and simple data type in an association, so is a natural organizing type for
records like the mixed data types in lists of directory entries reading a
hard drive (file length, name, extension, physical (cylinder, disk, head
indexes) address, etc.), or other mixed record type (patient names, address,
telephone... insurance codes, balance, etc.)."

The game, just like in the old days with "C - The Complete Reference", is to
count the errors.

James Dow Allen

unread,
Sep 27, 2014, 9:05:47 AM9/27/14
to
On Saturday, September 27, 2014 2:05:55 PM UTC+7, James Dow Allen wrote:
> Tim Rentsch <t...@alumni.caltech.edu> might have writ, in

> Try
> uniform_32_below(2);
>
> It should return 1 about 50% of the time, no? Instead it returns
> 1 about 66% of the time.

I need to apologize to Tim for this. This was NOT his bug but
rather a cockpit failure by me. :-(
My only excuse is that I wasn't so rusty
at the C keyboard years ago when I wrote my code!
And, while I may not be as diligent at triple-checking
BEFORE publication as I once was, at least I did triple-check!

I did notice something of slight interest. When I count the actual
bits of PRNG consumed, I find
.......... James ........ Tim
k=127 ... 6.988736 ... 7.036160
k=128 ... 7.000000 ... 7.000000
k=129 ... 7.011264 ... 7.132160

No big deal. But the perfectionsism to approach optimality
explains the extra eight lines of my code that Tim was
concerned enough to eliminate.
(Tim preserves entropy better when k is very large,
but that case was of no interest to me.)

Sorry again.

James

Tim Rentsch

unread,
Sep 27, 2014, 9:28:03 AM9/27/14
to
James Dow Allen <gm...@jamesdowallen.nospam> writes:

> Tim Rentsch <t...@alumni.caltech.edu> might have writ, in
> news:kfnsije...@x-alumni2.alumni.caltech.edu:
>
>> James Dow Allen <gm...@jamesdowallen.nospam> writes:
>>
>>> Tim Rentsch <t...@alumni.caltech.edu> might have writ, in
>>> news:kfn38bi...@x-alumni2.alumni.caltech.edu:
>>>
>>>> James Dow Allen <jdall...@yahoo.com> writes:
>>>>
>>>>> I've placed some code online
>>>>> http://fabpedigree.com/james/j_random.c
>>>>> which addresses a variety of issues related to PNRG's.
>>>>> (You may also want j_random.h and j_rdemo.c from
>>>>> the same directory.) Comments welcome.
>>>>
>>>> sigh...
>>>>
>>>> It's a good illustration of Hoare's aphorism.
>
> Thanks for your detailed reply.
> Let me respond to you in the same spirit.

ditto, and ditto. (I may edit out a bit more.)

>> I didn't look closely at the ma_rand() and me_rand() function
>> bodies.
>
> The "me" and "ma" are mnemonics. :-) [snip elaboration]

These names didn't bother me.

> I agree there should be more comments! [snip]

Please note that I wasn't complaining about there not being
enough comments. I did want to know what RNG's are being
used, but everythiing else I said was independent of that.

>>> If you're referring to rand_index() with its complicated-looking
>>> ir_range/ir_vari, then it's obvious you fail to grasp what problem
>>> the mechanism solves -- despite that I mentioned it upthread.
>>
>> I believe I understand the problem exactly. A few months ago I
>> posted some code in c.l.c for a simplified version of the same
>> concern.
>
> Did that code, whether "simplified" or not, work correctly?

I believe so, yes.

> The code you posted here did NOT.

I will return to this question further on.

>>> The special handling is useful only when one is using a very
>>> expensive RNG, but it's a FUN problem and,
>>
>> I agree, it is a fun problem.
>
>> My high order bit is that the code is opaque.
>
> The code follows my usual self-imposed convention that functions
> are defined before they're referenced. In other words, I don't
> really need to include the header with its function declarations;
> the module progresses nicely from low-level to high-level. [snip]

IMO this is bad practice. I find reading code that is given
in bottom-up order to be much harder than reading code that
is given in top-down order. I tried to ignore that aspect
when looking over your code, but I may have been affected by
it despite my efforts not to be.

> I strongly disagree that rand_bits should call rand_index. The
> former is lower-level. The latter calls it for a reason.

Can you say something about what the reason is? It seems like
needless duplication of effort, without any significant benefit.

> Once you grok rand_bits(k) -- hardly a problem -- the entirety of
> rand_index and its single subroutine prime_ir() (which "primes"
> the "ir" mechanism) comprises 47 non-blank lines.

Right, plus one for the global variables declaration line.

> (And please note that prime_ir() is called from a completely
> different routine, rand_decide(), which also needs a minimum
> number of random bits to work with but doesn't want to waste.

I noticed that at some point, not sure when. To my way of
thinking that's another indication that the design is not yet
sufficiently factored. Not a certainty, but an indication.
Of course you meant non-blank lines there. I don't see how you
get 39 non-blank lines. Following the scheme you outline, I get
only 37 (or maybe 38 - not sure what the rule is for when
unnecessary braces may be omitted). More important though this
description glosses over an important consideration - namely,
that six or seven of those lines are the outside function whose
only purpose in life is to handle a trivial boundary case, or
call the inner function where the real work is done. Except for
computing the value for z (which is just one expression in the
caller), these functions don't interact in any significant way;
ie, they are not "coupled" in their use of global data. The same
does not hold with prime_ir() and rand_index().

> I'd still want to cringe in shame at my excess verbosity ...
> except that my code is longer BECAUSE YOURS OMITS IMPORTANT
> FUNCTIONALITY.

Are you talking about functionality of rand_index(), or something
else? If only about rand_index(), what functionality are you
talking about? My intention was only to provide a different
implementation for a rand_index()-like function, not to mimic all
of its behavior in terms of how it couples/interacts with other
functions.

> And ... I'm much more tolerant than the average C programmer of
> constructs like
> return A ? B, C, D : E ? F, G : H;
> but your usage here makes me wonder. Is this how you usually
> write? Do you think it's easier to grok than the alternative?

When reading and understanding code, I try not to pay attention
to surface style issues. I'd rather not get involved now in a
discussion (or debate) about such issues, because I think it
would tend to obscure other more important aspects. If there is
a straightforward local transformation you can make (such as the
one you described earlier) that will remedy problems with my
style being unfamiliar, please make them (and I am happy to read
transformed definitions if they are posted). My comments and
interests here are deeper than surface style issues.

Having said that, for the remainder of the discussion I will
make an effort to write in a style closer to one likely to
be more familiar.

> Finally, but MOST IMPORTANTLY, your code does NOT work.
>
> Try
> uniform_32_below(2);

> It should return 1 about 50% of the time, no? Instead it returns
> 1 about 66% of the time. The problem occurs with every argument I
> tried but was most visible with k=2 and k=4.

I believe your testing went awry somewhere. Running my own
tests, in every case I tried the results were always average
values up to about half the number of digits, which I think is
what we would expect statistically.

> I'd try to debug it for you, but your code is hard to read. :-)

Didn't you re-write it to see how many lines it would be in your
usual statement-by-statement style? Did you try to debug that
version? Or do you find that version equally hard to read?

> Anyway, I've grokked it enough to suspect you're basic approach is
> wrong.

This seems a curious statement. You don't understand the code
well enough to debug it, but you do understand it enough to judge
the basic approach? Does this mean you can say what the basic
approach is?

>> By contrast, to see what rand_index is doing, more than 100 lines
>> of code, including four function definitions, need examining. I
>> can't help feeling that the resulting complications are more than
>> the problem warrants.
>
> 47 lines, 2 function definitions. I find it LESS complicated than
> yours

Do you mean less complicated, or easier to understand? I would
expect you would find it easier to understand; after all, you
wrote it. On the other hand, there certainly are objective
metrics under which the code in j_random.c is more complicated,
with data coupling being one example.

> and, more importantly, functionally correct. :-)

I don't yet see any evidence that the code I posted is not
correct. What can you say that might convince me otherwise?
My own testing does not exhibit anything like the behavior you
described.

(At this point I'm sure the TDD folks are having a good
laugh..... :)

James Dow Allen

unread,
Sep 27, 2014, 10:11:28 AM9/27/14
to
On Saturday, September 27, 2014 8:28:03 PM UTC+7, Tim Rentsch wrote:

I posted a retraction to the bug claim which you apparently
missed cross-posting. I'll still add some comments.

> Can you say something about what the reason is? It seems like
> needless duplication of effort, without any significant benefit.

The rand_bits() is too trivial to worry about "duplication."
But, yes, prime_ir() does call it for excellent reasons.



> > the "ir" mechanism) comprises 47 non-blank lines.
> Right, plus one for the global variables declaration line.

No. It was 46 until I added that line.
But are we going to quibble about exact line counts?

> > (And please note that prime_ir() is called from a completely
> > different routine, rand_decide(), which also needs a minimum
> > number of random bits to work with but doesn't want to waste.
>
> To my way of
> thinking that's another indication that the design is not yet
> sufficiently factored.

Just the opposite is true! the ir/prime_ir() mechanism
is one of general utility. You bury it as statics inside
ONE of the routines that wants to use it.

> > When I change this into the sort of code I write, where
> > Your code occupies 39 blank lines compared with my 47.
>
> Of course you meant non-blank lines there. I don't see how you
> get 39 non-blank lines. Following the scheme you outline, I get
> only 37 (or maybe 38 - not sure what the rule is ...

Oh my. Is this precise line-counting necessary? Note that my
routine starts by "wasting" 4 lines on unnecessary tests. You
started by sighing about Hoare's aphorism and now worry about a
9 LOC advantage instead of just a 5 LOC advantage. Puleeeze.


> (At this point I'm sure the TDD folks are having a good
> laugh..... :)

I miss this joke. What's TDD?

James

Tim Rentsch

unread,
Sep 27, 2014, 4:14:52 PM9/27/14
to
Richard Heathfield <inv...@see.sig.invalid> writes:

> Tim Rentsch wrote:
>
>> Richard Heathfield <inv...@see.sig.invalid> writes:
>>
>>> James Dow Allen wrote:
>>>
> <snip>
>
>>>>
>>>> Valid point. Perhaps you'd also like to compare Tim's and James'
>>>> implementations of rand_index() for readability, and ... help Tim
>>>> find his biug.
>>>
>>> His first bug is impenetrability. He needs to fix that bug
>>> *first*. Once he has done so, others can help him to find the
>>> other bugs.
>>
>> I am wondering how much of this reaction is to style and how
>> much is to something more substantial. Would you mind if I
>> asked you to take a stab at refactoring my function (or both
>> if you prefer), keeping the same functional behavior, but
>> written in a way that you find less impenetrable?
>
> Tim, I was contributing to clc for about a decade, then I took a
> few years off. I don't recall your name from first time around,
> but from what I've seen of your articles in the last three months
> or so, I have acquired a certain amount of respect for your
> knowledge of the C language and the clarity of your thought. [snip]

Thank you for the compliment.

> I invite you to consider the meaning of the word "impenetrable".
>
> There. That could have been worse, couldn't it? Right, what do I
> mean? Let's take a look at your code again: [snip code and
> subsequent commentary and the long section following.]

Thank you for the commentary and the example.

In my posting I asked a particular question which I now ask again
(paraphrasing): How would you refactor the function I wrote,
keeping the same functional behavior, but expressed so that you
would find it more readable and/or understandable? Please
consider this question and try to provide the best approximation
you can; eg, it would be okay to make up new variable names
or comment some with question marks if their use seems completely
inscrutable. It would be most helpful if you could do this.
Thanks.

Richard Heathfield

unread,
Sep 27, 2014, 4:49:11 PM9/27/14
to
Tim Rentsch wrote:

> Richard Heathfield <inv...@see.sig.invalid> writes:
>
<snip>
>
>> I invite you to consider the meaning of the word "impenetrable".
>>
>> There. That could have been worse, couldn't it? Right, what do I
>> mean? Let's take a look at your code again: [snip code and
>> subsequent commentary and the long section following.]
>
> Thank you for the commentary and the example.
>
> In my posting I asked a particular question which I now ask again
> (paraphrasing): How would you refactor the function I wrote,
> keeping the same functional behavior, but expressed so that you
> would find it more readable and/or understandable?

I think you may need to think a bit harder about the meaning of the word
"impenetrable".

Keith Thompson

unread,
Sep 27, 2014, 7:09:37 PM9/27/14
to
Richard Heathfield <inv...@see.sig.invalid> writes:
[...]
> Why? (Second time of asking.) The fact that a name is much-used is not a
> sufficient reason for making it short. In fact, it could be argued that it's
> a reason for making it long. I understand why sin(), cos(), and tan() have
> been chosen, rather than sine(), cosine(), and tangent() - this is in line
> with standard mathematical usage. But there is no good reason for, say,
> shortening create() to creat(), a fact that kt pointed out himself.

Whereas there must be a very good reason to shorten Ken Thompson to kt.
8-)}

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Malcolm McLean

unread,
Sep 27, 2014, 7:22:17 PM9/27/14
to
On Saturday, September 27, 2014 9:14:52 PM UTC+1, Tim Rentsch wrote:
>
> In my posting I asked a particular question which I now ask again
> (paraphrasing): How would you refactor the function I wrote,
> keeping the same functional behavior, but expressed so that you
> would find it more readable and/or understandable?
>
Here's my shot.


/*
uniform random number on 0 to k -1;
*/
int uniform_below(int k)
{
U32 zero = 0;
U32 uk;

assert(k > 0);
assert(k <= ~zero);
if(k == 1)
return 0;
else
{
uk = (U32) k;
return (int) uunder32(uk, -(-uk%uk + 1));
}
}

/*
The gubbins of our code

k - number we want to get below
z - cofactor of k ?

Returns: random number on 0 to k-1
Keeps bits in the buffers W and X

*/
static U32 uunder32( U32 k, U32 z )
{
static U64 W = 1, X = 0; /* invariant: X < W */
U32 u; /* uniform random number on 0 - 0x7FFFFFFFF */
U32 result;
U32 m, y;

while(1)
{
if(k > W)
{
/* k is above our range, need more bits */
u = uniform_32();
if(u > z)
{
/* u over value, add bits to W and X */
y = z +1;
W *= -y;
X = X * -y + (u -y);
}
else
{
/* u OK, take bits and add remainer to W and X */
m = 1 + z/k;
W *= m;
X = X*m + u/k;
return u%k;
}
}
else
{
/* k OK, extract bits from W and X */
y = W - W%k;
if(X >= y)
{
/* take y bits from W and X */
W -= y;
X -= y;
}
else
{
result = X %k;
W /= k;
X /= k;
return result;
}
}
}
}

I've given it a new name to avoid clashing with yours when I
compile both.

I've changed the interface, you can't be exporting symbol like U32
from a random number generator, and you can't be dependent on that
symbol from outside. It can take an int and assert fail if out of
range, that loses us some high values, but since it's a wrapper
function, it's unlikely to matter.

It's not clear to me why you need two functions, but I kept the
concept of the wrapper and gubbins. In general, it's often a good
way to go.

Now I understand that the code is storing bits in internal buffers,
and just doling out the number it needs to honour the random
request. The exact mathematics I can't follow. Generally you want
to provide a link or search term to the place your algorithm was
published (presuming it's not your own invention) so anyone
who's interested can follow. You can't provide a detailed
explanation in source code comments.

There's no need to be recursive. Whilst tail recursion is frequently
optimised, it's confusing in C code. Keep recursion for inherently
splitting, not as a concise form of iteration.

Richard Heathfield

unread,
Sep 27, 2014, 7:35:15 PM9/27/14
to
Keith Thompson wrote:

> Richard Heathfield <inv...@see.sig.invalid> writes:
> [...]
>> Why? (Second time of asking.) The fact that a name is much-used is not a
>> sufficient reason for making it short. In fact, it could be argued that
>> it's a reason for making it long. I understand why sin(), cos(), and
>> tan() have been chosen, rather than sine(), cosine(), and tangent() -
>> this is in line with standard mathematical usage. But there is no good
>> reason for, say, shortening create() to creat(), a fact that kt pointed
>> out himself.
>
> Whereas there must be a very good reason to shorten Ken Thompson to kt.
> 8-)}

:-) There is indeed a very good reason. Just as sin(), cos(), and tan() are
canonical abbrevs, so is kt. For yet another, see my sig (and that's
another). Canonical abbrevs convey significant amounts of information in a
very dense fashion. It is non-canonical abbrevs that I prefer to avoid.

Identifiers should not be long for the sake of being long. They should be as
long as is appropriate for maximising their clarity (subject to other
constraints, such as broken linkers or the Standard's rather low minimum
number of guaranteed significant characters in an identifier).

Here is a counter-example to my point, taken from my own codebase:
pcl_BbstSetExecutorFunction

This is 27 characters long. The Standard only guarantees us 31 characters
for external identifiers, so this is perilously close. Were I to expand it
(as I would prefer) to pcl_BalancedBinarySearchTreeSetExecutorFunction (!),
it would come to 47 characters, which could well impact on its portability,
especially as the 31 would only give me pcl_BalancedBinarySearchTreeSet,
which would clash with what would have rejoiced in the name of
pcl_BalancedBinarySearchTreeSetComparatorFunction - something had to give,
so I abbrev'd BalancedBinarySearchTree to Bbst.

It's a trade-off. Clarity against brevity. I prefer to err on the side of
clarity. But "kt" is self-documenting. :-)

glen herrmannsfeldt

unread,
Sep 27, 2014, 9:01:12 PM9/27/14
to
Richard Heathfield <inv...@see.sig.invalid> wrote:

(snip)

> :-) There is indeed a very good reason. Just as sin(), cos(), and tan() are
> canonical abbrevs, so is kt. For yet another, see my sig (and that's
> another). Canonical abbrevs convey significant amounts of information in a
> very dense fashion. It is non-canonical abbrevs that I prefer to avoid.

I suppose you might get different answers from different people,
and many won't know sin or cos at all.

As far as I know, kt (usually kT) is the abbreviation for thermal
energy, which is Boltzmann's constant times temperature.
At room temperature, it is about 25meV (0.025eV), which is the
electrical noise due to thermal effects in circuits.

Is that the meaning you meant?

-- glen

Richard Heathfield

unread,
Sep 28, 2014, 4:02:11 AM9/28/14
to
glen herrmannsfeldt wrote:

> Richard Heathfield <inv...@see.sig.invalid> wrote:
>
> (snip)
>
>> :-) There is indeed a very good reason. Just as sin(), cos(), and tan()
>> :are
>> canonical abbrevs, so is kt. For yet another, see my sig (and that's
>> another). Canonical abbrevs convey significant amounts of information in
>> a very dense fashion. It is non-canonical abbrevs that I prefer to avoid.
>
> I suppose you might get different answers from different people,
> and many won't know sin or cos at all.

People need to go to school more often instead of doing Twitbook or Faceoff
or whatever these new-fangled contraptions are called. Then they could learn
all about sin. And cos, of course.

> As far as I know, kt (usually kT) is the abbreviation for thermal
> energy, which is Boltzmann's constant times temperature.
> At room temperature, it is about 25meV (0.025eV), which is the
> electrical noise due to thermal effects in circuits.
>
> Is that the meaning you meant?

kt = Ken Thompson
dmr = Dennis M Ritchie
bwk = Brian W Kernighan

I find it quite odd that I should have to explain these abbrevs in
comp.lang.c.

Morris Dovey

unread,
Sep 28, 2014, 4:22:12 AM9/28/14
to
On 9/28/14 3:01 AM, Richard Heathfield wrote:

> I find it quite odd that I should have to explain these abbrevs in
> comp.lang.c.

"Usenet is a strange place" - dmr 29 July 1999 :-D

Tim Rentsch

unread,
Sep 28, 2014, 4:29:37 AM9/28/14
to
James Dow Allen <jdall...@yahoo.com> writes:

> On Saturday, September 27, 2014 2:05:55 PM UTC+7, James Dow Allen wrote:
>> Tim Rentsch <t...@alumni.caltech.edu> might have writ, in
>
>> Try
>> uniform_32_below(2);
>>
>> It should return 1 about 50% of the time, no? Instead it returns
>> 1 about 66% of the time.
>
> I need to apologize to Tim for this. This was NOT his bug but
> rather a cockpit failure by me. :-(

Thank you for checking. Apparently you were posting this
while I was composing my reply to the earlier message.

> I did notice something of slight interest. When I count the
> actual bits of PRNG consumed, I find
> .......... James ........ Tim
> k=127 ... 6.988736 ... 7.036160
> k=128 ... 7.000000 ... 7.000000
> k=129 ... 7.011264 ... 7.132160

I will comment about this as part of a response to your other
posting.

> No big deal. But the perfectionsism to approach optimality
> explains the extra eight lines of my code that Tim was
> concerned enough to eliminate.

Just one remark here - my concern was not to eliminate lines of
code but to offer a simpler solution. The "eight lines of code"
phrase is somewhat misleading because there is no particular
relationship between the two solution (and it doesn't reflect
my code as it was actually written). Anyway more in the next
response.

Richard Heathfield

unread,
Sep 28, 2014, 4:54:28 AM9/28/14
to
Morris Dovey wrote:

> On 9/28/14 3:01 AM, Richard Heathfield wrote:
>
>> I find it quite odd that I should have to explain these abbrevs in
>> comp.lang.c.
>
> "Usenet is a strange place" - dmr 29 July 1999 :-D

:-) Quite so.

For the record, since someone (glenn?) mentioned that some people might not
be familiar with sine and cosine:

Consider a circle with radius 1, centred around the origin (0, 0), and a
point P = (x, y) on that circle. Draw a line from the origin to P, and drop
a perpendicular from P to (x, 0) to give you a right-angled triangle. (If y
is negative, the perpendicular goes up to the x-axis rather than down, but
that's fine.) Let theta be the angle between the x-axis and the radial line,
measuring anti-clockwise.

By definition, sine(theta) - often written sin theta - is the distance y,
and cosine(theta) - often written cos theta - is the distance x.
Furthermore, tangent(theta) - often written tan theta - is
sine(theta)/cosine(theta) for all points where cosine(theta) is not equal to
0, and is undefined where cosine(theta) is 0.

And this is where we get back on topic. Here is some C code to approximate
sine(x) using 'prec' iterations of a Taylor series, where x is measured in
radians:

#include <math.h>

double sine(double x, unsigned int prec)
{
int sign = -1;
int divmultiplier = 3;
double q = 6.0;
double s = x;
double x2 = x * x;
if(s < -TWOPI)
{
double n = floor(-s / TWOPI);
s += (n * TWOPI);
s = fmod(s, TWOPI);
}
if(s >= TWOPI)
{
s = fmod(x, TWOPI);
}
while(prec-- > 0)
{
x *= x2;
s += ((x * sign) / q);
q *= ++divmultiplier;
q *= ++divmultiplier;
sign = -sign;
}
return s;
}

The cosine function is left as an exercise.

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999

Udyant Wig

unread,
Sep 28, 2014, 5:09:28 AM9/28/14
to
Richard Heathfield <inv...@see.sig.invalid> writes:

| kt = Ken Thompson
| dmr = Dennis M Ritchie
| bwk = Brian W Kernighan
|
| I find it quite odd that I should have to explain these abbrevs in
| comp.lang.c.

I, for one, found the ambiguity in the 'k' in "kt". I had a good hunch
that 't' stood for "Thompson", but could not decide whether 'k' meant
"Ken" or "Keith"; the latter because I suspected I had missed some
reference upthread.

--
Udyant Wig
GitHub: https://github.com/udyant
Poetry: http://www.writing.com/main/profile/biography/frosthrone
It is loading more messages.
0 new messages