randomizing trouble

Ben

unread,

Jun 9, 2009, 4:21:23 PM6/9/09

to

I have a gawk script that puts random comments into a file. It is run 3
times in a row in quick succession. I found that seeding the random
number generator using gawk did not work because all 3 times it was run
was done within the same second (and it uses the time) - so I decided to
use bash's random number generator to do it which seems to work, but I
still find that almost every time I run the script I find that a line
appears twice in a row and almost every time a line within the file will
be the same as a line within one of the other 2 files (of the same row).
Ideally, it should always be different and the last line should be
different (always) from the next (there are 30 comments this is working
from). I was wondering if anyone could give me some suggestions as to
either what I am doing wrong or just what can be done to get around this
problem.

Here's the awk script:
# Load in comments file and seed random number generator
BEGIN {
cmt_file = "/home/dir/comments.txt"

while (getline cmt[i++] < cmt_file)
gsub(/ /, "<SP>", cmt[i-1]);

# seed random number generator via shell random number
# Why? Because sted.awk is run 3 times in a row so fast that the time
# used for srand as srand() makes the same random number each time
since
# srand uses whole seconds.
"echo $RANDOM" | getline random;

srand(random);
(! GRADE) && GRADE=5;
}

.
.
.

# Add CONTENT line for random comments
/CONTENT=/ {
# try taking every 4th random number to get around some lines having
# the same comment twice in a row despite random generation.
for (i=0; i<3; i++) rand();
comment = cmt[int(rand() * 30)];

match($0, /.*ctl[0-9]*_3/);
nline = substr($0, 1, RLENGTH - 1) "6" substr($0, RLENGTH + 1);
sub(/CONTENT=.*/, "CONTENT=" comment, nline);
print nline;
}

Thanks.

Kenny McCormack

unread,

Jun 9, 2009, 4:31:21 PM6/9/09

to

In article <7vzXl.1624$u86...@nwrddc01.gnilink.net>, Ben <a...@efg.com> wrote:
>I have a gawk script that puts random comments into a file. It is run 3
>times in a row in quick succession. I found that seeding the random
>number generator using gawk did not work because all 3 times it was run
>was done within the same second (and it uses the time) - so I decided to

When last I ran into this problem, what I did was to save the last value
returned by rand() to a file, then on the next run, read that in and use
that value as the arg to srand(). Worked well.

Note: You may have to multiply the value by something like 32000 since
rand() returns a small decimal and srand() takes an integer. Anyway,
you get the idea...

Aharon Robbins

unread,

Jun 10, 2009, 1:11:36 AM6/10/09

to

In article <h0mgqp$kh8$2...@news.xmission.com>,

Kenny McCormack <gaz...@shell.xmission.com> wrote:
>In article <7vzXl.1624$u86...@nwrddc01.gnilink.net>, Ben <a...@efg.com> wrote:
>>I have a gawk script that puts random comments into a file. It is run 3
>>times in a row in quick succession. I found that seeding the random
>>number generator using gawk did not work because all 3 times it was run
>>was done within the same second (and it uses the time) - so I decided to

You could so something like add PROCINFO["pid"] to the value of the time,
or use that as the seed.

>When last I ran into this problem, what I did was to save the last value
>returned by rand() to a file, then on the next run, read that in and use
>that value as the arg to srand(). Worked well.
>
>Note: You may have to multiply the value by something like 32000 since
>rand() returns a small decimal and srand() takes an integer. Anyway,
>you get the idea...

Gawk doesn't use rand() internally, so it should be producing a larger
range of values.
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL

Thomas Weidenfeller

unread,

Jun 10, 2009, 3:43:05 AM6/10/09

to

Ben wrote:
> I have a gawk script that puts random comments into a file. It is run 3
> times in a row in quick succession. I found that seeding the random
> number generator using gawk did not work because all 3 times it was run
> was done within the same second (and it uses the time) - so I decided to
> use bash's random number generator to do it which seems to work, but I
> still find that almost every time I run the script I find that a line
> appears twice in a row and almost every time a line within the file will
> be the same as a line within one of the other 2 files (of the same row).
> Ideally, it should always be different and the last line should be
> different (always) from the next (there are 30 comments this is working
> from). I was wondering if anyone could give me some suggestions as to
> either what I am doing wrong or just what can be done to get around this
> problem.

Is that good enough (random enough) for your task?

BEGIN {
"od -tu4 -N4 -A n /dev/random" | getline
srand(0+$0)
}

BR,

Thomas

Kenny McCormack

unread,

Jun 10, 2009, 7:49:50 AM6/10/09

to

In article <h0nfa7$8e$2...@news.bytemine.net>,
Aharon Robbins <arn...@skeeve.com> wrote:
...

>>Note: You may have to multiply the value by something like 32000 since
>>rand() returns a small decimal and srand() takes an integer. Anyway,
>>you get the idea...
>
>Gawk doesn't use rand() internally, so it should be producing a larger
>range of values.

I wasn't implying that it was in any way limited. Rather, I was saying
that it returns a value in the 0..1 range, and that you might want to
remap that to something in the 0..(some large integer) range.

Note that "man gawk" is somewhat terse about what sort of values should
be used as the argument to srand(), but it does mention the time of day,
which implies something like a 32 bit integer.

Ben

unread,

Jun 10, 2009, 4:19:39 PM6/10/09

to

Thanks - will try out the suggestions. Much appreciated.