TIA
Andrew
>I'm trying to get an algorithm, preferably in C, that would pick up a =
>random positive integer number, say, between 1 and n, n being a positive =
>integer, too, but there would be a linear probability of picking the =
>smaller number first. The higher the number, the less likely it will be =
>selected. So 1 would have the most, n the least chance to be picked, and =
>so on for the numbers in between.
Interesting problem, and not trivial. You could start with the quite
clever Floyd's Algorithm for generating random sets explained in Jon
Bentley's More Programming Pearls. I think the key to solving this problem
is in that algorithm, but it will require modification.
>I'm trying to get an algorithm, preferably in C, that would pick up a =
>random positive integer number, say, between 1 and n, n being a positive =
>integer, too, but there would be a linear probability of picking the =
>smaller number first. The higher the number, the less likely it will be =
>selected. So 1 would have the most, n the least chance to be picked, and =
>so on for the numbers in between.
Here it is:
http://www.cs.ucsb.edu/~cs130a/slides/GrayCode.txt
2. Floyd's Algorithm.
The algorithm S is so simple that it's hard to imagine how it can
possibly be improved. Bob Floyd, (1978 Turing award winner) observed
the following peculiarity of algorithm S. Suppose N = M = 100.
At the time size = 99, the set S contains all but one element, and
the remaining job should be trivial. Unfortunately, the algorithm S
blindly keeps generating random numbers until it hits that single
number... Even with truly random source, it will take on average 100
tries; with non-random sources, the algorithm may not even converge.
So, Floyd set out to find a sample algorithm that makes exactly one
call to randInt for each item in S. Here is his first algorithm F1.
Sample (M, N)
if M = 0 return the empty set
else
S = Sample (M-1, N-1)
T = randInt (1, N)
if T not in S then
insert T in S
else insert N in S
return S
Proof idea: Suppose M = 5, N = 10. The algorithm first computes a
4 element random set from 1-9. Next, it chooses a random number
between 1 and 10 for T. Of the 10 possible values for T, exactly
5 result in 10 being inserted into S: four values in S plus 10
itself. Thus, element 10 is added to S with correct probability,
5/10.
A non-recursive implementation of Floyd's algorithm.
S = emptyset;
for J = N-M+1 to N do
T = randInt (1, J)
if T not in S then
insert T in S
else insert N in S
return S
I thought about this and would have approached it somewhat
differently, trying to make only a single call to the
random() function (or whatever it's called on various implementations).
Given the set of numbers 1...N, the OP wanted an algorithm
which would bias the RNG to the lower numbers in a linear
fashion. I interpreted this to mean that, if the set
of numbers was, for example N = 10, that 10 would be
chosen 10 times less frequently than 1.
We can temporarily reverse the selection (and
then unreverse it later) so that 10 is 10 times
more likely to be chosen than 1.
We can postulate that the total number of probablities
(I'm not sure this is right word).
is the sum of the range, i.e. sigma(1...N)
which in this case happens to be 55.
(1 occurence of 1, 2 occurrences of 2, ...
10 occurences of 10)
We can then pick a random integer in the range 1...55
(using the appropriate modulo arithmetic, etc.)
and determine which of the actual integers in the
range 1...10 it maps to by compparing the number
gotten with the cumulative probability frequencies
of any given number. Call this result A.
Unreversing the process again, we choose the result to
be N +1 -A to solve the OP's problem.
Now, whether or not this is more efficient than
calling random() or rand() multiple times
as described above is
probably implementation dependent.
Just my stab at it.
--
"It is impossible to make anything foolproof
because fools are so ingenious"
- A. Bloch
SNIP
> Given the set of numbers 1...N, the OP wanted an algorithm
> which would bias the RNG to the lower numbers in a linear
> fashion. I interpreted this to mean that, if the set
> of numbers was, for example N = 10, that 10 would be
> chosen 10 times less
DRAT! I meant 10 times *MORE* frequently than 1
My apologies!
frequently than 1.
>
SNIP
>I thought about this and would have approached it somewhat
>differently, trying to make only a single call to the
>random() function (or whatever it's called on various implementations).
I think an implementation would only require one call to rand() if they
only wnated one value. You may have misunderstood my point. I wasn't
saying that algorithm solved his problem. I said it was a good starting
point.
>We can postulate that the total number of probablities
>(I'm not sure this is right word).
>is the sum of the range, i.e. sigma(1...N)
>which in this case happens to be 55.
>(1 occurence of 1, 2 occurrences of 2, ...
>10 occurences of 10)
>
>We can then pick a random integer in the range 1...55
>(using the appropriate modulo arithmetic, etc.)
>and determine which of the actual integers in the
>range 1...10 it maps to by compparing the number
>gotten with the cumulative probability frequencies
>of any given number. Call this result A.
>
>Unreversing the process again, we choose the result to
>be N +1 -A to solve the OP's problem.
>
>Now, whether or not this is more efficient than
>calling random() or rand() multiple times
>as described above is
>probably implementation dependent.
>
>Just my stab at it.
In any event, I do like your solution. Clever. Here's a quick
implementation of it that seems to work well and print its results:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int WeightedRandRange( int Range);
#define RAND_RANGE(n) (rand() / (RAND_MAX / n + 1))
#define MAX_RANGE 1000 // Whatever is reasonable
#define TEST_RANGE 10
#define TEST_RUNS 1000000 // Run 1,000,000 times to get a good sample
main()
{
srand(time(NULL));
int i, accum[TEST_RANGE+1] = {0};
for (i = 0; i < TEST_RUNS; i++)
accum[WeightedRandRange(TEST_RANGE)]++;
for (i = 1; i <= TEST_RANGE; i++)
// Convert to percentage
printf("%d occurred %0.2f percent of the time\n", i,
(double)accum[i]/ (TEST_RUNS/100) );
return 0;
}
// Linearly weighted
int WeightedRandRange( int Range)
{
// Save these values for efficiency
static int Inited = 0;
static int Lookup[MAX_RANGE] = {0}, Total = 0;
int i, j, k;
// Don't init every time
if (Inited != Range)
{
Inited = Range;
// find sum for range
for (i = 1; i <= Range; i++)
Total += i;
// build the lookup table
for ( k = 1, i = 0; i < Total; k++)
for ( j = 0; j <= Range - k; j++)
Lookup[i++] = k;
}
// Get a random number up to the Total
return Lookup[RAND_RANGE(Total)];
}
> In comp.programming
> Nick Landsberg <huk...@NOSPAM.att.net> wrote:
>
>
>>I thought about this and would have approached it somewhat
>>differently, trying to make only a single call to the
>>random() function (or whatever it's called on various implementations).
>
>
> I think an implementation would only require one call to rand() if they
> only wnated one value. You may have misunderstood my point. I wasn't
> saying that algorithm solved his problem. I said it was a good starting
> point.
It was and it got me thinking. It seemed to require
multiple calls to the rand() function, but I may have
been mistaken. My apologies if I misunderstood.
BTW - I like what you did with the algorithm.
Now it's up to the OP to extend this for an
arbitrary range using malloc(), etc. ::grin::
but that's probably another exericise on the
homework list :)
>BTW - I like what you did with the algorithm.
>Now it's up to the OP to extend this for an
>arbitrary range using malloc(), etc. ::grin::
>but that's probably another exericise on the
>homework list :)
Yeah, I wasn't about to use a local array of size MAX_INT but I didn't want
to complicate it with a malloc call and subsequent error checking. I
originally didn't even have the static init code it it but couldn't stand
the thought of running that code 100,000X for no reason :) Were I writing
it for keeps, I would use dynamic memory.
TIA
Andrew
Think of flipping a coin. You count how many times to you have to flip it
until you get Heads.
If you get Heads the first time, you return 1.
If you first get Tails and then Heads, you return 2.
If you have to flip the coin more than n times, you wrap around and start
over at 1.
int nrand(int n)
{
int counter = 0;
for (;;)
{
int x = rand();
if (x > MAX_RAND / 2)
break;
counter++;
}
return 1 + counter % n;
}
Carsten Hansen
Interesting problem. If I understand what you mean by linear, here is a
complete solution:
int randLinearBiasForLow (int n) {
int m = n * (n + 1) / 2;
int i, x;
x = rand() % m;
for (i=1; i <= n; i++) {
if (x < i) return n+1-i;
x -= i;
}
}
Simply, we create n*(n+1)/2 slots, and assign 1 slot to n, 2 slots to n-1, 3
slots to n-2, etc. Then we pick a slot at random and map it to the value it
was assigned to. The weighting is exactly linear as you desire. The mapping
function I show is linear (so this is O(n)). You could improve this to O(1) by
actually solving the quadratic equation and truncating, but I think I'll leave
that as an exercise to the reader. :)
Here's a related problem if you are interested:
http://www.pobox.com/~qed/bitdist.html
--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Firstly, I think your algorithm can be expressed a little more naturally as:
int nrand(int n)
{
int counter = 0;
int x;
while((x = rand()) <= RAND_MAX / 2)
{
++counter;
}
return 1 + counter % n;
}
Secondly, I'm not sure it answers the question. The OP calls for a linear
approach but, in your solution, 1 has (roughly!) a 50% chance of being
selected, 2 has a 25% chance of being selected, 3 has a 12.5% chance of
being selected, and so on. If you plot these on a graph, you get a curve,
rather than a straight line. (Well, I know a straight line /is/ a curve,
but it's not a very bendy one.)
I believe the OP is after a straight-line function, so y = mx + c may prove
to be more enlightening than y = 1/(2^x).
--
Richard Heathfield : bin...@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton
> Think of flipping a coin. You count how many times to you have to flip it
> until you get Heads.
> If you get Heads the first time, you return 1.
> If you first get Tails and then Heads, you return 2.
> If you have to flip the coin more than n times, you wrap around and start
> over at 1.
>
> int nrand(int n)
> {
> int counter = 0;
> for (;;)
> {
> int x = rand();
> if (x > MAX_RAND / 2)
> break;
> counter++;
> }
> return 1 + counter % n;
> }
This will result in an expontentially weighted distribution, not a linear one.
Pr(selecting x) = 2^(n-x)/(2^n - 1)
While looking at this again, I stumbled upon another solution which is even
more twisted:
int randBiasLow (int n) {
int x, y;
x = rand () % n;
y = rand () % n;
if (x > y) x = y;
return x + 1;
}
Now, the probabilities are *linear*, however they are shifted to a different
offset and slope. Why this works is left as an exercise to the reader. :)
> Interesting problem. If I understand what you mean by linear, here is a
> complete solution:
> You could improve this to O(1) by
> actually solving the quadratic equation and truncating, but I think I'll leave
> that as an exercise to the reader. :)
Why bother with the quadratic equation?
Just do
int randLinearBiasForLow (int n) {
long int m = n*n - 1;
return (n - (int)floor(sqrt( rand() % m ));
}
(I am not sure floor is needed).
Whether this is better than the solution you posted later depends on
whether sqrt() runs faster than rand(). ;)
Coding an integer sqrt routine makes it run O(log n).
Michael
--
Feel the stare of my burning hamster and stop smoking!
Ok you win. That's the most twisted one. :)
> Whether this is better than the solution you posted later depends on whether
> sqrt() runs faster than rand(). ;)
Yeah, but there's also a numeric range issue -- in real implemenetations these
sqrt() based solutions will only work with really really small n (like what 12
or 8 bits or something?) So my later solution has the slight advantage that it
works at least for the same range that rand() works.
Oh yeah, and as for implementing fast integer square roots you're not going to
do much better than:
http://www.pobox.com/~qed/sqroot.html :)
The following is a mathematical joke.
If you want something more linear, just change the constant. Instead of
RAND_MAX / 2 use maybe RAND_MAX / 100.
For n = 4 we get probabilities of
P(1) ~ 0.253781
P(2) ~ 0.251244
P(3) ~ 0.248731
P(4) ~ 0.246244
Those points lie nicely on a straight line.
Carsten Hansen
>While looking at this again, I stumbled upon another solution which is even
>more twisted:
>
> int randBiasLow (int n) {
> int x, y;
> x = rand () % n;
> y = rand () % n;
> if (x > y) x = y;
> return x + 1;
> }
>
>Now, the probabilities are *linear*, however they are shifted to a different
>offset and slope. Why this works is left as an exercise to the reader. :)
This is beautiful Paul.
Thank you! :-D
> > Whether this is better than the solution you posted later depends on whether
> > sqrt() runs faster than rand(). ;)
>
> Yeah, but there's also a numeric range issue -- in real implemenetations these
> sqrt() based solutions will only work with really really small n (like what 12
> or 8 bits or something?)
Why? sqrt should be numerically quite stable, and my old (95) compiler
offers a sqrt function that operates on long double. Or is it that the
logarithms used in some implementations aren't that accurate?
> So my later solution has the slight advantage that it
> works at least for the same range that rand() works.
Good point - the n*n effectively limits the range to half the size of
the datatype of rand().
The description calls for another parameter. n is one of them, but we
also need the ratio of the probability of selecting the 1 vs. the
probability of selecting n, which defines the slope of the linear
relationship.
Consider transforming this into a selecting a real value in the range 0
to 1, which can easily be scaled into an integer 1 to n. The
probability density function would be a trapezoid. Let the relative
probability of selecting 0 as a and selecting 1 as b. Then the
trapezoid has height a at x=0, height b at x=1. The total area is
(a+b)/2. We can scale a and b so that the area is 1.
Generate a random value r with even distribution in [0,1). Solve for
the value c, from 0 to 1, such that the area under the trapezoid curve
from 0 to c = r. c is then a derived random value with the distribution
described by the trapezoid, which has the linear change in probability.
Once you have the random value c, multiply by n, truncate, and add 1 to
get your desired range.
Solving for c given the area involves the solution of a quadratic
equation. If you write code to implement this allowing for a parameter
a/b, you probably will have to write a special case for even
distribution.
Thad
1 - Randomly generate a number x from 1 to n*(n+1)/2
2 - Compute y = (-1 + sqrt(1 + 8*x))/2 rounded up to the nearest integer
3 - Now the randomly generated integer is z = n + 1 - y with frequency f = (n + 1 - z)/(n *(n+1)/2), which is the "linear"
frequency desired I hope.
The above is untested.
TIA
Andrew
The following is much simpler:
Generate 2 random integers a and b and let z = min(a, b) with linear frequncy f = (2*n - 2*z + 1)/n²
Again this is untested.
>
>
> TIA
> Andrew
>
>
where a and b are in the range of 1 to n.
If a solution is desired where a call to rand takes place only once, then generate one random number x in the range 0 to n² - 1,
then compute z = min(x%n, x/n) + 1.
>
> >
> >
> > TIA
> > Andrew
> >
> >
>
>
Well, Nick, it wasn't a homework exactly, but I am aware of groups being used (or abused) that way. It began when my music collection started to grow and I (being a background music listener while reading) just didn't have the time to spend it in front of my CD collection picking up the disks for my carousel, so I wrote a small program to pick up my next listening from a database of my CDs. That was twelve years ago. I wrote it in C for (laugh if you must) my C64 at the time. It didn't take long to realize that picking them at random could leave out music I might not be hearing for years (well, I could still play them without the computer prompting me to do it, of course, I'm not that manic compulsive yet, I hope), so my solution was to have the program record how many times every CD was picked before, calculate the average, and have a 1:20 chance to pick a CD that was above average listening time. (Don't ask why 20. It seemed to be working so far, letting me listening to my new CDs most of the time, but also letting me to have a random choice at my old disks every once in a while as well).
But I always felt this was an unsatisfactory solution I can live with, but occasionally I was thinking about streamlining it in the way I was asking for in my post. I actually got to the solution of: "1 occurence of 1, 2 occurrences of 2, ...10 occurrences of 10" by Nick Landsberg, only I was afraid of my making an awful spaghetti code of it. (Note in my original post I was asking for a function, not an algorithm).
I'm not a professional programmer, so I will try Paul Hsieh solution:
int randBiasLow (int n) {
int x, y;
x = rand () % n;
y = rand () % n;
if (x > y) x = y;
return x + 1;
}
although I have to admit I still don't see how it works. The fact that Bruce seemed to adore it spoke for it a lot, but can you, Paul, elaborate it a little? I really didn't get your "Now, the probabilities are *linear*, however they are shifted to a different offset and slope".
Oh, I know I am a pest, but thanks again for all your help.
Andrew
> I'm not a professional programmer, so I will try Paul Hsieh solution:
>
> int randBiasLow (int n) {
> int x, y;
> x = rand () % n;
> y = rand () % n;
> if (x > y) x = y;
> return x + 1;
> }
>
> although I have to admit I still don't see how it works.
y
x = y
| /
y = n |--------------+
| / |
| y > x / |
| / |
| / |
| / |
| / x > y |
|/ |
+------------------- x
0 x = n
Consider this graph of the equation x = y for the rectangle 0 < x < n,
0 < y < n. The line x = y partitions the rectangle into two halves,
one for which y > x and the other for which x > y. If you consider
just the y > x half, then you can see that for any randomly chosen
point (with discrete coordinates) inside it, the chance of it landing
in any given vertical slice decreases as the x coordinate increases,
because each vertical slice gets shorter and shorter as we move to the
right: for x = 0, 0 < y < n; for x = 1, 1 < y < n; for x = 2, 2 < y <
n; and so on. So the chance of getting x = 0 is greater than the
chance of getting x = 1 is greater than the chance of getting x = 2,
and so on. The same thing is true for the x > y half: for any
randomly chosen point, the chance of it landing in any given
horizontal slice decreases as the y coordinate increases. So for that
half we just use the y coordinate instead of the x coordinate.
The code might be more transparent if we replaced the two statements
if (x > y) x = y;
return x + 1;
by
if (x > y)
return y + 1;
else
return x + 1;
--
Ben Pfaff
email: b...@cs.stanford.edu
web: http://benpfaff.org
I adore it as well. :)
To understand this algorithm, I was trying it out on a small n, and
graphing all the possible outcomes in a table: if each outcome is
equally likely, you only need to count them to get the probabilities.
For n=6, the table looks like this:
123456
1 111111
2 122222
3 123333
4 123444
5 123455
6 123456
You can easily see the angle patterns, and you can also easily see that
each number occurs +2 more than the next higher number; it is also easy
to see that this pattern extends to any size of the table.
> I really didn't get your "Now, the probabilities are *linear*,
> however they are shifted to a different offset and slope".
Before, the probabilities were proportional to 1,2,3,4,...,n.
With this solution, they're proportional to 1,3,5,7,..., 2n-1. If you
graph these points, you'll see a slightly different slope and offset of
the line.