--
/**********************************************************
sa...@fubar.cs.montana.edu
**********************************************************/
Of all the questions that I've been too lazy or too ignorant or
too scatterbrained to add to the comp.lang.c FAQ list over the
years, this question ranks among the most pressing. Finally (as
part of the preparation of the book-length version), I've pulled
together the three answers that are always suggested, whenever
this question comes up, and composed the following. (Disclaimer:
I'm not a numerical analyst; part of my reason for posting this
here now is to let people poke holes in this code.)
Q: How can I generate random numbers with a normal or
Gaussian distribution?
A: There are a number of ways of doing this.
1. Exploit the Central Limit Theorem ("law of large numbers")
and add up several uniformly-distributed random numbers:
#include <stdlib.h>
#define NSUM 12
double gaussrand()
{
double x = 0;
int i;
for(i = 0; i < NSUM; i++)
x += (double)rand() / RAND_MAX;
x -= NSUM / 2.;
x /= NSUM / 12.;
return x;
}
2. Use a method described by Abramowitz and Stegun:
#include <stdlib.h>
#include <math.h>
#define PI 3.141592654
double gaussrand()
{
static double U, V;
static int phase = 0;
double Z;
if(phase == 0) {
U = (rand() + 1.) / (RAND_MAX + 2.);
V = rand() / (RAND_MAX + 1.);
Z = sqrt(-2 * log(U)) * sin(2 * PI * V);
} else
Z = sqrt(-2 * log(U)) * cos(2 * PI * V);
phase = 1 - phase;
return Z;
}
3. Use a method described by Box and Muller, and
discussed in Knuth:
#include <stdlib.h>
#include <math.h>
double gaussrand()
{
static double V1, V2, S;
static int phase = 0;
double Z;
if(phase == 0) {
do {
double U1 = (double)rand() / RAND_MAX;
double U2 = (double)rand() / RAND_MAX;
V1 = 2 * U1 - 1;
V2 = 2 * U2 - 1;
S = V1 * V1 + V2 * V2;
} while(S >= 1);
Z = V1 * sqrt(-2 * log(S) / S);
} else
Z = V2 * sqrt(-2 * log(S) / S);
phase = 1 - phase;
return Z;
}
All three methods generate numbers with mean 0 and standard
deviation 1. (To adjust to some other distribution, multiply by
the standard deviation and add the mean.) Method 1 is poor "in
the tails" (especially if NSUM is small), but methods 2 and 3
perform quite well. See the references for more information.
References:
Knuth Sec. 3.4.1 p. 117.
G.E.P. Box, M.E. Muller, and G. Marsaglia,
Annals Math. Stat. 28 (1958), 610.
Abramowitz and Stegun, ???
Does anyone have the full citations for the Box, Muller, and
Marsaglia or Abramowitz and Stegun papers?
Steve Summit
s...@eskimo.com
>--
>/**********************************************************
> sa...@fubar.cs.montana.edu
> **********************************************************/
Generate a sequence of uniform pseudo-random numbers in the range [t0,t1].
The mean of this sequence is a random variable with a normal distribution.
ie: If x1 and x2 are uniformly distributed random variables in [t0,t1], then
(x1+x2)/2 is a normally distributed random variable in [t0,t1].
Rick Castrapel
Two is the worst approximant of infinity that I have come across in a while
:-)
I think given two numbers x1 and x2 uniform [0,1], -log(x1)*sin(2*pi*x2) and
-log(x1)*cos(2*pi*x2) are two independent normal variates. (Check this: I
rederive it everytime I need it, so what I remember may not be completely
accurate).
Cheers
Tanmoy
--
tan...@qcd.lanl.gov(128.165.23.46) DECNET: BETA::"tan...@lanl.gov"(1.218=1242)
Tanmoy Bhattacharya O:T-8(MS B285)LANL,NM87544-0285,USA H:#3,802,9 St,NM87545
Others see <gopher://yaleinfo.yale.edu:7700/00/Internet-People/internet-mail>,
<http://alpha.acast.nova.edu/cgi-bin/inmgq.pl>or<ftp://csd4.csd.uwm.edu/pub/
internetwork-mail-guide>. -- <http://nqcd.lanl.gov/people/tanmoy/tanmoy.html>
fax: 1 (505) 665 3003 voice: 1 (505) 665 4733 [ Home: 1 (505) 662 5596 ]
Amazing.
Since when did the Central Limit Theorem become a statement about small
sample properties? (Not to mention the remarkable circumstance of a
normal variate with a finite interval for its domain.)
The proposed algorithm (averaging sets of uniform variates) is *only* a
practical approximation.
Cheers,
ar
This looks even more bizarrely redundant, doesn't it? Please read on.
>
> return x;
> }
>
> 2. Use a method described by Abramowitz and Stegun:
>
[a method based on a mathematical relation between the joint
distribution of a pair of independent uniform variates and that of a
pair of independent gaussians: don't bother to puzzle it out if the
phrase "Jacobian determinant" means nothing to you :-)]
>
> 3. Use a method described by Box and Muller, and
> discussed in Knuth:
[a variation on Method 2 to save on the trig function calls at the
expense of possibly multiple rand() calls: the basic idea is to sample
a point in the unit circle, and use the coordinates to calculate the
sin and cos values (!) -- a cool hack.]
The CLT basically says that given _any_ random variate (e.g., not
necessarily uniform and *also* not necessarily continuously distributed)
with certain constraints on its first two moments, for a sample of
size N the statistic
x(N) = ( Sum(N) - N*Expectation ) / sqrt( N*Variance )
tends to the Standard Normal distribution as N -> oo.
So, given precautions against overflow, normalizing rand() values
before summing at each iteration isn't really necessary: the
Expectation U/2 and Variance U^2/12 of a continuous [0,U] uniform
variate can be factored in at the end.
The practical consideration is how *small* can we make N without being
dangerously off-mark? 12 is the number usually cited in books -- it
saves the sqrt() call and obviates a division :-) -- but, as noted,
this renders the method definitely inferior to the next 2 methods.
>Does anyone have the full citations for the Box, Muller, and
>Marsaglia or Abramowitz and Stegun papers?
>
M. Abramowitz and I.A.Stegun (1964)
Handbook of Mathematical Functions
Applied Mathematics Series, vol. 55 (Washington: Nat. Bur. of Standards)
Reprinted, 1968, Dover Publications.
Cheers,
ar
The transformation is
y1 = sqrt( -2 * ln( x1 ) ) * sin( 2 * pi * x2 )
y2 = sqrt( -2 * ln( x1 ) ) * cos( 2 * pi * x2 )
with inverse
x1 = exp( -( y1^2 + y2^2 ) / 2 )
x2 = arctan( y1 / y2 ) / ( 2 * pi )
The Jacobian determinant of this transformation is the product of
gaussian density functions in y1 and y2.
Cheers,
ar
>>--
>>/**********************************************************
>> sa...@fubar.cs.montana.edu
>> **********************************************************/
>Rick Castrapel
Kurt Watzka (wat...@stat.uni-muenche.de), in a private e-mail, quite quickly
and quite correctly
called me on this one for overstating the case.
The problem is that, as the size of the uniform random sequence over which
you are taking the mean goes to infinity, the variance of that mean shrinks
to zero. Thus you cannot truly get a normal distribution.
However, the Central Limit Theorem justifies using this approximation in
computation. With tail between my legs, I apologize for the lack of
mathematical rigor, but I take refuge in the fact that those of us in
scientific computation must make due with approximations.
Rick Castrapel
> Does anybody have an algorithm (or C-code) for generating "normal"
> distributions from a random number generator?
Here is my file ranstdnor.c
I generate z ~ N(0,1), if you want N(mu, sigma), use mu + z*sigma
#include <math.h>
#include <stdio.h>
#include <utils.h>
#include <distribs.h>
double ranstdn(void) /* TOMS 712 */
{
const float s = .449871f, t = -.386595f, a = .196f, b = .25472f,
r1 = .27597f, r2 = .27846f;
float q, u, v, x, y;
do {
u = ranu(); v = ranu();
x = u - s;
v = (v - 0.5f) * 1.7156f;
y = fabs(v) - t;
q = x*x + y*(a*y - b*x);
/* Accept P if inside inner ellipse */
if (q < r1) return v/u;
if (q > r2) continue;
if (v*v > -4*log(u)*u*u) continue;
} while (1);
return v/u; /* ratio of P's coordinates is normal deviate */
}
/* if we are doing either of the two regression-testing procedures
we need the same fortran RNG: */
#ifdef COMPARE_TESTING
#define NEED_FORTRAN_RANU 1
#endif
#ifdef REG_TESTING
#ifndef NEED_FORTRAN_RANU
#define NEED_FORTRAN_RANU 1
#endif
#endif
/* If either COMPARE_TESTING or REG_TESTING are on, we NEED_FORTRAN_RANU */
#ifdef NEED_FORTRAN_RANU
struct {
int indx;
} ranuc_;
#define ranuc_1 ranuc_
double ranu()
{
/* Initialized data */
static float u[279] = { (float).5139,(float).1757,(float).3087,(float)
.5345,(float).9476,(float).1717,(float).7022,(float).2264,(float)
.4948,(float).1247,(float).0839,(float).3896,(float).2772,(float)
.3681,(float).9834,(float).5354,(float).7657,(float).6465,(float)
.7671,(float).7802,(float).823,(float).1519,(float).6255,(float)
.3147,(float).3469,(float).9172,(float).5198,(float).4012,(float)
.6068,(float).7854,(float).9315,(float).8699,(float).8665,(float)
.6745,(float).7584,(float).5819,(float).3892,(float).3556,(float)
.2002,(float).8269,(float).4159,(float).4635,(float).9792,(float)
.1264,(float).2126,(float).9585,(float).7375,(float).4091,(float)
.7801,(float).7579,(float).9568,(float).0281,(float).3187,(float)
.7569,(float).243,(float).5895,(float).0434,(float).956,(float)
.3191,(float).0594,(float).4419,(float).915,(float).5722,(float)
.1188,(float).5698,(float).252,(float).4959,(float).2367,(float)
.477,(float).4061,(float).873,(float).427,(float).3582,(float)
.382,(float).0432,(float).1606,(float).5224,(float).6966,(float)
.0971,(float).4008,(float).7734,(float).2448,(float).3428,(float)
.23,(float).2979,(float).3045,(float).8872,(float).0367,(float)
.6511,(float).3986,(float).6763,(float).7326,(float).9378,(float)
.2333,(float).8385,(float).9672,(float).7786,(float).4315,(float)
.6741,(float).8094,(float).1588,(float).2799,(float).1353,(float)
.8642,(float).7502,(float).208,(float).14,(float).2946,(float)
.8028,(float).2189,(float).5631,(float).7156,(float).1975,(float)
.9898,(float).25,(float).4306,(float).7553,(float).8609,(float)
.8948,(float).9781,(float).3954,(float).4322,(float).1271,(float)
.4577,(float).2378,(float).986,(float).6528,(float).6042,(float)
.2419,(float).4549,(float).79,(float).0788,(float).4764,(float)
.1526,(float).2458,(float).945,(float).614,(float).9882,(float)
.4773,(float).7997,(float).7442,(float).3807,(float).4799,(float)
.5269,(float).0981,(float).5942,(float).3472,(float).1434,(float)
.7795,(float).711,(float).4461,(float).7046,(float).0953,(float)
.9628,(float).5513,(float).7403,(float).579,(float).6379,(float)
.7817,(float).1879,(float).3021,(float).2828,(float).684,(float)
.2929,(float).5654,(float).4184,(float).3066,(float).4445,(float)
.5657,(float).4879,(float).6066,(float).4159,(float).1304,(float)
.256,(float).0358,(float).9771,(float).1145,(float).3781,(float)
.6467,(float).3504,(float).553,(float).3584,(float).5655,(float)
.4756,(float).1637,(float).6152,(float).1722,(float).5547,(float)
.2922,(float).8722,(float).8351,(float).8449,(float).8955,(float)
.5948,(float).5406,(float).1682,(float).655,(float).6905,(float)
.2639,(float).1067,(float).8149,(float).1914,(float).4233,(float)
.3519,(float).8392,(float).1373,(float).2627,(float).1773,(float)
.4799,(float).3802,(float).5048,(float).5028,(float).3519,(float)
.5256,(float).1206,(float).5196,(float).6071,(float).7329,(float)
.5569,(float).3441,(float).802,(float).591,(float).2669,(float)
.6707,(float).5522,(float).7889,(float).8877,(float).89,(float)
.0681,(float).8006,(float).9074,(float).6441,(float).1652,(float)
.3014,(float).1663,(float).2852,(float).842,(float).5363,(float)
.0363,(float).2072,(float).0212,(float).3581,(float).6215,(float)
.52,(float).546,(float).1537,(float).8234,(float).0334,(float)
.026,(float).3781,(float).6163,(float).0204,(float).6266,(float)
.9152,(float).3748,(float).7295,(float).3958,(float).9823,(float)
.5973,(float).1123,(float).2216,(float).7992,(float).8707,(float)
.7382,(float).0136,(float).7396,(float).4184,(float).362,(float)
.2039,(float).1832,(float).0763,(float).1156,(float).1591,(float)
.7883,(float).0404,(float).7906,(float).599,(float).4026,(float)
.2291 };
/* System generated locals */
float ret_val;
/* This is a dummy routine for generated uniform numbers in the */
/* interval (0,1). A list of stored numbers is sequentially */
/* accessed and returned. The routine is supplied to permit testing */
/* of the subprograms RANDN() and RAND0(). */
/* The routine cycles through 279 values stored in the data array U. */
/* The variable INDX in the named common RANUC retains the index of the
*/
/* number returned. This variable can be initialized to 0 in a calling
*/
/* routine to restart the sequence. */
++ranuc_1.indx;
if (ranuc_1.indx > 279) {
ranuc_1.indx = 1;
}
ret_val = u[ranuc_1.indx - 1];
return ret_val;
} /* randu_ */
#endif /* NEED_FORTRAN_RANU */
#ifdef COMPARE_TESTING /* compare against ranstdn_dumb */
double ranstdn_dumb(void)
{
static float r, u, v;
/* This function generates a normally distributed number using the */
/* ratio of uniforms method. No auxiliary boundary curves are used. */
/* Calls to the subprogram RANDU() must return independent random */
/* numbers uniform in the interval (0,1). */
/* Generate P = (u,v) uniform in rectangle enclosing acceptance region */
L50:
u = ranu();
v = ranu();
v = (v - .5f) * 1.7156f;
/* Compute coordinate ratio for candidate point */
r = v/u;
/* Reject point if outside acceptance region */
/* Computing 2nd power */
if (r*r > -4*log(u)) goto L50;
/* Return ratio as the normal deviate */
return r;
} /* randn0_ */
#include <stdio.h>
int main()
{
int i;
float good[100], forcheck[100];
FILE *univar;
ranuc_.indx = 0;
for (i=0; i<100; i++) good[i] = ranstdn();
printf("index at end of ranstdn is %d\n", ranuc_.indx);
ranuc_.indx = 0;
for (i=0; i<100; i++) forcheck[i] = ranstdn_dumb();
printf("index at end of dumb is %d\n", ranuc_.indx);
printf("Look at properties of discrepency:\n");
fflush(stdout);
univar = popen("univar -1", "w");
for (i=0; i<100; i++) fprintf(univar, "%g\n", forcheck[i]-good[i]);
pclose(univar); fflush(stdout);
printf("\nExpect N(0,1):\n");
fflush(stdout);
univar = popen("univar -1", "w");
for (i=0; i<100; i++) fprintf(univar, "%f\n", good[i]);
pclose(univar); fflush(stdout);
return 0;
}
#endif /* COMPARE_TESTING */
#ifdef REG_TESTING
int main()
{
int i;
ranuc_.indx = 0;
for (i=0; i<100; i++) printf("%f\n", ranstdn());
return 0;
}
#endif /* REG_TESTING */
/* --------------------------------------------------------------------- */
/* Now we go on to Numerical Recipes version */
double ranstdn2(void)
/* Creates 2 draws from N(0, 1) on each alternate call.
So even-numbered calls cause no calculations, a variable
"saved" is simply returned. The variable "alternating"
tracks which'th call this is.
*/
{
static int alternating=0;
static double saved;
double fac, r, v1, v2;
if (alternating) {
alternating = 0; return saved;
} else {
do {
v1=2.0*ranu()-1.0;
v2=2.0*ranu()-1.0;
r=v1*v1 + v2*v2;
} while (r >= 1.0 || r == 0.0);
fac = sqrt(-2.0*log(r)/r);
saved = v1*fac;
alternating = 1;
return v2*fac;
}
}
/* Another algorithm for ranstdn():
Algorithm 488, CACM, Vol. 17, #12, p704, Dec 1974, by R. P. Brent.
Except on the first call ranstdn3 returns a pseudo-random number
having a gaussian (i.e. normal) distribution with zero mean and unit
standard deviation.
Thus, the density is f(x) = exp(-0.5*x**2)/sqrt(2.0*pi).
The first call initializes ranstdn3 and returns zero
(no hassles -- it's a legit draw from N(0,1)).
It calls a function ranu(), and it is assumed that successive calls
to ranu(0) give independent pseudo- random numbers distributed
uniformly on (0, 1), possibly including 0 (but not 1). the method
used was suggested by Von Neumann, and improved by Forsythe, Ahrens,
Dieter and Brent. on the average there are 1.37746 calls of ranu() for
each call of ranstdn().
Warning - dimension and data statements below are machine-dependent.
The dimension of d must be at least the number of bits in the fraction
of a float number. Thus, on most machines the data statement
below can be truncated.
How is d calculated:
If the integral of sqrt(2.0/pi)*exp(-0.5*x**2) from a(i) to infinity
is 2**(-i), then d(i) = a(i) - a(i-1). Thus it seems that we could
do better than the constants in this code by calculating d[] in
double precision using a modern pnorm1. */
double ranstdn3(void)
{
/* Initialized data */
static float d[60] = { .67448975f,.47585963f,.383771164f,.328611323f,
.291142827f,.263684322f,.242508452f,.225567444f,.211634166f,
.199924267f,.189910758f,.181225181f,.1736014f,.166841909f,
.160796729f,.155349717f,.150409384f,.145902577f,.141770033f,
.137963174f,.134441762f,.13117215f,.128125965f,.12527909f,
.122610883f,.12010356f,.117741707f,.115511892f,.113402349f,
.11140272f,.109503852f,.107697617f,.105976772f,.104334841f,
.102766012f,.101265052f,.099827234f,.098448282f,.097124309f,
.095851778f,.094627461f,.093448407f,.092311909f,.091215482f,
.090156838f,.089133867f,.088144619f,.087187293f,.086260215f,
.085361834f,.084490706f,.083645487f,.082824924f,.082027847f,
.081253162f,.080499844f,.079766932f,.079053527f,.078358781f,
.077681899f };
static float u = 0;
/* Local variables */
static float a;
static int i;
static float v, w;
/* U MUST BE PRESERVED BETWEEN CALLS. */
/* INITIALIZE DISPLACEMENT A AND COUNTER I. */
a = 0;
i = 0;
/* INCREMENT COUNTER AND DISPLACEMENT IF LEADING BIT */
/* OF U IS ONE. */
L10:
u += u;
if (u < 1) goto L20;
u -= 1;
++i;
a -= d[i - 1];
goto L10;
/* FORM W UNIFORM ON 0 .LE. W .LT. D(I+1) FROM U. */
L20:
w = d[i] * u;
/* FORM V = 0.5*((W-A)**2 - A**2). NOTE THAT 0 .LE. V .LT. LOG(2). */
v = w * (w * .5f - a);
/* GENERATE NEW UNIFORM U. */
L30:
u = ranu();
/* ACCEPT W AS A RANDOM SAMPLE IF V .LE. U. */
if (v <= u) goto L40;
/* GENERATE RANDOM V. */
v = ranu();
/* LOOP IF U .GT. V. */
if (u > v) goto L30;
/* REJECT W AND FORM A NEW UNIFORM U FROM V AND U. */
u = (v - u) / (1 - u);
goto L20;
/* FORM NEW U (TO BE USED ON NEXT CALL) FROM U AND V. */
L40:
u = (u - v) / (1 - v);
/* USE FIRST BIT OF U FOR SIGN, RETURN NORMAL VARIATE. */
u += u;
if (u < 1) return (a-w);
else {
u -= 1; return (w-a);
}
} /* ranstdn3() */
#ifdef TESTING
#include <stdlib.h>
int main(int argc, char **argv)
{
int N, i, which, silent;
double x;
if (argc != 4) {
fprintf(stderr, "Usage: %s [123] N [-s]\n", argv[0]);
return 1;
}
which = atoi(argv[1]);
N = atoi(argv[2]);
silent = argc > 3;
switch (which) {
case 1 : {
printf("Using rantsdn() [TOMS 712, 1992]\n");
if (silent)
for (i=0; i<N; i++) x = ranstdn();
else
for (i=0; i<N; i++) printf("%.6g\n", ranstdn());
break;
}
case 2 : {
printf("Using ranstdn2() [Numerical Recipes]\n");
if (silent)
for (i=0; i<N; i++) x = ranstdn2();
else
for (i=0; i<N; i++) printf("%.6g\n", ranstdn2());
break;
}
case 3 : {
printf("Using ranstdn3() [CACM 488]\n");
if (silent)
for (i=0; i<N; i++) x = ranstdn3();
else
for (i=0; i<N; i++) printf("%.6g\n", ranstdn3());
break;
}
default : {
fprintf(stderr, "2nd argument must select a ranstdn() to time.\n");
fprintf(stderr, "It must be one of `1', `2' or `3'.\n");
return 1;
}
}
return 0;
}
#endif /* TESTING */
--
-------------------------
Ajay Shah, Centre for Monitoring Indian Economy, Bombay
ajay...@cmie.ernet.in (9-7 GMT+5:30) http://www.cmie.ernet.in/~ajayshah
<*(:-? - wizard who doesn't know the answer. Finger me for pgp public key