Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug in Random number generation?

31 views
Skip to first unread message

Elizabeth Bartilson

unread,
Jul 18, 2001, 2:17:42 PM7/18/01
to
I have been generating 1000 normally distributed random
numbers. Roughly 8% of the time one number (out of the
1000) returned does not fit into the distribution - either
a very large or a very small number. When I use the
numbers in my study (Mean of 2712 and standard deviation
of 1995), the incorrect number is either
9975002712 or -9974997288 (consistently).

If I use a mean of zero and standard deviation of one, the
problem observation number is either 5000000 or -5000000
consistently.

Is this a known bug? I am using NT, Excel 97 SR-2 and have
repeated the problem on two different computers.

Thanks.

Tom Ogilvy

unread,
Jul 18, 2001, 3:34:47 PM7/18/01
to
since the normal distribution goes form minus infinity to positive infinity,
it would be hard to envision a number that doesn't fit in that distribution.
That said, some of those who work in that arena claim that the algorithm for
the normal functions do have a bug similar to what you describe. So the bug
is in the conversion rather than in the random number generator itself.

So "known bug" would probably be the answer.

Regards,
Tom Ogilvy


"Elizabeth Bartilson" <ebar...@lexmark.com> wrote in message
news:4d6601c10fb5$ef5ea8e0$a4e62ecf@tkmsftngxa06...

Tom Ogilvy

unread,
Jul 18, 2001, 3:44:39 PM7/18/01
to
You can search on some of the past discussion:
http://groups.google.com/advanced_group_search

Use Normal as the word

*Excel* as the group

Perhaps Braden as the author or Lewis or Larsen

for example:
Here is an old post on the subject:

Search Result 10
From: David J. Braden (t...@fiastl.net)
Subject: Re: Random numbers from data analysis package
Newsgroups: microsoft.public.excel.misc
View complete thread (14 articles)
Date: 1999/11/09

Dan,
I disagree with you on the "life of the universe" metaphor (it is hilarious,
though!), but yes, this is a bug. My hypothesis is that MS is using its
Inverse
Normal approximation to generate the variates. The Inverse Normal itself is
badly done, IMO. Try it for =NORMINV(0.9999999,0,1) and you will get the
idea.
Further, their approach is veerrryyy sslloowww. There are workarounds that
are
far faster, and more accurate. Both Myrna Larson (recently) and I have
posted
code for this. Check out DejaNews.com, and do a power search for Myrna
Larson,
group would be *.excel.programming, date sometime in the last months.

If you run into problems, e-mail me. I would also appresiate your writing
Microsoft about this. The issue comes up every few weeks, and I first
contacted
them in about 1989 about the problem. So far, no action.

Oh, BTW, I recommend that for the stat stuff you minimize your use of the
Analysis ToolPak.

HTH
Dave Braden
MVP - Excel

Dan Evens wrote:
>
> Excel 97 SR-1
>
> I choose the Tools menu, item Data Analysis, then
> Random Number Generation. Then I choose 1 variable,
> 10000 random numbers, normal distribution, 0 mean, and
> standard deviation 1. Then I sort the results and look at the
> extremes. About half the time (I have not done any serious
> statistics on this) I find that one or the other end has a
> magnitude of 5000000. You may need to do this 3 or 4
> times to see this result. Seeing a point that is 5E6 standard
> deviations out on a normal curve should not happen in the
> life of the universe.
>
> Is this a bug? Is it known? Is there a patch? Or a workaround?
> I think this functionality comes in the addin atpvbaen.xla.
>
> --
> Dan Evens
> Standard disclaimers etc. No spam please.

Regards,
Tom Ogilvy


>


Jerry W. Lewis

unread,
Jul 19, 2001, 9:08:29 AM7/19/01
to
MS was careful to program log and trig functions, etc. to machine
accuracy. They did not attempt to do so with continuous probability
distributions (despite the fact that such algorithms were readily
available). The consequence is that while their probability functions
are usually adequate for simple hypotheses testing and confidence
interval construction at the most commonly used levels, they are totally
inadequate for most other purposes. In particular they are totally
useless for serious random number generation, although the Analysis
ToolPak (ATP) uses them for that anyway.

After the explanation, I give several alternatives, so read on ...

Explanation:

If U is a Uniform random variable, then F^(-1)(U) follows the
distribution defined by F(). This is a mathematically exact result, and
appears to be the way that ATP generates random numbers for all but the
Uniform distribution. There are two problems with doing it this way on
a computer:

- A computer cannot generate truly random numbers. The literature
seems to imply that F^(-1)(U) will magnify any problems in the algorithm
for generating pseudo-random numbers. As far as I know, MS has not
disclosed the particular algorithm they use, which makes it difficult to
determine how good (or bad) their method is. Since there are many truly
awful methods that have been used, this lack of information makes
Excel's random number generator inappropriate for any serious work.

- It requires a really accurate implementation of F^(-1). This is the
biggest problem with Excel's Normal random numbers. NORMSDIST is only
accurate to about 6 decimal places (not significant figures!). That
means that it is very inaccurate (not even 1 significant figure
correct!) in the tails. According to Help, NORMSINV(p) apparently does
an iterative search (probably Newton's method) for the x that gives
NORMSDIST(x) = p approximately. That means that NORMSINV is even worse
than NORMSDIST in the tails. The result is that you get random numbers
from a distribution that is similar to the Normal distribution over most
of the range, but has clumps of probability millions of standard
deviations away from the mean (as you discovered).

Alternatives:

- Either discard or redefine any ridiculous values to be 5 standard
deviations from the mean (in the same direction). If this is not an
adequate solution, then you probably should not be doing this in Excel.

- Generate n (at leat 10, more than 30 is probably overkill) Uniform
random numbers, average them, multiply the average by 2*SQRT(3*n), and
subtract 0.5. The result will be approximately N(0,1) by the Central
Limit Theorem. Multiply that by 1995 and add 2712 to get something that
is approximately Normal with a mean of 2712 and standard deviation of
1995. Repeat the process to generate as many more random numbers as you
need. Note that you only get k Normal random numbers from n*k Uniform
random numbers.

- Use the Box-Muller transformation to get 2*n Normal random numbers
from 2*n Uniform random numbers. This is a mathematically exact method
that again may magnify the problems of a poor pseudo-random number
generator, but at least the functions are implemented to machine
accuracy in Excel. My reference books are at the office, so I cannot
vouch for these formulas that I got from a Google search, but you take 2
Uniform random numbers (U1, U2) and transform them to 2 Normal random
numbers (N1, N2) with formulas like
N1 = SQRT(-2*LN(U1)) * COS(2*PI()*U2)
N2 = SQRT(-2*LN(U1)) * SIN(2*PI()*U2)

Good Luck,
Jerry

Jerry W. Lewis

unread,
Jul 19, 2001, 9:24:03 AM7/19/01
to
Oops, in using the Central Limit Theorem alternative, subtract 0.5 from
the average and then multiply by 2*SQRT(3*n). Sorry about reversing the
order originally.

Jerry

Tushar Mehta

unread,
Jul 19, 2001, 3:28:00 PM7/19/01
to
Up to and including XL2000, the normal random number generator has had a
bug. Any value equal to or greater than 0.999 999 7 (or less than or
equal to -0.999 999 7) yields 5,000,000

After reading your post, I check XL02. Amazingly, the bug seems to be
fixed!

0.999 999 7 4.991519862
0.999 999 999 999 999 7.935972938

Of course, it is of little help to you. Luckily, Jerry did provide you
with a mechanism by which to generate your own R.V.s

--
Regards,

Tushar Mehta
www.tushar-mehta.com
--

In <4d6601c10fb5$ef5ea8e0$a4e62ecf@tkmsftngxa06>, Elizabeth Bartilson
<ebar...@lexmark.com> wrote

Jerry W. Lewis

unread,
Jul 20, 2001, 8:24:21 AM7/20/01
to
Hi Tushar,

It's still not machine accuracy (puzzling, since m.a. is easily
obtainable), but it is good news that MS has finally decided to improve
its statistics.

What about STDEV, VAR, SLOPE, etc. -- has MS finally switched to a
numerically stable algorithm?

Did they improve probability functions across the board?
=SQRT(2*GAMMAINV(2*p-1,0.5,1))
should be identical to NORMSINV(p) for p>0.5. It used to be better than
NORMSINV, but this new NORMSINV has leapfrogged it.

Jerry

Tushar Mehta

unread,
Jul 20, 2001, 9:04:24 AM7/20/01
to
Hi Jerry,

I started a new discussion thread yesterday titled 'Surprising numerical
computation bug fix in XL2002.' The only function other than NORMSINV
that I tested was STDEV. That remains a 'feature' in that the STDEV of
100,000,000 and 100,000,001 still equals 0.

I haven't had a chance to look at the other functions.

--
Regards,

Tushar Mehta
www.tushar-mehta.com
--

In <3B5822F5...@bigfoot.com>, Jerry W. Lewis <JWL...@bigfoot.com>
wrote

0 new messages