If I use a mean of zero and standard deviation of one, the
problem observation number is either 5000000 or -5000000
consistently.
Is this a known bug? I am using NT, Excel 97 SR-2 and have
repeated the problem on two different computers.
Thanks.
So "known bug" would probably be the answer.
Regards,
Tom Ogilvy
"Elizabeth Bartilson" <ebar...@lexmark.com> wrote in message
news:4d6601c10fb5$ef5ea8e0$a4e62ecf@tkmsftngxa06...
Use Normal as the word
*Excel* as the group
Perhaps Braden as the author or Lewis or Larsen
for example:
Here is an old post on the subject:
Search Result 10
From: David J. Braden (t...@fiastl.net)
Subject: Re: Random numbers from data analysis package
Newsgroups: microsoft.public.excel.misc
View complete thread (14 articles)
Date: 1999/11/09
Dan,
I disagree with you on the "life of the universe" metaphor (it is hilarious,
though!), but yes, this is a bug. My hypothesis is that MS is using its
Inverse
Normal approximation to generate the variates. The Inverse Normal itself is
badly done, IMO. Try it for =NORMINV(0.9999999,0,1) and you will get the
idea.
Further, their approach is veerrryyy sslloowww. There are workarounds that
are
far faster, and more accurate. Both Myrna Larson (recently) and I have
posted
code for this. Check out DejaNews.com, and do a power search for Myrna
Larson,
group would be *.excel.programming, date sometime in the last months.
If you run into problems, e-mail me. I would also appresiate your writing
Microsoft about this. The issue comes up every few weeks, and I first
contacted
them in about 1989 about the problem. So far, no action.
Oh, BTW, I recommend that for the stat stuff you minimize your use of the
Analysis ToolPak.
HTH
Dave Braden
MVP - Excel
Dan Evens wrote:
>
> Excel 97 SR-1
>
> I choose the Tools menu, item Data Analysis, then
> Random Number Generation. Then I choose 1 variable,
> 10000 random numbers, normal distribution, 0 mean, and
> standard deviation 1. Then I sort the results and look at the
> extremes. About half the time (I have not done any serious
> statistics on this) I find that one or the other end has a
> magnitude of 5000000. You may need to do this 3 or 4
> times to see this result. Seeing a point that is 5E6 standard
> deviations out on a normal curve should not happen in the
> life of the universe.
>
> Is this a bug? Is it known? Is there a patch? Or a workaround?
> I think this functionality comes in the addin atpvbaen.xla.
>
> --
> Dan Evens
> Standard disclaimers etc. No spam please.
Regards,
Tom Ogilvy
>
After the explanation, I give several alternatives, so read on ...
Explanation:
If U is a Uniform random variable, then F^(-1)(U) follows the
distribution defined by F(). This is a mathematically exact result, and
appears to be the way that ATP generates random numbers for all but the
Uniform distribution. There are two problems with doing it this way on
a computer:
- A computer cannot generate truly random numbers. The literature
seems to imply that F^(-1)(U) will magnify any problems in the algorithm
for generating pseudo-random numbers. As far as I know, MS has not
disclosed the particular algorithm they use, which makes it difficult to
determine how good (or bad) their method is. Since there are many truly
awful methods that have been used, this lack of information makes
Excel's random number generator inappropriate for any serious work.
- It requires a really accurate implementation of F^(-1). This is the
biggest problem with Excel's Normal random numbers. NORMSDIST is only
accurate to about 6 decimal places (not significant figures!). That
means that it is very inaccurate (not even 1 significant figure
correct!) in the tails. According to Help, NORMSINV(p) apparently does
an iterative search (probably Newton's method) for the x that gives
NORMSDIST(x) = p approximately. That means that NORMSINV is even worse
than NORMSDIST in the tails. The result is that you get random numbers
from a distribution that is similar to the Normal distribution over most
of the range, but has clumps of probability millions of standard
deviations away from the mean (as you discovered).
Alternatives:
- Either discard or redefine any ridiculous values to be 5 standard
deviations from the mean (in the same direction). If this is not an
adequate solution, then you probably should not be doing this in Excel.
- Generate n (at leat 10, more than 30 is probably overkill) Uniform
random numbers, average them, multiply the average by 2*SQRT(3*n), and
subtract 0.5. The result will be approximately N(0,1) by the Central
Limit Theorem. Multiply that by 1995 and add 2712 to get something that
is approximately Normal with a mean of 2712 and standard deviation of
1995. Repeat the process to generate as many more random numbers as you
need. Note that you only get k Normal random numbers from n*k Uniform
random numbers.
- Use the Box-Muller transformation to get 2*n Normal random numbers
from 2*n Uniform random numbers. This is a mathematically exact method
that again may magnify the problems of a poor pseudo-random number
generator, but at least the functions are implemented to machine
accuracy in Excel. My reference books are at the office, so I cannot
vouch for these formulas that I got from a Google search, but you take 2
Uniform random numbers (U1, U2) and transform them to 2 Normal random
numbers (N1, N2) with formulas like
N1 = SQRT(-2*LN(U1)) * COS(2*PI()*U2)
N2 = SQRT(-2*LN(U1)) * SIN(2*PI()*U2)
Good Luck,
Jerry
Jerry
After reading your post, I check XL02. Amazingly, the bug seems to be
fixed!
0.999 999 7 4.991519862
0.999 999 999 999 999 7.935972938
Of course, it is of little help to you. Luckily, Jerry did provide you
with a mechanism by which to generate your own R.V.s
--
Regards,
Tushar Mehta
www.tushar-mehta.com
--
In <4d6601c10fb5$ef5ea8e0$a4e62ecf@tkmsftngxa06>, Elizabeth Bartilson
<ebar...@lexmark.com> wrote
It's still not machine accuracy (puzzling, since m.a. is easily
obtainable), but it is good news that MS has finally decided to improve
its statistics.
What about STDEV, VAR, SLOPE, etc. -- has MS finally switched to a
numerically stable algorithm?
Did they improve probability functions across the board?
=SQRT(2*GAMMAINV(2*p-1,0.5,1))
should be identical to NORMSINV(p) for p>0.5. It used to be better than
NORMSINV, but this new NORMSINV has leapfrogged it.
Jerry
I started a new discussion thread yesterday titled 'Surprising numerical
computation bug fix in XL2002.' The only function other than NORMSINV
that I tested was STDEV. That remains a 'feature' in that the STDEV of
100,000,000 and 100,000,001 still equals 0.
I haven't had a chance to look at the other functions.
--
Regards,
Tushar Mehta
www.tushar-mehta.com
--
In <3B5822F5...@bigfoot.com>, Jerry W. Lewis <JWL...@bigfoot.com>
wrote