Is this just rounding error due to IEEE double precision. It seems pretty
large.
-David
XL04:
=STDEV(1.4434,1.4434,1.4434) ===> 2.71947991102104E-16
(I don't have XL03 available right now, but I suspect that the result
would be the same).
The improvement was undoubtedly part of the overhaul of stats functions
for XL03/04:
http://support.microsoft.com/kb/828888/en-us
In article <978BF747-1E26-4897...@microsoft.com>,
Yes. But the numerical error caused by the binary
representation is exacerbated by the variance formula
apparently used in Excel 2002 (sp3), which seems to
be different from Excel 2003.
Mathematically, the sample variance can be computed
using either the following equivalent formulas:
1. var = SUM((x - mean)^2, for all x) / (n - 1)
2. var = (SUM(x^2, for all x) - SUM(x, for all x)^2 / n) / (n - 1)
In both cases, std dev = SQRT(var).
When we compute var manually using #2, where
x1 = x2 = x3 = 1.4434 (n = 3), we get:
var = 8.88178419700125E-16
sd = 2.98023223876953E-08
That matches your results. But when we use #1,
we get:
var = 7.39557098644699E-32
sd = 2.71947991102104E-16
That matches the STDEV(1.4434,1.4434,1.4434)
result using Excel 2003.
Note that in both cases, VAR() and STDEV() are not
zero, as you might have expected, even though
AVERAGE(1.4434,1.4434,1.4434) displays 1.44340000000000E+00.
I believe the displayed result of AVERAGE() belies the
fact that there are non-zero bits in the remaining 2-3
binary bits that Excel must round in order to display a
decimal result. I believe that is evidenced by the fact
that (1.4434 - AVERAGE(...))^2 yields 7.39557098644699E-32.
(But I am mystified by the fact that this numerical error
is not evident when I program #1 in VBA, which does
display var = 0 -- a pleasant surprise. Without access
to the VBA binary representations, VBA compiled code
and internal Excel algorithms, I can only speculate wildly
about the disparity.)
My suspicion was confirmed by a response by Martin
Brown to my inquiry in excel.programming and by some
more experimentation on my part. Apparently, even the
retooled Excel 2003 STDEV() implementation does not
take full advantage of the floating point coprocessor, as
the VBA compiler seems to. I'm surprised.
It's not the STDEV function, per se, it's the mean. The mean of x, x
and x isn't exactly x. That is,
=SUMSQ(1.4434-AVERAGE(1.4434,1.4434,1.4434),
1.4434-AVERAGE(1.4434,1.4434,1.4434),
1.4434-AVERAGE(1.4434,1.4434,1.4434))
isn't zero. This even defeats DEVSQ, which is usually pretty good, but
=DEVSQ(1.4434,1.4434,1.4434)
returns 1.47911E-31 rather than zero. More to the point, while the
formula
=1.4434-AVERAGE(1.4434,1.4434,1.4434)
returns zero, the formula
=(1.4434-AVERAGE(1.4434,1.4434,1.4434))
returns 2.22045E-16.
Ain't Excel fun!
As with all floating point calculations, use explicit rounding for
final results. In this case,
=ROUND(STDEV(1.4434,1.4434,1.4434),5)
where the 5 is the maximum significant digits in STDEV's arguments.
I understand. But the implementation of STDEV() could
easily compute the mean internally, taking advantage of
the 80-bit FP registers. It does not. Nor does AVERAGE(),
as you point out. But the OP asked about STDEV(), not
AVERAGE(). I tailored my comments to the OP's context.
80 bits wouldn't necessarily help in this case. If the mean is
calculated by summing all the values first then dividing by the count,
the sum itself could cause problems that dividing by the count wouldn't
necessarily solve. In an ideal world, cleverness would also be applied
to calculating the mean.
k = 0
n = 0
For Each v In d
n = n + 1
If n = 1 Then m = v Else m = m * (k / n) + v / n
k = n
Next v
Enter 1.4434 in A1:A3. Enter 1 in B1, 0 in C1 and =A1 in D1. Enter
=B1+1 in B2, =B1 in C2, =D1*(C2/B2)+A2/B2 in D2. Select B2:D2 and fill
down into B3:D3. Cell D3 contains the mean. Enter =AVERAGE(A1:A3) in
D4, and =(D3-D4) in D5. D5 evaluates to -2.22045E-16.
Enter =SUMSQ(A1-D4,A2-D4,A3-D4) in D6. It evaluates to 1.47911E-31. Now
enter =SUMSQ(A1-D3,A2-D3,A3-D3) in D7. It evaluates to 0.
Imagine that! Rounding error elimiated without using 80-bit FPU
registers!
Further, even calculating the mean as =(1.4434/3+1.4434/3+1.4434/3)
would eliminate the rounding error. Try
=SUMPRODUCT((A1:A3-SUMPRODUCT(A1:A3/3))^2).
The error is due *EXCLUSIVELY* to rounding error in the calculation of
the mean; however, it's possible to eliminate the rounding error
without using more bits of precision. For completeness, it's necessary
to understand that the order of operations is *ESSENTIAL* in floating
point arithmetic. While floating point addition and multiplication are
commutative, neither is associative, and the distributive law doesn't
hold (that is, all the time; there are fortuitous exceptions). In
floating point arithmetic it's usually best to perform multiplication
first, then addition if accuracy is more important than execution speed.
I think Harlan is correct. For an alternative opinion, Math programs that
DO take advantage of the FP processor show that the average of:
(1.4434 + 1.4434 + 1.4434)/3
is:
1.4433999999999998
We just can not see the last two digits with Excel.
It shows that an Excel 2003 Worksheet is doing it correctly. Vba rounded
differently. An argument could be made either way if this is good or bad.
StandardDeviation[{1.4434, 1.4434, 1.4434}]
Returns:
2.7194799110210365*^-16
Which is the same as Excel 2003.
It appears to me that an Excel 2003 worksheet is doing it correctly at the
math coprocessor level.
--
Dana DeLouis
Win XP & Office 2003
"joeu...@hotmail.com" <joeu2004h...@discussions.microsoft.com> wrote
in message news:25CF2CCE-A31E-4833...@microsoft.com...
I agree that 80 bits of precision instead of 52 bits does not
always lead to "exact" results. I did not intend to imply that.
But the fact is: it does in this case (viz., average of
1.4434 three times). Or at least, that was the explanation
give, and I accepted ti. See my posting in excel.programming
for details. This discussion should really be part of that
thread, not this one. See "Why do VBA and Excel floating
point results differ?".
> The error is due *EXCLUSIVELY* to rounding error in
> the calculation of the mean
Yes. And having (54%) more bits of precision will always
diminish those effects -- at least until the (intermediate)
result is stored into lower-precision variables.
> For completeness, it's necessary to understand that the
> order of operations is *ESSENTIAL* in floating point
> arithmetic.
Which is why I initially said, in this thread, that anything I
would say about the explanation would be wild speculation
without my knowing the internal implementations of Excel
and VBA. I should have left well enough alone -- in this
thread.
If you have that kind of insight, I would appreciate it if you
would contribute to the thread in excel.programming.
=SQRT(SUMSQ(1.4434-AVERAGE(1.4434,1.4434,1.4434),1.4434-AVERAGE(1.4434,1.4434,1.4434),1.4434-AVERAGE(1.4434,1.4434,1.4434))/2)
Prior to XL2003, STDEV was calculated as
=SQRT((SUMSQ(1.4434,1.4434,1.4434)-SUM(1.4434,1.4434,1.4434)^2/3)/2)
which returns the value that the OP reported.
These two formulas are mathematically but not numerically equivalent.
A third approach would use a one-pass updating algorithm
http://groups.google.com/group/microsoft.public.excel.misc/msg/4c6ee0c636ad016a
which would correctly return zero for this standard deviation, even though
1.4434 cannot be exactly represented in binary.
Jerry
Well, that's not correct. First, the mantissa is only 23%
more bits of precision (64 bits). Second, I forget that the
original 52-bit mantissa is effectively simply zero-extended.
So the 80-bit FP result is not necessarily better than the
64-bit result. But I don't believe it will ever be worse.
Function SD(x As Double) As Double
SD = Sqr(Abs((x * x + x * x + x * x - (x + x + x) * (x + x + x) / 3) / 2))
End Function
but SD(1.4434) returns 4.65661287307739E-10; better than XL2002's
2.98023223876953E-08, but nowhere near as good as XL2003's
2.71947991102104E-16 or the exact 0 that an updating algorithm would give
http://groups.google.com/group/microsoft.public.excel.misc/msg/4c6ee0c636ad016a
Jerry
64-bit reals can exactly represent all 15 digit integers, but require 17
digits to uniquely characterize a floating point number.
80-bit reals can exactly represent all 19 digit integers
The 80-byte representation of 1.4434 and of the average of three of them are
both equal to 1.44339999999999999997E0 (21 digits required to distinguish
from 1.4434), so the XL2003 algorithm would return 0 if it used 80-byte
calculations.
In this case, you could even use intermediate rounding to 64-bit reals and
still get 0 with a partial use of register precision, as the following VBA
code shows
Function DSq(Optional x As Double = 1.4434) As Double
Dim ave As Double
' each line is calculated in 80-bit register, then stored in 64-bit variable
ave = (x + x + x) / 3
DSq = (x - ave) * (x - ave) + (x - ave) * (x - ave) + (x - ave) * (x -
ave)
End Function
Jerry
--
Dana DeLouis
Win XP & Office 2003
"Jerry W. Lewis" <post_a_reply@no_e-mail.com> wrote in message
news:4C165249-0997-48DB...@microsoft.com...
Too lazy to crosspost this myself. Nothing prevents you from putting a
link to this thread in a follow-up in the other ng.
Jerry