Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Rounding error in Stdev function result.

11 views
Skip to first unread message

K@discussions.microsoft.com David K

unread,
Jan 9, 2006, 1:02:02 PM1/9/06
to
the function
=STDEV(1.4434,1.4434,1.4434)
gives
2.98023E-08
(at least on my computer using Excel 2002, sp3)

Is this just rounding error due to IEEE double precision. It seems pretty
large.

-David

JE McGimpsey

unread,
Jan 9, 2006, 1:16:48 PM1/9/06
to
It is pretty large, but it's not simply due to rounding...

XL04:

=STDEV(1.4434,1.4434,1.4434) ===> 2.71947991102104E-16

(I don't have XL03 available right now, but I suspect that the result
would be the same).

The improvement was undoubtedly part of the overhaul of stats functions
for XL03/04:

http://support.microsoft.com/kb/828888/en-us


In article <978BF747-1E26-4897...@microsoft.com>,

joeu2004h...@discussions.microsoft.com

unread,
Jan 9, 2006, 11:46:02 PM1/9/06
to
"David K" wrote:
> the function =STDEV(1.4434,1.4434,1.4434) gives
> 2.98023E-08 (at least on my computer using Excel
> 2002, sp3)
> Is this just rounding error due to IEEE double precision.

Yes. But the numerical error caused by the binary
representation is exacerbated by the variance formula
apparently used in Excel 2002 (sp3), which seems to
be different from Excel 2003.

Mathematically, the sample variance can be computed
using either the following equivalent formulas:

1. var = SUM((x - mean)^2, for all x) / (n - 1)

2. var = (SUM(x^2, for all x) - SUM(x, for all x)^2 / n) / (n - 1)

In both cases, std dev = SQRT(var).

When we compute var manually using #2, where
x1 = x2 = x3 = 1.4434 (n = 3), we get:

var = 8.88178419700125E-16
sd = 2.98023223876953E-08

That matches your results. But when we use #1,
we get:

var = 7.39557098644699E-32
sd = 2.71947991102104E-16

That matches the STDEV(1.4434,1.4434,1.4434)
result using Excel 2003.

Note that in both cases, VAR() and STDEV() are not
zero, as you might have expected, even though
AVERAGE(1.4434,1.4434,1.4434) displays 1.44340000000000E+00.
I believe the displayed result of AVERAGE() belies the
fact that there are non-zero bits in the remaining 2-3
binary bits that Excel must round in order to display a
decimal result. I believe that is evidenced by the fact
that (1.4434 - AVERAGE(...))^2 yields 7.39557098644699E-32.

(But I am mystified by the fact that this numerical error
is not evident when I program #1 in VBA, which does
display var = 0 -- a pleasant surprise. Without access
to the VBA binary representations, VBA compiled code
and internal Excel algorithms, I can only speculate wildly
about the disparity.)

joeu2004h...@discussions.microsoft.com

unread,
Jan 10, 2006, 1:43:03 PM1/10/06
to
I wrote:
> I am mystified by the fact that this numerical error
> is not evident when I program #1 in VBA, which does
> display var = 0 -- a pleasant surprise. Without access
> to the VBA binary representations, VBA compiled code
> and internal Excel algorithms, I can only speculate wildly
> about the disparity.

My suspicion was confirmed by a response by Martin
Brown to my inquiry in excel.programming and by some
more experimentation on my part. Apparently, even the
retooled Excel 2003 STDEV() implementation does not
take full advantage of the floating point coprocessor, as
the VBA compiler seems to. I'm surprised.

Harlan Grove

unread,
Jan 10, 2006, 3:02:58 PM1/10/06
to
joeu...@hotmail.com wrote...

It's not the STDEV function, per se, it's the mean. The mean of x, x
and x isn't exactly x. That is,

=SUMSQ(1.4434-AVERAGE(1.4434,1.4434,1.4434),
1.4434-AVERAGE(1.4434,1.4434,1.4434),
1.4434-AVERAGE(1.4434,1.4434,1.4434))

isn't zero. This even defeats DEVSQ, which is usually pretty good, but

=DEVSQ(1.4434,1.4434,1.4434)

returns 1.47911E-31 rather than zero. More to the point, while the
formula

=1.4434-AVERAGE(1.4434,1.4434,1.4434)

returns zero, the formula

=(1.4434-AVERAGE(1.4434,1.4434,1.4434))

returns 2.22045E-16.

Ain't Excel fun!

As with all floating point calculations, use explicit rounding for
final results. In this case,

=ROUND(STDEV(1.4434,1.4434,1.4434),5)

where the 5 is the maximum significant digits in STDEV's arguments.

joeu2004h...@discussions.microsoft.com

unread,
Jan 10, 2006, 3:26:02 PM1/10/06
to
"Harlan Grove" wrote:
> It's not the STDEV function, per se, it's the mean.

I understand. But the implementation of STDEV() could
easily compute the mean internally, taking advantage of
the 80-bit FP registers. It does not. Nor does AVERAGE(),
as you point out. But the OP asked about STDEV(), not
AVERAGE(). I tailored my comments to the OP's context.

Harlan Grove

unread,
Jan 10, 2006, 4:17:45 PM1/10/06
to
joeu...@hotmail.com wrote...
...

>I understand. But the implementation of STDEV() could
>easily compute the mean internally, taking advantage of
>the 80-bit FP registers. It does not. Nor does AVERAGE(),
>as you point out. But the OP asked about STDEV(), not
>AVERAGE(). I tailored my comments to the OP's context.

80 bits wouldn't necessarily help in this case. If the mean is
calculated by summing all the values first then dividing by the count,
the sum itself could cause problems that dividing by the count wouldn't
necessarily solve. In an ideal world, cleverness would also be applied
to calculating the mean.

k = 0
n = 0
For Each v In d
n = n + 1
If n = 1 Then m = v Else m = m * (k / n) + v / n
k = n
Next v

Enter 1.4434 in A1:A3. Enter 1 in B1, 0 in C1 and =A1 in D1. Enter
=B1+1 in B2, =B1 in C2, =D1*(C2/B2)+A2/B2 in D2. Select B2:D2 and fill
down into B3:D3. Cell D3 contains the mean. Enter =AVERAGE(A1:A3) in
D4, and =(D3-D4) in D5. D5 evaluates to -2.22045E-16.

Enter =SUMSQ(A1-D4,A2-D4,A3-D4) in D6. It evaluates to 1.47911E-31. Now
enter =SUMSQ(A1-D3,A2-D3,A3-D3) in D7. It evaluates to 0.

Imagine that! Rounding error elimiated without using 80-bit FPU
registers!

Further, even calculating the mean as =(1.4434/3+1.4434/3+1.4434/3)
would eliminate the rounding error. Try
=SUMPRODUCT((A1:A3-SUMPRODUCT(A1:A3/3))^2).

The error is due *EXCLUSIVELY* to rounding error in the calculation of
the mean; however, it's possible to eliminate the rounding error
without using more bits of precision. For completeness, it's necessary
to understand that the order of operations is *ESSENTIAL* in floating
point arithmetic. While floating point addition and multiplication are
commutative, neither is associative, and the distributive law doesn't
hold (that is, all the time; there are fortuitous exceptions). In
floating point arithmetic it's usually best to perform multiplication
first, then addition if accuracy is more important than execution speed.

Dana DeLouis

unread,
Jan 10, 2006, 4:46:10 PM1/10/06
to
> ... But the implementation of STDEV() could

> easily compute the mean internally, taking advantage of
> the 80-bit FP registers. It does not.

I think Harlan is correct. For an alternative opinion, Math programs that
DO take advantage of the FP processor show that the average of:
(1.4434 + 1.4434 + 1.4434)/3
is:
1.4433999999999998

We just can not see the last two digits with Excel.
It shows that an Excel 2003 Worksheet is doing it correctly. Vba rounded
differently. An argument could be made either way if this is good or bad.

StandardDeviation[{1.4434, 1.4434, 1.4434}]
Returns:
2.7194799110210365*^-16

Which is the same as Excel 2003.
It appears to me that an Excel 2003 worksheet is doing it correctly at the
math coprocessor level.
--
Dana DeLouis
Win XP & Office 2003

"joeu...@hotmail.com" <joeu2004h...@discussions.microsoft.com> wrote
in message news:25CF2CCE-A31E-4833...@microsoft.com...

joeu2004h...@discussions.microsoft.com

unread,
Jan 10, 2006, 9:16:01 PM1/10/06
to
"Harlan Grove" wrote:
> 80 bits wouldn't necessarily help in this case.

I agree that 80 bits of precision instead of 52 bits does not
always lead to "exact" results. I did not intend to imply that.

But the fact is: it does in this case (viz., average of
1.4434 three times). Or at least, that was the explanation
give, and I accepted ti. See my posting in excel.programming
for details. This discussion should really be part of that
thread, not this one. See "Why do VBA and Excel floating
point results differ?".

> The error is due *EXCLUSIVELY* to rounding error in
> the calculation of the mean

Yes. And having (54%) more bits of precision will always
diminish those effects -- at least until the (intermediate)
result is stored into lower-precision variables.

> For completeness, it's necessary to understand that the
> order of operations is *ESSENTIAL* in floating point
> arithmetic.

Which is why I initially said, in this thread, that anything I
would say about the explanation would be wild speculation
without my knowing the internal implementations of Excel
and VBA. I should have left well enough alone -- in this
thread.

If you have that kind of insight, I would appreciate it if you
would contribute to the thread in excel.programming.

Jerry W. Lewis

unread,
Jan 10, 2006, 9:23:02 PM1/10/06
to
XL2003 also returns 2.71947991102104E-16, but per
http://support.microsoft.com/kb/826349/
XL2003 was the first version that changed the STDEV calculation to

=SQRT(SUMSQ(1.4434-AVERAGE(1.4434,1.4434,1.4434),1.4434-AVERAGE(1.4434,1.4434,1.4434),1.4434-AVERAGE(1.4434,1.4434,1.4434))/2)

Prior to XL2003, STDEV was calculated as

=SQRT((SUMSQ(1.4434,1.4434,1.4434)-SUM(1.4434,1.4434,1.4434)^2/3)/2)

which returns the value that the OP reported.

These two formulas are mathematically but not numerically equivalent.

A third approach would use a one-pass updating algorithm

http://groups.google.com/group/microsoft.public.excel.misc/msg/4c6ee0c636ad016a

which would correctly return zero for this standard deviation, even though
1.4434 cannot be exactly represented in binary.

Jerry

joeu2004h...@discussions.microsoft.com

unread,
Jan 10, 2006, 10:38:02 PM1/10/06
to
I wrote:
> having (54%) more bits of precision will always
> diminish those effects -- at least until the (intermediate)
> result is stored into lower-precision variables.

Well, that's not correct. First, the mantissa is only 23%
more bits of precision (64 bits). Second, I forget that the
original 52-bit mantissa is effectively simply zero-extended.
So the 80-bit FP result is not necessarily better than the
64-bit result. But I don't believe it will ever be worse.

Jerry W. Lewis

unread,
Jan 11, 2006, 12:04:03 AM1/11/06
to
80-bit reals have 64-bit mantissas vs. effectively 53-bit mantissas in 64-bit
reals, so there would be some improvement, but not as much as from using a
better algorithm. VBA offers very limited control over register arithmetic.
The following code will use register arithmetic within the Abs()

Function SD(x As Double) As Double
SD = Sqr(Abs((x * x + x * x + x * x - (x + x + x) * (x + x + x) / 3) / 2))
End Function

but SD(1.4434) returns 4.65661287307739E-10; better than XL2002's
2.98023223876953E-08, but nowhere near as good as XL2003's
2.71947991102104E-16 or the exact 0 that an updating algorithm would give

http://groups.google.com/group/microsoft.public.excel.misc/msg/4c6ee0c636ad016a

Jerry

Jerry W. Lewis

unread,
Jan 11, 2006, 8:29:05 AM1/11/06
to
1.4433999999999998 is the result of calculating the average using 64-bit
reals. You can display additional figures as
=D2D(AVERAGE(1.4434,1.4434,1.4434)) using my D2D function from
http://groups.google.com/group/microsoft.public.excel/msg/b106871cf92f8465

64-bit reals can exactly represent all 15 digit integers, but require 17
digits to uniquely characterize a floating point number.

80-bit reals can exactly represent all 19 digit integers

The 80-byte representation of 1.4434 and of the average of three of them are
both equal to 1.44339999999999999997E0 (21 digits required to distinguish
from 1.4434), so the XL2003 algorithm would return 0 if it used 80-byte
calculations.

In this case, you could even use intermediate rounding to 64-bit reals and
still get 0 with a partial use of register precision, as the following VBA
code shows

Function DSq(Optional x As Double = 1.4434) As Double
Dim ave As Double
' each line is calculated in 80-bit register, then stored in 64-bit variable
ave = (x + x + x) / 3
DSq = (x - ave) * (x - ave) + (x - ave) * (x - ave) + (x - ave) * (x -
ave)
End Function

Jerry

Dana DeLouis

unread,
Jan 11, 2006, 10:48:46 AM1/11/06
to
Hi. Thanks Jerry. As a side note to the op, if you just need the STDEV of
3 cells, it "appears" that an equation will returns zero (0) more often if
all 3 cells are equal.
Perhaps use this:
=SQRT((A1^2+B1^2-B1*C1+C1^2-A1*(B1+C1))/3)
instead of this:
=STDEV(A1:C1)

--
Dana DeLouis
Win XP & Office 2003


"Jerry W. Lewis" <post_a_reply@no_e-mail.com> wrote in message
news:4C165249-0997-48DB...@microsoft.com...

Harlan Grove

unread,
Jan 11, 2006, 11:57:22 AM1/11/06
to
joeu...@hotmail.com wrote...
...

>If you have that kind of insight, I would appreciate it if you
>would contribute to the thread in excel.programming.

Too lazy to crosspost this myself. Nothing prevents you from putting a
link to this thread in a follow-up in the other ng.

Jerry W. Lewis

unread,
Jan 11, 2006, 11:19:02 PM1/11/06
to
Your alternat formula does seem to be a little better numerically than
pre-2003 STDEV; but it is still squaring terms before subtracting, so it will
be nowhere near as numerically stable as =SQRT(DEVSQ(A1:C1)/2), which is used
by xl2003.

Jerry

0 new messages