JWalk Excel 2003 review

David

unread,

Mar 18, 2004, 7:36:12 AM3/18/04

to

Greetings & TIA for your help.
From John's review:
"Microsoft's marketing literature refers to "enhancements" to the
statistical functions (those available in the Analysis ToolPak
add-in). This, of course, is a new definition of the word
enhancements. Truth is, these function have been broken for more than
a decade, and they finally return accurate results in Excel 2003."
Does anyone know which functions were defective and what the defects
were?

Howard Kaikow

unread,

Mar 18, 2004, 6:03:57 PM3/18/04

to

Excel has traditionally used poor mathematical algorithms for distribution
functions, etc.

I expect that somebody is doing a study of the changes.

--
http://www.standards.com/; See Howard Kaikow's web site.
"David" <metheg...@yahoo.ie> wrote in message
news:82ac4f89.04031...@posting.google.com...

Jody Goldberg

unread,

Mar 19, 2004, 1:30:23 PM3/19/04

to

On 2004-03-18, Howard Kaikow <kai...@standards.com> wrote:
> Excel has traditionally used poor mathematical algorithms for distribution
> functions, etc.
>
> I expect that somebody is doing a study of the changes.
>

http://www.csdassn.org/software_reports.html

Howard Kaikow

unread,

Mar 19, 2004, 5:09:45 PM3/19/04

to

I figured that McCullough would do something.
Did not realize he was now at Drexel.

--
http://www.standards.com/; See Howard Kaikow's web site.

"Jody Goldberg" <jo...@gnome.org> wrote in message
news:slrnc5mf2...@athlon.thegoldbergs.ca...

Ian Smith

unread,

Mar 23, 2004, 5:56:32 AM3/23/04

to

Jody Goldberg <jo...@gnome.org> wrote in message news:<slrnc5mf2...@athlon.thegoldbergs.ca>...

In the report, McCullough reruns some tests done by Knuesel on
Microsoft Excel 97 in 1998. The conclusion is that Microsoft have done
almost nothing to improve their statistical functions (I can't argue
with that) while "Gnumeric has largely fixed its flaws" (unfortunately
I do disagree with this although it's hard to say what "largely" is
supposed to mean).

I haven't got an executable copy of gnumeric to give specific
examples, but here are a few problems which can be detected from a
quick look at the code in "mathfunc.c".

pgeom
-----
R_DT_Cval(powgnum(1 - p, x + 1)) which if not log_p, simplifies to
(lower_tail ? (1 - (powgnum(1 - p, x + 1))) : (powgnum(1 - p, x + 1)))
So there are 2 potential "1-" disasters waiting to happen.
For cure see pexp. I think the calculations here avoids all "1-"
problems

phyper
------
term = lfastchoose(NR, xr) + lfastchoose(NB, xb) - lfastchoose(N,
n);
uses log of the gamma function which makes it inaccurate for large
values - just what dhyper seeks to avoid!
also uses R_DT_val macro so there are "1-" problems as well.
This algorithm is a disaster (it's slow and inaccurate) and needs
replaced.

pcauchy
-------
uses R_DT_val macro so there are "1-" problems
For cure see pexp.

pgamma
------
uses normal approx if shape parameter, alph > 1000 so not very
accurate.
uses R_D_val(1 - sum) macro so there are "1-" problems if sum close to
1 or sum small and it "logs" it.
The "1-" problem where sum close to 1 is a real problem and not a
careless bug which can be easily removed.
e.g. x = 1, alph 10^(-n) and lower_tail is false then we lose n
figures of accuracy in the answer.

pbeta
------
works with x and not 1-x as well. Calls to it must choose whether to
call beta(x,p,q,lower_tail_option,log_p_option) or beta(accurate
version of 1-x,q,p,!lower_tail_option,log_p_option) but they don't
bother so they don't yield as accurate results as they might.

pbeta_raw "takes forever (or ends wrongly) when (one or) both p & q
are huge" (from FIXME comment)

I don't know the origins of the algorithm well enough to say
definitely, "it can't be accurate for very small p and q", but I would
be surprised if that were the case.

pt
--
not accurate for large degrees of freedom.

pf
--
not accurate for large degrees of freedom. Not accurate elsewhere
because pbeta is not accurate.

Ian Smith

Christopher Browne

unread,

Mar 23, 2004, 5:52:35 PM3/23/04

to

A long time ago, in a galaxy far, far away, iandj...@aol.com (Ian Smith) wrote:
> Jody Goldberg <jo...@gnome.org> wrote in message news:<slrnc5mf2...@athlon.thegoldbergs.ca>...
>> On 2004-03-18, Howard Kaikow <kai...@standards.com> wrote:
>> > Excel has traditionally used poor mathematical algorithms for distribution
>> > functions, etc.
>> >
>> > I expect that somebody is doing a study of the changes.
>>
>> http://www.csdassn.org/software_reports.html
>
> In the report, McCullough reruns some tests done by Knuesel on
> Microsoft Excel 97 in 1998. The conclusion is that Microsoft have
> done almost nothing to improve their statistical functions (I can't
> argue with that) while "Gnumeric has largely fixed its flaws"
> (unfortunately I do disagree with this although it's hard to say
> what "largely" is supposed to mean).

Well, the only place where they considered that there was a continuing
flaw was in the fact that Gnumeric was using a "true" random number
generator rather than using a pseudo-RNG.

For all of the tests that they did where they had found Gnumeric
wanting in version 0.67, they found that 1.1.2 had resolved the flaws
that they had found.

It looks as though it still prefers /dev/urandom; I'm not sure how to
explicitly get at the pseudo-RNG...

But when the Gnumeric developers had fixed nearly all of the flaws
that they had found, the conclusion "Gnumeric has largely fixed its
flaws" seems not too "out there," even if it is poor grammar.
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://www.ntlug.org/~cbbrowne/spreadsheets.html
If a mute swears, does his mother wash his hands with soap?

Jody Goldberg

unread,

Mar 24, 2004, 9:44:55 AM3/24/04

to

On 2004-03-23, Ian Smith <iandj...@aol.com> wrote:
> Jody Goldberg <jo...@gnome.org> wrote in message news:<slrnc5mf2...@athlon.thegoldbergs.ca>...
>> On 2004-03-18, Howard Kaikow <kai...@standards.com> wrote:
>> > Excel has traditionally used poor mathematical algorithms for distribution
>> > functions, etc.
>> >
>> > I expect that somebody is doing a study of the changes.
>> >
>>
>> http://www.csdassn.org/software_reports.html
>
> In the report, McCullough reruns some tests done by Knuesel on
> Microsoft Excel 97 in 1998. The conclusion is that Microsoft have done
> almost nothing to improve their statistical functions (I can't argue
> with that) while "Gnumeric has largely fixed its flaws" (unfortunately
> I do disagree with this although it's hard to say what "largely" is
> supposed to mean).

The comment is related only to the flaws he mentioned to us after
reviewing the early development version.

> I haven't got an executable copy of gnumeric to give specific
> examples, but here are a few problems which can be detected from a
> quick look at the code in "mathfunc.c".

Thanks. This is exactly the sort of information we're looking for.
I've forwarded your write up to the list. Depending on the
magnitude of the changes we may even be able to back port the fixes
to the 1.2.x stable tree for release later this month.