Re: LINEST bug in Excel 2003

Mike Middleton

unread,

Jun 8, 2005, 5:10:29 PM6/8/05

to

mathman -

I think you should also be concerned about the "insidious problem" you may
encounter from overfitting the data when using a sixth-order polynomial.

- Mike
www.mikemiddleton.com

"mathman" <mat...@discussions.microsoft.com> wrote in message
news:D40DCE7E-DA42-4D19...@microsoft.com...
>I have used LINEST to provide polynomial coefficients in Excel97 through
> Excel 2002 with no problems. My company has loaded Excel 2003 and now all
> the
> existing spreadsheets give the wrong answers but no error messages. I have
> read the KB articles on LINEST in 2003 and none give a clue to the
> problem.
>
> I use two columns of data for x and y with in this example a row of 7
> cells
> highlighted with this array formula:
>
> =LINEST(AN3:AN11,AM3:AM11^{1,2,3,4,5,6})
>
> Excel 97 -2002 dutifully reports 7 cells with the coeffs of
> X6,X5,X4,X3,X2,X1 & X0
> Excel 2003 sets X^6 to zero regardless of the data set and reports a
> wildly
> wrong value for the constant.
>
> I have tested this on a clean build of Excel 2003 with the Analysis Tool
> pack loaded as well as several of our 8000 corporate desktops. I cannot
> find
> any reference to this insidious problem on the internet, has anyone else
> experienced it?
>

mathman

unread,

Jun 8, 2005, 5:26:01 PM6/8/05

to

Thanks for trying to help, I can only find one post from Jerry on 5/6/2005
under the thread "simple statistical analysis" this did not seem relevant but
perhaps I missed the point.

"Gary's Student" wrote:

> Please check out Jerry W. Lewis' posting of 5/6/2005
> --
> Gary's Student

mathman

unread,

Jun 8, 2005, 5:30:05 PM6/8/05

to

Thanks for your interest - the real data has many more points the ranges were
just an example. The problem of missing first coeff applies to 3rd order 4th
order 5th order etc.

I have tried multiple linear fitting and that fails in the same way in Excel
2003 but works perfectly in Excel 97, Excel 2000 & Excel 2002

Harlan Grove

unread,

Jun 9, 2005, 3:00:41 AM6/9/05

to

"mathman" <mat...@discussions.microsoft.com> wrote...

>Thanks for trying to help, I can only find one post from Jerry on 5/6/2005
>under the thread "simple statistical analysis" this did not seem relevant
>but perhaps I missed the point.

...

See

http://groups-beta.google.com/group/microsoft.public.excel.worksheet.functions/msg/50fbb67b6cb745a1?dmode=source&hl=en

(or http://makeashorterlink.com/?A19821A3B ).

Jerry W. Lewis

unread,

Jun 9, 2005, 8:07:57 AM6/9/05

to

Select 7 columns in a row and array enter (Ctrl-Shift-Enter) the
following formula to reproduce the pre 2003 calculation, except that the
coefficients will be given in reverse order (intercept, linear, ... 6th
power).

=TRANSPOSE(MMULT(MINVERSE(MMULT(TRANSPOSE(xCol^{0,1,2,3,4,5,6}),
xCol^{0,1,2,3,4,5,6})), MMULT(TRANSPOSE(xCol^{0,1,2,3,4,5,6}), yCol)))

Note the explicit zero power in this formula.

In addition to the potential for overfitting the data, I would be
concerned about numerical stability of the calculation (the reason that
Excel 2003 changed its approach).

http://groups-beta.google.com/group/microsoft.public.excel.charting/browse_thread/thread/d1b0d3d6f9a19fe8/6c916351b5101102#6c916351b5101102

Plot the data as an "XY (Scatter)" chart and add a 6th degree polynomial
trendline (format to scientific notation with 14 decimal places); if
there is an appreciable difference between the graph and formula
coefficients, then the graph coefficients are better.

David Braden has posted code to directly extract coefficients from the
displayed chart trendline equation into cells

http://groups.google.com/groups?selm=dbraden-1C662A.14054705032003%40msnews.microsoft.com

Jerry

Harlan Grove

unread,

Jun 9, 2005, 7:23:29 PM6/9/05

to

Jerry W. Lewis wrote...
...

>Plot the data as an "XY (Scatter)" chart and add a 6th degree polynomial
>trendline (format to scientific notation with 14 decimal places); if
>there is an appreciable difference between the graph and formula
>coefficients, then the graph coefficients are better.
>
>David Braden has posted code to directly extract coefficients from the
>displayed chart trendline equation into cells

...

Meaning that Microsoft is still using different pieces of code for
LINEST and chart trend lines? And the older, trend line code is
better? If so, it begs the question why Microsoft squandered their
money changing the LINEST code rather than just using the trend
line code for LINEST.

Jerry W. Lewis

unread,

Jun 10, 2005, 9:15:36 AM6/10/05

to

You are correct, LINEST and chart trendline are not doing exactly the
same thing. I don't know why MS reinvented the wheel here, but it is
not the first example of redundant Excel functions which are programmed
differently.

It has been a while since I did the comparisons, but my vague
recollections are that:

- Chart is slightly better numerically than 2003 LINEST (when no
coefficient estimates are exactly zero)

- 2003 LINEST (when no coefficient estimates are exactly zero) is
comparable to lm() in S-PLUS and R, and thus is far superior to previous
versions of LINEST (except for the anomaly under discussion).

- 2003 LINEST appears to zero "small" coefficients, much like the
difference between =227.82-(227+0.75+0.05+0.02) and
=(227.82-(227+0.75+0.05+0.02)). Thankfully MS did not introduce this
"improvement" into the chart trendline.

- The nonzero coefficients from 2003 LINEST are correct for the full
problem, so you might be able (a new [untested] idea) to fit the terms
that gave zero coefficients to the residuals from the original 2003
LINEST fit, to restore the arbitrarily zeroed coefficients.

Jerry

Jerry W. Lewis

unread,

Jun 20, 2005, 12:24:03 PM6/20/05

to

"Jerry W. Lewis" wrote:

> It has been a while since I did the comparisons, but my vague
> recollections are that:

...

> - The nonzero coefficients from 2003 LINEST are correct for the full
> problem, so you might be able (a new [untested] idea) to fit the terms
> that gave zero coefficients to the residuals from the original 2003
> LINEST fit, to restore the arbitrarily zeroed coefficients.

Limited testing suggests that when there are coefficients that are exactly
zero, coefficients OTHER THAN THE ASSUMED INTERCEPT TERM (apparently
calculated by subtraction at the center of data) are correct. If you
subtract off the fitted cubic and linear terms from the OP's y-data and then
fit a quadratic (with intercept) without a linear term, then you get the
correct missing coefficients.

Jerry

Harlan Grove

unread,

Jun 20, 2005, 1:01:14 PM6/20/05

to

Jerry W. Lewis wrote...
...

>Limited testing suggests that when there are coefficients that are exactly
>zero, coefficients OTHER THAN THE ASSUMED INTERCEPT TERM (apparently
>calculated by subtraction at the center of data) are correct. If you
>subtract off the fitted cubic and linear terms from the OP's y-data and then
>fit a quadratic (with intercept) without a linear term, then you get the
>correct missing coefficients.

You've already acknowledged that Microsoft reinvented the wheel here.
It's almost funny that they seem to have come up with yet another
approach not as good as the one they use in chart trendlines. Or do you
mean that XL2003 LINEST is more accurate than chart trendlines for the
coefficients it happens to get right on the first pass?

Jerry W. Lewis

unread,

Jun 21, 2005, 9:05:16 AM6/21/05

to

My recollection is that the chart is slightly more accurate for very
ill-conditioned polynomials such as
http://groups.google.com/groups?selm=cEe%254.29665%24Gj5.531879%40news-east.usenetserver.com

I am not party to the code used by the chart polynomial trendline (if I
were I would probably be enjoined by NDA from discussing it). It is
possible that makes use of the fact that the model is a polynomial,
which would be inappropriate for the more general LINEST function. At
this point, I would be thrilled to get a LINEST that is not subject to
the cancellation problems inherent in forming the normal equations
(pre-2003) and does not mistakenly zero part of the solution.

Jerry