I think you should also be concerned about the "insidious problem" you may
encounter from overfitting the data when using a sixth-order polynomial.
- Mike
www.mikemiddleton.com
"mathman" <mat...@discussions.microsoft.com> wrote in message
news:D40DCE7E-DA42-4D19...@microsoft.com...
>I have used LINEST to provide polynomial coefficients in Excel97 through
> Excel 2002 with no problems. My company has loaded Excel 2003 and now all
> the
> existing spreadsheets give the wrong answers but no error messages. I have
> read the KB articles on LINEST in 2003 and none give a clue to the
> problem.
>
> I use two columns of data for x and y with in this example a row of 7
> cells
> highlighted with this array formula:
>
> =LINEST(AN3:AN11,AM3:AM11^{1,2,3,4,5,6})
>
> Excel 97 -2002 dutifully reports 7 cells with the coeffs of
> X6,X5,X4,X3,X2,X1 & X0
> Excel 2003 sets X^6 to zero regardless of the data set and reports a
> wildly
> wrong value for the constant.
>
> I have tested this on a clean build of Excel 2003 with the Analysis Tool
> pack loaded as well as several of our 8000 corporate desktops. I cannot
> find
> any reference to this insidious problem on the internet, has anyone else
> experienced it?
>
"Gary's Student" wrote:
> Please check out Jerry W. Lewis' posting of 5/6/2005
> --
> Gary's Student
I have tried multiple linear fitting and that fails in the same way in Excel
2003 but works perfectly in Excel 97, Excel 2000 & Excel 2002
See
(or http://makeashorterlink.com/?A19821A3B ).
=TRANSPOSE(MMULT(MINVERSE(MMULT(TRANSPOSE(xCol^{0,1,2,3,4,5,6}),
xCol^{0,1,2,3,4,5,6})), MMULT(TRANSPOSE(xCol^{0,1,2,3,4,5,6}), yCol)))
Note the explicit zero power in this formula.
In addition to the potential for overfitting the data, I would be
concerned about numerical stability of the calculation (the reason that
Excel 2003 changed its approach).
Plot the data as an "XY (Scatter)" chart and add a 6th degree polynomial
trendline (format to scientific notation with 14 decimal places); if
there is an appreciable difference between the graph and formula
coefficients, then the graph coefficients are better.
David Braden has posted code to directly extract coefficients from the
displayed chart trendline equation into cells
http://groups.google.com/groups?selm=dbraden-1C662A.14054705032003%40msnews.microsoft.com
Jerry
Meaning that Microsoft is still using different pieces of code for
LINEST and chart trend lines? And the older, trend line code is
better? If so, it begs the question why Microsoft squandered their
money changing the LINEST code rather than just using the trend
line code for LINEST.
It has been a while since I did the comparisons, but my vague
recollections are that:
- Chart is slightly better numerically than 2003 LINEST (when no
coefficient estimates are exactly zero)
- 2003 LINEST (when no coefficient estimates are exactly zero) is
comparable to lm() in S-PLUS and R, and thus is far superior to previous
versions of LINEST (except for the anomaly under discussion).
- 2003 LINEST appears to zero "small" coefficients, much like the
difference between =227.82-(227+0.75+0.05+0.02) and
=(227.82-(227+0.75+0.05+0.02)). Thankfully MS did not introduce this
"improvement" into the chart trendline.
- The nonzero coefficients from 2003 LINEST are correct for the full
problem, so you might be able (a new [untested] idea) to fit the terms
that gave zero coefficients to the residuals from the original 2003
LINEST fit, to restore the arbitrarily zeroed coefficients.
Jerry
> It has been a while since I did the comparisons, but my vague
> recollections are that:
...
> - The nonzero coefficients from 2003 LINEST are correct for the full
> problem, so you might be able (a new [untested] idea) to fit the terms
> that gave zero coefficients to the residuals from the original 2003
> LINEST fit, to restore the arbitrarily zeroed coefficients.
Limited testing suggests that when there are coefficients that are exactly
zero, coefficients OTHER THAN THE ASSUMED INTERCEPT TERM (apparently
calculated by subtraction at the center of data) are correct. If you
subtract off the fitted cubic and linear terms from the OP's y-data and then
fit a quadratic (with intercept) without a linear term, then you get the
correct missing coefficients.
Jerry
You've already acknowledged that Microsoft reinvented the wheel here.
It's almost funny that they seem to have come up with yet another
approach not as good as the one they use in chart trendlines. Or do you
mean that XL2003 LINEST is more accurate than chart trendlines for the
coefficients it happens to get right on the first pass?
I am not party to the code used by the chart polynomial trendline (if I
were I would probably be enjoined by NDA from discussing it). It is
possible that makes use of the fact that the model is a polynomial,
which would be inappropriate for the more general LINEST function. At
this point, I would be thrilled to get a LINEST that is not subject to
the cancellation problems inherent in forming the normal equations
(pre-2003) and does not mistakenly zero part of the solution.
Jerry