Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Trend line equation error

2,350 views
Skip to first unread message

Robin Winsor

unread,
Apr 23, 2001, 12:00:21 PM4/23/01
to
I'm using the "Display equation on chart" option within the trendline
function to fit a polynomial curve and see the resultant equation. The
reason for wanting the equation is that I have two curves which I want to
multiply. To do that I need the equation so I can replot them with evenly
spaced datapoints. So here's the problem:

Excel gives an equation which, when used to construct a curve, does not
match the trendline drawn on the chart. Its close for much of the curve but
it goes unstable at the end. The higher the order of polynomial the worse
it gets. Does anyone know why? I am wondering about rounding errors or
something of the sort. If that is the problem, I wonder if there is a way
to increase the accuracy of the coefficients. If thats not it I can only
think it must be some sort of bug although I've never actually seen a real
bug in Excel's calculations before.

I'd appreciate any help or thoughts on the problem.

Robin


Tony

unread,
Apr 23, 2001, 1:39:36 PM4/23/01
to
What you see is an expected, although unpleasant, behavior of polynomial
fits--they fit *the points*, but don't know anything what you expect from them
beyond the range (extrapolation). The higher the degree of the polynomial, the
wilder the swings as the polynomial becomes too flexible to fit your data points.

Besides, One way to improve on this property is to use the lowest-order polynomial
you can get away with, another is to use a monotonic function, such as
logarithmic, exponential or power curve, instead of an n-th degree polynomial.
Another trick is to place dummy points in the extrapolated range to get the
polynomial to behave *within* your desired range. Yet another trick is to
increase the density of your data by either increasing the frequency of
the collected data points or by piecewise interpolation (quadratic will usually
do).

Tony

Robin


.

Robin Winsor

unread,
Apr 23, 2001, 2:10:24 PM4/23/01
to
Tony,

Thanks for taking a look. I agree that polynomials don't like to behave
well outside of the constrained area but the ones excel is listing in the
displayed equation don't even behave inside the constrained area. For
example my x values go from 0 to 4.5 with data points approximately every
0.2. The y values are 0 to 1 (Its a plot of modulation versus spatial
frequency). The excel trend lays nicely along the data points when I use a
5th order poly. If I resample based on the equation it becomes unstable at
x=3.2 - not even three quarters of the way through the series!. Clearly
excel is not actually using the equation as it is displayed. I've been
looking into VBA to see if I can get my hands on the coefficients that way
but they don't seem to be available.

Robin

"Tony" <t_g...@yahoo.com> wrote in message
news:1063b01c0cc1c$5d4a87d0$19ef2ecf@tkmsftngxa01...

Bernard Liengme

unread,
Apr 23, 2001, 2:50:51 PM4/23/01
to Robin Winsor
Hi Robin,
A) Have you formatted the trendline equation to show more sig figs?
B) Do you want to use LINEST to get the values of the coefficients of the
polynomial into spreadsheet cells? If so please visit
http://www.stfx.ca/people/bliengme and look at Excel Tips & tricks
Best wishes
Bernard

David J. Braden

unread,
Apr 23, 2001, 3:27:44 PM4/23/01
to
Robin,
I take exception to what Tony writes, at least as I understand your post.
The trendline is supposed to be drawn according to itscoefficients, however
derived. Whether or not you are extrapolating is immaterial here, as is the
degree of your polynomial *as far as its depiction is concerned*.

Click on the label for the trandline, the one that displays the underlying
equation, and choose Format -> Selected Data Labels. Click on the Number
tab to increase the precision displayed for the coeficients. If you were to
graph the equation with enough precision, it will coincide with the
trendline over the entire range.

I assume you have a pretty good reason for fitting the polynomial (e.g.,
nonmonotonicity of the x-y relationship). It is a pretty drastic approach
to smoothing each of the curves, however, before doing your calculation.
Might I suggest a less parametric approach or two?

(1) Use (piecewise) linear interpolation of your curves; this is how they
would be drawn individually with smoothing turned off. Several people have
posted nice VBA routines for doing this; Myrna Larson's is my favorite. You
could then use it to get y-values for each of your curves ove a grid of
x-values, and go from there.

(2) Same as (1), but use a cubic spline for the interpolation. This is
*approximately* what Excel is doing with smoothing turned on. I posted code
some months back for doing this; a single call to the UDF with the x-data,
y-data, and series of points you want the spline evaluated at will do the
trick.

Both of these approaches are far better at revealing detail than is
multiplying regressed curves.

Lastly, why evenly spaced datapoints??? You don't need that to do this. If,
for example, some of your data are more clustered in a certain range than
elsewhere, wouldn't you like more of the points at which you are multipling
these curves to be from such a region? It's easy to do.
(a) Copy all of the x-values from each of the series into the same column,
and sort it. Assume this is in column A, starting in cell A1.
(b) Suppose you have n values altogether in A. Then in B1, enter 1/(2*n), in
B2 enter =B1+(1/n), where n is the actual value, then drag-copy that down to
the last row of your data.
(c) If you want m points to sample the x-range from, then choose the first
point x1 to be the 1/(2*m)-th percentile of the data, x2 to be the (1/m+
1/(2*m))-th percentile, etc., so that the k-th point is given by the
(2*k-1)/(2*m)-th percentile. You will likely need linear interpolation to
get the value of x that you need, but the whole process will be quicker to
do than it took you and me to have this exchange. This approach gives you a
maximum likelihood nonparametric approximation to the support of the x-data.

HTH
Dave Braden
MVP - Excel

ps - Excel *does* have calculation errors, but let's not get started on that
one <vbg>

Robin Winsor <rwi...@xrayimaging.com> wrote in message
news:#7Bbx7AzAHA.2096@tkmsftngp02...

Tony Gozdz

unread,
Apr 23, 2001, 7:48:58 PM4/23/01
to
Unless I misunderstood the original post, it seems to me that two things are
happening in Robin's case. First, the fact that the listed coefficients do not
recover the trend line indicates that the displayed precision is not what the
program uses internally; Bernard and Dave have suggested a solution. If you have
a polynomial of a higher degree (say, >3-5), small changes in the coefficients
will result in large deviations.

Second, we may also have here a case of two differing points of view on the
problem: one of a very able programmer, Dave, and another of a non-programmer
(myself) who has had for a long time to try to guess probable relationships from
imprecise data. In principle, I do not disagree with the two solutions suggested
by Dave, but depending on the magnitude of error for each separate point, the
resulting interpolated points will still be in error--which will propagate. You
should multiply smoothed (most probable and physically relevant) curves, not
individual interpolated points.

The trend line is supposed to give a smoothed functional relationship based on
some theoretically justifiable dependence. It is *not* supposed to give a perfect
fit to every error-laden experimental point as it happens in Robin's case, which
while perfect, is a nonsensical solution. If you have n points, you can *always*
find an (n-1)-th degree polynomial that will give a perfect fit, but won't make
any physical sense as far as the underlying relationship is concerned.

What do you think, Robin and Dave?

Tony

Martin Brown

unread,
Apr 24, 2001, 3:50:43 AM4/24/01
to

Bernard Liengme wrote:

> Hi Robin,
> A) Have you formatted the trendline equation to show more sig figs?
> B) Do you want to use LINEST to get the values of the coefficients of the
> polynomial into spreadsheet cells? If so please visit
> http://www.stfx.ca/people/bliengme and look at Excel Tips & tricks

Setting sig fig to 16 or so ought to do it for the trendline equation on the
graph.

LINEST often gets the *wrong* answer for polynomials of order 3 or above.

For some reason MS used *different* algorithms for LINEST & graph trendline.

The one used in graph trendline gives the more accurate result, provided that you
extract the coefficients with sufficient significant digits. It has been discussed
here before.

Regards,
Martin Brown


Robin Winsor

unread,
Apr 24, 2001, 7:16:02 PM4/24/01
to
Wow, thats a lot of good ideas. Thanks to all of you for taking the time.
I ended up using =linest(yrange,xrange^{1,2,3,4,5,6}) to find the
coefficients and =index(linest(yrange,xrange^{1,2,3,4,5,6},,TRUE),3,1) for
the goodness of fit. If anyone is looking at this in the archives later,
don't forget that first one is an array formula so highlight 7 cells in a
row before entering that and hit Ctrl+Shift+Enter after typing it, not just
Enter. The trick of increasing sig figs in the displayed equation was a new
one for me and will surely come in handy in the future.

As for the validity of the exercise, yes, its true that with a high enough
order polynomial you can hold your nose and jam in almost anything. In this
case, my data fit well even at order 3 but the last few points on the tail
were not being honored and thats an area which I really needed to fit. 4th
and 5th orders were worse but 6th fit quite well. The goal remember was to
allow even resampling of the data so that I could work easily between the
two curves. So, mission accomplished and my conscience is clear about using
that ugly old 6th order polynomial!

Thanks again,

Robin


Tushar Mehta

unread,
Apr 25, 2001, 9:09:45 AM4/25/01
to
I am sure that you mean well, but I can't help but smile at the prospect
of Dave Braden sitting patiently as a professor of statistics explains
things to him. There is a certain irony in the mental picture since
Dave could probably teach most professors of statistics a thing or two!

--
Regards,

Tushar Mehta
www.tushar-mehta.com
--
In <uUthRzXzAHA.1932@tkmsftngp05>, R.Venkataraman <vra...@vsnl.net>
wrote
> this is loud thinkiung. fitting trendline is a statistical problem. In
> statistics they use what is called <least square method>. i.e. the differene
> between the actual and predicted is squared and summed up for all the data
> points and then minimised. I hope excel also does the same. A profesor of
> statistics will be able to explain the problem and give a solution.
>
> "David J. Braden" <dbr...@rochester.rr.com> wrote in message
> news:eRmiB1CzAHA.696@tkmsftngp05...

David J. Braden

unread,
Apr 25, 2001, 6:28:54 PM4/25/01
to Tushar Mehta, R.Venkataraman

OK, Tushar, I'll put aside my oceans of humility <g> and take the bait.

Mr. Venkatamaran:

My impression of Robin's post was that Robin had no uncertainty about the
(x,y) points, and was using Trendline to derive a low-order smooth - and
interpolation - of the data. If this is the case, then what I suggested
makes a lot of sense, in that you have an easily-derived curve with
excellent local properties that is extremely easy to justify. I would
suggest that any curve including all the points, or passing near all the
points, might be considered, and the choice of which one necessarily is
subjective. In principle I couldn't rule out an n-1 degree polynomial, for
example, but my esthetic sense would work against it. Linear and cubic
spline interpolation are commonly used techniques, and for presentation
purposes alone shine above other ideas unless there are compelling (and
perhaps cultural) reasons for not doing so.

As far as the statistical problem you allude to, this would be more in
keeping with how Tony (not unreasonably) viewed the problem: that
y = g(x) + e, where E(e | X = x) = 0, and g is some function of x that is
sought. Yes, statisticians do use minimization of squared residuals to
estimate parameters of linear models (linear in the parameters!!), but they
also use many other techniques, especially as they gain advanced training.
Early on they move towards maximum likelihood estimation for parameters; in
the case of a linear regression, this yields identical results to the
classic least-squares approach IF the residuals are distributed iid Normal.
Yet countless real-world applications cannot assume this, so maximum
likelihood is an alternative that is likely preferable. Further, other
weighting schemes are often employed, particularly where Bayesian estimation
is concerned.

The classical approach captured by LINEST relies on a wonderful body of
elegant mathematics that makes a *lot* of assumptions for the sake of
computational tractability. That is no longer need, so frequentists can now
use computer-intensive techniques that go far in relaxing these assumptions.
And the Bayesians among us can finally calculate the stuff we could only
dream about before. It's a wonderful world.

Finally, nonparametric estimation is definitely worth considering here if
Robin's problem is as Tony views it. Computationally, it's a snap. There
are Kernel regression methods such as Naraya-Watson, and, for a nonrandom
design, the Priestly-Chao kernel estimator. The only thing to remember is
to take care of the boundary conditions, and this is very easy and very
fast. Other approaches equally simple to implement are local polynomial
regression (including loess) and spline smoothing. They have excellent
statistical properties, and can reveal all sorts of structure that would be
masked by parametric models.

The only issue remaining for me is the choice of the points xi at which the
regressions are evaluated. I have a strong gut sense that my original
suggestion does the trick. Of course, I welcome any help in better
understanding all of this, as it has been my bread and butter for a few
years now.

Regards,


Dave Braden
MVP - Excel

<snip>

mam...@gmail.com

unread,
Jul 10, 2020, 10:29:05 AM7/10/20
to
Thanks Dave. I was having similar problem while fitting a 5th order polynomial on covid data. I stumbled upon this discussion. I should have checked this aspect given that Microsoft uses every item as an object model which can be independently accessed.

Jibanananda Roy
0 new messages