Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Calculate slope of a data set or trend line

83 views
Skip to first unread message

kux

unread,
Mar 24, 2006, 4:36:32 PM3/24/06
to
Hello everyone,
I hope someone is out here who can help me with a simple calculation...


I have a sales data base in access with monthly sales history by
product. to make future predictions I need the slope (representing
growth) of monthly sales.
the data is setup in a crosstab with the months in different collums,
say column 1 to 12 for last year. I want to calculate the slope for
every row (on the same query or another if needed)

Does anyone have an idea how to calculate the slope of the trendline
for those values? In excel I can just use =slope(known y's, known x's).
the slope i need is from the trendline in common statistics (I believe
it is calcuated by minimizing the sum of the squared deviation - OLS)

I cannot use an ordinary excel export (too many lines).

Anyone an idea how to work around that?

Help is much appreciated,

Thanks
Kai

kux

unread,
Mar 24, 2006, 4:36:40 PM3/24/06
to

tommaso....@uniroma1.it

unread,
Mar 25, 2006, 3:17:36 PM3/25/06
to
In your case, if the months are m=1,2...12 and, for each record
you have the 12 observations: y(1)...y(m)...y(12),

the slope is given by:

[ sum for m = 1 to 12 of (m-6.5) * y(m) ] / 143

PS
In the particular case when y(m) = b * m for each m,
you should get slope = b where b is any real

-tom

kux ha scritto:

CDMAP...@fortunejames.com

unread,
Mar 26, 2006, 1:52:23 AM3/26/06
to

Data points: (X1,Y1), ..., (XN, YN)

Best Fit Line: Y = AX + B

Error from the line: Ei = Yi - AXi - B

Square of each error:

Ei^2 = Yi^2 + A^2 Xi^2 + B^2 - 2A XiYi - 2B Yi + 2AB Xi

Sum of the squares of the errors:

Sum(Ei^2) = Sum(Yi^2) + A^2 Sum(Xi^2) + B^2 Sum(1) - 2A Sum(XiYi) - 2B
Sum(Yi) + 2AB Sum(Xi), i = 1,...,N

Taking the partial derivatives with respect to A and B and setting them
to 0,

w.r.t. A: 0 = 2A Sum(Xi^2) - 2 Sum(XiYi) + 2B Sum(Xi)
w.r.t. B: 0 = 2BN + 2A Sum(Xi) -2Sum(Yi)

Rearranging so that A and B are the variables:

Sum(Xi^2) A + Sum(Xi) B = Sum(XiYi)
Sum(Xi) A + N B = Sum(Yi)

as a Matrix Equation:

- - - - - -
| Sum(Xi^2) Sum(Xi) | * | A | = | Sum(XiYi) |
| Sum(Xi) N | | B | | Sum(Yi) |
- - - - - -

As long as Abs(N Sum(Xi^2) - [Sum(Xi)]^2) is not miniscule, Cramer's
rule will work without numerical instabilities:

A = [N Sum(XiYi) - Sum(Xi) Sum(Yi)] / [N Sum(Xi^2) - Sum(Xi) ^ 2]
B = [Sum(Xi) ^ 2 Sum(Yi) - Sum(Xi) Sum(XiYi)] / [N Sum(Xi^2) - Sum(Xi)
^ 2]

where each sum is performed on i = 1,...,N

So if tblData has N records of (X, Y) values:

SELECT SUM(1) * SUM(X*X) - SUM(X) * SUM(X) AS Denom, (SUM(1) * SUM(X *
Y) - SUM(X) * SUM(Y)) / Denom AS A, (SUM(X * X) * SUM(Y) - SUM(X) *
SUM(X * Y)) / Denom AS B FROM tblData;

should return unrounded values for A and B.

I didn't check the equations against a book but they seem to be giving
the correct results for a couple of simple data sets. Least squares
regression of physical data often results in miniscule Denom values.
You'll know that numerical instabilities are creeping in when the
answers you get no longer solve the 2 X 2 matrix equation. In those
cases, alternate matrix solutions such as partial or full (matrix)
pivoting during gaussian elimination, may be required. A new wrinkle
is added when the numbers are in rows. Perhaps create a public
function with 12 TotalMonthlySales arguments, that returns A (the
slope) for an N value of 12.

I hope this helps,

James A. Fortune
CDMAP...@FortuneJames.com

tommaso....@uniroma1.it

unread,
Mar 26, 2006, 12:13:22 PM3/26/06
to
In this case, since the x's are not any real, but a sequence of
natural, it is convenient to take into consideration that the sum of
the first m natural is equal to (m+1) m / 2 (which Gauss discovered
when he was a little child). Also translating the x's as to center them
on the x axis, makes the average x equal to 0. Finally one gets a very
simplified version of the formula (that one i provided). The 2 formulas
are equivalent, as to the final result (but not as computational
effort).

-tom

CDMAP...@FortuneJames.com ha scritto:

CDMAP...@fortunejames.com

unread,
Mar 27, 2006, 11:32:32 AM3/27/06
to
tommaso....@uniroma1.it wrote:
> In this case, since the x's are not any real, but a sequence of
> natural, it is convenient to take into consideration that the sum of
> the first m natural is equal to (m+1) m / 2 (which Gauss discovered
> when he was a little child). Also translating the x's as to center them
> on the x axis, makes the average x equal to 0. Finally one gets a very
> simplified version of the formula (that one i provided). The 2 formulas
> are equivalent, as to the final result (but not as computational
> effort).
>
> -tom

Tom,

I think that using the fact that the Xi are integers to simplify the
calculation is an excellent idea. I think I remember reading that
Gauss was five when he put the two triangles together to get a
rectangle. I probably should have capitalized Gaussian Elimination.
My favorite technique for more difficult summations is the technique of
finite integration using an Inverse Delta operator that acts in a way
that is reminiscent of the integral operator. This technique is shown
in:

http://www.amazon.com/gp/product/B0007H8EP4/sr=8-7/qid=1143474259/ref=sr_1_7/104-9316705-5699137?%5Fencoding=UTF8

(Calculus of Finite Differences by George Boole)

A powerful way to obtain a formula for an unknown series of integers is
to keep taking differences until you get zeroes. I think that series
of integers that are based on polynomials always reach a row of zeroes
eventually:

Sum(i^2):
1 5 14 30 55 91
4 9 6 25 36
5 7 9 11
2 2 2
0 0

The 1,4,5,2 values are used as coeffients in some kind of polynomial
expansion to get an expression for the nth value in the series. I'll
try to work out the details. You can pretend that the sequence for the
sum(i^2) is an unknown sequence and get the formula that way. The
problem here is that the resulting polynomial usually needs to be
cleaned up to get it into its "Aha, that's so obvious!" simplified
form. These approaches are much more fun than looking up the summation
formulae.

James A. Fortune
CDMAP...@FortuneJames.com

Kux

unread,
Mar 27, 2006, 2:36:37 PM3/27/06
to
Hello Tom,
thanks a lot for that formula. It works great with the data i checked.

but how do you get to 143?
I need to do similar calculations with other time frames as well (24,
36 and 48 months). I could always try it out but I would rahter enter a
formula in case someone changes the lenght of the period later...

THanks

Kai

tommaso....@uniroma1.it

unread,
Mar 27, 2006, 2:57:34 PM3/27/06
to
Very good and interesting James!!

The problem now is to write a nice and simple SQL expression which will
yield the slope for each row (record), as originally requested (here
the intercept coefficient is of no interest, as the x's can be
translated arbitrarily on their axis).

I will leave that to you, who are much more knowledgeable than I about
SQL.
I think it would be much useful and appreciated by the group.

con i migliori saluti,

tommaso

CDMAP...@FortuneJames.com ha scritto:

tommaso....@uniroma1.it

unread,
Mar 27, 2006, 4:17:45 PM3/27/06
to
Hi Kai, sorry for ignoring your second question, I missed it.

I hope I got it right (I have done some quick hand computation)...

In a more general case, where one has H consecutive equispaced time
occasions
y(1)...y(m)...y(H), the slope should be:

[ sum for m = 1 to H of (m - (H+1)/2) * y(m) ] / D

where: D = H * (H ^ 2 - 1) / 12


or, if you prefer: D = H * (H + 1) * (H - 1) / 12


-tommaso


PS
let me know if it works fine

Kux ha scritto:

Kux

unread,
Mar 27, 2006, 5:07:32 PM3/27/06
to
Hey Tom,
to be honest, I dont understand why the formula looks like it does. but
it WORKS!

...and in the meantime I found that it is also the sum for m = 1 to 12
of (m-6.5)^2. (you can replace 12 with H and 6.5 with (H+1)/2)
That holds true at least for the values I checked. I would be surprised
if that is a coincidence but could not say why it works.

thank you so much. That completely solved my problem.

kai

CDMAP...@fortunejames.com

unread,
Mar 27, 2006, 5:46:15 PM3/27/06
to
CDMAP...@FortuneJames.com wrote:

> The 1,4,5,2 values are used as coeffients in some kind of polynomial
> expansion to get an expression for the nth value in the series. I'll
> try to work out the details. You can pretend that the sequence for the
> sum(i^2) is an unknown sequence and get the formula that way. The

I think I see how to get the polynomial expansion. Consider:

1 5 14
4 9
5

The 1 contributes once, the 4 twice and the five once. Think of
Pascal's Triangle rotated clockwise 60 degrees.

f(n) = Sum(Ai * C(n-1, i)) over i = 0 to n-1

C(m,n) = m! / (m-n)! n!

Example:

A0 = 1
A1 = 4
A2 = 5
A3 = 2

1 * 1 + 4 * (n-1) + 5 * (n-1) * (n-2) / 2 + 2 * (n-1) * (n-2) * (n-3) /
6

= (2n+1) (n+1) (n) / 6

So Sum(i^2) over i = 1 to n = (2n+1) (n+1) (n) / 6.

This is the only example I checked.

James A. Fortune
CDMAP...@FortuneJames.com

0 new messages