I have a sales data base in access with monthly sales history by
product. to make future predictions I need the slope (representing
growth) of monthly sales.
the data is setup in a crosstab with the months in different collums,
say column 1 to 12 for last year. I want to calculate the slope for
every row (on the same query or another if needed)
Does anyone have an idea how to calculate the slope of the trendline
for those values? In excel I can just use =slope(known y's, known x's).
the slope i need is from the trendline in common statistics (I believe
it is calcuated by minimizing the sum of the squared deviation - OLS)
I cannot use an ordinary excel export (too many lines).
Anyone an idea how to work around that?
Help is much appreciated,
Thanks
Kai
the slope is given by:
[ sum for m = 1 to 12 of (m-6.5) * y(m) ] / 143
PS
In the particular case when y(m) = b * m for each m,
you should get slope = b where b is any real
-tom
kux ha scritto:
Data points: (X1,Y1), ..., (XN, YN)
Best Fit Line: Y = AX + B
Error from the line: Ei = Yi - AXi - B
Square of each error:
Ei^2 = Yi^2 + A^2 Xi^2 + B^2 - 2A XiYi - 2B Yi + 2AB Xi
Sum of the squares of the errors:
Sum(Ei^2) = Sum(Yi^2) + A^2 Sum(Xi^2) + B^2 Sum(1) - 2A Sum(XiYi) - 2B
Sum(Yi) + 2AB Sum(Xi), i = 1,...,N
Taking the partial derivatives with respect to A and B and setting them
to 0,
w.r.t. A: 0 = 2A Sum(Xi^2) - 2 Sum(XiYi) + 2B Sum(Xi)
w.r.t. B: 0 = 2BN + 2A Sum(Xi) -2Sum(Yi)
Rearranging so that A and B are the variables:
Sum(Xi^2) A + Sum(Xi) B = Sum(XiYi)
Sum(Xi) A + N B = Sum(Yi)
as a Matrix Equation:
- - - - - -
| Sum(Xi^2) Sum(Xi) | * | A | = | Sum(XiYi) |
| Sum(Xi) N | | B | | Sum(Yi) |
- - - - - -
As long as Abs(N Sum(Xi^2) - [Sum(Xi)]^2) is not miniscule, Cramer's
rule will work without numerical instabilities:
A = [N Sum(XiYi) - Sum(Xi) Sum(Yi)] / [N Sum(Xi^2) - Sum(Xi) ^ 2]
B = [Sum(Xi) ^ 2 Sum(Yi) - Sum(Xi) Sum(XiYi)] / [N Sum(Xi^2) - Sum(Xi)
^ 2]
where each sum is performed on i = 1,...,N
So if tblData has N records of (X, Y) values:
SELECT SUM(1) * SUM(X*X) - SUM(X) * SUM(X) AS Denom, (SUM(1) * SUM(X *
Y) - SUM(X) * SUM(Y)) / Denom AS A, (SUM(X * X) * SUM(Y) - SUM(X) *
SUM(X * Y)) / Denom AS B FROM tblData;
should return unrounded values for A and B.
I didn't check the equations against a book but they seem to be giving
the correct results for a couple of simple data sets. Least squares
regression of physical data often results in miniscule Denom values.
You'll know that numerical instabilities are creeping in when the
answers you get no longer solve the 2 X 2 matrix equation. In those
cases, alternate matrix solutions such as partial or full (matrix)
pivoting during gaussian elimination, may be required. A new wrinkle
is added when the numbers are in rows. Perhaps create a public
function with 12 TotalMonthlySales arguments, that returns A (the
slope) for an N value of 12.
I hope this helps,
James A. Fortune
CDMAP...@FortuneJames.com
-tom
CDMAP...@FortuneJames.com ha scritto:
Tom,
I think that using the fact that the Xi are integers to simplify the
calculation is an excellent idea. I think I remember reading that
Gauss was five when he put the two triangles together to get a
rectangle. I probably should have capitalized Gaussian Elimination.
My favorite technique for more difficult summations is the technique of
finite integration using an Inverse Delta operator that acts in a way
that is reminiscent of the integral operator. This technique is shown
in:
(Calculus of Finite Differences by George Boole)
A powerful way to obtain a formula for an unknown series of integers is
to keep taking differences until you get zeroes. I think that series
of integers that are based on polynomials always reach a row of zeroes
eventually:
Sum(i^2):
1 5 14 30 55 91
4 9 6 25 36
5 7 9 11
2 2 2
0 0
The 1,4,5,2 values are used as coeffients in some kind of polynomial
expansion to get an expression for the nth value in the series. I'll
try to work out the details. You can pretend that the sequence for the
sum(i^2) is an unknown sequence and get the formula that way. The
problem here is that the resulting polynomial usually needs to be
cleaned up to get it into its "Aha, that's so obvious!" simplified
form. These approaches are much more fun than looking up the summation
formulae.
James A. Fortune
CDMAP...@FortuneJames.com
but how do you get to 143?
I need to do similar calculations with other time frames as well (24,
36 and 48 months). I could always try it out but I would rahter enter a
formula in case someone changes the lenght of the period later...
THanks
Kai
The problem now is to write a nice and simple SQL expression which will
yield the slope for each row (record), as originally requested (here
the intercept coefficient is of no interest, as the x's can be
translated arbitrarily on their axis).
I will leave that to you, who are much more knowledgeable than I about
SQL.
I think it would be much useful and appreciated by the group.
con i migliori saluti,
tommaso
CDMAP...@FortuneJames.com ha scritto:
I hope I got it right (I have done some quick hand computation)...
In a more general case, where one has H consecutive equispaced time
occasions
y(1)...y(m)...y(H), the slope should be:
[ sum for m = 1 to H of (m - (H+1)/2) * y(m) ] / D
where: D = H * (H ^ 2 - 1) / 12
or, if you prefer: D = H * (H + 1) * (H - 1) / 12
-tommaso
PS
let me know if it works fine
Kux ha scritto:
...and in the meantime I found that it is also the sum for m = 1 to 12
of (m-6.5)^2. (you can replace 12 with H and 6.5 with (H+1)/2)
That holds true at least for the values I checked. I would be surprised
if that is a coincidence but could not say why it works.
thank you so much. That completely solved my problem.
kai
> The 1,4,5,2 values are used as coeffients in some kind of polynomial
> expansion to get an expression for the nth value in the series. I'll
> try to work out the details. You can pretend that the sequence for the
> sum(i^2) is an unknown sequence and get the formula that way. The
I think I see how to get the polynomial expansion. Consider:
1 5 14
4 9
5
The 1 contributes once, the 4 twice and the five once. Think of
Pascal's Triangle rotated clockwise 60 degrees.
f(n) = Sum(Ai * C(n-1, i)) over i = 0 to n-1
C(m,n) = m! / (m-n)! n!
Example:
A0 = 1
A1 = 4
A2 = 5
A3 = 2
1 * 1 + 4 * (n-1) + 5 * (n-1) * (n-2) / 2 + 2 * (n-1) * (n-2) * (n-3) /
6
= (2n+1) (n+1) (n) / 6
So Sum(i^2) over i = 1 to n = (2n+1) (n+1) (n) / 6.
This is the only example I checked.
James A. Fortune
CDMAP...@FortuneJames.com