Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Derivation of James' Pythagorean Formula (Long)

756 views
Skip to first unread message
Message has been deleted

iosx...@gmail.com

unread,
Dec 5, 2012, 7:10:04 AM12/5/12
to
Τη Δευτέρα, 14 Ιουλίου 2003 10:06:02 μ.μ. UTC+3, ο χρήστης Hein Hundal έγραψε:
> I tried to post this article earlier today, but it didn't seem to get
> posted. Here is my second try.
>
>
>
>
> Bill James developed the "Pythagorean" win formula to estimate the
> number of wins a baseball team should have in one season given the
> runs scored (RS) and runs against (RA). His surprisingly accurate
> formula,
>
> estimated win percent = RS^1.83/(RA^1.83 + RS^1.83)
>
> was empirically derived
> (http://www.baseball.reference.com/about/faq.shtml). This note
> describes a derivation for the formula and provides a formula for a
> good exponent in the Pythagorean win formula.
>
>
> DERIVATION OF WIN PERCENT
>
> One way to derive a win formula from runs is to assume a distribution
> for the runs scored by each team, assume the team run distributions
> are independent so that you can multiply them to get a joint
> distribution of runs, and then sum the probabilities of winning
> situations. Jim Ferry used this procedure to get a win formula by
> assuming a Poisson distribution for the runs scored (sci.math,
> rec.puzzles 2003-06-24). The Poisson distribution is natural because
> it produces probabilities for all the possible runs: 0, 1, 2, 3, ....
> Ferry notes that the Poisson distribution does not fit the actual runs
> distribution in baseball, but he is able to work out a formula for win
> percentage based on this distribution.
>
>
> There are several other common distributions used in statistics: the
> binomial, normal, log normal, and Raleigh could all be used to model
> runs scored and thus each distribution could produce a estimated win
> formula. One of these distributions, the lognormal, produces a win
> formula which almost exactly matches James's Pythagorean formula.
> (The lognormal win formula differs from the Pythagorean by less than
> 0.0003 over the typical range of RS and RA).
>
>
>
> If you assume that the runs for each game are produced by independent
> continuous distributions with density functions Dx[x] and Dy[y] for
> each team, then the win percent is simply
>
> winper = Integrate[ Integrate[ Dx[x]*Dy[y], {y, 0, x}], {x, 0,
> Infinity}]
>
> (The notation Integrate[ f[x], {x,a,b}] means the integral of f[x]
> from x=a to x=b.)
>
> If we substitute log normal distributions, then resulting integral is
>
>
> winper = Integrate[ Integrate[ 1/(2 x y sigma^2 Pi) *
> Exp[ -(Log[x/mux]^2 + Log[y/muy]^2)/(2 sigma^2) ],
> {y, 0, x}], {x,0, Infinity}].
>
> where 'mux' is the geometric average of the first team's runs and
> 'muy' is the geometric average of the other teams' runs. Changing
> variable to a = Log[x] and b=Log[y] gives
>
> winper = Integrate[ Integrate[ 1/(2 sigma^2 Pi) *
> Exp[ -((a- Log[mux])^2 + (b - Log[muy])^2)/(2 sigma^2) ],
> {b, -Infinity, a}], {a,-Infinity, Infinity}].
>
> Another change of variable to delta = a-b and integrating gives
>
> winper = Integrate[ 1/(2 sigma Sqrt[Pi]) *
> Exp[-(delta-Log[mux/muy])^2 / (4 sigma^2)],
> {delta, 0, Infinity}]
>
> which simplifies to
>
> winper = 1/2 * (1 + Erf[ Log[mux/muy]/ (2 sigma)] )
>
> which we will call the LogNormal Formula.
>
>
>
>
> COMPARISON OF JAMES PYTHAGOREAN FORMULA AND THE LOGNORMAL FORMULA
>
>
> If we approximate mux/muy by RS/RA and call that r, then the LogNormal
> Formula becomes
>
> winperlognorm = 1/2 * (1 + Erf[ Log[r]/ (2 sigma)] ).
>
> Rewriting the Pythagorean formula in terms of r and letting c be the
> James Pythagorean exponent gives
>
> winperpythag = r^c/(r^c + 1).
>
> Computing the first three terms of the Taylor series of each about r=1
> :
>
> 2
> 1 r-1 (r - 1 )
> winperlognorm = - + ---------------- - --------------- + ...
> 2 2 Sqrt[Pi] sigma 4 Sqrt[Pi] sigma
>
> 2
> 1 c (r - 1) c (r - 1)
> winperpythag = - + --------- - ---------- + ...
> 2 4 8
>
> Notice that the first three terms of the Taylor Series will be equal
> if
>
>
>
> 2
> James Pythagorean Exponent = c = --------------
> Sqrt[Pi] sigma
>
>
>
> where sigma is the standard deviation of the Log of the runs
> distribution. sigma can be approximated by
>
> sigma ~= StandardDeviation(runs scored)/average(runs scored).
>
>
> For baseball, sigma = 0.617 and c = 2/Sqrt[Pi]/sigma ~= 1.83 give
> almost identical results. (They differ by less than 0.0003 between
> r=.8 and r=1.2). The formulas also match exactly at r = 0, r = 1, and
> r = Infinity.
>
> The Pythagorean Exponent formula also explains why a higher value for
> c is required to estimate win percent from basketball scores. In
> basketball the standard deviation of points scored divided by the
> average points scored is closer to sigma = 0.1 which give a much
> higher value for c.
>
>
> SUMMARY
>
> James pythagorean formula
>
> winper = RS^c/(RS^c + RA^c)
>
> is almost exactly the same as the win percent formula derived from the
> assumption of independent lognormal run distributions. These formulas
> are related by the exponent formula
>
> 2
> James Pythagorean Exponent = c = --------------
> Sqrt[Pi] sigma
>
> where sigma is the standard deviation the log of runs which is
> approximately equal to the standard deviation of runs scored divided
> by the average number of runs scored.

WE NEED TO TALK IMMIDIETLY!!!!!!!!!!!!!!!!!!! CONTACT ME SKYPE ID xaralambos.iosifidis
0 new messages