Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Derivation of James' Pythagorean Formula (Long)
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  2 messages - Expand all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Hein Hundal  
View profile  
 More options Jul 14 2003, 3:06 pm
Newsgroups: rec.puzzles, sci.math, rec.sports.baseball.analysis
From: Hein Hundal <hunda...@aldelphia.net>
Date: Mon, 14 Jul 2003 19:06:27 GMT
Local: Mon, Jul 14 2003 3:06 pm
Subject: Derivation of James' Pythagorean Formula (Long)
I tried to post this article earlier today, but it didn't seem to get
posted.  Here is my second try.

Bill James developed the "Pythagorean" win formula to estimate the
number of wins a baseball team should have in one season given the
runs scored (RS) and runs against (RA).  His surprisingly accurate
formula,

estimated win percent =   RS^1.83/(RA^1.83 + RS^1.83)

was empirically derived
(http://www.baseball.reference.com/about/faq.shtml).  This note
describes a derivation for the formula and provides a formula for a
good exponent in the Pythagorean win formula.

DERIVATION OF WIN PERCENT

One way to derive a win formula from runs is to assume a distribution
for the runs scored by each team, assume the team run distributions
are independent so that you can multiply them to get a joint
distribution of runs, and then sum the probabilities of winning
situations.  Jim Ferry used this procedure to get a win formula by
assuming a Poisson distribution for the runs scored (sci.math,
rec.puzzles 2003-06-24).  The Poisson distribution is natural because
it produces probabilities for all the possible runs: 0, 1, 2, 3, ....
Ferry notes that the Poisson distribution does not fit the actual runs
distribution in baseball, but he is able to work out a formula for win
percentage based on this distribution.

There are several other common distributions used in statistics:  the
binomial, normal, log normal, and Raleigh could all be used to model
runs scored and thus each distribution could produce a estimated win
formula.  One of these distributions, the lognormal, produces a win
formula which almost exactly matches James's Pythagorean formula.
(The lognormal win formula differs from the Pythagorean by less than
0.0003 over the typical range of RS and RA).

If you assume that the runs for each game are produced by independent
continuous distributions with density functions Dx[x] and Dy[y] for
each team, then the win percent is simply

winper = Integrate[ Integrate[ Dx[x]*Dy[y], {y, 0, x}], {x, 0,
Infinity}]

(The notation Integrate[ f[x], {x,a,b}] means the integral of f[x]
from x=a to x=b.)

If we substitute log normal distributions, then resulting integral is

winper = Integrate[ Integrate[ 1/(2 x y sigma^2 Pi) *
             Exp[ -(Log[x/mux]^2 + Log[y/muy]^2)/(2 sigma^2) ],
             {y, 0, x}], {x,0, Infinity}].

where 'mux' is the geometric average of the first team's runs and
'muy' is the geometric average of the other teams' runs.  Changing
variable to a = Log[x] and b=Log[y] gives

winper = Integrate[ Integrate[ 1/(2 sigma^2 Pi) *
             Exp[ -((a- Log[mux])^2 + (b - Log[muy])^2)/(2 sigma^2) ],
             {b, -Infinity, a}], {a,-Infinity, Infinity}].

Another change of variable to delta = a-b and integrating gives

winper = Integrate[ 1/(2 sigma Sqrt[Pi]) *
            Exp[-(delta-Log[mux/muy])^2 / (4 sigma^2)],
            {delta, 0, Infinity}]

which simplifies to

winper  = 1/2 * (1 + Erf[ Log[mux/muy]/ (2 sigma)] )

which we will call the LogNormal Formula.

COMPARISON OF JAMES PYTHAGOREAN FORMULA AND THE LOGNORMAL FORMULA

If we approximate mux/muy by RS/RA and call that r, then the LogNormal
Formula becomes

winperlognorm  = 1/2 * (1 + Erf[ Log[r]/ (2 sigma)] ).

Rewriting the Pythagorean formula in terms of r and letting c be the
James Pythagorean exponent gives

winperpythag  = r^c/(r^c + 1).

Computing the first three terms of the Taylor series of each about r=1
:

                                                   2
                 1         r-1             (r - 1 )
winperlognorm = - + ----------------  - --------------- + ...
                 2   2 Sqrt[Pi] sigma    4 Sqrt[Pi] sigma

                                          2
                 1   c (r - 1)   c (r - 1)
winperpythag =  - + --------- - ---------- + ...
                 2       4           8

Notice that the first three terms of the Taylor Series will be equal
if

                                        2
James Pythagorean Exponent = c = --------------
                                  Sqrt[Pi] sigma

where sigma is the standard deviation of the Log of the runs
distribution.  sigma can be approximated by

sigma ~=  StandardDeviation(runs scored)/average(runs scored).

For baseball, sigma = 0.617 and c = 2/Sqrt[Pi]/sigma ~= 1.83 give
almost identical results.  (They differ by less than 0.0003 between
r=.8 and r=1.2).  The formulas also match exactly at r = 0, r = 1, and
r = Infinity.

The Pythagorean Exponent formula also explains why a higher value for
c is required to estimate win percent from basketball scores.  In
basketball the standard deviation of points scored divided by the
average points scored is closer to sigma = 0.1 which give a much
higher value for c.

SUMMARY

James pythagorean formula

winper = RS^c/(RS^c + RA^c)

is almost exactly the same as the win percent formula derived from the
assumption of independent lognormal run distributions.  These formulas
are related by the exponent formula

                                        2
James Pythagorean Exponent = c = --------------
                                  Sqrt[Pi] sigma

where sigma is the standard deviation the log of runs which is
approximately equal to the standard deviation of runs scored divided
by the average number of runs scored.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
iosxa...@gmail.com  
View profile  
 More options Dec 5 2012, 7:10 am
Newsgroups: rec.puzzles
From: iosxa...@gmail.com
Date: Wed, 5 Dec 2012 04:10:04 -0800 (PST)
Local: Wed, Dec 5 2012 7:10 am
Subject: Re: Derivation of James' Pythagorean Formula (Long)
Τη Δευτέρα, 14 Ιουλίου 2003 10:06:02 μ.μ. UTC+3, ο χρήστης Hein Hundal έγραψε:

WE NEED TO TALK IMMIDIETLY!!!!!!!!!!!!!!!!!!! CONTACT ME SKYPE ID xaralambos.iosifidis

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »