Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Base Runs: a new run estimation formula

210 views
Skip to first unread message

David Smyth

unread,
Dec 30, 1999, 3:00:00 AM12/30/99
to

BASE RUNS

There are three main categories of run estimation formulas.

1) Multiplicative formulas: The primary example is the Bill James original
runs created. These formulas attempt, in effect, to answer the question: How
many runs would a team lineup of nine such batters score?

2) Linear formulas: The most well-known is Palmer's Batting Runs. A recent
entry is Furtado's eXtrapolated Runs. Linear formulas attempt, in effect, to
answer the question: How many runs would the batter add to a large league?

3) Team Context methods: Here we have the new runs created and Tate's
Marginal Lineup Value. These methods attempt, in effect, to answer the
question: How many runs would the batter add to an average team?

A fourth category is systems which use play-by-play data to generate run
estimates which are completely situational. An example is Ruane's Value
Added.

Contrary to what others have written, none of these approaches is inherently
superior, because none of the underlying questions is inherently superior.
They differ primarily in the emphasis on real-world value vs. abstract
isolated ability. The choice depends on the needs, goals, and preferences of
the informed user. It's good to have all of these approaches.

It would also be good to have a multiplicative formula which really does
what it's supposed to do. For example, consider a team which comes out and
hits 500 consecutive home runs. How many runs would they score? The answer
is 500, of course, but runs created says a whopping 2000 runs. An error of
this magnitude couldn't happen unless there were a glaring flaw in
construction. It wasn't difficult to figure it out.

Runs created, as you know, consists of A times B divided by C, where A is
baserunners, B is advancement, and C is opportunity. Since A is baserunners,
in order to arrive at runs, the remaining B/C would have to be the
proportion of baserunners who score. But B/C is not in the form of a
proportion. Using the same symbols, a proportion would be written as
B/(B+C). Unfortunately, the revision doesn't work using the same A, B, and C
factors as in runs created. I pretty much had to start from scratch. A
further refinement was to take out one run from the equation body for each
home run. It is entirely appropriate to apply this technique to individuals
because the batter scoring on a homer is an individual act.

The end result is called Base Runs. There are four components:

A = H + BB - IBB - HR - CS
B = 1.39TB - 0.58H - 2.8HR + 0.19BB - 0.19 IBB + 1.2SB
C = AB - H
D = HR

The components are put together as follows:

Base Runs = A x B / (B + C) +D

In words, you multiply A times B, then you divide this by the sum of B and
C, and then you add this total to D. In terms of what they represent, A is
baserunners, B is baserunner advancement, C is inning advancement, and D is
guaranteed runs. The formula structure portrays the team's struggle to score
its baserunners before the inning ends. At extreme levels of offensive
productivity, the formula makes the proper adjustments and provides a
reasonable estimate.

Extreme levels are a valuable check, but in order to be useful a run
estimator must be competitively accurate for real major league teams. Jim
Furtado of the BBBA group was kind enough to add Base Runs to his accuracy
testing program. For the period 1970-1998 Base Runs has essentially the same
accuracy as runs created tech-1 and eXtrapolated Runs Basic. Since Base Runs
contains fewer statistical categories than these other formulas, this
performance is quite satisfactory. The question of which categories to
include has many different aspects. After considering them all, I voted the
way I did.

Here are the Base Runs leaders for 1999.

1999 A.L.
1) Jeter-------------145
2) Ramirez---------138
3) Palmeiro--------137
4) Alomar----------135
5) Giambi----------132
6) B. Williams-----130
7) Belle--------------128
8) Green------------128
9) Griffey------------126
10) E. Martinez---124

1999 N.L.
1) C. Jones--------154
2) Bagwell---------152
3) McGwire--------143
4) Abreu------------137
5) Sosa-------------134
6) Giles-------------129
7) L. Gonzalez----125
8) Olerud-----------122
9) Alfonzo----------122
10) Helton----------122

Compared to runs created, these totals are much more similar to those from a
linear formula or a team context method. Bill James wrote that the reason
runs created overestimates runs for the Babe Ruth type hitter is that it
allows his slugging to interact with his own on-base. The truth is that this
factor is only responsible for a minor portion of the discrepancy. The
remaining part is simply the result of faulty construction.

Here are some comparisons for Ruth's 1920 season. Forget, for the moment,
the problem of applicability to the 1920s. Also, I estimated the missing
data as IBB=13, SF=5, GDP=7, and SO=150.

XR = 170
new RC = 177
Base Runs = 182
RC tech-1 = 213

I believe that this is correct---that Ruth added around 177 runs to his
team, and that a team of Ruths would have scored around 182 runs. The reason
that these numbers are so close to each other is that the interaction of his
SLG with his OBA is not a complete positive for Ruth in Base Runs. As OBA
goes up, the relative value of SLG goes down. This is the reason why a home
run is worth only one run to the 500 consecutive homer team, instead of the
typical value of around 1.4 runs.

There is a companion stat to Base Runs called Base Wins. It's essentially an
offensive wins above replacement procedure. I intend to post an explanation
in the future. My email address for any personal feedback is
<crac...@msn.com>


--
* Keith Woolner, Moderator for rec.sport.baseball.analysis *
* Submissions: rs...@stathead.com *
* Questions/Info/Contact: rsba-r...@stathead.com *
* Charter: http://www.stathead.com/rsba-charter.htm *

J. Edward Tuttle

unread,
Jan 1, 2000, 3:00:00 AM1/1/00
to

David Smyth wrote:
>
> BASE RUNS
>
> There are three main categories of run estimation formulas.
>
> 1) Multiplicative formulas: The primary example is the Bill James
> original runs created.
>
> 2) Linear formulas: The most well-known is Palmer's Batting Runs. A
> recent entry is Furtado's eXtrapolated Runs.
>
> 3) Team Context methods: Here we have the new runs created and Tate's
> Marginal Lineup Value.
>
> A fourth category is systems which use play-by-play data to generate
> run estimates which are completely situational. An example is Ruane's
> Value Added.

A fifth category, which you either forgot or never heard of, is my Base
Production System which counts bases gained, subtracts stranded runner
penalties, and divides the result by 4 to estimate runs. It uses
play-by-play data, but can be normalized to remove lineup effects, thus
becoming situation-independent.

0 new messages