Defense Independent Pitching Stats (Intro)
More options Nov 18 1999, 3:00 am
From: vo...@daruma.co.jp (Voros)
Date: 1999/11/18
Subject: Defense Independent Pitching Stats (Intro)

I've been working on a pitching evaluation tool and thought I'd post
it here to get some feedback.

I call it "Defensive Independent Pitching" and what it does is
evaluate a pitcher base strictly on the statistics his defense has no
ability to affect (it uses Home Runs and I guess technically a CF can
stop a HR every now and then but I think that's more or less
insignificant). The stats which play a main role in the calculations
are BFP, HR, BB, SO, HP, IBB. These stats are all "rated" by number of
BFP except for BFP (of course) and IBB (which are rated by total BB).

The process works as follows:

1. The rates of each of the above stats are adjusted for the pitchers
home park using a set of park factors.
2. The rates of each of the above stats are adjusted for the league in
which the pitchers pitch (The AL is the benchmark league so NL
Pitchers are adjusted to AL standards and AL pitchers are not affected
by this step).
3. The above stats are then "reassembled" to once again resemble
counting stats. i.e., the BFP stays the same the new K/BFP is
multiplied to BFP to get new K. Then BB/(BFP-K) is multiplied by the
new (BFP-K) to get the new BB, etc. on down the line. We now have the
above stats calculated for our Defense Independent Pitching Stats.
4. League Average (In this case the AL) rates are then assigned to the
remaining basic pitching stats, H, 2B, 3B, OUTS etc., based on all
BFP's which are not the above defense independent stats. We now have
another set of counting stats.
5. The counting stats are then used to come up with an earned run
total. I used Jim Furtado's Extrapolated Runs and then multiplied a
factor to make them on line with actual Earned Run Totals.
6. At this point we have a line of pitching stats just like a
traditional line of pitching stats. However the new line has now
removed virtually any effect the defense could POSSIBLY have on the
pitchers stats. This point can't be stressed enough. I'm not drawing
the assumption that pitchers can't have any effect on singles,
doubles, triples etc., I'm simply saying that the defense CAN affect
these things and we're often uncertain to what extent these effects
are. A study of those stats I did using Sean Lahman's database showed
that non home run hits from season to season correlated very poorly
for pitchers from year to year (about .2-.25). They correlated worse
than HR (.45-.5) and much much worse than BB and K (.65-.8). I think
this approach is radical enough to provide insight that other methods
don't as most other methods give the pitcher complete credit for his
1B, 2B and 3B totals.
7. The new earned run totals can be used to calculate an ERA but I
also used it to create an earned runs below replacement value which
for now will serve as the overall "rating" for each of the pitchers.
The League Average ERA in the AL last year was a very high 4.86. That
will also serve as "average" for these stats. I used an ERA of 5.98 as
a replacement ERA (after some fiddling. It was originally higher, but
at that point the highest relief pitcher, Keith Foulke was about 53rd
behind some guys that were barely average starters. With the new
replacement level Foulke climbed up into the top 30).

For the following post there will only be three categories:

ERA=The new Defense Independent ERA.
DIPR=The Defense Independent Pitching Rating. A calue equaling defense
independent earned runs over replacement value.
DIFF=The difference between the defense independent earned run total
and the pitchers actual earned run total. A positive rating means the
DI ER is lower than actual earned runs. A negative value indicates the
opposite.

The ratings for every single pitcher in the majors last year will be
posted on the web in a day or so. These numbers will look like
complete Traditional Pitching lines with Wins, Losses, ERA, IP, H, HR,
BB, SO, ER and the above DIPR. They are sorted by team and within
teams are sorted from highest DIPR to lowest. They will be posted at:

Stay tuned for a post on the "Top 20" DIPRs in the majors last year.
--
More options Nov 19 1999, 3:00 am
From: James Fraser <jfra...@planeteer.com>
Date: 1999/11/19
Subject: Re: Defense Independent Pitching Stats (Intro)

Voros wrote:
> I've been working on a pitching evaluation tool and thought I'd post
> it here to get some feedback.

> I call it "Defensive Independent Pitching" and what it does is
> evaluate a pitcher base strictly on the statistics his defense has no
> ability to affect (it uses Home Runs and I guess technically a CF can
> stop a HR every now and then but I think that's more or less
> insignificant). The stats which play a main role in the calculations
> are BFP, HR, BB, SO, HP, IBB. These stats are all "rated" by number of
> BFP except for BFP (of course) and IBB (which are rated by total BB).

Instinctively I want to disagree with the league average being used for
hits.  Some pitchers give up many hits per 9 innings, while others
hardly ever are touched at all.  I understand that although we want to
try to eliminate the effects of fielding on a pitcher's performance,
penalizing pitchers who seldom give up hits is the wrong way to go.

I can think of other ways to accomplish what you have set out to do.
Normalize a pitchers balls in play to the league average Zone Rating, or
Defensive Efficiency Record.  The Zone Rating would still
penalize pitchers who give up sure hits (line drives in the
gaps, etc.), while the DER would normalize every ball in play.
Attempting this would lead to similar results, although the question has
to be asked: Is Pedro Martinez getting hitters out in easier ways (more
routine 6-3 ground outs) than Jaime Navarro (who would give up more line
drives).

I think that its close to impossible to separate Fielding and Defense,
the only way we could do it is if we knew that all balls in play were
created equal.  My guess is that they're not.

I know that you stressed that these are only possible effects, and I
think that yours is a valid attempt to separate the two.  I think that
Bill James' ERC accomplishes something similar, but relies on an even
distribution of extra base hits, using the pitchers actual hits allowed.

James Fraser
jfra...@planeteer.com
More options Nov 20 1999, 3:00 am
From: vo...@daruma.co.jp (Voros)
Date: 1999/11/20
Subject: Re: Defense Independent Pitching Stats (Intro)

On Fri, 19 Nov 1999 16:08:31 GMT, James Fraser <jfra...@planeteer.com>
wrote:

However these are often factors of the pitchers strikeout totals. It's
awfully hard to get a hit off Randy Johnson if you don't hit the ball.
The Defense Independent hit totals are base on a league average rate
of balls in play falling in for hits, but what you must understand is
that there are much fewer balls in play against Randy Johnson than
Scott Karl. Therefore Karl will give up more hits.

>I understand that although we want to
>try to eliminate the effects of fielding on a pitcher's performance,
>penalizing pitchers who seldom give up hits is the wrong way to go.

It's not a penalty. I'm simply ignoring their hit total. I've now made
another lengthy post in this NG explaining why I've chosen to do so.

>I can think of other ways to accomplish what you have set out to do.
>Normalize a pitchers balls in play to the league average Zone Rating, or
>Defensive Efficiency Record.

In a very roundabout way, Defense Independent Pitching stats equal
what Defensive Efficiency Records will give you. Essentially I'm
taking the components of the Defensive Efficiency Record for each
individual pitcher and setting them to equal, as my next post
explains, I have good reasons for doing so.

>The Zone Rating would still
>penalize pitchers who give up sure hits (line drives in the
>gaps, etc.), while the DER would normalize every ball in play.
>Attempting this would lead to similar results, although the question has
>to be asked: Is Pedro Martinez getting hitters out in easier ways (more
>routine 6-3 ground outs) than Jaime Navarro (who would give up more line
>drives).

A ha! You fell for my little trap. :)

I suppose it would surprise you to know that Pedro Martinez gave up on
average MORE hits this year on his balls in play than an AL average
pitcher did. More than did Navarro. The difference of course is that
Martinez set a record for K/9 IP for starting pitchers and that tended
to limit the number of times people put the ball in play against him.
But when they did, the results often were better for the hitter than
normally could be expected.

>I think that its close to impossible to separate Fielding and Defense,
>the only way we could do it is if we knew that all balls in play were
>created equal.  My guess is that they're not.

The emphasis on a particular teams defense isn't really the point. The
emphasis here is on the aspects of a pitcher's statistical record that
are clearly measurements of the pitchers_ability_ as opposed to things
that just happened to be going on while he's out there.

>I know that you stressed that these are only possible effects, and I
>think that yours is a valid attempt to separate the two.  I think that
>Bill James' ERC accomplishes something similar, but relies on an even
>distribution of extra base hits, using the pitchers actual hits allowed.

Thanks for the input, James.

The weakness of ERC is the inclusion of the hit stat, when strikeouts
and home runs can be used to derive a hit stat much closer to the
pitchers actual ability to prevent hits. Hit totals on balls in play
don't correlate from year to year with pitchers, like Home Runs do for
mark McGwire, walks do for Barry Bonds and strikeouts do for Randy
Johnson. In 1999 Milwood had the best figure in the major leagues for
pitchers with 162 IP. In 1998 Milwood had among the worst. In 1998
among the worst. These things don't happen with strikeout and walk
totals and very rarely happen with Home Runs. But they happen all the
time with hit totals on balls in play. I ask you, if this had anything
to do with pitching ability, would this happen?
More options Nov 22 1999, 3:00 am
From: James Fraser <jfra...@planeteer.com>
Date: 1999/11/22
Subject: Re: Defense Independent Pitching Stats (Intro)

Hi,
Voros, your "Explanation Post" went along way to convince me of the
validity of your method.  In fact, I am now almost (very close) sold on
DIPS.  I know that some of these ideas/answers to questions/suggestions
are only feasable on "the magical planet where everyone has access to
STATS database" (maybe we really do need a project scoresheet II, or is
Total Sports releasing the data for this years games?)

-Wouldn't using zone rating take away the problem of pitchers who give
up "sure hit" line drives.  It seems that using all balls in play lends
itself to putting them on equal ground.

-Aren't groundball pitchers given a bit of an advantage.  It seems that
GBs are less likely to turn into outs than flyballs (although without
the risk of turning into HR).  I'm not sure if this is true, but I
remember my baseball coach saying: "Hit all linedrives and bat 1.000,
all ground balls and hit .300, all flies and you'll never get on base."

-The fact that hit prevention is so far out of the pitcher's control is
surprising.  That means that keeping the ball out of play is a lot more
important than we think.  Wouldn't this validate Bill James' Game Scores
(which put heavy weight on K's) over the BBBA QMAX (which rely on Hit
prevention).

-Although I used the wrong example, Kevin Millwood might be getting
hitters out in easier ways than Martinez.  Although assuming that all
chances are created equal might be a better alternative than what we
have now, it seems that a more detailed breakdown of GB/FB and
LineDrives would set the H rate better.

Thanks for the clarification, and keep up the excellent work,
Still surprised that H rate isn't consistent,
James Fraser
jfra...@planeteer.com
More options Nov 23 1999, 3:00 am
From: "FM" <dank...@Dartmouth.EDU>
Date: 1999/11/23
Subject: Re: Defense Independent Pitching Stats (Intro)

Voros <vo...@daruma.co.jp> wrote:
> 5. The counting stats are then used to come up with an earned run
> total. I used Jim Furtado's Extrapolated Runs and then multiplied a
> factor to make them on line with actual Earned Run Totals.

I'm pretty much sold on this after reading the explanation (though
I have troubling believing it) but does the above take account of
the pitcher's "clutch ability"? For example does the above mean

DIPS-Earned-Run-Total = Expected-Run-Total-from-DIPS *
Earned-Run-Total / Expected-Run-Total-from-actual

Dan.

More options Nov 23 1999, 3:00 am
From: vo...@daruma.co.jp (Voros)
Date: 1999/11/23
Subject: Re: Defense Independent Pitching Stats (Intro)

In article <38398f87\$0\$...@nntp1.ba.best.com>, jfra...@planeteer.com says...

>Hi,
>   Voros, your "Explanation Post" went along way to convince me of the
>validity of your method.  In fact, I am now almost (very close) sold on
>DIPS.  I know that some of these ideas/answers to questions/suggestions
>are only feasable on "the magical planet where everyone has access to
>STATS database"

So tragically true. Intellectual property is tricky stuff.

>(maybe we really do need a project scoresheet II, or is
>Total Sports releasing the data for this years games?)

>-Wouldn't using zone rating take away the problem of pitchers who give
>up "sure hit" line drives.  It seems that using all balls in play lends
>itself to putting them on equal ground.

Possibly, but the question again begs itself, are there actually any pitchers
who _consistently_ give up significantly more of these types of hits, and who
are these pitchers? Looking over the 1998 & 1999 seasons, the only pitcher
that jumps out is Shane Reynolds.  Sele and the Rockies pitchers have plenty
of reasons to excuse their rates, but Reynolds' is a bit enigmatic. I went
back a few years in Reynolds career and found a few years not quite as high,
but it looks like it's _possible_ something is going on there.

>-Aren't groundball pitchers given a bit of an advantage.  It seems that
>GBs are less likely to turn into outs than flyballs (although without
>the risk of turning into HR).

The top 10 Ground Ball pitchers in the AL in 1999
Name        ER DPER DIFF
Erickson   123  129   -6
Pettite    100  105   -5
Sele       109   92   17
Nagy       111  104    7
Heredia    107   93   14
Moehler    110   98   12
Suppan     105  113   -8
Mussina     79   80   -1
Mays        83   95  -12
Colon       90   96   -6
TOTAL                 +2

I chose the AL because there was no league adjustments so they should balance
pretty well close to zero difference. 6 of the 10 top GBers were hurt by the
DIP stats, although the 4 helped were helped by a larger amount. These ten
were helped by a total combined amount of 2 runs. However, I don't think
you're wrong here. If the league average totals are adjusted to represent
league average totals for GB, FB, LH, RH, SP and RP we might gain some
accuracy in the DIP numbers. Also to night is that, anecdotally, I believe
pitchers with trick deliveries (e.g. Knuckleballers) might post consistently
lower \$H numbers than other pitchers. I looked at Tim Wakefield's career and
that seems to bear out slightly.

>I'm not sure if this is true, but I remember my baseball coach saying: "Hit
>all linedrives and bat 1.000, all ground balls and hit .300, all flies and
>you'll never get on base."

Then you're stuck asking, "What's a line drive and what's a flyball?" On
balls where there seems to be some question as to that, the usual method is,
"If it falls in, it was a line drive, and if it's caught, it's a flyball."

>-The fact that hit prevention is so far out of the pitcher's control is
>surprising.  That means that keeping the ball out of play is a lot more
>important than we think.  Wouldn't this validate Bill James' Game Scores
>(which put heavy weight on K's) over the BBBA QMAX (which rely on Hit
>prevention).

Only in the sense that it uses strikeouts, but not in the sense that it also
uses hits, runs and earned runs (hits having a significant effect on the
other two). As a quick box score figure I do prefer it to QMAX or SNWL, which
basically ascribe everything that happens while he's on the mound to the
pitcher. Game scores do too, but they also give credit to a stat that is a
huge indicator of _ability_ but is often left out do to it's lack of direct
_value.

>-Although I used the wrong example, Kevin Millwood might be getting
>hitters out in easier ways than Martinez.

Of course that leaves explanation as to why Martinez was significantly better
at it in 1998.

>Although assuming that all chances are created equal might be a better
>alternative than what we have now, it seems that a more detailed breakdown
>of GB/FB and LineDrives would set the H rate better.

But if GB/FB correlates for pitchers from year to year, and \$H correlates
with GB/FB, then \$H should correlate from pitcher to pitcher. But it doesn't.
I believe that GB/FB adjustments to league average figures would make the
system more accurate. I wonder how much more accurate though and would the
improvements make up for the very severe decrease in simplicity. A long term
project would be to do this and see what the differences are.

>Thanks for the clarification, and keep up the excellent work,
>Still surprised that H rate isn't consistent,

I'll leave the topic for just a second here. It's also surprising that the
pattern exists for hitters as well (with some important differences). \$SO
still correlates very highly, but \$BB correlation increases a bit, and \$HR
correlation increases above \$BB. About .80, .72 and .75 respectively. But the
batters version of \$H only correlates between .35 and .40. Now that's much
bigger than the pitchers \$H and does indicate that there is some correlation
going on, but it also suggests that there's a lot of fluctuation in the stat.
A topic for another time, but for fun, look at the yearly (H-HR)/(AB-SO-HR)
figures for Paul O'Neill from his rookie year until now.

Thanks again for the input, James.
