Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Defense Independent Pitching Stats (An Explanation)

92 views
Skip to first unread message

Voros

unread,
Nov 20, 1999, 3:00:00 AM11/20/99
to

The responses I've gotten thus far are pretty much what I expected.
The idea of NOT using a pitchers hit totals to evaluate his
performance is not well supported in any community including the
sabermetric one. But what I'm going to detail here is why I think it's
important to do this. I'll warn you that this will be relatively
lengthy and involve some statistics, but if you're at all interested
in statistical evaluations of pitchers, I think it's important to
understand some things about the various pitching stats.

I'll start off by defining some rate stats. What I'm going to do is a
comparison of pitching statistics from 1998 to 1999 for the group of
pitchers who pitched 162+ innings in both seasons (there were 60 such
pitchers). The only statistics used will be IP, H, HR, BB and SO. The
comparison will be done using four rate stats:

$BB=BB/((IP*3)+H+BB); This rate stat is essentially measuring how
often the pitcher walked the guy during the season per number of
batters he faced. Obviously the second half doesn't equate accurately
with the pitchers actual BFP, but I'm sure it equates pretty precisely
with it, i.e. the difference between BFP and that part of the equation
is most likely pretty constant in the long run from pitcher to
pitcher.

$SO=SO/((IP*3)+H); This stat is designed to measure how often he
struck guys out relative to how often he gave up fair batted balls. A
measure of how difficult it is to make contact in other words.

$HR=HR/((IP*3)+H-SO); This stat measures how often a batted fair ball
left the park against the pitcher.

$(H-HR)/((IP*3)+H-SO-HR); This stat measures how often a batted ball
in the field of play not leaving the park falls in for a hit. This
stat is the central focus of the discussion and it's behavior is the
basis for my leaving hit totals out of the evaluation method.

(these are difficult stats to write out every time, so to understand
the following you might need to refer back to what the $BB, $SO, $HR
and $H abbreviations mean).

I've debated the order to present the information, and I've decided to
give you the strict statistical analysis of the statistics for the
1998-1999 pitchers. What I did was compare each rate stats for each
pitcher in 1998 with what that same pitcher did in 1999. For example,
Andy Benes $BB in 1998 was .075 and his $BB in 1999 was .092 (note
these numbers will not be park or league adjusted for simplicity's
sake). These will be part of a correlation string where it is
calculated how well the pitchers 1998 Stats correlate with their 1999
stats (consistency so to speak). The correlations when each stat is
compared to it's counterpart (i.e. 1998 $BB to 1999 $BB and so on) are
as follows:

$BB:.681
$SO:.792
$HR:.505
$H: .153

If you know about statistics and you know about baseball, this should
have your attention. In three of the stats, the correlation ranges
from ok ($HR) to very good ($SO). In the other stat ($H) there really
is a very low level of correlation. Considering many of these pitchers
had the same defenses and pitched in the same parks, one could argue
that any correlation there is might be due to those factors as much as
anything.

What does this mean? Essentially it means that if a pitcher posts a
very low $H rate one year, you really can't expect him to repeat that
with any level of certainty at all. However if a player posts a very
high $SO rate, there is a level of comfort in thinking he'll have a
good one the following year as well. To leave the realm of statistics
and show examples of what I'm talking about, I'll provide the
following:

10 Lowest $H in 1998 and 1999 (of the 60 pitchers)
(In order)
1998: Hideki Irabu, Pete Harnisch, Woody Williams, Kenny Rogers, Greg
Maddux*, David Wells, Dustin Hermanson, Brian Moehler, Al Leiter, Tom
Glavine.
1999: Kevin Millwood*, Omar Daal, Masato Yoshii, Curt Schilling, Pete
Harnisch, Bartolo Colon, David Cone, Rick Helling, Eric Milton, Kevin
Brown.

You may have noticed the asterisks next to Maddux's and Milwood's
names on these list. I want you to remember that they were on these
lists.

Notice that only one pitcher, Pete Harnisch, made the top 10 both
years. If this stat really had anything to do with the pitchers actual
ABILITY, one would expect a few more guys to be on the list both
years.

10 Highest $H in 1998 and 1999
(in order)
1998: Aaron Sele, Shane Reynolds, Brian Meadows, Scott Erickson, Pedro
Astacio, Randy Johnson, Mike Sirotka, Kevin Millwood*, Brad Radke and
Darryl Kile.

Well look. In 1999 Kevin Millwood had the lowest $H rate of any
pitcher in the majors while in 1998 he had the 8th highest. If $H
reflected pitching abilities, would that make sense? Does Mark mcGwire
ever finish among the lowest in the league in HR%? Does Rey Ordonez
ever finish among the highest? And in Back to Back years !?!? We'll
move on.

1999: Aaron Sele, LaTroy Hawkins, Jon Lieber, Greg Maddux*, Pedro
Martinez, Shane Reynolds, Pedro Astacio, Steve Woodard, Livan
Hernandez and Charles Nagy.

And there's Millwood's teammate, Maddux, pulling the opposite trick.
After posting the 5th lowest in 1998, Maddux then proceeded to post
the 4th highest. In other words there was one instance of a pitcher
making the top 10 both years, and two instances of pitchers making the
top 10 one year and the bottom 10 the other. There is an increase in
players being on both lists here though as Aaron Sele (who I'll get to
later on), Shane Reynolds and Pedro Astacio (Coors effect mostly I
assume. Kile the only other Rockies pitcher who qualified made the
1998 list too). However I bet the name Pedro Martinez jumped out at
you for 1999 no less! Pedro was rightfully considered unhittable this
year, but in fact when they did hit Pedro, a large number of balls
went for hits this year. Did you expect that?

In the other categories things make more sense. In $BB, 7 pitchers
were on the lists for lowest $BB both years. 5 Pitchers were on the
lists for highest $BB both years. No pitchers were on one list one
year and the other list the other.

For $SO, 5 pitchers were on the lists for highest $SO both years. 6
pitchers were on the lists for lowest $SO both years. Again, no
pitcher was on one list one year and the other list the other.

For $HR, 4 pitchers were on the lists for lowest $HR both years. 4
pitchers were on the list for highest $HR both years. There was one
pitcher who was among the lowest in 1998 (10th lowest) and the highest
in 1999 (5th highest). It was the enigmatic Chan Ho Park.

Again, this shows us that for at least 2 of the stats and to a certain
extent the third, the guys that do well one year in a stat tend to do
well the next year too. In the $H stat though, such a conclusion is
not possible.

This borders on heresy, really. We talk about no hitters and we
denigrate Nolan Ryan because all he did was strike people out. But I
can't look at this and think that a pitchers Hit total is more
important than his strikeout total. I just can't do it. I've done some
cursory looks at historical patterns regarding these numbers back to
1946, and the same correlations keep popping up. High and very high
for $BB and $SO respectively. Mid range for $HR and low for $H. I
don't have exact figures because I haven't done the exact full work
for it yet, but when I do, I'll pass them along to whoever is
interested.

The Defense Independent Pitching stats work I did, was based on this
discussion. More accurate forms of $BB, $SO and $HR were used to
compile the numbers, but I simply didn't see a good reason to bring $H
into this. Yes the hits given up were costly and can lead to runs, but
I've yet to see much information that suggests the pitcher had a whole
lot to giving up the hits or preventing the hits, OTHER THAN
PREVENTING BATTERS FROM HITTING THE BASEBALL. Getting hits of Johnson
or Martinez is tough, not because their pitches are tough to center
(remember both pitchers made highest $H lists) but because they strike
you out so often.

It troubles me that more and more sabermetrics is leaning towards
increased attempts to measure_value_(I'm about to sound like Maynard
here) and not enough is done to measure_ability_. Yes balls falling in
for hits are costly, but what does that have to do with Steve Woodard?
I think we need to understand the difference between the things the
pitcher is actually causing to happen and the things that just happen
to occur while he's out there on the mound.

Ask yourself this question. Who do you think has the lowest $H rate
combined over the last 5 years with a minimum of say 800 innings? I
don't know the answer to this as I haven't looked it up. But I will
venture that a lot of people would give a lot of different answers and
a whole bunch would be wrong. I couldn't begin to tell you who it
might be. John Smoltz? Tim Wakefield? Omar Olivares? Who? What type of
pitcher would you equate witha low $H rate? Check out these
correlations:

Correlations to 1999 $H rate
1998
Rate Correl.
$BB -.116
$SO -.131
$HR .084

A negative correlation means an inverse relationship, i.e. the higher
the $SO rate the lower the $H rate. Again there's not a whole lot of
correlating going on here, but notcie that $SO correlates with the
following year's $H almost as well as $H does. Which is to say, not
very much.

The implications of this can be applied elsewhere. We've been
struggling for many years now to make sense of minor league pitching
statistics. If hit totals are unreliable measurements for major league
stats, one certainly can conclude that you can throw them straight out
the window for minor leaguers. The evaluation of pitchers like Aaron
Sele, who:

1. Plays in a hitters park.
2. Has an atrocious defense behind him.
3. Pitches in a DH league

A reevaluation of pitchers like Sele may be in order (note his
finishing 6th in my Defense Independent Rating) due to the inflation
of his hit totals which probably have little or nothing to do with his
pitching abilities.

I consider pitchers hit totals to be the RBI of pitching statistics.
On the surface they correlate, and they in a sense are measurements of
value, but they have very little to do with the player's abilities and
everything to do with the circumstances the player plays in. Bill
James in the 1987 Baseball Abstract assessed a statistic's
"reliability" as "the extent to which the statistic truly reflects the
ability." In this case I'm left with no other conclusion than to say
that a pitchers hit totals are terribly unreliable ways to evaluate
pitchers when we have strikeout data we can use instead. There doesn't
seem to be any innate ability for pitchers to lower their hit totals
other than striking people out and keeping the ball in the park.

Which brings us back to Defense Independent Pitching Stats. Defense
Independent Pitching uses only the statistics that have been shown to
have some factor of reliability from year to year for the pitcher. I
would expect DIP stats to vary much less than the unadjusted stats
from year to year for each individual pitcher as things that don't
accurately reflect his abilities have been removed from the equation.
What is important though, is that DIP stats still show very wide
ranges in quality form pitcher to pitcher (ranges almost as wide as
the actual numbers). Pedro Martinez is WAY WAY WAY WAY WAY ahead of
Jaime Navarro. As such, it isn't just assigning a 4.25 ERA to
everybody and heading on your merry way. We are still evaluating
pitching quality. But now we're rewarding the pitcher for his
abilities and not his luck.
--
* Keith Woolner, Moderator for rec.sport.baseball.analysis *
* Submissions: rs...@stathead.com *
* Questions/Info/Contact: rsba-r...@stathead.com *
* Charter: http://www.stathead.com/rsba-charter.htm *

0 new messages