Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Clutch Hitting

4 views
Skip to first unread message

Randy Palermo

unread,
Feb 14, 1992, 6:16:32 PM2/14/92
to
I know this subject was widely discussed some time ago. Unfortunately, I
did not have net access at the time. I have one question. Why is BA with
men in scoring position, men in scoring position and 2 outs and men in
scoring position late in the game not a good measure of clutch hitting?
I can understand why BA is not always a good measure but with this kind
of granularity wouldn't it be more telling? I guess that's 2 questions.

luigi
--
Randy Palermo lu...@csd.sgi.com Silicon Graphics Computer Systems
2011 N. Shoreline Blvd Mt. View, CA 94039
Fax: (415)961-6502
I use not only all the brains I have, but all I can borrow. --Woodrow Wilson

Valentine

unread,
Feb 14, 1992, 8:13:37 PM2/14/92
to
In article <1992Feb14....@odin.corp.sgi.com> lu...@bullpen.csd.sgi.com (Randy Palermo) writes:
>I know this subject was widely discussed some time ago. Unfortunately, I
>did not have net access at the time. I have one question. Why is BA with
>men in scoring position, men in scoring position and 2 outs and men in
>scoring position late in the game not a good measure of clutch hitting?

It should be a good measure of clutch hitting. However, batters show
no consistency in it from one year to the next. Take 10 players who
hit well with men in scoring position, and five of them will be below
average in that category the next year. And so forth.... Not the
kind of thing you want to bet money on.

Conclusion: Either clutch hitting doesn't exist, or BA with RISP is
not a good measure of clutch hitting.

(The full study with data, and validations, and everything is much
longer than this. But this accurately sums up the main points.)
--
Cheers,
-Valentine

"Definitely insane when the Red Sox are involved." ted...@cs.cornell.edu

Bob Gajarsky

unread,
Feb 14, 1992, 10:09:25 PM2/14/92
to
without getting in to the whole clutch hitting discussion....

i'd say that because THOSE situations aren't enough samples for
a true study, to prove whether someone is clutch ornot.

by the way, elias did a study a fewyears ago to "prove" that clutch hitting
existed. only problem was, to look at the stats they presented, you'd
have a better case of it NOT existing (according to their criteria).
one year harold baines was in the top 10 of batting average , the
next year he was in the bottom 10. and that shouldn't happen,
from year to year...

- bob gaj

Gordon Niguma

unread,
Feb 15, 1992, 1:14:32 AM2/15/92
to

>I know this subject was widely discussed some time ago. Unfortunately, I
>did not have net access at the time. I have one question. Why is BA with
>men in scoring position, men in scoring position and 2 outs and men in
>scoring position late in the game not a good measure of clutch hitting?
>I can understand why BA is not always a good measure but with this kind
>of granularity wouldn't it be more telling? I guess that's 2 questions.

OK, before I start I want to say I don't believe that certain players perform
better in clutch than not in clutch. (You get the idea).
Anyways, I think the MOST important stat to define "clutch" is OBP and SLG in
the late innings (7+) of games within 2 or 3 runs as difference between two
teams.
The main problem with "clutch" stats is what to define as "clutch"? I don't
think driving in a run when the score is 10 to one is really all that clutch
to anyone but this may count as a "clutch" situation by two of the accounts
you listed (ScP 2 outs, ScPos "n" outs). But is driving in a runner from third
with two outs in a 1-1 game in the fifth inning a "clutch" hit? Is hitting a
sac fly with one out to drive in the winning run in the ninth a "clutch"
performance? (Here it's a game winning RBI but it doesn't help any of his other
"clutch" stats). Is drawing a leadoff walk in the ninth inning of a tie
ballgame a "clutch" performance? (Definitely; but it doesn't help alot of his
"clutch" stats). So one has to properly def "clutch" and this is quite
difficult to do and get everyone to agree upon it. To me, I think batting with
Runners In Scoring Position is a stat that should be junked as far as "clutch"
stats go because clearly some R.B.I.'s are more important in close games than
lopsided.
Now I must say that we CANNOT use BA as an adequate indicator of clutch
performance. You talk about BA with RISP in late innings of close games; this
fails the test because you are only rewarding the guy who drives in the run;
don't forget than a guy had to get on base FOR the guy to drive home the run.
It's like giving an RBI without awarding a R; it's not fair for the high OBP
guy who is unrewarded for getting on base, while the RBI guy gets all the
kudos.
My opinion is that "clutch" should be defined as late innings of a close game
(say 7th inning +, and game within 2 runs either way). (By the way this "clutch"
definition is just my opinion I'm sure there is plenty others). Now to measure
performance late in the game I think we should measure performance AS THE SAME
WE DO FOR ANY SITUATION. Which is probably something like OPS or just OBP and
SLG. It seems absolutely logical that the gain of a single in the first is just
as valuable (relatively speaking) as one in the 7th inning of a close ballgame.
That is, a single's importance in creating runs is just as important as it's
importance in creating runs in the 7th. OR IS IT????

Here's where things get messy. When trying to play for one run strategies one
values the importance of OBP ahead of slugging because if a team is down by say
one run THE MOST IMPORTANT THING A GUY CAN DO IS GET ON BASE. Slugging is great
but it's more valuable in the early innings when YOU ARE TRYING TO MAXIMIZE
YOUR NUMBER OF RUNS. If you're in a tie ballgame, you don't send Pete Incaviglia
to go hit a HR unless there's two outs. Let's say you had men on 1st and 2nd
with two outs in the late innings of a game within 1 or 2 runs. You don't want
Incaviglia's high chance of making an out up at the plate because his HR power
ISN'T THAT IMPORTANT RIGHT NOW. You'd rather want a guy with line drive power
who can get on base (maybe Javier Ortiz or something). The key thing to remember
is that slugging is more useful to create innings of multiple runs and OBP
more important in scoring more often but not as many runs. I don't pretend to
know that this is ABSOLUTELY TRUE but it seems like a decent guess. Opinions?

I'm torn between whether OBP increases it's value in the late innings of a
close game or whether it stays the same or maybe even if it decreases (I doubt
this). I do like having good OBP guys as pinch hitters rather than big sluggers
though, and I guess this implies somewhat that OBP does increase it's value
later in close games. It's late now, so it's hard to make judgements on my
logic right now.....
Any opinions???

Gord Niguma
(fav type PH: Greg Gross, Mulliniks type)

Greg Sarcasm Is A Way Of Life Spira

unread,
Feb 15, 1992, 1:03:23 AM2/15/92
to

>I know this subject was widely discussed some time ago. Unfortunately, I
>did not have net access at the time. I have one question. Why is BA with
>men in scoring position, men in scoring position and 2 outs and men in
>scoring position late in the game not a good measure of clutch hitting?
>I can understand why BA is not always a good measure but with this kind
>of granularity wouldn't it be more telling? I guess that's 2 questions.

I'm sure someone will answer this one better than I will, but here goes
anyway.

If you ignore the problems of batting average, then yes, the statistics
you mention are good measurements of clutch hitting. But they don't tell
you anything about who's a clutch hitter (if there is such a thing). There
is no real correlation from year to year as to who hits 30 points above
his average with RISP and who hits 30 points below his average in LIPS.
Even if you find some one who has hit better than his average with RISP than
w/o for the last five years, there is only a 50/50 chance that he will do so
again next year. Yes, there are a few players who have, in the past, usually
done better or worse in so-called "pressure situations," but only as many as
you would expect if the "clutch effect" was completely random in its
distribution.

Sabermetricians have studied almost every possible definition of clutch, and
they have found this to be the case. Some studies have assigned different
levels of "clutch" to each event, other studies have assumed that at-bats are
either "clutch" or "non-clutch." The only "study" to ever conclude that
clutch hitters exist was done by Elias, and their "proof" is as flimsy as
the rest of their statistical work (and frankly, their motivation may not
be all that pure - they sell books partly on the basis of the importance of
their clutch stats) and has been easily discredited by several members of this
board.

Greg
--
sp...@panix.com "The one-O delivery to Fisk. He swings. Long drive,
cmcl2!panix!spira left field! If it stays fair, it's gone! Home Run!"
158-17 Riverside Dr. Ned Martin, 10/22/75
Whitestone NY 11357 (Insert your favorite baseball moment here)

Sherri Nichols

unread,
Feb 14, 1992, 7:40:08 PM2/14/92
to
In article <1992Feb14....@odin.corp.sgi.com> lu...@bullpen.csd.sgi.com (Randy Palermo) writes:
>I know this subject was widely discussed some time ago. Unfortunately, I
>did not have net access at the time. I have one question. Why is BA with
>men in scoring position, men in scoring position and 2 outs and men in
>scoring position late in the game not a good measure of clutch hitting?

A couple of things. One, it ignores the fact that you can't get a hit if
the pitcher won't pitch to you. Why would I put anything near the strike
zone for Barry Bonds to hit if I've got a base open in a tight situation,
and Steve Buechele following him? I know I'm not going to get Bonds to
chase a bad pitch, so it's going to be hard for me to get him out. I know
Steve Buechele is more likely to chase a bad pitch. Sure, sometimes I'll
get beat, but I'd rather take my chances with Steve Buechele beating me
than Barry Bonds.

Second, like all other measures of clutch hitting we've found so far, it
doesn't correlate well from year to year. That a hitter hit well with RISP
last year tells us nothing about whether he's likely to do so next year.

Sherri Nichols
snic...@adobe.com


Paul Benjamin

unread,
Feb 18, 1992, 12:14:02 PM2/18/92
to
In article <1992Feb15.0...@adobe.com> snic...@adobe.com (Sherri Nichols) writes:
>In article <1992Feb14....@odin.corp.sgi.com> lu...@bullpen.csd.sgi.com (Randy Palermo) writes:
>>I know this subject was widely discussed some time ago. Unfortunately, I
>>did not have net access at the time. I have one question. Why is BA with
>>men in scoring position, men in scoring position and 2 outs and men in
>>scoring position late in the game not a good measure of clutch hitting?

>Second, like all other measures of clutch hitting we've found so far, it


>doesn't correlate well from year to year. That a hitter hit well with RISP
>last year tells us nothing about whether he's likely to do so next year.

As I've posted before, correlation per player from year to year is
a meaningless measure of this stat. There are too few RISP instances,
about 30-40 per player per year. This tells us about as much as a
player's BA after the first 30-40 atbats of the season. Any stat based
on so few instances will vary considerably from year to year, so that
clutch hitting, if it exists, cannot be found by examining yearly stats
for each player. Using a different definition of clutch that includes a
lot more situations might help, but that seems to contradict the
meaning of clutch.

Paul Benjamin

Roger Lustig

unread,
Feb 18, 1992, 1:38:25 PM2/18/92
to
In article <1992Feb18.1...@sjuphil.uucp> pben...@sjuphil.uucp (Paul Benjamin) writes:
>In article <1992Feb15.0...@adobe.com> snic...@adobe.com (Sherri Nichols) writes:
>>In article <1992Feb14....@odin.corp.sgi.com> lu...@bullpen.csd.sgi.com (Randy Palermo) writes:
>>>I know this subject was widely discussed some time ago. Unfortunately, I
>>>did not have net access at the time. I have one question. Why is BA with
>>>men in scoring position, men in scoring position and 2 outs and men in
>>>scoring position late in the game not a good measure of clutch hitting?
>
>>Second, like all other measures of clutch hitting we've found so far, it
>>doesn't correlate well from year to year. That a hitter hit well with RISP
>>last year tells us nothing about whether he's likely to do so next year.
>
>As I've posted before, correlation per player from year to year is
>a meaningless measure of this stat. There are too few RISP instances,
>about 30-40 per player per year.

Paul, are we singing from the same choirbook? RISP means Runners in
Scoring Position, and in 1990 Cecil Fielder came up almost 200 times
with RISP. RISP represents almost exactly 25% of overall AB.

>This tells us about as much as a
>player's BA after the first 30-40 atbats of the season. Any stat based
>on so few instances will vary considerably from year to year, so that
>clutch hitting, if it exists, cannot be found by examining yearly stats
>for each player. Using a different definition of clutch that includes a
>lot more situations might help, but that seems to contradict the
>meaning of clutch.

a) see above.

b) What we tend to do is measure the *difference* between RISP
performance and non-RISP. Surely you're not going to tell us that a
2-sample t-test with n1=150 and n2=450 can't yield a significant
result!

c) We don't need to use individual players, either. All we need to do
is establish a class of players whose RISP BA (or stat of choice) is
*significantly* higher than their non-RISP BA/whatever. That's not
impossible to do, given the number of PA in a year. We take those
players, *each* of whom was significantly RISP-positive, if you wish,
and compare them a year later. The *group* RISP-differential can be
compared to that of the previous yea, and plenty of significance testing
can be done.

d) Sherri and I are working on a study that may further irritate the
"clutch" folk, based on an aspect of LIPS (Late inning Pressure
Situations) that hadn't previously occurred to most people.

e) If you *were* thinking of LIPS and not RISP, then it's 14-15% of all
PA. And you can still do studies based on more than one person.

f) If you have data from enough years, LIPS, RISP, and LIPS*RISP data
can *all* be used for a given player; eventually you *do* have enough
observations.

g) It's also possible to trace the differences over the years. Simple +
and - plots, taken for enough players, will provide an idea of whether
there are more players with excess + (or -) than chance would dictate.

Roger

PS: One study I'm working on involves variances of offense measures.
You mentioned some time ago that you'd tested these; do you have your
data, or at least a description of what you did? I'd be grateful. --R


David Grabiner

unread,
Feb 18, 1992, 6:02:13 PM2/18/92
to
In article <1992Feb18.1...@sjuphil.uucp>, Paul Benjamin writes:

> In article <1992Feb15.0...@adobe.com> snic...@adobe.com (Sherri Nichols) writes:
>>In article <1992Feb14....@odin.corp.sgi.com> lu...@bullpen.csd.sgi.com (Randy Palermo) writes:
>>>I know this subject was widely discussed some time ago. Unfortunately, I
>>>did not have net access at the time. I have one question. Why is BA with
>>>men in scoring position, men in scoring position and 2 outs and men in
>>>scoring position late in the game not a good measure of clutch hitting?

>>Second, like all other measures of clutch hitting we've found so far, it
>>doesn't correlate well from year to year. That a hitter hit well with RISP
>>last year tells us nothing about whether he's likely to do so next year.

> As I've posted before, correlation per player from year to year is
> a meaningless measure of this stat. There are too few RISP instances,
> about 30-40 per player per year.

Actually, 1/4 of all at-bats are with runners in scoring position; a
regular thus gets 150 such at-bats, but only 20 with runners in scoring
position in the late innings of close games.

> This tells us about as much as a player's BA after the first 30-40
> atbats of the season. Any stat based on so few instances will vary
> considerably from year to year, so that clutch hitting, if it exists,
> cannot be found by examining yearly stats for each player.

Correcting 30-40 to 150, that still isn't enough to be statistically
significant for one player; the standard deviation is over 40 points.
But you can look at hitters as a group; if there is a real effect, it
should be significant when you look at a large sample.

I looked at the players who had gained or lost the most with runners in
scoring position over a period of time, and checked their performance in
1989-1991.

For RISP clutch hitters, I had 24 full seasons and 10 half seasons, for
a total of about 4050 AB. The batting averages increased by .012 with
RISP.

An average clutch hitter has no change in batting average with RISP.

For RISP chokers, I had 31 full seasons and 8 half seasons, for a total
of about 4900 AB. The batting averages decreased by .001.

The difference of .013 between good and bad clutch hitters isn't
statistically significant; the standard deviation of the difference is
about .010. And even if it is real, the value of this difference is
less than two hits a year.

A study of batting in late-inning pressurte situations gave similar
results.

So you can say, "Yes, I know that X hit just .258, but he's a great
clutch hitter. That means that he's a more valuable hitter than that
choker Y who hit .261 with the same power last year." This is why I
don't care about clutch hitting; if the ability does exist, it isn't
important enough to worry about.

(Yes, I know that the study shouldn't be based on batting average, but
that's what Elias gave me to work with.)

--
David Grabiner, grab...@zariski.harvard.edu
"We are sorry, but the number you have dialed is imaginary."
"Please rotate your phone 90 degrees and try again."
Disclaimer: I speak for no one and no one speaks for me.

Greg Spira

unread,
Feb 18, 1992, 5:30:42 PM2/18/92
to
In article <1992Feb18.1...@sjuphil.uucp> pben...@sjuphil.uucp (Paul Benjamin) writes:
The biggest problem with your argument is that each player has a lot more than
30-40 RISP instances each year - the average regular has over 100 RISP
opportunities, and it is possible to get over 190 RISP at-bats in a year -
just ask Joe Carter.

Now, I'm not an expert statistician, so I can't say what the statistical
significance of the results of 150 at-bats are - but I do know that there
is SOME significance.

John H. Rickert

unread,
Feb 19, 1992, 10:58:23 AM2/19/92
to
In article <1992Feb18.2...@panix.com> sp...@panix.com (Greg Spira)
writes:

> 30-40 RISP instances each year - the average regular has over 100 RISP
> opportunities, and it is possible to get over 190 RISP at-bats in a year -
> just ask Joe Carter.

Shucks. I left the list at home.
Yesterday I went throught the '92 Great American Stat Book (GASB)
and totaled AB+BB (I'll call it ABBB) in
RSIP(runners in scoring position) for the regulars.
I also looked at bases empty (R0) and runner on 1b onlt (R1B) totals.
(To approximate RBI opportunities).
per 600ABBB, the AL average was about 155 RISP ABBB
(I'm going into initial shock (IS) soon).
The leaders (to the best of my recollection were)

212 Joe Carter (surprise :-/ )
209 Cecil Fielder
204 Ruben Sierra
200 Ken Griffey (well which one do you *think* it is?)
199 Ron Gant
196 Fred McGriff (finally got away from George Bell)
194 Barry Bonds
194 Juan Gonzalez
194 Pete O'Brien (Oh my.)
193 Tim Wallach (Wow.)
192 Chili Davis
192 Frank Thomas

Any corrections needed I'll post in the future, but this looks right.

john rickert
ric...@sabr.nextwork.rose-hulman.edu

Gerald R Hobbs

unread,
Feb 19, 1992, 2:11:35 PM2/19/92
to
[David Grabiner]

I looked at the players who had gained or lost the most with runners in
scoring position over a period of time, and checked their performance in
1989-1991.

For RISP clutch hitters, I had 24 full seasons and 10 half seasons, for
a total of about 4050 AB. The batting averages increased by .012 with
RISP.

An average clutch hitter has no change in batting average with RISP.

For RISP chokers, I had 31 full seasons and 8 half seasons, for a total
of about 4900 AB. The batting averages decreased by .001.

The difference of .013 between good and bad clutch hitters isn't
statistically significant; the standard deviation of the difference is
about .010. And even if it is real, the value of this difference is
less than two hits a year.

A study of batting in late-inning pressurte situations gave similar
results.

So you can say, "Yes, I know that X hit just .258, but he's a great
clutch hitter. That means that he's a more valuable hitter than that
choker Y who hit .261 with the same power last year." This is why I
don't care about clutch hitting; if the ability does exist, it isn't
important enough to worry about.

(Yes, I know that the study shouldn't be based on batting average, but
that's what Elias gave me to work with.)

[Me]
NIce work! So maybe the best way to present this is to present a
one sided confidence interval; as in "I'm 95 % sure that the difference
in clutch average for the 2 groups is less than .0??. It should help to
dispell the notion tthat he clutch hitters are MUCH better in RISP/LIPS
circumstances. I guess the downside is that the .0?? will be about .030
and someone with a typewriter and a press pass will say you showed
difference was 30 points!

Gerry


Paul Benjamin

unread,
Feb 19, 1992, 11:51:04 AM2/19/92
to

Please read my incredibly long posting in response to Roger. I'd
appreciate your response to the two main points I made: that
using more players doesn't necessarily help, and that clutch
performance needs to be compared to clutch expectation, not non-clutch
performance.

Paul Benjamin

P.S. The emailer and news software here just doesn't like me. I keep
trying to respond to your email, but it just comes back. I'll keep
on trying.

Paul Benjamin

unread,
Feb 19, 1992, 11:47:37 AM2/19/92
to
In article <1992Feb18.1...@Princeton.EDU> ro...@phoenix.Princeton.EDU (Roger Lustig) writes:
>In article <1992Feb18.1...@sjuphil.uucp> pben...@sjuphil.uucp (Paul Benjamin) writes:
>>In article <1992Feb15.0...@adobe.com> snic...@adobe.com (Sherri Nichols) writes:
>>>In article <1992Feb14....@odin.corp.sgi.com> lu...@bullpen.csd.sgi.com (Randy Palermo) writes:
>>>>I know this subject was widely discussed some time ago. Unfortunately, I
>>>>did not have net access at the time. I have one question. Why is BA with
>>>>men in scoring position, men in scoring position and 2 outs and men in
>>>>scoring position late in the game not a good measure of clutch hitting?

>>>Second, like all other measures of clutch hitting we've found so far, it
>>>doesn't correlate well from year to year. That a hitter hit well with RISP
>>>last year tells us nothing about whether he's likely to do so next year.

>>As I've posted before, correlation per player from year to year is
>>a meaningless measure of this stat. There are too few RISP instances,
>>about 30-40 per player per year.

>Paul, are we singing from the same choirbook? RISP means Runners in
>Scoring Position, and in 1990 Cecil Fielder came up almost 200 times
>with RISP. RISP represents almost exactly 25% of overall AB.

No, apparently we're singing entirely different tunes. I was referring
to RISP in late innings of close games, meaning LISP, oops, LIPS.
Sorry for the acronym confusion.

>b) What we tend to do is measure the *difference* between RISP
>performance and non-RISP. Surely you're not going to tell us that a
>2-sample t-test with n1=150 and n2=450 can't yield a significant
>result!

Two things: 1) see above for n
2) what we need to do is measure the difference between
RISP and the expected production in RISP. This is not
at all the same as non-RISP production. To compare RISP
with non-RISP assumes that all other relevant factors
are constant. This is likely not so: the population of
pitchers in late inning close situations is definitely
different than in non-RISP situations, as people like
Eckersley, Williams, et al. appear primarily in RISP.
Also include things like defensive replacements, pinch
hitters/runners, etc. some of which may be insignificant.

>c) We don't need to use individual players, either. All we need to do
>is establish a class of players whose RISP BA (or stat of choice) is
>*significantly* higher than their non-RISP BA/whatever. That's not
>impossible to do, given the number of PA in a year. We take those
>players, *each* of whom was significantly RISP-positive, if you wish,
>and compare them a year later. The *group* RISP-differential can be
>compared to that of the previous yea, and plenty of significance testing
>can be done.

Again, compare it with RISP expectation, not non-RISP. But even then,
this is questionable. Consider the following standard case of a
stochastic process (excuse the length of this posting, too!):

Consider Brownian motion, one of the classical stochastic processes.
If we dump some small particles of sand in a glass of water, we might
like to see if they will eventually fall to the bottom. Due to Brownian
motion, the impacts of the water molecules will keep the smallest
particles suspended, and will slow the motion of the slightly larger
particles towards the bottom. So the question that is analogous to
clutch is: is the effect of gravity strong enough to eventually pull
the particles to the bottom? Let us assume that the particles are
large enough to eventually move down, but small enough that they do
so very slowly (analogy: clutch is not too strong an effect.)

If we examine a slice of time large enough to see only a few dozen
impacts of the water molecules, we cannot know the answer to this
question. Each particle will move up about as much as it moves down,
even if it is heavy enough to eventually fall. We need to examine a
much longer period of time.

Now, the players are the particles, and gravity is clutch hitting.
The point of all this is that considering data from more players does
not help at all. It's exactly the same as using a larger glass of water.
It wouldn't matter if the glass were as large as the Pacific - the
small time-frame keeps us from knowing whether the particles are heavy
enough. Looking at more particles doesn't give us any more information.

And examining a subset of the particles, e.g., the ones that are closer
to the bottom at a particular moment, need not tell us anything, because
they may be closer to the bottom because they are heavier (better clutch
hitters) or just because of the random currents in the water (natural
variation of hitting production). So detecting clutch hitting could be
*extremely* difficult.

>d) Sherri and I are working on a study that may further irritate the
>"clutch" folk, based on an aspect of LIPS (Late inning Pressure
>Situations) that hadn't previously occurred to most people.

Interesting, but please don't include me in clutch folk (just in case
you were).

>e) If you *were* thinking of LIPS and not RISP, then it's 14-15% of all
>PA. And you can still do studies based on more than one person.

Yes, but not for the purposes of classifying individuals.

>f) If you have data from enough years, LIPS, RISP, and LIPS*RISP data
>can *all* be used for a given player; eventually you *do* have enough
>observations.

Yes, but this doesn't help classify clutch hitters until their careers
are at least well along, or maybe over. For example, classifying Bonds
as a postseason clutch choker (as a number of clutch people have
claimed) based on his performance so far is unjustifiable.

>PS: One study I'm working on involves variances of offense measures.
>You mentioned some time ago that you'd tested these; do you have your
>data, or at least a description of what you did? I'd be grateful. --R

Ah, email! I sent you those things more than a year ago. Well, I
will have to try to dig those out again.

Paul Benjamin

Stephen Kingwin Wong

unread,
Feb 20, 1992, 12:44:22 PM2/20/92
to

I'm a Yankees fan up here in Troy, NY. I don't get to hear very much about
baseball in New York City, so I was wondering if you all out there can answer
my questions. First, I want to know if the Yanks have made any deals after
the Steve Sax/Perez trade. Second, does anyone know if Steve Howe is reporting
for training camp? I know he's been in a lot of trouble lately, but I'm not
sure if Fay Vincent has reviewed the case.

Randy Palermo

unread,
Feb 20, 1992, 2:44:09 PM2/20/92
to
In article <1992Feb18.1...@sjuphil.uucp> pben...@sjuphil.uucp (Paul Benjamin) writes:
I agree that "clutch" if it exists would be very difficult to measure.
It might be easier if measured over a longer period of time, say 5+ years.
I am confused about one thing. You stated that there are only about 30-40
RISP opportunities per year per player. Is this figure correct? I would
have thought it was much higher then that. Perhaps this is the league
average.

luigi
>Paul Benjamin

John H. Rickert

unread,
Feb 20, 1992, 4:01:18 PM2/20/92
to
In article <1992Feb19....@sjuphil.uucp> pben...@sjuphil.uucp (Paul
Benjamin) writes:
> 2) what we need to do is measure the difference between
> RISP and the expected production in RISP. This is not
> at all the same as non-RISP production. To compare RISP
> with non-RISP assumes that all other relevant factors
> are constant. This is likely not so: the population of
> pitchers in late inning close situations is definitely
> different than in non-RISP situations, as people like
> Eckersley, Williams, et al. appear primarily in RISP.
> Also include things like defensive replacements, pinch
> hitters/runners, etc. some of which may be insignificant.

But isn't this taken into account if we take a look at what
the league non-RISP, RISP stats are?
Or how about the non-RISP, RISP totals for all regulars?
If not, how is this failing to measure the difference in quality
of pitchers?

john rickert
ric...@sabr.nextwork.rose-hulman.edu

Roger Lustig

unread,
Feb 20, 1992, 3:56:15 PM2/20/92
to
In article <1992Feb19....@sjuphil.uucp>

OK, that's squared away. Now, since we know that hitting with men in
scoring position *is* more productive than other hitting (teams that
elevate their RISP production score more runs than predicted), it's not
unreasonable to use this simplest definition of "clutch" (i.e., hitting
when it counts) first, before going for smaller fish of bigger
importance to the "clutch" worldview. So, for now, let's DO stay with
RISP, and not insist on RISP*LIPS and the like. There ought to be an
effect with plain RISP, if the "clutch" concept is valid.

>>b) What we tend to do is measure the *difference* between RISP
>>performance and non-RISP. Surely you're not going to tell us that a
>>2-sample t-test with n1=150 and n2=450 can't yield a significant
>>result!

>Two things: 1) see above for n
> 2) what we need to do is measure the difference between
> RISP and the expected production in RISP. This is not
> at all the same as non-RISP production. To compare RISP
> with non-RISP assumes that all other relevant factors
> are constant.

Granted, and most studies I've seen/done adjust for this. RISP hitting
is generally a little higher, say, .008 of BA.

> This is likely not so: the population of
> pitchers in late inning close situations is definitely
> different than in non-RISP situations, as people like
> Eckersley, Williams, et al. appear primarily in RISP.
> Also include things like defensive replacements, pinch
> hitters/runners, etc. some of which may be insignificant.

VERY interesting point. Watch this space: Sherri and I have some stuff
cooking...I'll let you guess for now.

(Of course, this stuff doesn't apply for pain RISP, another reason for
looking under that particular lamppost first.)

>>c) We don't need to use individual players, either. All we need to do
>>is establish a class of players whose RISP BA (or stat of choice) is
>>*significantly* higher than their non-RISP BA/whatever. That's not
>>impossible to do, given the number of PA in a year. We take those
>>players, *each* of whom was significantly RISP-positive, if you wish,
>>and compare them a year later. The *group* RISP-differential can be
>>compared to that of the previous yea, and plenty of significance testing
>>can be done.

>Again, compare it with RISP expectation, not non-RISP. But even then,
>this is questionable. Consider the following standard case of a
>stochastic process (excuse the length of this posting, too!):

>Consider Brownian motion, one of the classical stochastic processes.
>If we dump some small particles of sand in a glass of water, we might
>like to see if they will eventually fall to the bottom. Due to Brownian
>motion, the impacts of the water molecules will keep the smallest
>particles suspended, and will slow the motion of the slightly larger
>particles towards the bottom. So the question that is analogous to
>clutch is: is the effect of gravity strong enough to eventually pull
>the particles to the bottom? Let us assume that the particles are
>large enough to eventually move down, but small enough that they do
>so very slowly (analogy: clutch is not too strong an effect.)

I'm lost. Are you saying that there are some clutch hitters whose
batting averages will not be any higher in the clutch?

I might point out that home runs are only about 3% of all at-bats. Can
it be that there are home run hitters who don't hit home runs?

That said, the analogy completely eludes me. Are different players the
different-sized particles? If so, what is the analogy to size? What if
we start with uniform sand? Surely a more regular effect will be
evident.

Start over, OK?

>If we examine a slice of time large enough to see only a few dozen
>impacts of the water molecules, we cannot know the answer to this
>question. Each particle will move up about as much as it moves down,
>even if it is heavy enough to eventually fall. We need to examine a
>much longer period of time.

On the other hand, the total number of impacts is vastly larger than the
total number of AB in a career. So again the analogy eludes me.

You also seem to be ignoring what we already know about variances of
batting averages per se. We *do* know how they vary from year to year.
With a null hypothesis of no clutch, we can predict the variance of
"clutch" BA from year to year as well. If the variance is lower than
expected, then there should be some effect we haven't accounted for.

I'm really not sure what's bugging you about time here.

>Now, the players are the particles, and gravity is clutch hitting.

Why? For one thing, that assumes that there *is* clutch hitting; our
null hypothesis is that there *isn't.* If we test for evidence of
gravity by dumping sand in water, we get a pretty unequivocal result
pretty fast. Surely it's the hypothesized suspensive effect of
molecular motion that would have to be the analogue to clutch.

>The point of all this is that considering data from more players does
>not help at all. It's exactly the same as using a larger glass of water.
>It wouldn't matter if the glass were as large as the Pacific - the
>small time-frame keeps us from knowing whether the particles are heavy
>enough. Looking at more particles doesn't give us any more information.

Again, the analogy makes no sense to me. We're not trying to find out
whether all players behave the same; we want to know a specific thing:
is an OBSERVED variation inexplicable by mere chance? Your experiment
is looking for something entirely different.

We KNOW that some players hit better in the clutch than otherwise --
over some time period. Now, we look to see whether, in the absence of
some clutch effect (cause unkown, btw; we don't have things like
Brownian motion and gravity as explanations or predictors), the observed
situation would be likely to occur. We are not seeking explanation; we
are seeking an answer to the question: is there anything to explain in
the first place? And, if we DO find that there's something to explain,
will it be useful in predicting future behavior?

>And examining a subset of the particles, e.g., the ones that are closer
>to the bottom at a particular moment, need not tell us anything, because
>they may be closer to the bottom because they are heavier (better clutch
>hitters) or just because of the random currents in the water (natural
>variation of hitting production). So detecting clutch hitting could be
>*extremely* difficult.

Again, examining a group is a PRELIMINARY. If you can't detect it in a
group, you SURELY won't detect it in an individual. And since we're
creating the group by using the hallmarks we WOULD use in evaluating an
individual (assuming the effect was there), thisis not a problem.

Our question is: what would it look like if there was no clutch? Would
it look different?

If there were no gravity, things WOULD look quite different, believe me.
ALL the sand would remain suspended; or it wouldn't even fall into the
water.

>>d) Sherri and I are working on a study that may further irritate the
>>"clutch" folk, based on an aspect of LIPS (Late inning Pressure
>>Situations) that hadn't previously occurred to most people.

>Interesting, but please don't include me in clutch folk (just in case
>you were).

Did I say I had? 8-)

>>e) If you *were* thinking of LIPS and not RISP, then it's 14-15% of all
>>PA. And you can still do studies based on more than one person.

>Yes, but not for the purposes of classifying individuals.

Sure you can. You can show that the classifier is not strong enough to
do the super-individual that the group represents, and therefore not any
of the individuals in the group. (This assumes only that other
variables not being used in the study are in fact not relevant tot the
issue. Suffice it to say that these can be considered a priori and
included according to need, wish, or this week's definition of clutch.)

>>f) If you have data from enough years, LIPS, RISP, and LIPS*RISP data
>>can *all* be used for a given player; eventually you *do* have enough
>>observations.

>Yes, but this doesn't help classify clutch hitters until their careers
>are at least well along, or maybe over. For example, classifying Bonds
>as a postseason clutch choker (as a number of clutch people have
>claimed) based on his performance so far is unjustifiable.

Not the point. The point is: should we bother? If the answer to THAT
one is "no," then all the attempts to classify are irrelevant, and the
problems with those attempts are not problems to us.

>>PS: One study I'm working on involves variances of offense measures.
>>You mentioned some time ago that you'd tested these; do you have your
>>data, or at least a description of what you did? I'd be grateful. --R

>Ah, email! I sent you those things more than a year ago. Well, I
>will have to try to dig those out again.

Oh, dear. I don't recall getting it; I'd be most grateful.

Roger
>
>Paul Benjamin
>

Paul Benjamin

unread,
Feb 21, 1992, 10:38:21 AM2/21/92
to
In article <1992Feb20.1...@odin.corp.sgi.com> lu...@sgi.com (Randy Palermo) writes:
>In article <1992Feb18.1...@sjuphil.uucp> pben...@sjuphil.uucp (Paul Benjamin) writes:
>>In article <1992Feb15.0...@adobe.com> snic...@adobe.com (Sherri Nichols) writes:
>>>In article <1992Feb14....@odin.corp.sgi.com> lu...@bullpen.csd.sgi.com (Randy Palermo) writes:

>I agree that "clutch" if it exists would be very difficult to measure.
>It might be easier if measured over a longer period of time, say 5+ years.
>I am confused about one thing. You stated that there are only about 30-40
>RISP opportunities per year per player. Is this figure correct? I would
>have thought it was much higher then that. Perhaps this is the league
>average.
>luigi

Sorry, I meant LIPS (late innings clutch situations). As David
Grabiner has pointed out to me in email, it comes to about 20AB per
player per year.

Paul Benjamin

Paul Benjamin

unread,
Feb 21, 1992, 10:46:42 AM2/21/92
to

Again, I really meant LIPS. In late innings of close games, teams
use their relief aces, who tend to hold opposing batters to low
batting averages. When the game isn't close, the mopup pitchers
come in, and give up more hits. So the expectation is that hitters
should do worse in LIPS than in non-LIPS, and we need to compare
the LIPS performance with LIPS expectation.

As David Grabiner points out, there may even be effect that are
not uniform among players, e.g., late inning aces may tend to be
more fastball types, thus changing the mix of pitches faced in
LIPS situations, and tilting the averages towards fastball hitters.

Paul Benjamin

Paul Benjamin

unread,
Feb 22, 1992, 3:19:42 PM2/22/92
to
(Roger Lustig) writes:
>In article <1992Feb19....@sjuphil.uucp>

>>Consider the following standard case of a
>>stochastic process (excuse the length of this posting, too!):

>>Consider Brownian motion, one of the classical stochastic processes.
>>If we dump some small particles of sand in a glass of water, we might
>>like to see if they will eventually fall to the bottom. Due to Brownian
>>motion, the impacts of the water molecules will keep the smallest
>>particles suspended, and will slow the motion of the slightly larger
>>particles towards the bottom. So the question that is analogous to
>>clutch is: is the effect of gravity strong enough to eventually pull
>>the particles to the bottom? Let us assume that the particles are
>>large enough to eventually move down, but small enough that they do
>>so very slowly (analogy: clutch is not too strong an effect.)

>I'm lost. Are you saying that there are some clutch hitters whose
>batting averages will not be any higher in the clutch?

If we cannot get sufficient data points for an individual player,
then there may well be clutch hitters whose measured averages will
not show their true ability. Consider the case when a left-handed
clutch hitter happens to face lefties in 70% of his late-inning
close game RISP situations over a few years. His average will not
indicate clutch ability, due to an effect we alreadyy know is important.
Now, we can eliminate such effects by averaging over all players,
but then we are averaging any clutch hitters that might exist in
with clutch chokers, and cancelling the effect of clutch hitting.

>I might point out that home runs are only about 3% of all at-bats. Can
>it be that there are home run hitters who don't hit home runs?

If not enough AB are counted. Consider Bonilla. He hit only 3 HR his
first year in the majors, then 15 his second, and 25+ (or so) since.
There can be HR hitters who don't hit many HRs over periods of a few
hundred ABs, so why can't there be clutch hitters who don't hit many
clutch hits over similar periods (especially at the beginning of
their major league careers)?

>That said, the analogy completely eludes me. Are different players the
>different-sized particles? If so, what is the analogy to size? What if
>we start with uniform sand? Surely a more regular effect will be
>evident.

effect of clutch hitting = effect of gravity
size of particles = clutch hitting ability

If we have uniform sand ( = all hitters are roughly equal clutch hitters,
so no real clutch hitters exist, just hitters) then there still will be
some particles closer to the bottom and some closer to the top. Try it,
and you'll see that the particles don't fall identically to the bottom.
But this can't be due to different effects of gravity on the different
particles, as they are all of the same size ( = even if some hitters
are perceived as better/worse at clutch hitting, it need not be due to
any different clutch hitting ability). Some particles are just going to
get to the bottom faster ( = some hitters are just going to compile
better clutch numbers). This is because each particle undergoes a unique
history of impacts of water particles ( = each hitter has a unique history
of pitchers faced). Now, with hitting in general we have enough atbats
for major league regulars to closely estimate their hitting ability ( =
we watch the particles for a long time) so that if particle X is well
below another particle Y then we can be sure with high probability that
gravity is having more effect on X, so it is heavier (X has more hitting
ability). But for clutch hitting ability, we are not able to watch the
particle for very long, so its position may be due to other effects,
like currents in the water (e.g., different mix of pitchers faced).
Now, if we make the water container much wider, and pour in more sand
( = we watch a lot more players) we still have absolutely no more
information about any individual. More sand may enable us to detect
some properties of the distribution of the sand in the water ( = we
may be able to detect some macro-effect of clutch hitting) but maybe
not. Perhaps a bootstrapping technique would be useful for this.

>>Now, the players are the particles, and gravity is clutch hitting.

>Why? For one thing, that assumes that there *is* clutch hitting; our
>null hypothesis is that there *isn't.* If we test for evidence of
>gravity by dumping sand in water, we get a pretty unequivocal result
>pretty fast. Surely it's the hypothesized suspensive effect of
>molecular motion that would have to be the analogue to clutch.

No. The point of Brownian motion is that for sufficiently light
particles, the water keeps them suspended indefinitely ( = clutch
hitting ability is negligeable.)

Paul Benjamin

David Grabiner

unread,
Feb 22, 1992, 6:48:13 PM2/22/92
to
In article <1992Feb22.2...@sjuphil.uucp>, Paul Benjamin writes:

> (Roger Lustig) writes:

>>I'm lost. Are you saying that there are some clutch hitters whose
>>batting averages will not be any higher in the clutch?

> If we cannot get sufficient data points for an individual player,
> then there may well be clutch hitters whose measured averages will
> not show their true ability. Consider the case when a left-handed
> clutch hitter happens to face lefties in 70% of his late-inning
> close game RISP situations over a few years. His average will not
> indicate clutch ability, due to an effect we alreadyy know is important.
> Now, we can eliminate such effects by averaging over all players,
> but then we are averaging any clutch hitters that might exist in
> with clutch chokers, and cancelling the effect of clutch hitting.

Now that I understand this argument, I can respond to it properly. The
above argument doesn't apply to what I did.

If there is an ability to hit better in the clutch, but there is also a
large random effect, it will be more likely for good clutch hitters to
have good clutch statistics. For example, if there are players who have
the ability to hit 30 points better in LIPS, then half of these players
should hit at least 29 points better in LIPS over the period 1979-1988.
(This was one of the cutoffs for my study; seven players active in 1989
met it.)

Thus, if there is an ability to hit in the clutch, then players who have
previously hit well in the clutch would be expected to hit well in
subsequent years. If clutch hitting has very little to do with the
ability to hit in the clutch, there should be only a weak correlation
between past and present clutch performance.

And the weak correlation is what I found; players who had previously
established themselves as good clutch hitters gained .004 in LIPS
batting average, compared to a league average of -.007 and -.017 for
chokers. This is not statistically significant, although it may become
statistically significant when I can add 1991 data. (The correlation
for hitting with runners in scoring position was even weaker.)

Since I don't have enough data on any individual player, my study cannot
estimate an individual player's clutch ability. It might be the case
that Tony Fernandez actually hits much better in the clutch, Greg Brock
actually hits much worse, and nobody else has any clutch ability at all.

But it seems more likely that clutch ability, like most abilities,
comes from something approximating a normal distribution. You might
lose part of the normal distribution because baseball players are not a
random sample, but the distribution should still look reasonable. Thus,
if there were players who had the ability to gain or lose 40 points more
than normal, there should be many more who had the ability to gain or
lose 20 points, and that *is* inconsistent with my studies.

I haven't proven that clutch ability either does or doesn't exist, but I
have proven that, if it does exist, it isn't very important. You can
use the data as a basis for saying, "I know Player X hit just .260 this
year, but he's a great clutch hitter. I'd rather have him on my team
than that choker Y, even though he hit .263." But if you replace .263
by even .270, you have gone three standard deviations beyond what the
data justifies. And most fans discussing clutch hitters would be
willing to make this statement with .290 replacing .263.

Roger Lustig

unread,
Feb 23, 1992, 1:25:54 AM2/23/92
to

Of course. But this is a different experiment from the one we're
proposing. We want to know WHETHER there is clutch hitting ability;
your experiment postulates it.

>But this can't be due to different effects of gravity on the different
>particles, as they are all of the same size ( = even if some hitters
>are perceived as better/worse at clutch hitting, it need not be due to
>any different clutch hitting ability). Some particles are just going to
>get to the bottom faster ( = some hitters are just going to compile
>better clutch numbers). This is because each particle undergoes a unique
>history of impacts of water particles ( = each hitter has a unique history
>of pitchers faced). Now, with hitting in general we have enough atbats
>for major league regulars to closely estimate their hitting ability ( =
>we watch the particles for a long time) so that if particle X is well
>below another particle Y then we can be sure with high probability that
>gravity is having more effect on X, so it is heavier (X has more hitting
>ability). But for clutch hitting ability, we are not able to watch the
>particle for very long, so its position may be due to other effects,
>like currents in the water (e.g., different mix of pitchers faced).

a) for some definitions of clutch hitting (e.g., RISP hitting) we CAN
watch quite a lot. Only when we get to things like the LIPS*RISP
definition does this point have any relevance.

b) with a definition such as RISP, you get rid of a lot of the other
sources of variation you suggest, e.g., pitcher changes.

c) as I said, Sherri and I are working on a study that, we hope, WILL
show one of these effects.

d) we can also look at an actual, documented effect whose existence has
been validated by just the methods I described. It is the righty-lefty
effect. A simple comparison of two kinds of events, namely same-hand
vs. cross-hand matchups. Likewise, we can compare RISP hitting vs.
other hitting, determine a baseline, find variance estimates, and look
at stability over time. Whether with one player, or with a group, we
CAN use R/L differences to make predictions. Using the same methods, we
find that RISP performance does NOT predict future RISP performance.
Where's the difference? What do we learn from the comparison of these
two studies, if not that RISP-clutch is an effect for which we have no
evidence?

>Now, if we make the water container much wider, and pour in more sand
>( = we watch a lot more players) we still have absolutely no more
>information about any individual. More sand may enable us to detect
>some properties of the distribution of the sand in the water ( = we
>may be able to detect some macro-effect of clutch hitting) but maybe
>not.

Why not? After all, that's what we're looking at! We want to know if
ANY designation of "clutch ability" has predictive power. You seem to
be putting the cart before the horse: we don't WANT to make claims about
individual players' clutch ability -- YET. And we *know* that if we
can't do it for a group of players, we won't be able to do it for
individuals.

(By the way, I just looked in the new Stat Book. Darryl Strawberry,
everybody's favorite example of a choker, was back to his pre-1988 form,
and did better than average (and better than the normal gain) in RISP
situations last year.)

> Perhaps a bootstrapping technique would be useful for this.

Paul, you may be right about all your critiques, but they're not
critiques of the studies we're doing.

>>>Now, the players are the particles, and gravity is clutch hitting.

>>Why? For one thing, that assumes that there *is* clutch hitting; our
>>null hypothesis is that there *isn't.* If we test for evidence of
>>gravity by dumping sand in water, we get a pretty unequivocal result
>>pretty fast. Surely it's the hypothesized suspensive effect of
>>molecular motion that would have to be the analogue to clutch.

>No. The point of Brownian motion is that for sufficiently light
>particles, the water keeps them suspended indefinitely ( = clutch
>hitting ability is negligeable.)

But your model already assumes that there *is* an effect. We want to
know *whether* there is an effect.

That randomness will cause hypothesized effects to be obscured in small
samples is obvious and granted. But it is also irrelevant, and your
critique of the methods we're interested in using is also off base,
imho.

One thing we're pretty sure of is that the players who do *worse* in the
clutch over some period of time are unlikely to be true clutch hitters,
i.e., are unlikely to have clutch ability that's merely been obscured by
randomness. In any case, they are less likely to have the ability (if
it exists) than thosewhose numbers show a positive clutch differential.

OK so far? We're better off looking for clutch hitters among those who
actually hit in the clutch last year.

Now, if the effect does exist, then the performance of the supposed
clutch-hitting group should, in the future, be higher than that of a
control group, at least most of the time. And we can test for that.

If you wish, we are not just dumping more light particles in the water,
we are lumping many light particles into a few heavy ones.

Roger

JXR...@psuvm.psu.edu

unread,
Feb 23, 1992, 3:15:33 PM2/23/92
to
I'm a BIG TIME Yankee fan, but up here at Penn State I don't here much
about them. I was wondering if anybody knew any trade rumors surrounding
them, and just any roster moves in general.

Paul Benjamin

unread,
Feb 23, 1992, 9:14:34 PM2/23/92
to
(Roger Lustig) writes:
>(Paul Benjamin) writes:
>>(Roger Lustig) writes:

(lots of stuff about sand and water deleted)

>>effect of clutch hitting = effect of gravity
>>size of particles = clutch hitting ability

>>If we have uniform sand ( = all hitters are roughly equal clutch hitters,
>>so no real clutch hitters exist, just hitters) then there still will be
>>some particles closer to the bottom and some closer to the top. Try it,
>>and you'll see that the particles don't fall identically to the bottom.

>Of course. But this is a different experiment from the one we're
>proposing. We want to know WHETHER there is clutch hitting ability;
>your experiment postulates it.

No. The effect of gravity may be so small as to be completely
ignored, if the particles are too small ( = clutch hitting ability
is ignorable).

>we can also look at an actual, documented effect whose existence has
>been validated by just the methods I described. It is the righty-lefty
>effect. A simple comparison of two kinds of events, namely same-hand
>vs. cross-hand matchups. Likewise, we can compare RISP hitting vs.
>other hitting, determine a baseline, find variance estimates, and look
>at stability over time. Whether with one player, or with a group, we
>CAN use R/L differences to make predictions. Using the same methods, we
>find that RISP performance does NOT predict future RISP performance.
>Where's the difference? What do we learn from the comparison of these
>two studies, if not that RISP-clutch is an effect for which we have no
>evidence?

If by time, you mean seasons, then once again, there are too few AB
in one season of RISP-LIPS to expect there to be any predictive power.

...

>>The point of Brownian motion is that for sufficiently light
>>particles, the water keeps them suspended indefinitely ( = clutch
>>hitting ability is negligeable.)

>But your model already assumes that there *is* an effect. We want to
>know *whether* there is an effect.

No. See above.

>That randomness will cause hypothesized effects to be obscured in small
>samples is obvious and granted. But it is also irrelevant, and your
>critique of the methods we're interested in using is also off base,
>imho.

>One thing we're pretty sure of is that the players who do *worse* in the
>clutch over some period of time are unlikely to be true clutch hitters,
>i.e., are unlikely to have clutch ability that's merely been obscured by
>randomness. In any case, they are less likely to have the ability (if
>it exists) than thosewhose numbers show a positive clutch differential.

But for LIPS-RISP there are too few AB. The example of Bonds comes to
mind again. Suppose we define a "postseason hitting" ability,
and attempt to measure it. Bonds does terribly, and we have two seasons'
worth of data to back it up. Now, are you saying that because he did worse
over two years that he is unlikely to be a true postseason hitter?
That we can predict with any confidence that he will not become a hot
postseason hitter in the future?

>OK so far? We're better off looking for clutch hitters among those who
>actually hit in the clutch last year.

Really? How much better? How much does the likelihood increase?

>Now, if the effect does exist, then the performance of the supposed
>clutch-hitting group should, in the future, be higher than that of a
>control group, at least most of the time. And we can test for that.
>If you wish, we are not just dumping more light particles in the water,
>we are lumping many light particles into a few heavy ones.

But we can't really know which are which ahead of time.

Paul Benjamin

Roger Lustig

unread,
Feb 23, 1992, 10:56:06 PM2/23/92
to
In article <1992Feb24.0...@sjuphil.uucp> pben...@sjuphil.uucp (Paul Benjamin) writes:
>(Roger Lustig) writes:
>>(Paul Benjamin) writes:
>>>(Roger Lustig) writes:

>(lots of stuff about sand and water deleted)

>>>effect of clutch hitting = effect of gravity
>>>size of particles = clutch hitting ability

>>>If we have uniform sand ( = all hitters are roughly equal clutch hitters,
>>>so no real clutch hitters exist, just hitters) then there still will be
>>>some particles closer to the bottom and some closer to the top. Try it,
>>>and you'll see that the particles don't fall identically to the bottom.

>>Of course. But this is a different experiment from the one we're
>>proposing. We want to know WHETHER there is clutch hitting ability;
>>your experiment postulates it.

>No. The effect of gravity may be so small as to be completely
>ignored, if the particles are too small ( = clutch hitting ability
>is ignorable).

One more time: what does the size of the particles represent? If it
represents clutch hitting ability, and there *is* no clutch hitting
ability, then I'm completely lost once again.

Either that or your analogy contradicts itself.

>>we can also look at an actual, documented effect whose existence has
>>been validated by just the methods I described. It is the righty-lefty
>>effect. A simple comparison of two kinds of events, namely same-hand
>>vs. cross-hand matchups. Likewise, we can compare RISP hitting vs.
>>other hitting, determine a baseline, find variance estimates, and look
>>at stability over time. Whether with one player, or with a group, we
>>CAN use R/L differences to make predictions. Using the same methods, we
>>find that RISP performance does NOT predict future RISP performance.
>>Where's the difference? What do we learn from the comparison of these
>>two studies, if not that RISP-clutch is an effect for which we have no
>>evidence?

>If by time, you mean seasons, then once again, there are too few AB
>in one season of RISP-LIPS to expect there to be any predictive power.

Not the case for RISP alone. Why do you continue to harp on RISP*LIPS?

It's not as though I hadn't pointed out the difference between th two in
the very paragraph you're responding to.

Nor do you seem to have noticed the example of the method I'm using that
*has* produced a conclusive result. Or are you arguing that the
righty/lefty differential is not conclusively demonstrated?

>>>The point of Brownian motion is that for sufficiently light
>>>particles, the water keeps them suspended indefinitely ( = clutch
>>>hitting ability is negligeable.)

>>But your model already assumes that there *is* an effect. We want to
>>know *whether* there is an effect.

>No. See above.

Actually, it's also lacking in that there's no "choke" force pulling i
nthe other direction. The "clutch" hypothesis says that there are
clutch hitters, choke hitters, and in-between hitters.

>>That randomness will cause hypothesized effects to be obscured in small
>>samples is obvious and granted. But it is also irrelevant, and your
>>critique of the methods we're interested in using is also off base,
>>imho.

>>One thing we're pretty sure of is that the players who do *worse* in the
>>clutch over some period of time are unlikely to be true clutch hitters,
>>i.e., are unlikely to have clutch ability that's merely been obscured by
>>randomness. In any case, they are less likely to have the ability (if
>>it exists) than thosewhose numbers show a positive clutch differential.

>But for LIPS-RISP there are too few AB. The example of Bonds comes to
>mind again. Suppose we define a "postseason hitting" ability,
>and attempt to measure it. Bonds does terribly, and we have two seasons'
>worth of data to back it up. Now, are you saying that because he did worse
>over two years that he is unlikely to be a true postseason hitter?
>That we can predict with any confidence that he will not become a hot
>postseason hitter in the future?

Paul, do you even *read* what you're responding to?

a) I have argued that RISP, and *not* LIPS-RISP, is adequate as a test
subset. And there ARE adequate PA in a season for such tests, even on
individuals.

b) NOWHERE have I advocated using short observation periods as
indicative of clutch or choke performance. I have *objected* to it.
But I see nothing wrong with aggregating lots of sort-term performances
and using tests based on them to investigate clutch. That will not tell
us about the individuals, but it will tell us about the supposed
existence of the *effect*.

c) If you haven't noticed yet that I strongly support the null
hypothesis in the absence of evidence to the contrary, you've been away
from the net for a very long time.

>>OK so far? We're better off looking for clutch hitters among those who
>>actually hit in the clutch last year.

>Really? How much better? How much does the likelihood increase?

Irrelevant. The point is: if the hypothesis is true, then it SHOULD
increase by some noticeable amount, because the aggregate clutch
differential observed will be *in part* attributable to the
hypothesized effect.

We are setting up a null hypothesis: that it's all random, that the ones
who did better this time will NOT do better next time. A straw man.
Get it?

(Of course, no test has yet knocked down the straw man, which is the
interesting part.)

>>Now, if the effect does exist, then the performance of the supposed
>>clutch-hitting group should, in the future, be higher than that of a
>>control group, at least most of the time. And we can test for that.
>>If you wish, we are not just dumping more light particles in the water,
>>we are lumping many light particles into a few heavy ones.

>But we can't really know which are which ahead of time.

OF COURSE NOT. But to TEST the damn hypothesis, we have to imagine a
scenario in which it is true.

Paul, are you being deliberately obtuse here? I'm not at all sure I
understand what your objections to this experimental design are.

Roger

John H. Rickert

unread,
Feb 24, 1992, 8:30:44 AM2/24/92
to
In article <1992Feb21.1...@sjuphil.uucp> pben...@sjuphil.uucp (Paul
Benjamin) writes:

> Again, I really meant LIPS. In late innings of close games, teams
> use their relief aces, who tend to hold opposing batters to low
> batting averages. When the game isn't close, the mopup pitchers
> come in, and give up more hits. So the expectation is that hitters
> should do worse in LIPS than in non-LIPS, and we need to compare
> the LIPS performance with LIPS expectation.

From the 1992 STATS Scoreboard Book;
Late & Close totals
League BA OBP SLG
AL .244 .319 .360 roughly 14,000 PA
NL .257 .341 .377 roughly 12,000 PA

All PA
League BA OBP SLG
AL .260 .329 .395
NL .250 .317 .373

> As David Grabiner points out, there may even be effect that are
> not uniform among players, e.g., late inning aces may tend to be
> more fastball types, thus changing the mix of pitches faced in
> LIPS situations, and tilting the averages towards fastball hitters.

OK. Also, having runners on first gives an advantage to left-handed batters

john rickert
ric...@sabr.nextwork.rose-hulman.edu

Paul Benjamin

unread,
Feb 24, 1992, 12:13:33 PM2/24/92
to
In article <1992Feb24....@Princeton.EDU> ro...@phoenix.Princeton.EDU (Roger Lustig) writes:
>In article <1992Feb24.0...@sjuphil.uucp> pben...@sjuphil.uucp (Paul Benjamin) writes:
>>(Roger Lustig) writes:
>>>(Paul Benjamin) writes:
>>>>(Roger Lustig) writes:

>>(lots of stuff about sand and water deleted)

>>>>effect of clutch hitting = effect of gravity
>>>>size of particles = clutch hitting ability

>>>>If we have uniform sand ( = all hitters are roughly equal clutch hitters,
>>>>so no real clutch hitters exist, just hitters) then there still will be
>>>>some particles closer to the bottom and some closer to the top. Try it,
>>>>and you'll see that the particles don't fall identically to the bottom.

>One more time: what does the size of the particles represent? If it


>represents clutch hitting ability, and there *is* no clutch hitting
>ability, then I'm completely lost once again.

It represents clutch hitting ability. Now, don't read that as meaning
that if there is no clutch hitting ability, then the particles are of
zero size - they are then all of the same size. "No clutch hitting
ability" doesn't mean that everyone hits .000 in clutch situations!
It just means that everyone has about the same clutch ability as they
do overall hitting ability.

>>>we can also look at an actual, documented effect whose existence has
>>>been validated by just the methods I described. It is the righty-lefty
>>>effect. A simple comparison of two kinds of events, namely same-hand
>>>vs. cross-hand matchups. Likewise, we can compare RISP hitting vs.
>>>other hitting, determine a baseline, find variance estimates, and look
>>>at stability over time. Whether with one player, or with a group, we
>>>CAN use R/L differences to make predictions. Using the same methods, we
>>>find that RISP performance does NOT predict future RISP performance.
>>>Where's the difference? What do we learn from the comparison of these
>>>two studies, if not that RISP-clutch is an effect for which we have no
>>>evidence?
>
>>If by time, you mean seasons, then once again, there are too few AB
>>in one season of RISP-LIPS to expect there to be any predictive power.

>Not the case for RISP alone. Why do you continue to harp on RISP*LIPS?
>It's not as though I hadn't pointed out the difference between th two in
>the very paragraph you're responding to.

I'm "harping" on it because it seems intuitively to be the better
definition of clutch (than does RISP) because it takes into account
more of the state of the game. Remember, the pro-clutch crowd has made
very explicit their belief that the game conditions can have a strong
effect on hitting. Just looking at RISP could easily be objected to
by a pro-clutch person who would say that hitters don't "get up" for
atbats earlier in the game, etc. Despite my intuition that this is
not so (and you have made it clear that you don't accept this either)
it is necessary to use such a definition to meet their statements
head on.

>>>>The point of Brownian motion is that for sufficiently light
>>>>particles, the water keeps them suspended indefinitely ( = clutch
>>>>hitting ability is negligeable.)

>>>But your model already assumes that there *is* an effect. We want to
>>>know *whether* there is an effect.

>>No. See above.

>Actually, it's also lacking in that there's no "choke" force pulling i
>nthe other direction. The "clutch" hypothesis says that there are
>clutch hitters, choke hitters, and in-between hitters.

Good point. I could use another analogy, if you'd like.

>>>That randomness will cause hypothesized effects to be obscured in small
>>>samples is obvious and granted. But it is also irrelevant, and your
>>>critique of the methods we're interested in using is also off base,
>>>imho.

>>>One thing we're pretty sure of is that the players who do *worse* in the
>>>clutch over some period of time are unlikely to be true clutch hitters,
>>>i.e., are unlikely to have clutch ability that's merely been obscured by
>>>randomness. In any case, they are less likely to have the ability (if
>>>it exists) than thosewhose numbers show a positive clutch differential.

>>But for LIPS-RISP there are too few AB. The example of Bonds comes to
>>mind again. Suppose we define a "postseason hitting" ability,
>>and attempt to measure it. Bonds does terribly, and we have two seasons'
>>worth of data to back it up. Now, are you saying that because he did worse
>>over two years that he is unlikely to be a true postseason hitter?
>>That we can predict with any confidence that he will not become a hot
>>postseason hitter in the future?

>Paul, do you even *read* what you're responding to?

Yes, Roger. Perhaps it is better to refrain from transforming a
technical discussion into a personal one. Having read the recent
slew of postings between stat and anti-stat people, in which the
anti-stat people accused the stat people of personal attacks, I
am disappointed to see that you have provided an example of this.
Note also that David Tate did a similar thing recently, when he
questioned my partiality. Is there a need for this sort of thing?

>b) NOWHERE have I advocated using short observation periods as
>indicative of clutch or choke performance. I have *objected* to it.
>But I see nothing wrong with aggregating lots of sort-term performances
>and using tests based on them to investigate clutch. That will not tell
>us about the individuals, but it will tell us about the supposed
>existence of the *effect*.

Sorry, but this still makes little sense to me. After all, pitchers
are people, too. Any clutch ability possessed by hitters will likely
be possessed by pitchers, too (and maybe even fielders), making the
overall measurement of clutch ability/performance difficult.

And there are a couple of questions here which are being confused:

1) Does clutch ability exist?

2) How much effect does performance in clutch situations have on
the outcome of games?

Question #1 can be investigated at the individual or group level, e.g.,
how much of the population exhibits clutch ability, and who has it.
The individual investigation will be very difficult, and the group
investigation can show at best that there appear to be hitters who
might possess the ability. David Grabiner's work is along these lines.

Question #2 is independent of #1. How many games is a +.010 in clutch
on the team level worth? How about +.010 for an individual? Even if
there's no clutch hitting ability (meaning that there is no ability
separate from overall ability) how crucial is the random variation
in clutch performance to a team's offense?

>>>OK so far? We're better off looking for clutch hitters among those who
>>>actually hit in the clutch last year.

>>Really? How much better? How much does the likelihood increase?

>Irrelevant. The point is: if the hypothesis is true, then it SHOULD
>increase by some noticeable amount, because the aggregate clutch
>differential observed will be *in part* attributable to the
>hypothesized effect.

No. It's extremely relevant. The amount of increase need not be
noticeable.

>Paul, are you being deliberately obtuse here? I'm not at all sure I
>understand what your objections to this experimental design are.

Deliberately obtuse? You mean I've switched off some of my neurons?
No, I don't think so. If I'm being obtuse, it's purely accidental.

Some objections to the design I've seen so far are:

- Definitions of clutch that encompass a sufficient number of data
points may be too broad to capture the notion of "clutch", as in
hitting when it counts most. More restricted definitions don't
permit enough data per player to detect any supposed ability with
any reliability.

- Given that clutch performance (if it exists) is almost certainly
shown by pitchers, too, it is likely that any hitter's clutch
performance will vary considerably from one year to the next, as
the population of pitchers he faces changes.

- Just examining averages really doesn't tell the story. It is
likely that any subpopulation you choose, e.g., people who hit
well in the clutch last year, contains a number of chokers, too
(if they exist). The chokers will reduce the average of the group,
probably a good amount.

Paul Benjamin

Roger Lustig

unread,
Feb 24, 1992, 2:55:58 PM2/24/92
to
In article <1992Feb24....@sjuphil.uucp> pben...@sjuphil.uucp (Paul Benjamin) writes:
>In article <1992Feb24....@Princeton.EDU> ro...@phoenix.Princeton.EDU (Roger Lustig) writes:
>>In article <1992Feb24.0...@sjuphil.uucp> pben...@sjuphil.uucp (Paul Benjamin) writes:
>>>(Roger Lustig) writes:
>>>>(Paul Benjamin) writes:
>>>>>(Roger Lustig) writes:

>>>(lots of stuff about sand and water deleted)

>>>>>effect of clutch hitting = effect of gravity
>>>>>size of particles = clutch hitting ability

>>>>>If we have uniform sand ( = all hitters are roughly equal clutch hitters,
>>>>>so no real clutch hitters exist, just hitters) then there still will be
>>>>>some particles closer to the bottom and some closer to the top. Try it,
>>>>>and you'll see that the particles don't fall identically to the bottom.

>>One more time: what does the size of the particles represent? If it
>>represents clutch hitting ability, and there *is* no clutch hitting
>>ability, then I'm completely lost once again.

>It represents clutch hitting ability. Now, don't read that as meaning
>that if there is no clutch hitting ability, then the particles are of
>zero size - they are then all of the same size. "No clutch hitting
>ability" doesn't mean that everyone hits .000 in clutch situations!
>It just means that everyone has about the same clutch ability as they
>do overall hitting ability.

So: if there's no real clutch effect, then

a) the particles are the same size
b) there is no gravity. Or no effect of gravity.

At which point the sand hovers over the water, and never gets to meet
the water at all. Can we start over?

a) Why should I let *them* define my work? I don't need to include
every cockamamie ad-hoc hypothesis in my design in order to do the first
round of it. Besides, if *some* RISP at-bats are clutch (and with a
greater frequency than non-RISP ones), then the effect, if it exists,
should be visible, since it's unlikely that any RISP at-bats are
actually less clutch than equivalent non-RISP ones.

b) Plenty of clutch proponents DO accept RISP as at least a somewhat
clutch situation.

c) With all the emphasis on RBI, it IS a clutch situation--if you define
"clutch" as "a situation where a fan pays close attention."

d) RISP at-bats DO "count" more than other AB. That's an effective
argument for examining them.

>>>>>The point of Brownian motion is that for sufficiently light
>>>>>particles, the water keeps them suspended indefinitely ( = clutch
>>>>>hitting ability is negligeable.)

>>>>But your model already assumes that there *is* an effect. We want to
>>>>know *whether* there is an effect.

>>>No. See above.

>>Actually, it's also lacking in that there's no "choke" force pulling i
>>nthe other direction. The "clutch" hypothesis says that there are
>>clutch hitters, choke hitters, and in-between hitters.

>Good point. I could use another analogy, if you'd like.

I can hardly wait... 8-)

Note that this equal-and-opposite-forces model means that ultimately
we'll be looking in the variances to determine whether the effect is
there.

>>>>That randomness will cause hypothesized effects to be obscured in small
>>>>samples is obvious and granted. But it is also irrelevant, and your
>>>>critique of the methods we're interested in using is also off base,
>>>>imho.

>>>>One thing we're pretty sure of is that the players who do *worse* in the
>>>>clutch over some period of time are unlikely to be true clutch hitters,
>>>>i.e., are unlikely to have clutch ability that's merely been obscured by
>>>>randomness. In any case, they are less likely to have the ability (if
>>>>it exists) than thosewhose numbers show a positive clutch differential.

>>>But for LIPS-RISP there are too few AB. The example of Bonds comes to
>>>mind again. Suppose we define a "postseason hitting" ability,
>>>and attempt to measure it. Bonds does terribly, and we have two seasons'
>>>worth of data to back it up. Now, are you saying that because he did worse
>>>over two years that he is unlikely to be a true postseason hitter?
>>>That we can predict with any confidence that he will not become a hot
>>>postseason hitter in the future?

>>Paul, do you even *read* what you're responding to?

>Yes, Roger. Perhaps it is better to refrain from transforming a
>technical discussion into a personal one. Having read the recent
>slew of postings between stat and anti-stat people, in which the
>anti-stat people accused the stat people of personal attacks, I
>am disappointed to see that you have provided an example of this.

Paul, it was real frustration that caused that. I think I was
perfectly clear before in explaining my use of sample means as
estimators of population means under a certain hypothesis.

>Note also that David Tate did a similar thing recently, when he
>questioned my partiality. Is there a need for this sort of thing?

Only when you don't respond to the things you include in your responses,
but march past them.

>>b) NOWHERE have I advocated using short observation periods as
>>indicative of clutch or choke performance. I have *objected* to it.
>>But I see nothing wrong with aggregating lots of sort-term performances
>>and using tests based on them to investigate clutch. That will not tell
>>us about the individuals, but it will tell us about the supposed
>>existence of the *effect*.

>Sorry, but this still makes little sense to me. After all, pitchers
>are people, too. Any clutch ability possessed by hitters will likely
>be possessed by pitchers, too (and maybe even fielders), making the
>overall measurement of clutch ability/performance difficult.

How likely is it that the sample of "clutch" hitters we test for will
have faced a biased sample of pitchers? (If we use plain RISP, this is
very unlikely. If we go to LIPS, it's more likely; and I hope to
produce some real results on this front sometime soon. Watch this
space.)

>And there are a couple of questions here which are being confused:

>1) Does clutch ability exist?

>2) How much effect does performance in clutch situations have on
> the outcome of games?

>Question #1 can be investigated at the individual or group level, e.g.,
>how much of the population exhibits clutch ability, and who has it.
>The individual investigation will be very difficult, and the group
>investigation can show at best that there appear to be hitters who
>might possess the ability. David Grabiner's work is along these lines.

David? Does my design look like yours? Do you have problems iwth
aggregation? (If anything, aggregation should *reduce* hitter/pitcher
bias...)

>Question #2 is independent of #1. How many games is a +.010 in clutch
>on the team level worth? How about +.010 for an individual? Even if
>there's no clutch hitting ability (meaning that there is no ability
>separate from overall ability) how crucial is the random variation
>in clutch performance to a team's offense?

I don't recall confusing the questions. We know approximately what RISP
hitting does for a team; is it that problematic to transfer this
information to evaluations of individual player value?

>>>>OK so far? We're better off looking for clutch hitters among those who
>>>>actually hit in the clutch last year.

>>>Really? How much better? How much does the likelihood increase?

>>Irrelevant. The point is: if the hypothesis is true, then it SHOULD
>>increase by some noticeable amount, because the aggregate clutch
>>differential observed will be *in part* attributable to the
>>hypothesized effect.

>No. It's extremely relevant. The amount of increase need not be
>noticeable.

Oh, for heavens' sake. Then let's amend the "clutch hypothesis" to
something a little closer to what clutch proponents actually say: that
there is a NOTICEABLE effect.

>>Paul, are you being deliberately obtuse here? I'm not at all sure I
>>understand what your objections to this experimental design are.

>Deliberately obtuse? You mean I've switched off some of my neurons?
>No, I don't think so. If I'm being obtuse, it's purely accidental.

>Some objections to the design I've seen so far are:

>- Definitions of clutch that encompass a sufficient number of data
> points may be too broad to capture the notion of "clutch", as in
> hitting when it counts most.

"Most" is an undefined term. It could mean "In the last game of the WS,
when the score is tied in th e 9th inning or thereafter."

What's wrong with "when it counts *more*"? RISP is clearly and
obviously and demonstrably such a situation.

You've put your finger on something: defining the notion of "clutch" is
difficult. Used to be, a big RBI man was by definition a clutch hitter.
Then Ted Williams got too many RBI for some schmuck sportswriter in
Boston, and the Splinter got accused of never hitting "important" RBI.
Ever since, the definition has changed whenever it became too clear that
the old definition didn't serve current prejudices sufficiently.

This is one reason why I stick with RISP. I'm not going to fool around
with people who want to redefine "clutch" so that all the work that, up
until now, has made them look like fools must be thrown out, just
because they feel like it. If they have something to prove, let THEM
prove it. I'm not getting paid for this, you know, and neither is
Sherri.

(Which is not to say that a grant from the Clutch Hitting Council
wouldn't convince us to change our minds and get all-new data and start
over...)

> More restricted definitions don't
> permit enough data per player to detect any supposed ability with
> any reliability.

Yet aggregation can alleviate this problem, to the extent of showing
whether there IS an effect wrt that definition, i.e., whether players
who do well once are more likely than other s to do well again.

>- Given that clutch performance (if it exists) is almost certainly
> shown by pitchers, too, it is likely that any hitter's clutch
> performance will vary considerably from one year to the next, as
> the population of pitchers he faces changes.

Again, aggregate to minimize variation.

>- Just examining averages really doesn't tell the story. It is
> likely that any subpopulation you choose, e.g., people who hit
> well in the clutch last year, contains a number of chokers, too
> (if they exist). The chokers will reduce the average of the group,
> probably a good amount.

OK: is that subpopulation more or less likely to contain a given number
of chokers than the subpopulation of people who actually *did* choke
last year? Again, we're just using asample mean to estimate a
population mean. And then testing whether this makes sense.

Roger

David Grabiner

unread,
Feb 24, 1992, 11:30:03 PM2/24/92
to

In article <1992Feb24....@sjuphil.uucp>, Paul Benjamin writes:

> -Paul Benjamin
>>-Roger Lustig
>>>-Paul Benjamin
>>>>-Roger Lustig

>>b) NOWHERE have I advocated using short observation periods as
>>indicative of clutch or choke performance. I have *objected* to it.

>>But I see nothing wrong with aggregating lots of short-term performances


>>and using tests based on them to investigate clutch. That will not tell
>>us about the individuals, but it will tell us about the supposed
>>existence of the *effect*.

> Sorry, but this still makes little sense to me. After all, pitchers
> are people, too. Any clutch ability possessed by hitters will likely
> be possessed by pitchers, too (and maybe even fielders), making the
> overall measurement of clutch ability/performance difficult.

Given a large enough sample, you expect that the hitters in your study
will face a random mix of pitchers. There is still some error which is
due to luck.

>>>>OK so far? We're better off looking for clutch hitters among those who
>>>>actually hit in the clutch last year.

>>>Really? How much better? How much does the likelihood increase?

See below.

>>Irrelevant. The point is: if the hypothesis is true, then it SHOULD
>>increase by some noticeable amount, because the aggregate clutch
>>differential observed will be *in part* attributable to the
>>hypothesized effect.

> No. It's extremely relevant. The amount of increase need not be
> noticeable.

This is correct in theory, but I don't care whether clutch hitting
exists if the ability is insignificant.

How well a study of players whose previous clutch data was good will
find players with good clutch ability depends on how much of clutch
performance is due to ability.

If you look at five years of data (which is about what I did), then the
standard deviation of luck in clutch performance is 24 points of batting
average. If the standard deviation of ability in clutch performance is
24 points, then the correlation between ability and performance is .5.
Thus, given a group of players who averaged a gain of 47 points in the
clutch over the five-year period, you would expect their abiltiy to be
halfway betwen +47 and the normal -7, which is +20. Likewise, you would
expect the chokers to average -34 in actual ability.

Thus you would expect the clutch terrors to beat the chokers in future
clutch performance by 54 points, with a standard deviation (given my
sample size) of 16 points. The actual difference I found was 24 points,
which is off by just under two standard deviations. Thus this is an
upper bound for the ability to hit in the clutch.

At the other extreme, you might assume that clutch hitting is entirely
due to luck. That gives an expected difference of 0, and the actual
value of 24 is one and a half standard deviations off.

A reasonable estimate for the importance of clutch ability is the value
which gives a prediction closest to the actual value. That comes out to
a standard deviation of 13 points. This means that (assuming a normal
distribution) 1/40 of all major league hitters, the true terrors, have
an ability to hit 26 points better in LIPS than otherwise; that's two
hits a year.

In addition, you can't find the top players perfectly from the clutch
data. Given the values above and using a linear regression, you can
predict that a player's clutch ability is less than 1/4 of his observed
difference from the mean. Thus the observed clutch terrors, who gained
an average of 54 points above the mean, have an ability to hit 12 points
better in LIPS than otherwise; that's less than one hit a year.

> And there are a couple of questions here which are being confused:

> 1) Does clutch ability exist?

> 2) How much effect does performance in clutch situations have on
> the outcome of games?

> Question #1 can be investigated at the individual or group level, e.g.,
> how much of the population exhibits clutch ability, and who has it.
> The individual investigation will be very difficult, and the group
> investigation can show at best that there appear to be hitters who
> might possess the ability. David Grabiner's work is along these lines.

There certainly isn't enough data to say *who* has clutch ability, but
there is enough data to say that whoever does have it has approximately
a certain amount of it.

And there is no way to distinguish the ability to bear down in the
clutch from any other ability which causes players to hit better in the
clutch; thus the measurements of clutch ability are actually upper
bounds.

> Question #2 is independent of #1. How many games is a +.010 in clutch
> on the team level worth? How about +.010 for an individual? Even if
> there's no clutch hitting ability (meaning that there is no ability
> separate from overall ability) how crucial is the random variation
> in clutch performance to a team's offense?

I can also answer this. A study in The Hidden Game of Baseball showed
that relief aces' pitching had twice the effect per inning on the
probability of winning as starters' pitching did. That is, the average
hit given up by a relief ace costs the team twice as much as the average
hit given up by a starter.

Since 1/7 of all at-bats are in LIPS and each LIPS hit is worth two
non-LIPS hits, 6 points of batting average gained in LIPS is worth an
point on your season batting average with no LIPS advantage, for either
a player or a team.

I don't have comparable data for RISP or RISP in LIPS, but I would guess
that the factors are about 2 with RISP (a single is more than twice as
valuable, but extra bases aren't worth much more) and thus 4 with RISP
in LIPS. Since 1/4 of all at-bats are with RISP and 1/30 are with RISP
in LIPS, that means that 3 points of RISP batting advantage is worth one
point on your season batting average, and 10 points of RISP with LIPS
average is worth one point.

Now, let's put all of this together. Given the estimate that the top
clutch hitters have an ability to hit 26 points better in LIPS than
overall, that's worth about 4 points of season batting average. And
given that the *observed* clutch hitters have an ability to hit 12
points better in LIPS than overall, that's worth just 2 points.

So it is legitimate to say, "Benito Santiago hit .267 last year, but
he's been one of the top clutch hitters in baseball, which makes
him as valuable as a .269 hitter." The data backs you up here.

But replace .269 by .271 and you are depending on Santiago's clutch data
being reliable, rather than due to luck. Replace it by .272 and you
have gone beyond any prediction which can be justified by the statistics
alone. Replace it by .276 and you have made a statement which is
inconsistent with the data on clutch hitting.

Remove the .269 from the above statement, and ask fans who believe
in clutch hitting to fill in the number. I'd be surprised to see many
numbers below .277. I believe the correct number is somewhere between
.267 and .271, and I don't really care about such a small range.

> - Given that clutch performance (if it exists) is almost certainly
> shown by pitchers, too, it is likely that any hitter's clutch
> performance will vary considerably from one year to the next, as
> the population of pitchers he faces changes.

This is already dealt with in the computation of luck; few hitters face
the same pitcher even twice in a game in a clutch situation, so the mix
of pitchers is pretty close to independent.

> - Just examining averages really doesn't tell the story. It is
> likely that any subpopulation you choose, e.g., people who hit
> well in the clutch last year, contains a number of chokers, too
> (if they exist). The chokers will reduce the average of the group,
> probably a good amount.

I dealt with this above. There are probably some average clutch hitters
among the players with excellent clutch data, but for a choker who lost
20 points more than normal in the clutch to make the list would be
almost impossible.

Using clutch data from one year, there might be a minor problem, but
with five or ten years of data, the effect of luck is greatly reduced.

Paul Benjamin

unread,
Feb 25, 1992, 3:13:08 PM2/25/92
to
(Roger Lustig) writes:
>(Paul Benjamin) writes:
>>(Roger Lustig) writes:
>>>(Paul Benjamin) writes:
>>>>(Roger Lustig) writes:

(a lot more stuff about water and sand deleted)

>>Good point. I could use another analogy, if you'd like.

>I can hardly wait... 8-)

OK. Suppose I have a loaded coin that turns up heads 26% of the time, so
it's a .260 "hitter". Now, I suspect that there may be metal inside that
might be affected by a magnet, so I'd like to see if so, and how much.
If I bring in a magnet and flip the coin 100 times, and it comes up heads
28 times, what does that tell me about the coin's metal (or the player's
mettle ;) )?

Now suppose I have 100 coins like this, each with its own average, e.g.,
some "hit" .250, some .300. Now, some may be not affected by the magnet,
and some may have their % of heads increased, and some decreased. If
I flip them in the presence of the magnet 100 times each, what
information does the set of flips of one coin tell me about the other
coins?

Third question: If I can increase the number of trials for each coin,
clearly I can make a reliable guess about the magnetic properties of
each coin, e.g., flip each 1 million times in the presence of the
magnet, and get an almost certain classification as a "magnetic hitter",
a "magnetic choker", or even. But if I can't increase the number of
trials per coin in this way, what information do I get by getting a
million such coins, each of which can be flipped only a hundred times?

>>>Paul, do you even *read* what you're responding to?

>>Yes, Roger. Perhaps it is better to refrain from transforming a


>>technical discussion into a personal one. Having read the recent
>>slew of postings between stat and anti-stat people, in which the
>>anti-stat people accused the stat people of personal attacks, I
>>am disappointed to see that you have provided an example of this.

>Paul, it was real frustration that caused that. I think I was
>perfectly clear before in explaining my use of sample means as
>estimators of population means under a certain hypothesis.

>>Note also that David Tate did a similar thing recently, when he
>>questioned my partiality. Is there a need for this sort of thing?

>Only when you don't respond to the things you include in your responses,
>but march past them.

No. There is never a need for that sort of thing. And is it not
as though I have found your responses totally to the point.

>>... pitchers


>>are people, too. Any clutch ability possessed by hitters will likely
>>be possessed by pitchers, too (and maybe even fielders), making the
>>overall measurement of clutch ability/performance difficult.

>How likely is it that the sample of "clutch" hitters we test for will


>have faced a biased sample of pitchers? (If we use plain RISP, this is
>very unlikely. If we go to LIPS, it's more likely; and I hope to
>produce some real results on this front sometime soon. Watch this
>space.)

I don't know how likely disparate the sets of opposing pitchers are
for two different hitters in one year. I don't think anyone knows.
I don't think this means the effect can be ignored. Certainly there
is some difference due merely to the fact that each hitter cannot
face the pitching staff of his own team.

>>And there are a couple of questions here which are being confused:

>>1) Does clutch ability exist?

>>2) How much effect does performance in clutch situations have on
>> the outcome of games?

>>Question #2 is independent of #1. How many games is a +.010 in clutch


>>on the team level worth? How about +.010 for an individual? Even if
>>there's no clutch hitting ability (meaning that there is no ability
>>separate from overall ability) how crucial is the random variation
>>in clutch performance to a team's offense?

>I don't recall confusing the questions. We know approximately what RISP


>hitting does for a team; is it that problematic to transfer this
>information to evaluations of individual player value?

Yes. As David Grabiner has pointed out to me (I'm really just a puppet
on his string) definitions of clutch (even RISP, to some extent) might
just be measuring things such as how well hitters hit fastballs, or
how quickly a hitter adjusts to a new pitcher, or perhaps other factors
we haven't thought of. The set of hitters who do better in RISP or LIPS
situations may just be the better fastball hitters or faster adjusters.
Now, such things are relevant for a manager trying to make decisions
about pinch hitters, etc., but I don't really think it is too relevant
to the supposed clutch ability.

I really think that the supposed clutch "ability" is one of the most
difficult aspects of baseball to pin down.

>>>>>OK so far? We're better off looking for clutch hitters among those who
>>>>>actually hit in the clutch last year.

Like looking for the coins that are positively affected by the magnet
by looking only among those that flipped a higher than expected
percentage of heads in their last 100 or 200 flips?

>>Some objections to the design I've seen so far are:

>>- Definitions of clutch that encompass a sufficient number of data
>> points may be too broad to capture the notion of "clutch", as in
>> hitting when it counts most.

>"Most" is an undefined term. It could mean "In the last game of the WS,
>when the score is tied in th e 9th inning or thereafter."

>What's wrong with "when it counts *more*"? RISP is clearly and
>obviously and demonstrably such a situation.

I don't think anything is wrong with your definition. Nothing at all.
But in trying to investigate clutch, it also pays to look at RISP*LIPS.
After all, what if a player does better in RISP than overall, but
actually does worse in RISP*LIPS? Is he a clutch hitter?

>This is one reason why I stick with RISP. I'm not going to fool around
>with people who want to redefine "clutch" so that all the work that, up
>until now, has made them look like fools must be thrown out, just
>because they feel like it. If they have something to prove, let THEM
>prove it. I'm not getting paid for this, you know, and neither is
>Sherri.

Believe me, I know. Professors aren't exactly paid well, and I'd
be much better off doing something profitable. But back to the point:
I think you've stated this the right way: the obvious definitions of
clutch clearly don't show anything, e.g., someone who hits much better
in the ninth inning of close games. My point is that any further
investigation of clutch is 1) not going to be able to classify
individuals, and 2) lead at best to a statement like "using this
definition of clutch, we find that maybe this group of hitters hits
a certain amount better."

Now, pro-clutch people will object to your conclusions because 1) they
will point to numerous supposed cases of clutch hitters/chokers, and
2) they can always come up with different definitions of clutch (and
call them scientific hypotheses yet!). So, why spend the time working
on something like this, where the conclusions are likely to be
questionable, due to all sorts of effects like ratio of fastballs or
speed of adjustment to pitchers (and who knows what else?) when there
are very fundamental aspects of baseball that need to be investigated
and that should be more straightforward? Like defense (we know it's
there, and we know it affects teams results - the question is how to
quantify it). Like investigating the variances of existing stats such
as RC/27 to be able to use them better to evaluate players. Another
idea: someone recently came up with a result
about college football, where he showed conclusively that if a team is
14 points behind, and scores a touchdown, it should always go for 2
points. Now, can we come up with similarly strong case for things like
sacrifice bunts (such as showing that they're only good for pitchers
in the NL?) Not just an informed opinion, but a conclusive
demonstration. I think these things are doable, and if I didn't have
to work my butt off here, I'd do some of them myself. (I was hoping that
some of the grad students in my discrete simulation course would choose
baseball for their term project, but there's only one American in the
class!) I think such things could actually affect the way baseball is
played (at the very least, they would affect the simulation leagues),
whereas I think that any conclusions from a clutch study would likely
lead to just another statement of, "Well, we didn't find it." to which
the baseball people or pro-clutch people would say, "Well, we know it's
there, so you didn't look in the right place."

>>- Just examining averages really doesn't tell the story. It is
>> likely that any subpopulation you choose, e.g., people who hit
>> well in the clutch last year, contains a number of chokers, too
>> (if they exist). The chokers will reduce the average of the group,
>> probably a good amount.

>OK: is that subpopulation more or less likely to contain a given number


>of chokers than the subpopulation of people who actually *did* choke
>last year? Again, we're just using asample mean to estimate a
>population mean. And then testing whether this makes sense.

Looking at the population of coins that flip greater than their average
for 100 or 200 flips will lead to including a good number of "magnetic
chokers" or unaffected coins, and omitting a good number of "magnetic
hitters."

Paul Benjamin

Roger Lustig

unread,
Feb 26, 1992, 9:42:55 AM2/26/92
to
In article <1992Feb25.2...@sjuphil.uucp> pben...@sjuphil.uucp (Paul Benjamin) writes:
>(Roger Lustig) writes:
>>(Paul Benjamin) writes:
>>>(Roger Lustig) writes:
>>>>(Paul Benjamin) writes:
>>>>>(Roger Lustig) writes:

>(a lot more stuff about water and sand deleted)

>OK. Suppose I have a loaded coin that turns up heads 26% of the time, so


>it's a .260 "hitter". Now, I suspect that there may be metal inside that
>might be affected by a magnet, so I'd like to see if so, and how much.

I'm not doing an experiment like that. I want to know whether there
is a magnetic effect on coins in general. I know that there are
only a certain number of types of coin.

>If I bring in a magnet and flip the coin 100 times, and it comes up heads
>28 times, what does that tell me about the coin's metal (or the player's

>Now suppose I have 100 coins like this, each with its own average, e.g.,
>some "hit" .250, some .300. Now, some may be not affected by the magnet,
>and some may have their % of heads increased, and some decreased. If
>I flip them in the presence of the magnet 100 times each, what
>information does the set of flips of one coin tell me about the other
>coins?

Nothing yet, but of course this has nothing to do with the experiment I
proposed. All we do after the first 100 flips of the various coins is
stratify them by our observations. We have learned NOTHING about the
coins wrt the magnet. We can *surmise* things about the original coins,
and we can say which coins are *likeliest* to have been affected,
*assuming* that there was a magnetic effect, but we don't know yet that
there *was* an effect.

Now we finish the experiment. We flip the coins another 100 times each,
and see what happened in each stratum. If the observed differences from
mean are repeated at a significant level, i.e., the ones that gained
last time gain again this time, and the ones that lost last time lose
again, etc., THEN we are hard-put to deny the effect of the magnet; for
the expected value of each group, in the ABSENCE of a magnetic effect,
would have been no-difference-from mean.

>Third question:

What was #2?

>If I can increase the number of trials for each coin,
>clearly I can make a reliable guess about the magnetic properties of
>each coin, e.g., flip each 1 million times in the presence of the
>magnet, and get an almost certain classification as a "magnetic hitter",
>a "magnetic choker", or even. But if I can't increase the number of
>trials per coin in this way, what information do I get by getting a
>million such coins, each of which can be flipped only a hundred times?

STOP RIGHT THERE.

NOWHERE have I advocated a single set of trials. NOWHERE. When I
advocated separating out observed clutchers from observed chokers from
observed no-effects, it was in order to observe them A SECOND TIME. If
the three groups evidence no difference the second time, then it is
clear that there was no effect, and pooling individuals doesn't change
that. True, one can't say that a particular individual displayed an
effect, but one can't say that *ANYWAY* unitl one has demonstrated that
there *IS* an effect.

Which is what I'm trying to do (or rather, to refute).

>>Paul, it was real frustration that caused that. I think I was
>>perfectly clear before in explaining my use of sample means as
>>estimators of population means under a certain hypothesis.

And note that you just did it again, by missing the point of separating
the groups out, after I'd explained it any number of times.

>>>Note also that David Tate did a similar thing recently, when he
>>>questioned my partiality. Is there a need for this sort of thing?

>>Only when you don't respond to the things you include in your responses,
>>but march past them.

>No. There is never a need for that sort of thing. And is it not
>as though I have found your responses totally to the point.

Then maybe I need to know what the point is. Your analogy was confusing
(which was the clutch effect, grain size or gravity?) and your
critiques had nothing to do with method, but rather with what, in the
ideal case, one might be able to prove or not to prove in order to shut
the clutch folks up for all time. And, without acknowledging that you
were doing so, you repeatedly addressed issues other than the ones you
were following up to.

>>>... pitchers
>>>are people, too. Any clutch ability possessed by hitters will likely
>>>be possessed by pitchers, too (and maybe even fielders), making the
>>>overall measurement of clutch ability/performance difficult.

>>How likely is it that the sample of "clutch" hitters we test for will
>>have faced a biased sample of pitchers? (If we use plain RISP, this is
>>very unlikely. If we go to LIPS, it's more likely; and I hope to
>>produce some real results on this front sometime soon. Watch this
>>space.)

>I don't know how likely disparate the sets of opposing pitchers are
>for two different hitters in one year. I don't think anyone knows.

What is the likelihood that a hitter will face similarly biased pitchers
year after year? That a *pool* of hitters will face similarly biased
pitchers year after year? Bloody unlikely, imho; and I'm really not
interested in tracking down every data-free surmise that is thrown in my
path.

>I don't think this means the effect can be ignored.

What effect? A moment ago you hypothesized it. Now you're assuming its
existence.

> Certainly there
>is some difference due merely to the fact that each hitter cannot
>face the pitching staff of his own team.

Difference, yes. Effect? Not "certainly" at all.

>>>And there are a couple of questions here which are being confused:

>>>1) Does clutch ability exist?

>>>2) How much effect does performance in clutch situations have on
>>> the outcome of games?

>>>Question #2 is independent of #1. How many games is a +.010 in clutch
>>>on the team level worth? How about +.010 for an individual? Even if
>>>there's no clutch hitting ability (meaning that there is no ability
>>>separate from overall ability) how crucial is the random variation
>>>in clutch performance to a team's offense?

>>I don't recall confusing the questions. We know approximately what RISP
>>hitting does for a team; is it that problematic to transfer this
>>information to evaluations of individual player value?

>Yes. As David Grabiner has pointed out to me (I'm really just a puppet
>on his string) definitions of clutch (even RISP, to some extent) might
>just be measuring things such as how well hitters hit fastballs, or
>how quickly a hitter adjusts to a new pitcher, or perhaps other factors
>we haven't thought of.

SO? Those are hypotheses about the SOURCE of an effect. We don't care
about the sources until we know that there IS an effect. And besides,
what does it matter? If there are more fastballs in RISP situations,
then clutch hitting *is* fastball hitting. And if it is fastball
hitting, then we will discover that fastball hitters show a clutch
effect.

But that would be a subset of showing that ANY group of hitters displays
a clutch effect.

> The set of hitters who do better in RISP or LIPS
>situations may just be the better fastball hitters or faster adjusters.
>Now, such things are relevant for a manager trying to make decisions
>about pinch hitters, etc., but I don't really think it is too relevant
>to the supposed clutch ability.

>I really think that the supposed clutch "ability" is one of the most
>difficult aspects of baseball to pin down.

>>>>>>OK so far? We're better off looking for clutch hitters among those who
>>>>>>actually hit in the clutch last year.

>Like looking for the coins that are positively affected by the magnet
>by looking only among those that flipped a higher than expected
>percentage of heads in their last 100 or 200 flips?

STOP RIGHT THERE. *Nowhere* did I advocate looking ONLY at those coins.
I said we should ISOLATE those coins, and compare their performance to
that of the rest of the population when we repeat the experiment. In
the absence of a difference between groups the second time, we are led
to reject the idea that there are magnet-affected coins.

On the other hand, if you were asked to point to the coin most likely
affected by the magnet, which one would you pick?

Right, the one that had shown the greatest deviation from its previously
experienced mean.

>>>- Definitions of clutch that encompass a sufficient number of data
>>> points may be too broad to capture the notion of "clutch", as in
>>> hitting when it counts most.

>>"Most" is an undefined term. It could mean "In the last game of the WS,
>>when the score is tied in th e 9th inning or thereafter."

>>What's wrong with "when it counts *more*"? RISP is clearly and
>>obviously and demonstrably such a situation.

>I don't think anything is wrong with your definition. Nothing at all.
>But in trying to investigate clutch, it also pays to look at RISP*LIPS.

Pays? I wish.

>After all, what if a player does better in RISP than overall, but
>actually does worse in RISP*LIPS? Is he a clutch hitter?

Why are you asking *me* this? 8-)

>>This is one reason why I stick with RISP. I'm not going to fool around
>>with people who want to redefine "clutch" so that all the work that, up
>>until now, has made them look like fools must be thrown out, just
>>because they feel like it. If they have something to prove, let THEM
>>prove it. I'm not getting paid for this, you know, and neither is
>>Sherri.

>Believe me, I know. Professors aren't exactly paid well, and I'd
>be much better off doing something profitable. But back to the point:
>I think you've stated this the right way: the obvious definitions of
>clutch clearly don't show anything, e.g., someone who hits much better
>in the ninth inning of close games. My point is that any further
>investigation of clutch is 1) not going to be able to classify
>individuals, and 2) lead at best to a statement like "using this
>definition of clutch, we find that maybe this group of hitters hits
>a certain amount better."

No, AT WORST it would lead to that. I'm NOT expecting to reject the
null hypothesis, after all.

On the other hand, if the effect IS supported by the data, if the group
that DOES do well one time does well the next, then it is not
unreasonable to consider the effect as being part of any observed
deviation, whether that of an individual or that of a group. Unless
there is a clear reason NOT to attribute a difference to clutch (e.g.,
LIPS-R/L stuff), and a clutch effect has been shown to exist, then one
ought to go with observations that are consistent with the demonstrable
effect.

And RISP is nice in that it removes many of the confounds. Also, if you
don't see the effect there, you have a powerful tool in any argument
about clutch, because RISP is where most arguments start.

(Not that I expect there to BE an effect...)

>Now, pro-clutch people will object to your conclusions because 1) they
>will point to numerous supposed cases of clutch hitters/chokers, and
>2) they can always come up with different definitions of clutch (and
>call them scientific hypotheses yet!). So, why spend the time working
>on something like this, where the conclusions are likely to be
>questionable, due to all sorts of effects like ratio of fastballs or
>speed of adjustment to pitchers (and who knows what else?) when there
>are very fundamental aspects of baseball that need to be investigated
>and that should be more straightforward?

So it's my fault for doing the wrong kind of study, eh?

a) No study of ANYTHING will convince everyone.

b) Who cares what ad-hoc definitions silly people can come up with?

Look, Paul, if you don't like that *fact* that I'm doing the study,
fine. Go do a better study. But kindly don't a) complain about the
method based on all kinds of irrelevancies, and then b) tell me I
oughtn't to bother in the first place.

>Like defense (we know it's
>there, and we know it affects teams results - the question is how to
>quantify it).

Do it.

> Like investigating the variances of existing stats such
>as RC/27 to be able to use them better to evaluate players.

*I*'m doing that. Got any ways of doing *that* one without some
pooling?

> Another idea: someone recently came up with a result
>about college football, where he showed conclusively that if a team is
>14 points behind, and scores a touchdown, it should always go for 2
>points. Now, can we come up with similarly strong case for things like
>sacrifice bunts (such as showing that they're only good for pitchers
>in the NL?) Not just an informed opinion, but a conclusive
>demonstration.

NOW you're talking about an obviously situational problem. Can YOU come
up with such a thing?

> I think these things are doable, and if I didn't have
>to work my butt off here, I'd do some of them myself.

But there's plenty of time to complain about other people doing
methodologically sound studies you don't happen to approve of, is that
it? The vast majority of your criticism doesn't even address the study
I'm talking about.

>(I was hoping that
>some of the grad students in my discrete simulation course would choose
>baseball for their term project, but there's only one American in the
>class!) I think such things could actually affect the way baseball is
>played (at the very least, they would affect the simulation leagues),
>whereas I think that any conclusions from a clutch study would likely
>lead to just another statement of, "Well, we didn't find it." to which
>the baseball people or pro-clutch people would say, "Well, we know it's
>there, so you didn't look in the right place."

So you're saying that baseball people are MORE small-minded on this
matter than on others. Don't you suppose that a sac bunt study would
meet with the same result -- "you didn't tak into account the fact of
situation X."

>>>- Just examining averages really doesn't tell the story. It is
>>> likely that any subpopulation you choose, e.g., people who hit
>>> well in the clutch last year, contains a number of chokers, too
>>> (if they exist). The chokers will reduce the average of the group,
>>> probably a good amount.

>>OK: is that subpopulation more or less likely to contain a given number
>>of chokers than the subpopulation of people who actually *did* choke
>>last year? Again, we're just using asample mean to estimate a
>>population mean. And then testing whether this makes sense.

>Looking at the population of coins that flip greater than their average
>for 100 or 200 flips will lead to including a good number of "magnetic
>chokers" or unaffected coins, and omitting a good number of "magnetic
>hitters."

Witness will answer the question, please.

For one thing, "a good number" is undefined. (What, other than 666, is
a bad number?) For another, it dodges the question -- and its answer --
regarding sample means. If the hypothesized effect a) exists and b) is
measurable, then ANY pool of results that exhibits a difference from the
overall mean is LIKELIER to have been affected by the effect than a
group that did NOT show a difference, or that showed a difference in the
opposite direction.

COINS: Suppose I suspect SOME coins in a set of being other than 50-50.
Furthermore, I suspect them of being biased towards heads.
I toss each one 100 times. I isolate those that show more than 60
heads. Now, IF there are weighted coins in that set, then p(heads) for
the group will be > .5. ALSO, if there are weighted coins ANYWHERE,
it is likelier that the 60+ coins will be those than it is for any of
the other coins. Sure, "a good number" (something I didn't expect from
you, Paul; you sound like one of the clutch proponents who always has
one more anecdote) will not be, but a *better* number on the other side
will almost certainly be pulling *that* sample down.

Is that so hard to understand? *IF* there is bias, then it is likeliest
to be found in those items that have displayed behavior consistent with
bias; and LEAST likely to be found in those items whose sample has been
biased the other way. Separating the two groups and retesting them is a
good way to determine whether the observed bias was anything but random;
and, yes, pooling results does reduce variance while not distracting
from the attempt to test for the existence of the hypothesized effect.

Roger

David Grabiner

unread,
Feb 26, 1992, 1:22:10 PM2/26/92
to
In article <1992Feb25.2...@sjuphil.uucp>, Paul Benjamin writes:
> (Roger Lustig) writes:
>>(Paul Benjamin) writes:
>>>(Roger Lustig) writes:
>>>>(Paul Benjamin) writes:
>>>>>(Roger Lustig) writes:

> OK. Suppose I have a loaded coin that turns up heads 26% of the time, so
> it's a .260 "hitter". Now, I suspect that there may be metal inside that
> might be affected by a magnet, so I'd like to see if so, and how much.
> If I bring in a magnet and flip the coin 100 times, and it comes up heads
> 28 times, what does that tell me about the coin's metal (or the player's
> mettle ;) )?

> Now suppose I have 100 coins like this, each with its own average, e.g.,
> some "hit" .250, some .300. Now, some may be not affected by the magnet,
> and some may have their % of heads increased, and some decreased. If
> I flip them in the presence of the magnet 100 times each, what
> information does the set of flips of one coin tell me about the other
> coins?
> Now suppose I have 100 coins like this, each with its own average, e.g.,
> some "hit" .250, some .300. Now, some may be not affected by the magnet,
> and some may have their % of heads increased, and some decreased. If
> I flip them in the presence of the magnet 100 times each, what
> information does the set of flips of one coin tell me about the other
> coins?

I never claimed that I could do this. Benito Santiago is the best
clutch hitter in my study, and his four-year data (1987-1990) is
consistent with his being an average clutch hitter, or a true terror.

But if you measure the change in bias of every coin in the past, and
then repeat the study, you can draw some conclusions.

Suppose that you find that 75 of the 100 coins had the same bias in
both measurements. This is conclusive evidence that some of the coins
are magnetically biased.

Or suppose that you find that 50 of the 100 coins had the same bias in
both measurements. This proves that the magnetic bias is not an
important factor; there could be only two magnetically loaded coins, or
all 100 coins could be magnetically loaded with an effect of .005.

The difference between clutch hitting and the coin analogy is that there
is a reasonable assumption that clutch hitting is close to normally
distributed. My data is consistent with Benito Santiago being 50 points
better in the clutch, Greg Brock being 50 points worse, and nobody else
having any change in ability, but with people, that's unlikely.

> I don't know how likely disparate the sets of opposing pitchers are
> for two different hitters in one year. I don't think anyone knows.
> I don't think this means the effect can be ignored. Certainly there
> is some difference due merely to the fact that each hitter cannot
> face the pitching staff of his own team.

But this won't affect clutch data that much; I doubt that the entire Red
Sox staff has a large enough clutch ability to skew the clutch data for
Red Sox batters. (It would have to be a *huge* effect; a non-Red Sox
batter gets only 1/13 of his clutch at-bats a year against Red Sox
pitching, so the Red Sox staff would have have the ability to be .065
worse in the clutch to skew Red Sox batters' data by even five points.

> Yes. As David Grabiner has pointed out to me (I'm really just a puppet
> on his string) definitions of clutch (even RISP, to some extent) might
> just be measuring things such as how well hitters hit fastballs, or
> how quickly a hitter adjusts to a new pitcher, or perhaps other factors
> we haven't thought of.

RISP will have different; disciplined hitters who get walked
whenever they are ahead in the count with a base open, such as Mickey
Tettleton, might be expected to do poorly with RISP. (But Mark McGwire
is the same type of hitter, and he is the *best* RISP hitter in my
study.) Also, fly-ball hitters tend to hit a lot of sacrifice flies
with runners on third, and these get discarded from your batting average.

> The set of hitters who do better in RISP or LIPS
> situations may just be the better fastball hitters or faster adjusters.
> Now, such things are relevant for a manager trying to make decisions
> about pinch hitters, etc., but I don't really think it is too relevant
> to the supposed clutch ability.

However, all of these effects would cause clutch ability to look
*larger* than it actually is. Thus any estimate of clutch ability from
a study of my type is an overestimate of its actual importance.

> I really think that the supposed clutch "ability" is one of the most
> difficult aspects of baseball to pin down.

I agree with this, which is why I'm not trying to pin it down exactly,
just show that its importance is quite limited.

>>>>>>OK so far? We're better off looking for clutch hitters among those who
>>>>>>actually hit in the clutch last year.

> Like looking for the coins that are positively affected by the magnet
> by looking only among those that flipped a higher than expected
> percentage of heads in their last 100 or 200 flips?

Yes, although my sample size is larger and more reliable because I used
five years, 400 AB per player for LIPS and 750 for RISP. This won't be
a perfect sample, but if there is clutch ability and you look at enough
players (or if there are loaded coins and you look at enough coins), the
sample will have more clutch ability (or magnetic loading) than average.

> I don't think anything is wrong with your definition. Nothing at all.
> But in trying to investigate clutch, it also pays to look at RISP*LIPS.
> After all, what if a player does better in RISP than overall, but
> actually does worse in RISP*LIPS? Is he a clutch hitter?

If you believe that there is an ability to bear down harder in the
clutch, it seems unlikely that a player with this ability would bear
down harder with RISP in the early innings and not in the late innings.

> My point is that any further
> investigation of clutch is 1) not going to be able to classify
> individuals, and 2) lead at best to a statement like "using this
> definition of clutch, we find that maybe this group of hitters hits
> a certain amount better."

When I get 1991 LIPS data, I might be able to remove the "maybe."

> whereas I think that any conclusions from a clutch study would likely
> lead to just another statement of, "Well, we didn't find it." to which
> the baseball people or pro-clutch people would say, "Well, we know it's
> there, so you didn't look in the right place."

This is a problem. Now, if pro-clutch people could tell us where to
look (as Elias did), maybe we could look there and find something.

> Looking at the population of coins that flip greater than their average
> for 100 or 200 flips will lead to including a good number of "magnetic
> chokers" or unaffected coins, and omitting a good number of "magnetic
> hitters."

I'm not claiming that the study is perfect; note that my own estimates
suggest that only half of clutch ability is being found by the study.

But the question is not whether the sample is perfect, but whether it is
better than average. If the sample of "clutch" coins, or players, has a
clutch ability which is above average, a large enough study will show
that clutch ability exists.

David M Tate

unread,
Feb 24, 1992, 6:03:46 PM2/24/92
to
In article <1992Feb24....@sjuphil.uucp> pben...@sjuphil.uucp (Paul Benjamin) writes:

Roger Lustig writes:

>>Actually, it's also lacking in that there's no "choke" force pulling i
>>nthe other direction. The "clutch" hypothesis says that there are
>>clutch hitters, choke hitters, and in-between hitters.
>
>Good point. I could use another analogy, if you'd like.

Yes please. :^) I don't think the Brownian motion analogy is contributing
to anyone's intuitions here.

>>Paul, do you even *read* what you're responding to?
>
>Yes, Roger. Perhaps it is better to refrain from transforming a
>technical discussion into a personal one. Having read the recent
>slew of postings between stat and anti-stat people, in which the
>anti-stat people accused the stat people of personal attacks, I
>am disappointed to see that you have provided an example of this.
>Note also that David Tate did a similar thing recently, when he
>questioned my partiality.

Clarification:

Bob Gaj posted a couple of times, to the effect that MLEs are basically as
useful for predicting major league future performance as major league past
performance is. Paul replied to these postings in a way which seemed to me
to consistently interpret Bob's comments in ways which made them obviously
false. I commented that I thought an "impartial observer" would not read
Bob's comments that way. Paul got quite upset that I had questioned his
"impartiality" (whatever that is in this context), thinking I had intended
some sort of personal attack (which I hadn't). I tried to clear this up in
a subsequent posting, but apparently it didn't take. (And I turned out to
be right about what Bob had meant in his posting.)

>Is there a need for this sort of thing?

You mean dredging up the embers of previous misunderstandings in unrelated
threads? I certainly hope not. But it's a little late now...

>Sorry, but this still makes little sense to me. After all, pitchers
>are people, too. Any clutch ability possessed by hitters will likely
>be possessed by pitchers, too (and maybe even fielders), making the
>overall measurement of clutch ability/performance difficult.

This is an interesting point. For one thing, if there is a significant
clutch pitching *and* clutch hitting effect, shouldn't it show up as a
higher population variance in "key" situation performance? In addition to
getting clutch hitters and chokers, you now get chokers who faced more
clutch hitters than average, clutch hitters who faced Jim Acker a lot, and
so on.

>And there are a couple of questions here which are being confused:
>
>1) Does clutch ability exist?
>
>2) How much effect does performance in clutch situations have on
> the outcome of games?

I don't think these questions are being confused; most of the people working
on this stuff (David, Roger, Sherri, etc.) are interested in how much weight
to give "clutch ability" in doing player evaluations and projections. For
that purpose, it doesn't matter whether you conclude that there are no clutch
hitters, or that being a clutch hitter isn't really worth anything. The
effect on your evaluation method is the same: "disregard clutchness". Now,
we are certainly all interested in questions 1 and 2 above, but answering them
both may not be necessary for answering the evaluation problem.

>Question #2 is independent of #1. How many games is a +.010 in clutch
>on the team level worth? How about +.010 for an individual? Even if
>there's no clutch hitting ability (meaning that there is no ability
>separate from overall ability) how crucial is the random variation
>in clutch performance to a team's offense?

Great questions all. In fact, answering the last one (under a hypothesis of
"no clutch hitting anywhere") would go a long way toward telling us how good
models like RC and LW are, by seeing if the observed errors in those measures
match the predicted error from random fluctuations in clutch batting at the
team level.

>>The point is: if the hypothesis is true, then it SHOULD
>>increase by some noticeable amount, because the aggregate clutch
>>differential observed will be *in part* attributable to the
>>hypothesized effect.
>
>No. It's extremely relevant. The amount of increase need not be
>noticeable.

At which point, why do we care? If Joe Batter hits 50 points higher in
RISP*LIPS than in "other" situations, over a 15 year career, that's about
300 at bats (out of 8000-10000), raising his lifetime batting average by
2 points or so with 15 "clutch" hits. Fifteen extra hits in a whole career
ain't much, and a 50 point boost is a *very* generous allowance. One extra
clutch hit per season, at a cost of slightly more than one random hit per
season (which is what a player with the same overall batting average and no
clutch effect would get). Even if half of all RISP*LIPS hits are game-
winners, that's less than one extra win every other year.

(Admittedly, this is anything but a rigorous calculation, but I've been
consistently generous in estimating the magnitude and marginal effect of
a great clutch hitter's contributions.)

I think the implicit question behind much of the "clutch" discussion has been
along the lines of "Is there a clutch effect that makes a difference in which
teams win and lose?". The existence of a true clutch effect which is small
compared to the inherent "noise" of baseball would be intellectually
intriguing, but of little value in decision-making, which is the root of the
sabermetric motivation.

Or have I again misunderstood where you're coming from?

--
David M. Tate | "Our flabbers were gasted."
dt...@unix.cis.pitt.edu | --Martin Keates
I was of three minds / Like a tree |
in which there are three blackbirds | <==Wallace Stevens, "13 Ways..."

Rob Strom

unread,
Feb 26, 1992, 9:35:34 PM2/26/92
to
In article <202...@unix.cis.pitt.edu>, dt...@unix.cis.pitt.edu (David M Tate) writes:

|> I think the implicit question behind much of the "clutch" discussion has been
|> along the lines of "Is there a clutch effect that makes a difference in which
|> teams win and lose?". The existence of a true clutch effect which is small
|> compared to the inherent "noise" of baseball would be intellectually
|> intriguing, but of little value in decision-making, which is the root of the
|> sabermetric motivation.

I don't know whether there's a clutch effect or not, but
I doubt that clutch performance, if it exists,
correlates positively with winning.
Given the choice between someone who comes through primarily
in late innings and someone who comes through primarily
in the early innings, I think that I as a manager would
rather have the person who comes through in early innings,
if such a differential actually exists.

If there is such a thing as clutch differential, I think
it would be of interest to the fan, not the manager.
The fan wants excitement, drama, tension.
I would imagine that the manager would prefer
9-0 laughers.
--
Rob Strom, st...@watson.ibm.com, (914) 784-7641
IBM Research, 30 Saw Mill River Road, P.O. Box 704, Yorktown Heights, NY 10598

David H. Thornley

unread,
Feb 27, 1992, 4:39:22 PM2/27/92
to
In article <202...@unix.cis.pitt.edu> dt...@unix.cis.pitt.edu (David M Tate) writes:
>
>I think the implicit question behind much of the "clutch" discussion has been
>along the lines of "Is there a clutch effect that makes a difference in which
>teams win and lose?". The existence of a true clutch effect which is small
>compared to the inherent "noise" of baseball would be intellectually
>intriguing, but of little value in decision-making, which is the root of the
>sabermetric motivation.
>
Let's consider clutch from a different viewpoint for a moment. If "clutch"
is the ability to wring the most winning out of the smallest input, we can
look for it in bunching hits and walks together to get more runs than
expected, grouping runs together to get more wins than you would expect,
or grouping wins together to get more titles than you would expect.

I haven't looked at the RC (or other) predictions for the 1987 Twins recently,
so I don't know whether they were clutch in scoring runs. They did of
course win 85 games during the regular season despite being outscored,
which is a clutch achievement, and they did win a division and later a
World Series with only 85 wins, which is also (from the above discussion)
a clutch achievement.

Amazing how little respect a real clutch team gets.

David Thornley
"Willing to take any WS victory, however besmirched."

Paul Benjamin

unread,
Feb 27, 1992, 4:29:37 PM2/27/92
to
(Roger Lustig) writes:
>(Paul Benjamin) writes:
>>(Roger Lustig) writes:
>>>(Paul Benjamin) writes:
>>>>(Roger Lustig) writes:
>>>>>(Paul Benjamin) writes:

>>OK. Suppose I have a loaded coin that turns up heads 26% of the time, so
>>it's a .260 "hitter". Now, I suspect that there may be metal inside that
>>might be affected by a magnet, so I'd like to see if so, and how much.

...


>>If I can increase the number of trials for each coin,
>>clearly I can make a reliable guess about the magnetic properties of
>>each coin, e.g., flip each 1 million times in the presence of the
>>magnet, and get an almost certain classification as a "magnetic hitter",
>>a "magnetic choker", or even. But if I can't increase the number of
>>trials per coin in this way, what information do I get by getting a
>>million such coins, each of which can be flipped only a hundred times?

>STOP RIGHT THERE.

>NOWHERE have I advocated a single set of trials. NOWHERE. When I
>advocated separating out observed clutchers from observed chokers from
>observed no-effects, it was in order to observe them A SECOND TIME. If
>the three groups evidence no difference the second time, then it is
>clear that there was no effect, and pooling individuals doesn't change
>that. True, one can't say that a particular individual displayed an
>effect, but one can't say that *ANYWAY* unitl one has demonstrated that
>there *IS* an effect.

Yes, Roger, I understand. My point is that it is not necessarily clear
at all that there is no effect in the case you describe. If you put
the second set of trials together with the first ones and there is still
an effect, then it is wrong to dismiss the existence of the effect
based just on the change from the first set of trials to the second.
This is the original point that I made, and I feel that you have just
not addressed it. This is what distinguishes your approach from that
of David Grabiner, who is using averages over several years, rather
than changes from year to year.

You are characterizing my replies as "marching right past" your
points, yet I feel that you have failed to address the central and
original point I made: year-to-year changes in clutch just aren't
the way to get at clutch. A longer time average is needed. David
Grabiner has done (and is doing) this, and is showing some overall
effect, albeit not a large one.

>Then maybe I need to know what the point is. Your analogy was confusing
>(which was the clutch effect, grain size or gravity?) and your
>critiques had nothing to do with method, but rather with what, in the
>ideal case, one might be able to prove or not to prove in order to shut
>the clutch folks up for all time. And, without acknowledging that you
>were doing so, you repeatedly addressed issues other than the ones you
>were following up to.

I disagree. I felt that I was addressing the issues in your posts, and
it is entirely unnecessary to transform the discussion into one about
the people involved. Why do net discussions have to turn into personal
arguments and then into flamewars? (except when they turn into mutual
fawning.)

>>>>... pitchers


>>>>are people, too. Any clutch ability possessed by hitters will likely
>>>>be possessed by pitchers, too (and maybe even fielders), making the
>>>>overall measurement of clutch ability/performance difficult.

>>>How likely is it that the sample of "clutch" hitters we test for will


>>>have faced a biased sample of pitchers? (If we use plain RISP, this is
>>>very unlikely. If we go to LIPS, it's more likely; and I hope to
>>>produce some real results on this front sometime soon. Watch this
>>>space.)

>>I don't know how likely disparate the sets of opposing pitchers are


>>for two different hitters in one year. I don't think anyone knows.

>What is the likelihood that a hitter will face similarly biased pitchers


>year after year? That a *pool* of hitters will face similarly biased
>pitchers year after year? Bloody unlikely, imho; and I'm really not
>interested in tracking down every data-free surmise that is thrown in my
>path.

Exactly the point. It *is* unlikely, and that may be one major reason
that a supposed clutch hitter one year might appear to be a clutch
choker the next, etc., e.g., a hitter may face a disproportionate rate of
righties one year in the clutch and the next year face far more lefties,
or face more fastballers one year and more breaking ball hitters the next.
The variation need not be large to produce differences of reasonable
size. For example, consider a RH hitter who hits .240 against RHP, and
.320 against LHP. Suppose the league has 60% RHP and 40% LHP. Now, one
year he gets 70% RHP so he hits .264, and the next he gets 50% RHP
(maybe traded to a team with a LH lineup) and hits .280. 16 points
just from the lefty-righty difference. Then figure in other things,
like mix of pitches. For some players, these may cancel, but for others,
they will add.

You know, if there's one obvious thing about these characteristics of
groups of ballplayers, it's that they don't hold constant from year to
year. Things like the relative strengths of the leagues, the percentage
of fastballers or lefties, etc. are bound to show a natural variation over
time (some, like the relative strengths of leagues, vary slowly). If
performance in so-called clutch situations is actually just a measurement
of a more complex interactions of things that have nothing to do with
"getting up" for the big atbats, then measurements of this performance
will vary in unpredictable ways. Then, as you are looking for "the ones


that gained last time gain again this time, and the ones that lost last

time lose", then you are extremely unlikely to find any consistent clutch
hitters. And that is exactly your result, yes? No surprise. Looking at the
clutch data per year is just not convincing.

>>Certainly there
>>is some difference due merely to the fact that each hitter cannot
>>face the pitching staff of his own team.

>Difference, yes. Effect? Not "certainly" at all.

Certainly a hitter for the team with the worst pitching staff faces
on average better pitchers than a hitter for the team with the best
pitching staff, especially if they are in the same division in the NL
(in which case the difference in opponents' ERA can be .2 runs).

>> ... definitions of clutch (even RISP, to some extent) might


>>just be measuring things such as how well hitters hit fastballs, or
>>how quickly a hitter adjusts to a new pitcher, or perhaps other factors
>>we haven't thought of.

>SO? Those are hypotheses about the SOURCE of an effect. We don't care


>about the sources until we know that there IS an effect. And besides,
>what does it matter? If there are more fastballs in RISP situations,
>then clutch hitting *is* fastball hitting. And if it is fastball
>hitting, then we will discover that fastball hitters show a clutch
>effect.

1) David has already shown the size of the overall effect. His numbers
have been posted recently, but were also posted a while back. So there
is an effect. The remaining question is not whether some hitters
perform differently in RISP or LIPS, but why? Is there an ability?
Is it merely effects like hitting fastballs?

2) I agree that as far as the manager's decision is concerned, clutch
hitting may well be just something like fastball hitting. But that
is definitely *not* what the clutch people have in mind.

3) Do you (or anyone else) have data on who the fastball hitters are? Who
adjusts faster to new pitchers, etc.?

>>>>>>>OK so far? We're better off looking for clutch hitters among those who
>>>>>>>actually hit in the clutch last year.

>>Like looking for the coins that are positively affected by the magnet


>>by looking only among those that flipped a higher than expected
>>percentage of heads in their last 100 or 200 flips?

>STOP RIGHT THERE. *Nowhere* did I advocate looking ONLY at those coins.


>I said we should ISOLATE those coins, and compare their performance to
>that of the rest of the population when we repeat the experiment. In
>the absence of a difference between groups the second time, we are led
>to reject the idea that there are magnet-affected coins.

When we *should* be led to reject just this subpopulation as consisting
mostly of coins that are positively affected by the magnet.

>Look, Paul, if you don't like that *fact* that I'm doing the study,
>fine. Go do a better study. But kindly don't a) complain about the
>method based on all kinds of irrelevancies, and then b) tell me I
>oughtn't to bother in the first place.

Look, Roger, if you don't like the fact that I'm criticizing your
study, fine. Hit the 'n' key. But kindly don't a) complain that I'm
using the net to criticize your methods, because you have often been
known to post strong criticisms of other people's posts, and then
b) tell me that I oughtn't to bother posting in the first place.
What do you think the net is? If you don't want criticism, then
either don't post or don't respond.

>> Another idea: someone recently came up with a result
>>about college football, where he showed conclusively that if a team is
>>14 points behind, and scores a touchdown, it should always go for 2
>>points. Now, can we come up with similarly strong case for things like
>>sacrifice bunts (such as showing that they're only good for pitchers
>>in the NL?) Not just an informed opinion, but a conclusive
>>demonstration.

>NOW you're talking about an obviously situational problem. Can YOU come
>up with such a thing?

I'd be more than willing to help work on such a thing.

>>I think these things are doable, and if I didn't have
>>to work my butt off here, I'd do some of them myself.

>But there's plenty of time to complain about other people doing
>methodologically sound studies you don't happen to approve of, is that
>it? The vast majority of your criticism doesn't even address the study
>I'm talking about.

No, that's not it. I disagree that your methodology is sound.
Clearly, you haven't grasped that, so there's no point in continuing.
I think you'll just come up with a measurement of perhaps many different
effects interacting. The measurement will be small, and then you'll
make some pronouncement about clutch hitting ability, when that will
not have been isolated at all.

>>(I was hoping that
>>some of the grad students in my discrete simulation course would choose
>>baseball for their term project, but there's only one American in the
>>class!) I think such things could actually affect the way baseball is
>>played (at the very least, they would affect the simulation leagues),

>>whereas I think that any conclusions from a clutch study would likely
>>lead to just another statement of, "Well, we didn't find it." to which
>>the baseball people or pro-clutch people would say, "Well, we know it's
>>there, so you didn't look in the right place."

>So you're saying that baseball people are MORE small-minded on this


>matter than on others. Don't you suppose that a sac bunt study would
>meet with the same result -- "you didn't tak into account the fact of
>situation X."

I think that a well supported conclusion would catch someone's eye.
Certainly there would be those who would ignore it, but all you'd need
is one manager who used it.

Paul Benjamin

P.S. Note that I've cut out things, as you do, to reduce the
exponential explosion of these postings. If there's something
I omitted that you think is important, put it back in.

Sherri Nichols

unread,
Feb 25, 1992, 11:38:05 AM2/25/92
to
In article <GRABINER.92...@zariski.harvard.edu> grab...@math.harvard.edu (David Grabiner) writes:
>Now, let's put all of this together. Given the estimate that the top
>clutch hitters have an ability to hit 26 points better in LIPS than
>overall, that's worth about 4 points of season batting average. And
>given that the *observed* clutch hitters have an ability to hit 12
>points better in LIPS than overall, that's worth just 2 points.
>
>So it is legitimate to say, "Benito Santiago hit .267 last year, but
>he's been one of the top clutch hitters in baseball, which makes
>him as valuable as a .269 hitter." The data backs you up here.
>
>But replace .269 by .271 and you are depending on Santiago's clutch data
>being reliable, rather than due to luck. Replace it by .272 and you
>have gone beyond any prediction which can be justified by the statistics
>alone. Replace it by .276 and you have made a statement which is
>inconsistent with the data on clutch hitting.

What about the problem that the variance in batting averages from year to
year is greater than the clutch effect you've found here?

Sherri Nichols
snic...@adobe.com

Paul Benjamin

unread,
Feb 27, 1992, 12:31:06 PM2/27/92
to
In article <GRABINER.92...@zariski.harvard.edu> grab...@math.harvard.edu (David Grabiner) writes:

>In article <1992Feb24....@sjuphil.uucp>, Paul Benjamin writes:

>> -Paul Benjamin
>>>-Roger Lustig
>>>>-Paul Benjamin
>>>>>-Roger Lustig

>>>Irrelevant. The point is: if the hypothesis is true, then it SHOULD


>>>increase by some noticeable amount, because the aggregate clutch
>>>differential observed will be *in part* attributable to the
>>>hypothesized effect.

>> No. It's extremely relevant. The amount of increase need not be
>> noticeable.

>This is correct in theory, but I don't care whether clutch hitting
>exists if the ability is insignificant.

What I meant was that the increase in performance in these situations
need not be a good measure of clutch ability (if it exists). For
example, suppose clutch performance mostly measures ability to hit
fastballs. Now, good breaking ball hitters would then generally do
worse in the clutch. But a clutch breaking ball hitter might be able
to do about as well as his overall average. His increase in clutch
performance will not be noticeable from his averages. And a choking
fastball hitter might do only as well as his overall average.

>How well a study of players whose previous clutch data was good will
>find players with good clutch ability depends on how much of clutch
>performance is due to ability.

Exactly.

>If you look at five years of data (which is about what I did), then the
>standard deviation of luck in clutch performance is 24 points of batting
>average. If the standard deviation of ability in clutch performance is
>24 points, then the correlation between ability and performance is .5.

(much nice analysis deleted)

>A reasonable estimate for the importance of clutch ability is the value
>which gives a prediction closest to the actual value. That comes out to
>a standard deviation of 13 points. This means that (assuming a normal
>distribution) 1/40 of all major league hitters, the true terrors, have
>an ability to hit 26 points better in LIPS than otherwise; that's two
>hits a year.

It's not many hits, due to the relatively small number of instances
per year, but 26 points is a lot. A manager would definitely bring the
.300 hitter to the plate instead of the .274 hitter. And of course he'll
want to stay away from the choker, who will hit only .248 in LIPS. Now
if there were only a way to identify the clutch individuals.

>And there is no way to distinguish the ability to bear down in the
>clutch from any other ability which causes players to hit better in the
>clutch; thus the measurements of clutch ability are actually upper
>bounds.

But there is also no way to distinguish from any other factor which
might cause a player to hit worse in the clutch, e.g., he's a breaking
ball hitter, so that the clutch ability, if any, may be actually larger
than his clutch performance.

>> Question #2 is independent of #1. How many games is a +.010 in clutch
>> on the team level worth? How about +.010 for an individual? Even if
>> there's no clutch hitting ability (meaning that there is no ability
>> separate from overall ability) how crucial is the random variation
>> in clutch performance to a team's offense?

>I can also answer this.

Great.

>A study in The Hidden Game of Baseball showed
>that relief aces' pitching had twice the effect per inning on the
>probability of winning as starters' pitching did. That is, the average
>hit given up by a relief ace costs the team twice as much as the average
>hit given up by a starter.

>Since 1/7 of all at-bats are in LIPS and each LIPS hit is worth two
>non-LIPS hits, 6 points of batting average gained in LIPS is worth an
>point on your season batting average with no LIPS advantage, for either
>a player or a team.

Well, in a straight average, 6 points of LIPS avg balances 1 point
of nonLIPS avg, each giving 6/7 points on the overall average. If each
LIPS hit is then worth 2 nonLIPS ones, then 6 points of LIPS is worth
2 points of nonLIPS, giving 12/7 on the overall avg. Or did I
misunderstand you?

>I don't have comparable data for RISP or RISP in LIPS, but I would guess
>that the factors are about 2 with RISP (a single is more than twice as
>valuable, but extra bases aren't worth much more) and thus 4 with RISP
>in LIPS. Since 1/4 of all at-bats are with RISP and 1/30 are with RISP
>in LIPS, that means that 3 points of RISP batting advantage is worth one
>point on your season batting average, and 10 points of RISP with LIPS
>average is worth one point.

3 points of RISP balances 1 point of nonRISP, giving 3/4 point overall.
If each RISP hit is worh 2 nonRISP hits, then 3 points of RISP is worth
2 points of nonRISP, giving 3/2 overall. And 29 points of RISP-LIPS is
worth 4 points of non, giving 116/30 ~ 4 points overall (10 points is
worth 4/3). OK.

>Now, let's put all of this together. Given the estimate that the top
>clutch hitters have an ability to hit 26 points better in LIPS than
>overall, that's worth about 4 points of season batting average. And
>given that the *observed* clutch hitters have an ability to hit 12
>points better in LIPS than overall, that's worth just 2 points.

Using 12/7, we get 26 points LIPS = about 7.5 points overall.

>> - Given that clutch performance (if it exists) is almost certainly
>> shown by pitchers, too, it is likely that any hitter's clutch
>> performance will vary considerably from one year to the next, as
>> the population of pitchers he faces changes.

>This is already dealt with in the computation of luck; few hitters face
>the same pitcher even twice in a game in a clutch situation, so the mix
>of pitchers is pretty close to independent.

Well, a hitter cannot face the pitchers on his own team, so, e.g.,
the Cincinnati hitters have not had to face the Nasty Boys, but
everyone else has. Is it possible that this might alter the "level
playing field"?

Paul Benjamin

Paul Benjamin

unread,
Feb 28, 1992, 3:04:04 PM2/28/92
to
In article <GRABINER.92...@zariski.harvard.edu> grab...@math.harvard.edu (David Grabiner) writes:
>In article <1992Feb25.2...@sjuphil.uucp>, Paul Benjamin writes:
>> (Roger Lustig) writes:
>>>(Paul Benjamin) writes:
>>>>(Roger Lustig) writes:
>>>>>(Paul Benjamin) writes:
>>>>>>(Roger Lustig) writes:

>> OK. Suppose I have a loaded coin that turns up heads 26% of the time, so
>> it's a .260 "hitter". Now, I suspect that there may be metal inside that
>> might be affected by a magnet, so I'd like to see if so, and how much.

>> If I bring in a magnet and flip the coin 100 times, and it comes up heads
>> 28 times, what does that tell me about the coin's metal (or the player's
>> mettle ;) )?

>> Now suppose I have 100 coins like this, each with its own average, e.g.,
>> some "hit" .250, some .300. Now, some may be not affected by the magnet,
>> and some may have their % of heads increased, and some decreased. If
>> I flip them in the presence of the magnet 100 times each, what
>> information does the set of flips of one coin tell me about the other
>> coins?

>Suppose that you find that 75 of the 100 coins had the same bias in


>both measurements. This is conclusive evidence that some of the coins
>are magnetically biased.
>Or suppose that you find that 50 of the 100 coins had the same bias in
>both measurements. This proves that the magnetic bias is not an
>important factor; there could be only two magnetically loaded coins, or
>all 100 coins could be magnetically loaded with an effect of .005.

>The difference between clutch hitting and the coin analogy is that there
>is a reasonable assumption that clutch hitting is close to normally
>distributed. My data is consistent with Benito Santiago being 50 points
>better in the clutch, Greg Brock being 50 points worse, and nobody else
>having any change in ability, but with people, that's unlikely.

Actually, to me the big difference between the analogies is that there
may be many unknown factors affecting the hitters, whereas coins can
be flipped under closely controlled conditions. This weakens or maybe
even severs the connection between the measured clutch performance and
any clutch hitting ability.

The reason I raised the analogy was to try to point out the assumptions
being made in the study of clutch ability. Without some assumptions, we
can't really do anything here.

>> Yes. As David Grabiner has pointed out to me (I'm really just a puppet

>> on his string) definitions of clutch (even RISP, to some extent) might


>> just be measuring things such as how well hitters hit fastballs, or
>> how quickly a hitter adjusts to a new pitcher, or perhaps other factors
>> we haven't thought of.

>RISP will have different; disciplined hitters who get walked


>whenever they are ahead in the count with a base open, such as Mickey
>Tettleton, might be expected to do poorly with RISP. (But Mark McGwire
>is the same type of hitter, and he is the *best* RISP hitter in my
>study.) Also, fly-ball hitters tend to hit a lot of sacrifice flies
>with runners on third, and these get discarded from your batting average.

>However, all of these effects would cause clutch ability to look


>*larger* than it actually is. Thus any estimate of clutch ability from
>a study of my type is an overestimate of its actual importance.

On the whole, that may be true. But note that each effect obscures the
distribution of clutch ability, e.g., skewing the data to make fastball
hitters look better and breaking ball hitters worse. So although the
overall effect of clutch performance on winning games is small, it
cannot yet be ruled out as a factor for evaluating players.

>> I really think that the supposed clutch "ability" is one of the most
>> difficult aspects of baseball to pin down.

>I agree with this, which is why I'm not trying to pin it down exactly,
>just show that its importance is quite limited.

Agreed.

>>>>>>>OK so far? We're better off looking for clutch hitters among those who
>>>>>>>actually hit in the clutch last year.
>

>> Like looking for the coins that are positively affected by the magnet
>> by looking only among those that flipped a higher than expected
>> percentage of heads in their last 100 or 200 flips?

>Yes, although my sample size is larger and more reliable because I used


>five years, 400 AB per player for LIPS and 750 for RISP. This won't be
>a perfect sample, but if there is clutch ability and you look at enough
>players (or if there are loaded coins and you look at enough coins), the
>sample will have more clutch ability (or magnetic loading) than average.

Yes, if you assume a normal distribution of clutch ability. Although
I think this is a reasonable assumption, it may be wrong.

>> I don't think anything is wrong with your definition. Nothing at all.
>> But in trying to investigate clutch, it also pays to look at RISP*LIPS.
>> After all, what if a player does better in RISP than overall, but
>> actually does worse in RISP*LIPS? Is he a clutch hitter?

>If you believe that there is an ability to bear down harder in the
>clutch, it seems unlikely that a player with this ability would bear
>down harder with RISP in the early innings and not in the late innings.

Right, so that was why I felt that RISP-LIPS was a better test of
clutch performance.

>> Looking at the population of coins that flip greater than their average
>> for 100 or 200 flips will lead to including a good number of "magnetic
>> chokers" or unaffected coins, and omitting a good number of "magnetic
>> hitters."

>I'm not claiming that the study is perfect; note that my own estimates
>suggest that only half of clutch ability is being found by the study.

>But the question is not whether the sample is perfect, but whether it is
>better than average. If the sample of "clutch" coins, or players, has a
>clutch ability which is above average, a large enough study will show
>that clutch ability exists.

I agree.

Paul Benjamin

Paul Benjamin

unread,
Feb 28, 1992, 3:25:04 PM2/28/92
to
In article <202...@unix.cis.pitt.edu> dt...@unix.cis.pitt.edu (David M Tate) writes:
>In article <1992Feb24....@sjuphil.uucp> pben...@sjuphil.uucp (Paul Benjamin) writes:

>Roger Lustig writes:

>>>Paul, do you even *read* what you're responding to?

>>Yes, Roger. Perhaps it is better to refrain from transforming a
>>technical discussion into a personal one. Having read the recent
>>slew of postings between stat and anti-stat people, in which the
>>anti-stat people accused the stat people of personal attacks, I
>>am disappointed to see that you have provided an example of this.
>>Note also that David Tate did a similar thing recently, when he
>>questioned my partiality.

>Clarification:
>Bob Gaj posted a couple of times, to the effect that MLEs are basically as
>useful for predicting major league future performance as major league past
>performance is. Paul replied to these postings in a way which seemed to me
>to consistently interpret Bob's comments in ways which made them obviously
>false. I commented that I thought an "impartial observer" would not read
>Bob's comments that way. Paul got quite upset that I had questioned his
>"impartiality" (whatever that is in this context), thinking I had intended
>some sort of personal attack (which I hadn't). I tried to clear this up in
>a subsequent posting, but apparently it didn't take. (And I turned out to
>be right about what Bob had meant in his posting.)

Yes, you were right. But I had read Bob's remarks in a way that seemed
right to me. I think you could have just pointed out that Bob probably
meant something else, without mentioning my "partiality" (which is a
strange term here!). Actually, you could have just let Bob do that
himself. So I responded as I did. After your clarification, I consider
the issue past.

>>Is there a need for this sort of thing?

>You mean dredging up the embers of previous misunderstandings in unrelated
>threads? I certainly hope not. But it's a little late now...

At the time I did not consider it unrelated, but now I do. Sorry to
have brought it up again.

>>Sorry, but this still makes little sense to me. After all, pitchers
>>are people, too. Any clutch ability possessed by hitters will likely
>>be possessed by pitchers, too (and maybe even fielders), making the
>>overall measurement of clutch ability/performance difficult.

>This is an interesting point. For one thing, if there is a significant
>clutch pitching *and* clutch hitting effect, shouldn't it show up as a
>higher population variance in "key" situation performance? In addition to
>getting clutch hitters and chokers, you now get chokers who faced more
>clutch hitters than average, clutch hitters who faced Jim Acker a lot, and
>so on.

Yes, this makes sense. And might there be different distributions of
clutch in pitchers and hitters? E.g., it might be that pitchers who
choke more than a little bit tend to be sent down, whereas a good
overall hitter who isn't good in the clutch would survive. Different
distributions of clutch would lead to a very irregular distribution of
clutch measurement.

>>And there are a couple of questions here which are being confused:
>>
>>1) Does clutch ability exist?
>>
>>2) How much effect does performance in clutch situations have on
>> the outcome of games?

>I don't think these questions are being confused; most of the people working
>on this stuff (David, Roger, Sherri, etc.) are interested in how much weight
>to give "clutch ability" in doing player evaluations and projections. For
>that purpose, it doesn't matter whether you conclude that there are no clutch
>hitters, or that being a clutch hitter isn't really worth anything. The
>effect on your evaluation method is the same: "disregard clutchness". Now,
>we are certainly all interested in questions 1 and 2 above, but answering them
>both may not be necessary for answering the evaluation problem.

Yes, I agree. When I said confused, I meant in the discussion we were
having, not in everyone's heads. It is central to the point that I am
trying to make that these two questions aren't closely related at all.

>>Question #2 is independent of #1. How many games is a +.010 in clutch
>>on the team level worth? How about +.010 for an individual? Even if
>>there's no clutch hitting ability (meaning that there is no ability
>>separate from overall ability) how crucial is the random variation
>>in clutch performance to a team's offense?

>Great questions all. In fact, answering the last one (under a hypothesis of
>"no clutch hitting anywhere") would go a long way toward telling us how good
>models like RC and LW are, by seeing if the observed errors in those measures
>match the predicted error from random fluctuations in clutch batting at the
>team level.

It would seem to me that the last question would be answerable by
an analysis of team season run scoring and clutch performance. Now,
I don't have all that data online. Question: is it perhaps possible that
we could set up an ftp server for baseball stats? Or does such a thing
already exist?

No. I think you're essentially on the button. I agree completely that
the effect of clutch *performance* on winning games is small. I just
want to say, again, that clutch *ability* and performance are not
necessarily closely related. Performance may be a function of a number
of other factors. I don't know.

Paul Benjamin

Roger Lustig

unread,
Mar 1, 1992, 6:45:05 PM3/1/92
to

>>STOP RIGHT THERE.

My case is the one where there *isn't* an effect. What do you mean?

> then it is wrong to dismiss the existence of the effect
>based just on the change from the first set of trials to the second.

We've lost track of the meaning of "effect." That there is a difference
between some individuals and others in a certain set of trials is
(according to our design) an effect of either randomness or something
else. The situation I described above was one in which it was
impossible to dismiss randomness as the cause.

>This is the original point that I made, and I feel that you have just
>not addressed it. This is what distinguishes your approach from that
>of David Grabiner, who is using averages over several years, rather
>than changes from year to year.

>You are characterizing my replies as "marching right past" your
>points, yet I feel that you have failed to address the central and
>original point I made: year-to-year changes in clutch just aren't
>the way to get at clutch.

Not in the individual case. But your invocation of extreme cases simply
does not apply to the examination of *all* the cases, imho./

> A longer time average is needed. David
>Grabiner has done (and is doing) this, and is showing some overall
>effect, albeit not a large one.

Is it a significant one, esp. with Bonferroni adjustments to
significance levels?

> I felt that I was addressing the issues in your posts, and
>it is entirely unnecessary to transform the discussion into one about
>the people involved. Why do net discussions have to turn into personal
>arguments and then into flamewars? (except when they turn into mutual
>fawning.)

Well, perhaps the analogy and your prosecution of it got on my nerves.
I still don't get it...

>>What is the likelihood that a hitter will face similarly biased pitchers
>>year after year? That a *pool* of hitters will face similarly biased
>>pitchers year after year? Bloody unlikely, imho; and I'm really not
>>interested in tracking down every data-free surmise that is thrown in my
>>path.

>Exactly the point. It *is* unlikely, and that may be one major reason
>that a supposed clutch hitter one year might appear to be a clutch
>choker the next, etc., e.g., a hitter may face a disproportionate rate of
>righties one year in the clutch and the next year face far more lefties,
>or face more fastballers one year and more breaking ball hitters the next.
>The variation need not be large to produce differences of reasonable
>size. For example, consider a RH hitter who hits .240 against RHP, and
>.320 against LHP. Suppose the league has 60% RHP and 40% LHP. Now, one
>year he gets 70% RHP so he hits .264, and the next he gets 50% RHP
>(maybe traded to a team with a LH lineup) and hits .280. 16 points
>just from the lefty-righty difference. Then figure in other things,
>like mix of pitches. For some players, these may cancel, but for others,
>they will add.

Of course! Now, how does that argue *against* my design? You've just
shown that much of the source of "clutch" variation has nothing to do
with actual clutch performance.

By taking many players and observing all of them from year to year, we
avoid the problems of observing a single player.

>You know, if there's one obvious thing about these characteristics of
>groups of ballplayers, it's that they don't hold constant from year to
>year.

They hold constant a lot better than the characteristics of individual
ballplayers...

> Things like the relative strengths of the leagues, the percentage
>of fastballers or lefties, etc. are bound to show a natural variation over
>time (some, like the relative strengths of leagues, vary slowly). If
>performance in so-called clutch situations is actually just a measurement
>of a more complex interactions of things that have nothing to do with
>"getting up" for the big atbats, then measurements of this performance
>will vary in unpredictable ways. Then, as you are looking for "the ones
>that gained last time gain again this time, and the ones that lost last
>time lose", then you are extremely unlikely to find any consistent clutch
>hitters.

There *are* consistent home run hitters, consistent platoon hitters,
consistent fly-ball hitters, consistent opposite-field hitters. What's
so special about "clutch" that it can exist without showing up in the
numbers? All these other effects are easily tangible. For all these
other things, the ones who did better last time are likely to do better
this time.

> And that is exactly your result, yes? No surprise. Looking at the
>clutch data per year is just not convincing.

So, how will these natural variations affect my study? I don't get it.
If the league gets weaker, then the baseline batting average (or OPS, or
whatever) rises or falls. If there are more lefties, then *all* the
hitters in the study get affected by that. Again, I'm not sure you're
clear on the design of my study.

>Certainly a hitter for the team with the worst pitching staff faces
>on average better pitchers than a hitter for the team with the best
>pitching staff, especially if they are in the same division in the NL
>(in which case the difference in opponents' ERA can be .2 runs).

Can be, indeed. How often *is* it anything like this? How much noise
IS introduced by this? Looking only at the extreme case doesn't tell
you about the overall noise level.

(With LIPS it's probably worse, given the variability in bullpens...)

>>> ... definitions of clutch (even RISP, to some extent) might
>>>just be measuring things such as how well hitters hit fastballs, or
>>>how quickly a hitter adjusts to a new pitcher, or perhaps other factors
>>>we haven't thought of.

>>SO? Those are hypotheses about the SOURCE of an effect. We don't care
>>about the sources until we know that there IS an effect. And besides,
>>what does it matter? If there are more fastballs in RISP situations,
>>then clutch hitting *is* fastball hitting. And if it is fastball
>>hitting, then we will discover that fastball hitters show a clutch
>>effect.

>1) David has already shown the size of the overall effect. His numbers
> have been posted recently, but were also posted a while back. So there
> is an effect. The remaining question is not whether some hitters
> perform differently in RISP or LIPS, but why? Is there an ability?
> Is it merely effects like hitting fastballs?

a) I haven't seen David's results, but hope to. I'm unclear about his
methods, too.

b) That some hitters perform differently in RISP or LIPS is clear. I
only want to know: can they be expected to continue doing so? If not,
then asking about fastballs seems pointless.

c) fastball-hitting ability is a *cause*, not an effect.

>2) I agree that as far as the manager's decision is concerned, clutch
> hitting may well be just something like fastball hitting. But that
> is definitely *not* what the clutch people have in mind.

That's their problem.

>3) Do you (or anyone else) have data on who the fastball hitters are? Who
> adjusts faster to new pitchers, etc.?

No. Do I need it at this stage?

>>>>>>>>OK so far? We're better off looking for clutch hitters among those who
>>>>>>>>actually hit in the clutch last year.

>>>Like looking for the coins that are positively affected by the magnet
>>>by looking only among those that flipped a higher than expected
>>>percentage of heads in their last 100 or 200 flips?

>>STOP RIGHT THERE. *Nowhere* did I advocate looking ONLY at those coins.
>>I said we should ISOLATE those coins, and compare their performance to
>>that of the rest of the population when we repeat the experiment. In
>>the absence of a difference between groups the second time, we are led
>>to reject the idea that there are magnet-affected coins.

>When we *should* be led to reject just this subpopulation as consisting
>mostly of coins that are positively affected by the magnet.

Um, how's that again? Remember, hte other populations didn't show any
positive effect in the first place. Why shouldn't we reject the
existence of a positive effect for them? Statistically speaking, the
interpretations are about the same: no evidence for the effect of the
magnet.

I suppose one should rephrase the whole thing: instead of rejecting the
magnet hypothesis, we state that we *don't* reject the no-magnet
hypothesis.

>>Look, Paul, if you don't like that *fact* that I'm doing the study,
>>fine. Go do a better study. But kindly don't a) complain about the
>>method based on all kinds of irrelevancies, and then b) tell me I
>>oughtn't to bother in the first place.

>Look, Roger, if you don't like the fact that I'm criticizing your
>study, fine. Hit the 'n' key. But kindly don't a) complain that I'm
>using the net to criticize your methods, because you have often been
>known to post strong criticisms of other people's posts, and then
>b) tell me that I oughtn't to bother posting in the first place.
>What do you think the net is? If you don't want criticism, then
>either don't post or don't respond.

Paul, that's not the point and I think you know it. I am open to all
kinds of critiques of my methods; but when I respond to a critique and
get the original critique, essentially unaltered, as a rebuttal to my
response, I have to wonderwhether my point has even been received, let
alone processed.

>>>I think these things are doable, and if I didn't have
>>>to work my butt off here, I'd do some of them myself.

>>But there's plenty of time to complain about other people doing
>>methodologically sound studies you don't happen to approve of, is that
>>it? The vast majority of your criticism doesn't even address the study
>>I'm talking about.

>No, that's not it. I disagree that your methodology is sound.
>Clearly, you haven't grasped that, so there's no point in continuing.

There's no point in continuing, certainly, if you refuse to see my
proposals for what they are -- and no more. That they don't do a whole
lot of other things is entirely beside my point.

>I think you'll just come up with a measurement of perhaps many different
>effects interacting.

What effects are those? If you mean "fastball in RISP," or "different
pitchers in LIPS," well, those are clearly *part* of clutch hitting, as
they're integral parts of baseball as she is spoke. Whether a given
player hits better with men on base in a type of baseball where pitchers
*don't* go to their fastball in that situation is really pretty
uninteresting to me; I don't watch that kind of baseball.

The righty-lefty effect in LIPS is perhaps an exception, and I hope to
show that what we call "clutch" hitting is often in fact a result of
platooning, i.e., selection of clutch opportunities. Suffice it to say
that one doesn't get a choice of pitchers in the late innings, or a
choice of pitches in a RISP at-bat. But a manger *does* have a choice
of hitters...

>The measurement will be small, and then you'll
>make some pronouncement about clutch hitting ability, when that will
>not have been isolated at all.

When *what* will not have been isolated? We're still evidently atthe
level of *defining* "clutch." If it's simply "hitting better in RISP,"
or "hitting better in situation X," then we *can* make the comparison
and see whether any hitters routinely hit better in situation X. We
agree that the platooning bias can skew the numbers for some definitions
of X; for the rest, however, you seem to be thinking of various *causes*
for the clutch effect, which don't concern me at this moment.

Oh, and the only pronouncement I intend to make is "I [see/do not see]
evidence fo the existence of the clutch effect as defined by X, through
the examination of the dataset Y."

>>>whereas I think that any conclusions from a clutch study would likely
>>>lead to just another statement of, "Well, we didn't find it." to which
>>>the baseball people or pro-clutch people would say, "Well, we know it's
>>>there, so you didn't look in the right place."

>>So you're saying that baseball people are MORE small-minded on this
>>matter than on others. Don't you suppose that a sac bunt study would
>>meet with the same result -- "you didn't tak into account the fact of
>>situation X."

>I think that a well supported conclusion would catch someone's eye.
>Certainly there would be those who would ignore it, but all you'd need
>is one manager who used it.

Frankly, I think managers *do* act pretty clutch-independently. How
often do you see a star with a bad clutch rep getting benched for the
big game? Or taken out for a pinch-hitter, even? The platoon effect
explains the vast majority of substitutions (along with double-switch,
etc.).

Roger

MotrinMan

unread,
Mar 1, 1992, 9:32:26 PM3/1/92
to
(Roger Lustig) writes:
>(Paul Benjamin) writes:
>>(Roger Lustig) writes:
>>>(Paul Benjamin) writes:
>>>>(Roger Lustig) writes:
>>>>>(Paul Benjamin) writes:
>>>>>>(Roger Lustig) writes:
>>>>>>>(Paul Benjamin) writes:

C'mon, guys, maybe some of us OTHER net readers are tired of your unending
tirade!


Can you say *email*?

Sure, I knew you could.
--
ditt...@usc.edu (Matt Dittrich, aka MotrinMan)

\\\\\\\\\\\\\\\\FIGHT ON TROJANS///////////////

Greg Sarcasm Is A Way Of Life Spira

unread,
Mar 2, 1992, 1:43:54 AM3/2/92
to

> (Roger Lustig) writes:
> >(Paul Benjamin) writes:
> >>(Roger Lustig) writes:
> >>>(Paul Benjamin) writes:
> >>>>(Roger Lustig) writes:
> >>>>>(Paul Benjamin) writes:
> >>>>>>(Roger Lustig) writes:
> >>>>>>>(Paul Benjamin) writes:

>C'mon, guys, maybe some of us OTHER net readers are tired of your unending
>tirade!

And maybe some other net readers are actually interested in reading this
discussion, which doesn't resemble a tirade in the least.

Greg
--
sp...@panix.com "The one-O delivery to Fisk. He swings. Long drive,
cmcl2!panix!spira left field! If it stays fair, it's gone! Home Run!"
158-17 Riverside Dr. Ned Martin, 10/22/75
Whitestone NY 11357 (Insert your favorite baseball moment here)

Paul Benjamin

unread,
Mar 2, 1992, 1:05:36 PM3/2/92
to
>(Roger Lustig) writes:
>>(Paul Benjamin) writes:
>>>(Roger Lustig) writes:

>>>NOWHERE have I advocated a single set of trials. NOWHERE. When I
>>>advocated separating out observed clutchers from observed chokers from
>>>observed no-effects, it was in order to observe them A SECOND TIME. If
>>>the three groups evidence no difference the second time, then it is
>>>clear that there was no effect, and pooling individuals doesn't change
>>>that. True, one can't say that a particular individual displayed an
>>>effect, but one can't say that *ANYWAY* unitl one has demonstrated that
>>>there *IS* an effect.

>>Yes, Roger, I understand. My point is that it is not necessarily clear
>>at all that there is no effect in the case you describe. If you put
>>the second set of trials together with the first ones and there is still
>>an effect,

>My case is the one where there *isn't* an effect. What do you mean?

I read your previous posting as saying that if you take a group of
players who appeared to hit well/poorly one year, and compared their
performance for a second year, and found no difference for that second
year, then you would conclude that there is no clutch effect. I am
saying that if the total for the two years still shows the clutchers
and chokers, then that conclusion would be wrong.

>> A longer time average is needed. David
>>Grabiner has done (and is doing) this, and is showing some overall
>>effect, albeit not a large one.

>Is it a significant one, esp. with Bonferroni adjustments to
>significance levels?

I don't know. David?

----------------------------------------------------------


>>>What is the likelihood that a hitter will face similarly biased pitchers
>>>year after year? That a *pool* of hitters will face similarly biased
>>>pitchers year after year? Bloody unlikely, imho; and I'm really not
>>>interested in tracking down every data-free surmise that is thrown in my
>>>path.

>>Exactly the point. It *is* unlikely, and that may be one major reason
>>that a supposed clutch hitter one year might appear to be a clutch
>>choker the next, etc., e.g., a hitter may face a disproportionate rate of
>>righties one year in the clutch and the next year face far more lefties,
>>or face more fastballers one year and more breaking ball hitters the next.
>>The variation need not be large to produce differences of reasonable
>>size. For example, consider a RH hitter who hits .240 against RHP, and
>>.320 against LHP. Suppose the league has 60% RHP and 40% LHP. Now, one
>>year he gets 70% RHP so he hits .264, and the next he gets 50% RHP
>>(maybe traded to a team with a LH lineup) and hits .280. 16 points
>>just from the lefty-righty difference. Then figure in other things,
>>like mix of pitches. For some players, these may cancel, but for others,
>>they will add.

>Of course! Now, how does that argue *against* my design? You've just
>shown that much of the source of "clutch" variation has nothing to do
>with actual clutch performance.
>By taking many players and observing all of them from year to year, we
>avoid the problems of observing a single player.

Yes, but you do not avoid the problem of observing them each for only
a year at a time. Only by observing them over a longer period of time
can you try to ameliorate the effects of variations in things like
the pitchers faced. By observing them from year to year, you are
preserving these variations.

>> Things like the relative strengths of the leagues, the percentage
>>of fastballers or lefties, etc. are bound to show a natural variation over
>>time (some, like the relative strengths of leagues, vary slowly). If
>>performance in so-called clutch situations is actually just a measurement
>>of a more complex interactions of things that have nothing to do with
>>"getting up" for the big atbats, then measurements of this performance
>>will vary in unpredictable ways. Then, as you are looking for "the ones
>>that gained last time gain again this time, and the ones that lost last
>>time lose", then you are extremely unlikely to find any consistent clutch
>>hitters.

>There *are* consistent home run hitters, consistent platoon hitters,
>consistent fly-ball hitters, consistent opposite-field hitters. What's
>so special about "clutch" that it can exist without showing up in the
>numbers? All these other effects are easily tangible. For all these
>other things, the ones who did better last time are likely to do better
>this time.

Clutch isn't special. Any effect will be difficult to measure if it
can be greatly influenced by another effect. For example, consider
home-run hitters, as you mention. Now, HR hitting can be greatly
affected by the park in which a player hits. If a player moves
from Atlanta to KC, he should expect a real effect on his HR totals,
exactly as things like opposing pitchers affect clutch. Now, few
hitters make such transitions between great and poor HR parks every
year, so yearly averages of HR are stable. But I don't know that
the things that can affect clutch are that stable, e.g., yearly
variation in LHP-RHP faced.

>> And that is exactly your result, yes? No surprise. Looking at the
>>clutch data per year is just not convincing.

>So, how will these natural variations affect my study? I don't get it.
>If the league gets weaker, then the baseline batting average (or OPS, or
>whatever) rises or falls. If there are more lefties, then *all* the
>hitters in the study get affected by that. Again, I'm not sure you're
>clear on the design of my study.

Well, if there are more lefties, then some of the hitters will look
better, some worse, and some will be unaffected. The effect will not
be uniform. But even if there is the same number of lefties, then
the natural variation in the % of lefties each hitter faces will cause
such effects from year to year. And when measuring only a subset of
atbats (like for clutch) the variance gets larger. For example, suppose
we measure the % of lefties faced in the first 10 atbats in June. Now,
if the league has 60% RHP and 40% LHP, then the mean will be 4 atbats.
The variance will be large, with many hitters facing zero or 9-10
lefties. The change from year to year per hitter will be large, with
little correlation from year to year. Now, in clutch, you are measuring
a subset of the season's atbats; the variance increase will be less than
in this example, because there are much more than 10 atbats, but I
still maintain that by measuring the year-to-year change in clutch
you will be largely observing this variance.

----------------------------------------------------------


>>Certainly a hitter for the team with the worst pitching staff faces
>>on average better pitchers than a hitter for the team with the best
>>pitching staff, especially if they are in the same division in the NL
>>(in which case the difference in opponents' ERA can be .2 runs).

>Can be, indeed. How often *is* it anything like this?

Every year. Every year there is a worst ERA team and a best one, and
there are hitters for both those teams. Some years there is more than
one team at each extreme.

>How much noise
>IS introduced by this? Looking only at the extreme case doesn't tell
>you about the overall noise level.

I don't know. I am merely pointing out that this is a possible source
of noise, and that it should be eliminated for an experiment.

>>3) Do you (or anyone else) have data on who the fastball hitters are? Who
>> adjusts faster to new pitchers, etc.?

>No. Do I need it at this stage?

Well, I think it might help shed some light in things. For example, if
clutch performance correlates strongly with fastball hitting, then we
can confidently theorize that that's what clutch *is*, which is always
more satisfying than having to state what clutch isn't, or what might
be the extent of clutch.

----------------------------------------------------------


>>>I said we should ISOLATE those coins, and compare their performance to
>>>that of the rest of the population when we repeat the experiment. In
>>>the absence of a difference between groups the second time, we are led
>>>to reject the idea that there are magnet-affected coins.

>>When we *should* be led to reject just this subpopulation as consisting
>>mostly of coins that are positively affected by the magnet.

>Um, how's that again? Remember, hte other populations didn't show any
>positive effect in the first place. Why shouldn't we reject the
>existence of a positive effect for them? Statistically speaking, the
>interpretations are about the same: no evidence for the effect of the
>magnet.

>I suppose one should rephrase the whole thing: instead of rejecting the
>magnet hypothesis, we state that we *don't* reject the no-magnet
>hypothesis.

Ah, now *that* I agree with. The second statement is much weaker than
the first.

----------------------------------------------------------


>> I felt that I was addressing the issues in your posts, and
>>it is entirely unnecessary to transform the discussion into one about
>>the people involved. Why do net discussions have to turn into personal
>>arguments and then into flamewars? (except when they turn into mutual
>>fawning.)

>Well, perhaps the analogy and your prosecution of it got on my nerves.
>I still don't get it...

OK. It was a less than perfect analogy, to be sure.

>Paul, that's not the point and I think you know it. I am open to all
>kinds of critiques of my methods; but when I respond to a critique and
>get the original critique, essentially unaltered, as a rebuttal to my
>response, I have to wonderwhether my point has even been received, let
>alone processed.

The reason I have repeated the same criticisms is that I still perceive
the same flaw in your approach, no matter how you state it: clutch
performance may be a function of a number of variables, including a
clutch "ability", so that measuring the year-to-year variation in
clutch performance might simply be measuring the year-to-year variation
in a number of other variables.

----------------------------------------------------------


>>I think you'll just come up with a measurement of perhaps many different
>>effects interacting.

>What effects are those? If you mean "fastball in RISP," or "different
>pitchers in LIPS," well, those are clearly *part* of clutch hitting, as
>they're integral parts of baseball as she is spoke. Whether a given
>player hits better with men on base in a type of baseball where pitchers
>*don't* go to their fastball in that situation is really pretty
>uninteresting to me; I don't watch that kind of baseball.

You turn off the set when a curveball pitcher is in that situation?
Bert Blyleven will be insulted! :)

>The righty-lefty effect in LIPS is perhaps an exception, and I hope to
>show that what we call "clutch" hitting is often in fact a result of
>platooning, i.e., selection of clutch opportunities. Suffice it to say
>that one doesn't get a choice of pitchers in the late innings, or a
>choice of pitches in a RISP at-bat. But a manger *does* have a choice
>of hitters...

So that clutch hitting may be just situational hitting. Could be.

Paul Benjamin

David M Tate

unread,
Mar 3, 1992, 11:22:40 AM3/3/92
to
In article <1992Mar2.1...@sjuphil.uucp> pben...@sjuphil.uucp (Paul Benjamin) writes:
> [Roger Lustig writes:]

>>The righty-lefty effect in LIPS is perhaps an exception, and I hope to
>>show that what we call "clutch" hitting is often in fact a result of
>>platooning, i.e., selection of clutch opportunities. Suffice it to say
>>that one doesn't get a choice of pitchers in the late innings, or a
>>choice of pitches in a RISP at-bat. But a manger *does* have a choice
>>of hitters...

>So that clutch hitting may be just situational hitting. Could be.

Inconclusive anecdote:

In glancing through my GABSB, I noticed that noted clutch maniac Pat Tabler
has a career BA/OBP/SLG platoon split that is all but identical to his split
in scoring position vs. bases empty situations. I mean *really* identical;
I'll post the numbers when I get the chance.

--
David M. Tate |"A tendencey to drastically underestimate
dt...@unix.cis.pitt.edu | the frequency of coincidences is a prime
Less musteliform than Gary Huckabay | characteristic of innumerates."
Less cephalopoid than Lance Smith. | --John Allen Paulos, _Innumeracy_

0 new messages