Value of a draft pick

1 view
Skip to first unread message

igor eduardo küpfer

unread,
Oct 9, 2004, 6:12:40 PM10/9/04
to
Okay, my long awaited post. To tell the truth, I've kinda forgot why I did
this study in the first place, so if I wander off the path, give me a
little nudge, won't you?

I've decide to break up my study into two parts: this post is an in-depth
explanation of the method I've used to come up with a model to predict
player performance based on draft position; a post to follow will apply
that model to the drafts over the year, to see if we can spot variations
and deviations from the model. This post is going to be math-intensive and
very, very dull -- feel free to skip it. The next post should have a little
more meat.

I've decided to present my numbers in terms of Net Wins Above Replacement
instead of using more traditional boxscore stats. Even thought nWAR is
slightly esoteric, they can be translated back into the real world simply:
One nWAR = one win added to the team by that player, over and above what
his replacement would've done given the same amount of touches. Got that?
One nWAR = One Win. In this post, I may use nWAR and Win interchangeably.

To give you a feel for what a nWAR is worth, I present here the top 10 for
2003-04, in conjunction with the usual per game numbers. (GR is "Games
Responsible" -- an estimate of how many team possessions over the season
were used up by that player.)

Player Year GP GR PPG APG RPG nWAR
1 garnett,kevin 2004 82 16.3 24.2 5.0 13.9 13.4
2 duncan,tim 2004 69 12.7 22.3 3.1 12.4 10.3
3 stojakovic,peja 2004 81 13.9 24.2 2.1 6.3 9.9
4 cassell,sam 2004 81 13.8 19.8 7.3 3.3 9.2
5 kirilenko,andri 2004 78 12.6 16.5 3.1 8.1 9.1
6 ming,yao 2004 82 12.2 17.5 1.5 9.0 8.8
7 nowitzki,dirk 2004 77 12.8 21.8 2.7 8.7 8.7
8 billups,chaunce 2004 78 12.3 16.9 5.7 3.5 8.7
9 jefferson,richa 2004 82 13.6 18.5 3.8 5.7 8.6
10 o'neal,jermaine 2004 78 13.9 20.1 2.1 10.0 8.0

Yes, I too was surprised to see Jeff.

Here are the top 10 nWAR of all time (actually, since 1977-78, when the
numbers became available for the first time):

Player Year GP GR PPG APG RPG nWAR
1 jordan,michael 1988 82 17.7 35.0 5.9 5.5 14.4
2 robinson,david 1994 80 17.1 29.8 4.8 10.7 14.0
3 jordan,michael 1989 81 17.3 32.5 8.0 8.0 13.7
4 jordan,michael 1996 82 16.3 30.4 4.3 6.6 13.6
5 garnett,kevin 2004 82 16.3 24.2 5.0 13.9 13.4
6 o'neal,shaq 2000 79 16.3 29.7 3.8 13.6 13.3
7 jordan,michael 1990 82 17.1 33.6 6.3 6.9 13.2
8 jordan,michael 1991 82 15.9 31.5 5.5 6.0 13.2
9 jordan,michael 1987 82 18.7 37.1 4.6 5.2 13.2
10 jordan,michael 1997 82 16.2 29.6 4.3 5.9 12.9

(One thing I'll take from this stat is just how good MJ really was.
Incredible, isn't it?)

So just keep in mind: 1 nWAR over a full season isn't very good -- we're
talking Samaki Walker. 3 nWAR is pretty decent: Joe Barry Carroll, say. 5
nWAR is good second banana territory: Derrick Coleman, say, or Otis
Birdsong. 7 nWAR is getting into the really good category: Brad Daugherty,
Larry Johnson, Alonzo Mourning. 10 is team leader having career year area:
Ray Allen 2001, Mookie 1997, Terrell Brandon 1996, Kidd 2003. 12 is the
type of season only HOFers have: Moses, DRob, Shaq, MJ, Dirk, Karl, Duncan,
Hill, Bird, Barkley. (Yes, Dirk will be a HOFer.) MJ leads with 10 12-nWAR
seasons, Karl next with 7. Kareem, who played with better teammates for
many of his years, had to share the ball too much to rack up many 10+ nWAR
seasons -- which highlights one aspect of nWAR as a performance metric: it
only measures production, not ability. Keep that in mind.

Got that out of the way. So how much is a draft pick worth? I ran into
problems right away -- what the hell does that question mean? Worth to who?
To the team that has the pick, obviously, but for how long? A team can't be
expected to hold on to a picked draft pick forever. What does San Antonio
get out of picking Tim Duncan anyway? They get his services (8-10
nWAR/year), but for how long?

Say a team gets a draft pick's services for 3 years -- the three years
following the year of the pick. Here are the average wins per year a draft
pick contributes to his team.


Level N Mean StDev ---------+---------+---------+-------
1 25 4.607 2.775 (---*--)
2 25 3.217 1.886 (--*---)
3 25 4.388 2.738 (--*---)
4 25 2.366 1.592 (---*--)
5 25 3.366 2.381 (--*---)
6 25 2.023 1.801 (--*---)
7 25 2.030 1.523 (---*--)
8 25 2.096 2.076 (---*---)
9 25 2.456 1.985 (--*---)
10 25 2.255 1.880 (---*---)
MID 225 1.483 1.499 (*)
LATE 250 0.896 1.337 (*)
2ndRound 556 0.322 0.758 (*)
---------+---------+---------+-------
Pooled StDev = 1.356 1.5 3.0 4.5


What does all that mean? "Level" is the Draft Pick position: 1-10; "MID" is
a midlevel first rounder, 11-20; "LATE" is a late first rounder, 21-29; and
then there are the second round picks. "N" is the number of picks in my
sample. "Mean" is the average nWAR per year by each pick. "StDev" is
standard deviation, a measure of variation. The variation is also displayed
by the parentheses enclosing the asterisk, like this -> (----*----). The
wider the parentheses are set apart, the more variation in the wins
produced by the draft picks.

Clearly, then, the top five draft picks produce wins roughly in accordance
with their draft positioning. Draft picks 6-10 are indistinguishable from
each other. The other picks decrease in value, fading to an average of just
over zero wins for second round picks.

That's for the first 3 years. But what about the first 5 years following
the draft, what can we expect?

Level N Mean StDev ---------+---------+---------+-------
1 23 5.217 2.672 (---*--)
2 23 3.637 1.983 (---*---)
3 23 4.693 2.682 (--*---)
4 23 3.088 1.591 (--*---)
5 23 3.751 2.502 (--*---)
6 23 2.132 1.872 (--*---)
7 23 2.529 1.716 (---*---)
8 23 2.409 2.112 (---*---)
9 23 3.044 2.221 (---*---)
10 23 2.476 2.062 (--*---)
MID 207 1.701 1.754 (-*)
LATE 230 0.991 1.467 (*)
2ndRound 500 0.379 0.866 *)
---------+---------+---------+-------
Pooled StDev = 1.483 1.6 3.2 4.8

The same pattern. How about 7 seasons?

Level N Mean StDev ---------+---------+---------+-------
1 21 5.556 2.646 (---*--)
2 21 3.640 1.950 (--*---)
3 21 4.699 2.786 (--*---)
4 21 3.262 1.831 (--*---)
5 21 3.757 2.632 (---*--)
6 21 2.108 2.126 (---*--)
7 21 2.667 1.878 (--*---)
8 21 2.326 2.113 (---*--)
9 21 2.669 1.957 (--*---)
10 21 2.240 1.859 (--*---)
MID 189 1.798 1.939 (*)
LATE 210 0.990 1.496 (*)
2ndRound 442 0.389 0.892 (*)
---------+---------+---------+-------
Pooled StDev = 1.552 2.0 4.0 6.0

These all display the same pattern. What I will do is use the average nWAR
production over the first five seasons following the draft as the measure
under study.

It's clear then that the average draft pick produces wins roughly in
accordance to his draft position. This isn't a huge surprise, but what we
want to know is the amount of variation about the mean -- the amount of
certainty we can have that a draft pick will produce the expected number of
wins. What we need is a model of win production which takes draft position
into account, along with any other factors that may affect his performance.
I'll use a typical multiple regression model for this.

The multiple regression equation takes the form
y = b1*x1 + b2*x2 + ... + bn*xn + c. The b's are the regression
coefficients, representing the amount the dependent variable y changes when
the independent changes 1 unit. The c is the constant, where the regression
line intercepts the y axis, representing the amount the dependent y will be
when all the independent variables are 0.

In English: the model I've used will predict a player's nWAR by using the
following equation: nWAR = b1*x1 + b2*x2 + ... + bn*xn + c, where the x's
are factors used to predict player performance (draft position, season,
height, etc.) and the b's are coefficient used to weight the x-factors
properly (because one inch of height has less effect than one draft
position). C is just a constant added to the equation to make it look nice.

These are the factors I used to try to predict player performance:

Year - that is, the season the player was drafted
AGE - player age at draft
Ht - height
OvAlPk - overall pick in the draft
Teams - # of teams in the league
AllPs - total # of picks taken in the draft
HHI5 - a measure of team equality, averaged over the 5 years
HHI5_1 - team equality from the season before
W_L5 - team win/loss record
W_L5_1 - team win/loss record from the season before

Additionally, I've included squared and cubed versions of each of these
variables in the regression, denoted by "^2" and "^3" respectively -- eg
AGE^2 = the player's age, squared. The reason for this is that some
variables's effect is non-linear (for example, the effect of rest on team
performance is non-linear: 1 days rest is twice as good as 0 days, but 2 is
four times as good as 1 days rest). Squared and cubed terms sometimes
capture the non-linearity.

Okay, that done. The regression equation is

1st5 = - 421363 + 644*Year - 0.33*Year^2 +0.000056*Year^3
+ 0.070*AGE - 0.00455*AGE^2 +0.000040*AGE^3
- 5.61*Ht + 0.0732*Ht^2 -0.000318*Ht^3
- 0.290*OvAlPk + 0.00719*OvAlPk^2 -0.000060*OvAlPk^3
+ 8.6*Teams - 0.299*Teams^2 + 0.0034*Teams^3
+ 0.0708*AllPs -0.000485*AllPs^2 +0.000001*AllPs^3
- 139*HHI5 + 570*HHI5^2 - 732*HHI5^3
- 23.6*HHI5_1 + 64.3*HHI5_1^2 - 52*HHI5_1^3
+ 22.1*W_L5 - 66.6*W_L5^2 + 70.5*W_L5^3
- 15.6*W_L5_1 + 43.2*W_L5_1^2 - 41.6*W_L5_1^3

The next step is to remove the variables which are not statistically
significant. These are listed in the following table:

Predictor Coef SE Coef T P
Constant -421363 3258105 -0.13 0.897
Year 644 4917 0.13 0.896
Year^2 -0.328 2.473 -0.13 0.895
Year^3 0.0000556 0.0004147 0.13 0.893
AGE 0.0703 0.1599 0.44 0.660
AGE^2 -0.004551 0.004870 -0.93 0.350
AGE^3 0.00003986 0.00003607 1.10 0.269
Ht -5.609 5.471 -1.03 0.306
Ht^2 0.07317 0.06984 1.05 0.295
Ht^3 -0.0003185 0.0002969 -1.07 0.284
OvAlPk -0.28996 0.02819 -10.29 0.000
OvAlPk^2 0.007192 0.001164 6.18 0.000
OvAlPk^3 -0.00005979 0.00001380 -4.33 0.000
Teams 8.63 19.89 0.43 0.664
Teams^2 -0.2992 0.7769 -0.39 0.700
Teams^3 0.00342 0.01007 0.34 0.734
AllPs 0.07076 0.06993 1.01 0.312
AllPs^2 -0.0004846 0.0005099 -0.95 0.342
AllPs^3 0.00000101 0.00000112 0.90 0.366
HHI5 -139.17 44.58 -3.12 0.002
HHI5^2 569.5 184.0 3.10 0.002
HHI5^3 -732.1 240.2 -3.05 0.002
HHI5_1 -23.62 25.07 -0.94 0.346
HHI5_1^2 64.25 91.30 0.70 0.482
HHI5_1^3 -51.9 100.5 -0.52 0.606
W_L5 22.07 36.03 0.61 0.540
W_L5^2 -66.62 75.52 -0.88 0.378
W_L5^3 70.53 51.85 1.36 0.174
W_L5_1 -15.62 34.40 -0.45 0.650
W_L5_1^2 43.20 73.26 0.59 0.556
W_L5_1^3 -41.59 51.30 -0.81 0.418

S = 1.423 R-Sq = 47.5% R-Sq(adj) = 46.0% <--that reflects a
pretty good fit!

The column labeled "P" shows the statistical significance. We are looking
for variables which have low p-values, below 0.05. Once I remove the
variables that aren't significant, we end up with this:

1st5 = 2.69
+ 0.00668*Ht
- 0.189*OvAlPk + 0.00228*OvAlPk^2
- 15.5*HHI5 + 32.7*HHI5^2
+ 8.12*W_L5
- 2.45*W_L5_1


Predictor Coef SE Coef T P
Constant 2.6878 0.8788 3.06 0.002
Ht 0.006684 0.001792 3.73 0.000
OvAlPk -0.18878 0.01091 -17.31 0.000
OvAlPk^2 0.0022751 0.0001927 11.80 0.000
HHI5 -15.514 6.740 -2.30 0.022
HHI5^2 32.69 14.04 2.33 0.020
W_L5 8.116 1.098 7.39 0.000
W_L5_1 -2.449 1.131 -2.16 0.031

So how does that work? Take a look at the picks from the 97 draft, for
example: I'll show picks at intervals of 5, beginning with the #1 pick.

actual predicted
Name Pick nWAR nWAR Error
Tim Duncan 1 11.2 5.5 -5.7
Ron Mercer 6 0.8 2.5 1.7
Tariq Abdul-Wahad 11 0.2 2.1 1.9
Brevin Knight 16 2.9 1.1 -1.8
Anthony Parker 21 0.1 1.4 1.3
Charles C. Smith 26 0.0 0.3 0.3
Charles O'Bannon 31 0.1 0.6 0.5
James Collins 36 0.0 0.0 -0.1
Jason Lawson 41 0.0 0.4 0.4
Eric Washington 46 0.0 -0.3 -0.3
DeJuan Wheat 51 0.0 0.2 0.2
Nate Erdmann 56 0.0 0.8 0.8

Except for Duncan, our regression equation does a pretty good job at
predicting the amount of wins these players will contribute. In fact, if we
look at all the picks from every season, we'll see that about two-thirds of
the predictions are off by less than 2 nWAR. We'll call this (2 nWAR) the
Error term of the equation -- the amount of uncertainty inherent the
equation.

In my next post I will apply this model to drafts over the years to see if
there are consistent deviations from the model.

--

-------------------------------------
| best, | Sticking it to |
| ed | The Man since 1971 |
-------------------------------------
Watch the spam trap -- the domain is rogers

Big Chris

unread,
Oct 10, 2004, 2:03:40 AM10/10/04
to

"igor eduardo küpfer" <edku...@example.com> wrote in message
news:aungm0la7ufvscemn...@4ax.com...

So the million dollar question is: if you took the Clippers out of this
equation....how would it change the numbers? Really. They have wasted a
LOT of decent picks.

Big Chris


Ron Coscorrosa

unread,
Oct 10, 2004, 1:49:13 PM10/10/04
to
igor eduardo küpfer wrote:

<snip>

Excellent. Really.

Just curious: how did you calculate this? I presume some software package,
it seems to me that with all the variables, even with software, that would
take forever to compute. That's really really cool though, I can see how
that would come in useful in the general "how much does X correlate with Y"
type questions.

<snip>

> The column labeled "P" shows the statistical significance. We are looking
> for variables which have low p-values, below 0.05. Once I remove the
> variables that aren't significant, we end up with this:
>
> 1st5 = 2.69
> + 0.00668*Ht
> - 0.189*OvAlPk + 0.00228*OvAlPk^2
> - 15.5*HHI5 + 32.7*HHI5^2
> + 8.12*W_L5
> - 2.45*W_L5_1

Interesting how age isn't in there.

That's really pretty damn good. But, you need to apply the model for a
longer period of time to see how good it really is. It would be
interesting to see how relevant the increase in high school players drafted
really is (my guess: not that much) - just look at the average NWAR over
the first X years prior to, say, the draft with KG and the draft with KG
and after. It's also be interesting to see, in general, if high school
players end up being better than college players on average (for whatever
definition of "average").

> In my next post I will apply this model to drafts over the years to see if
> there are consistent deviations from the model.

Well, there you go.

Also, unrelated, but I remember you mentioning something (how's that for
vague?) which analyzed game logs to pull out interesting stats - what was
that? I was thinking of making something like that in my "free time." I
was also thinking of yanking the box score stats from every game and making
them free available in an RSS feed so other people could parse them easily
(or, more accurately, so I could mess with them later in the year).

Anyway, that's really interesting stuff. You should have one of those "blog
thingys" man.

--
Ron Coscorrosa
http://coscorrosa.com

igor eduardo küpfer

unread,
Oct 10, 2004, 4:23:51 PM10/10/04
to
One thing. I adjusted my regression equation from the previous post,
changing the constant to 2.0. This had the effect of lowering the ERROR
term to 0.87.

On Sun, 10 Oct 2004 01:03:40 -0500, "Big Chris" <mr...@yahoo.com> wrote in
<2ss1ltF...@uni-berlin.de>:

The Clippers are as good a place to start as any. Between 1977 and 1999
(the years used in my sample), they have had 27 1st round picks. They break
down like this (each "X" represents one pick):

Pick |#of picks
-----|----------
1-3 |XXXXXX
4-6 |XXXX
7-9 |XXXXXX
10-12|
13-15|XXXXX
16-18|X
19-21|X
22-24|XX
25-27|XX

How good have the Clippers picks been? Have they underachieved or surpassed
expectations? Take a look at the following table, containing every 1st
round Clipper pick:

Actual Expected
TEAM Year Round Pick PLAYER nWAR nWAR DIFF SIGNIFICANCE

SDC 1980 1 9 Mike Brooks +2.2 +3.1 -0.9 -
SDC 1981 1 8 Tom Chambers +3.3 +1.8 +1.5 ++
SDC 1982 1 2 Terry Cummings +6.9 +4.3 +2.6 +++
SDC 1983 1 4 Byron Scott +5.3 +4.8 +0.5 +
LAC 1984 1 8 Lancaster Gordon -0.2 +1.7 -1.9 --
LAC 1984 1 14 Michael Cage +4.1 +1.0 +3.1 ++++
LAC 1985 1 3 Benoit Benjamin +2.4 +2.4 -0.1
LAC 1987 1 4 Reggie Williams +1.2 +2.3 -1.1 -
LAC 1987 1 13 Joe Wolf -0.2 +0.8 -1.1 -
LAC 1987 1 19 Ken Norman +2.0 +0.7 +1.3 +
LAC 1988 1 1 Danny Manning +4.5 +3.7 +0.7 +
LAC 1988 1 6 Hersey Hawkins +5.8 +3.0 +2.8 +++
LAC 1989 1 2 Danny Ferry +1.0 +4.1 -3.1 ----
LAC 1990 1 8 Bo Kimble -0.0 +3.2 -3.2 ----
LAC 1990 1 13 Loy Vaught +3.2 +1.6 +1.7 ++
LAC 1991 1 22 LeRon Ellis +0.3 +1.5 -1.2 -
LAC 1992 1 16 Randy Woods +0.0 +1.0 -1.0 -
LAC 1992 1 25 Elmore Spencer -0.1 +0.6 -0.6 -
LAC 1993 1 13 Terry Dehere +0.9 +1.1 -0.2
LAC 1994 1 7 Lamond Murray +0.8 +1.6 -0.8 -
LAC 1994 1 25 Greg Minor +1.2 +0.1 +1.1 +
LAC 1995 1 2 Antonio McDyess +4.0 +3.6 +0.4
LAC 1996 1 7 Lorenzen Wright +2.1 +1.6 +0.5 +
LAC 1997 1 14 Maurice Taylor +0.4 +0.9 -0.5 -
LAC 1998 1 1 Michael Olowokandi -0.4 +3.0 -3.3 ----
LAC 1998 1 22 Brian Skinner +1.2 +0.3 +0.9 +
LAC 1999 1 4 Lamar Odom +2.8 +3.0 -0.3


The DIFF column is the difference between expected wins and actual wins --
a positive result denotes a player who exceeded expectations. The
SIGNIFICANCE column shows how many ERRORS away from expectations that
player's performance was. For example, Mike Brooks averaged 2.2 wins over
his first 5 seasons. Based on his draft position and other factors, he was
expected to average 2.5 wins, for a DIFFerence of -0.3. The DIFFerence is
lower than the ERROR (0.87), so the difference is not statistically
significant. Tom Chambers averaged 3.3 wins, but was excted to average only
1.2. He exceeded expectations by +2.1 wins, which is two ERRORS over, shown
here as "++".

Now we can look at the Clippers' picks in terms of disappointments and
pleasant surprises. Those players with "-" in the SIGNIFICANCE column are
the disappointments and those with "+" are the pleasant surprises. Those
with nothing in that column are those who performed exactly to
expectations. The following graph shows how many ERRORS the Clippers picks
deviated from expectations.

-4 |XXX
-3 |
-2 |X
-1 |XXXXXXXX
0 |XXXX
+1 |XXXXXX
+2 |XX
+3 |XX
+4 |X

Four picks (15%) exactly met expectations, and eleven more (41%) exceeded
their expected win totals. Twelve picks (44%) were disappointnents. That
seems to me like a pretty average draft record.

Let's compare that to the Sonics' draft record:

Actual Expected
TEAM Year Round Pick PLAYER nWAR nWAR DIFF SIGNIFICANCE

SEA 1977 1 8 Jack Sikma +7.1 +3.4 +3.7 ++++
SEA 1979 1 6 James Bailey +1.3 +2.6 -1.4 --
SEA 1979 1 7 Vinnie Johnson +3.1 +3.2 -0.1
SEA 1980 1 20 Bill Hanzlik +1.6 +1.5 +0.0
SEA 1981 1 5 Danny Vranes +1.9 +3.2 -1.3 --
SEA 1983 1 16 Jon Sundvold +0.8 +1.1 -0.3
SEA 1985 1 4 Xavier McDaniel +4.0 +3.6 +0.5 +
SEA 1987 1 5 Scottie Pippen +5.5 +4.5 +1.0 +
SEA 1987 1 9 Derrick McKey +4.5 +3.0 +1.5 ++
SEA 1988 1 15 Gary Grant +0.6 +1.5 -0.9 -
SEA 1989 1 16 Dana Barros +2.4 +1.8 +0.6 +
SEA 1989 1 17 Shawn Kemp +5.8 +2.4 +3.4 ++++
SEA 1990 1 2 Gary Payton +5.7 +4.8 +0.9 +
SEA 1991 1 14 Rich King +0.0 +3.0 -3.0 ---
SEA 1992 1 17 Doug Christie +1.2 +1.4 -0.1
SEA 1993 1 23 Ervin Johnson +2.7 +1.8 +1.0 +
SEA 1994 1 11 Carlos Rogers +0.9 +1.7 -0.9 -
SEA 1995 1 26 Sherell Ford +0.1 +1.2 -1.2 -
SEA 1997 1 23 Bobby Jackson +1.3 +1.3 +0.1
SEA 1998 1 27 Vladimir Stepania +0.6 +0.0 +0.6 +
SEA 1999 1 13 Corey Maggette +3.2 +1.6 +1.7 ++

Fifteen of Seattle's picks (56%) met or exceeded expectations, about the
same as the Clippers.

-4 |
-3 |X
-2 |XX
-1 |XXX
+0 |XXXXX
+1 |XXXXXX
+2 |XX
+3 |
+4 |XX

My next post will explore the deviations from expectations over time, which
I believe was the original topic under discussion.

igor eduardo küpfer

unread,
Oct 10, 2004, 5:22:38 PM10/10/04
to
On Sun, 10 Oct 2004 16:23:51 -0400, igor eduardo küpfer
<edku...@example.com> wrote in
<na3jm01bhj5ch504k...@4ax.com>:

>Fifteen of Seattle's picks (56%) met or exceeded expectations, about the
>same as the Clippers.

That should be 71%, way better than the Clippers' 56%.

Big Chris

unread,
Oct 10, 2004, 5:56:44 PM10/10/04
to
igor eduardo küpfer wrote:
> On Sun, 10 Oct 2004 16:23:51 -0400, igor eduardo küpfer
> <edku...@example.com> wrote in
> <na3jm01bhj5ch504k...@4ax.com>:
>
>> Fifteen of Seattle's picks (56%) met or exceeded expectations, about
>> the same as the Clippers.
>
> That should be 71%, way better than the Clippers' 56%.

I see this type of application as interesting, in that it is a measurable
way to verify if your management team is doing a good job over time. One
could plot the Jerry West Lakers years to see if he had a big impact, or if
it was only partly him, and other parts xxx.

Big Chris


rave

unread,
Oct 10, 2004, 6:15:57 PM10/10/04
to

"Big Chris" <mr...@yahoo.com> wrote in message
news:2stpguF...@uni-berlin.de...

But, in the sonic sample, it covers how many GM's, and I think 3 owners. to
isolate the draft in terms of management, it would need to be broken down by
GM and Owner. You could actually add in a variable for coach and see which
coaches have impacted the draft for the sonics.

>
> Big Chris
>
>


igor eduardo küpfer

unread,
Oct 11, 2004, 1:02:24 PM10/11/04
to

In <aungm0la7ufvscemn...@4ax.com> I described my method for
calculating the expected number of wins (nWAR) a player will contribute
over the first five seasons following his being drafted. From this, we can
see which players may be judged to be "busts" -- those who produce fewer
than their expected number of wins. We can also find "steals," players who
exceed their expected win total.

Before we do that, I want to show how well 1st round players met
expectations over the years. If we subtract Expected nWAR from Actual nWAR,
and divide the difference by 0.87, we get a measure of how much that player
under- or over-achieved in terms of nWAR ERRORs (the ERROR term was
described previous post). The graph below shows the standard deviation of
nWAR ERRORS for each season between 1977 and 1999. A standard deviation is
a statistical measure of variability -- the more variation in a sample, the
higher the standard deviation. If my nWAR Expectation measure could
perfectly predict win production, the standard deviations would be zero.
However, if players' production was utterly uncorrelated to draft position,
and draft day was ultimately a crap shoot, the standard deviation would be
infinite. Of course, reality is somewhere between the two extremes.


- O
3.0+ . . . . |. . . . . . . . .
- |
StDev - |
- O O | O O O
- | | | | | |
2.0+ .| .| .O .O |.| .| .O . . . .O O.| .
- | O | | O | | | O | | O O O | | |
- | | | O | | | | | | | O | O | O | | O O | | |
- | | | | | | | | | | | | | | | | | | | | | | |
- | | | | | | | | | | | | | | | | | | | | | | |
1.0+ .| |.| |.| |.| |.| |.| |.| |.| |.| |.| |.| |.| .
- | | | | | | | | | | | | | | | | | | | | | | |
- | | | | | | | | | | | | | | | | | | | | | | |
- | | | | | | | | | | | | | | | | | | | | | | |
- | | | | | | | | | | | | | | | | | | | | | | |
0.0+ | | | | | | | | | | | | | | | | | | | | | | |
--------+---------+---------+---------+---------+-
1980 1985 1990 1995 2000


No real pattern emerges from this measure. The highest amount of deviation
from expectations came from the draft class of '84, the second highest in
'85 and '99. The draft classes that came closest to meeting expectations
were '88, '92, and '96.

But this isn't the only way of looking at this. The plot above looked at
all deviations from the expectations of my model, the better-than-expected
and the worse. But imagine if we were only interested in avoiding wasting a
draft pick. We'd want to know what percentage of draft picks underachieve
their expectations, how many become busts.

Let us define "bust" as a 1st round pick who averages 2 fewer wins over his
first 5 seasons than our regression model predicts for a player of his
draft position. Here, then, is the percentage of picks per season who
became busts:

-
-
30%+ . . . . . . . . . . . . .
- O
-% - | O O O O
- | | | | |
- | | | | | O O
20%+ .| . |. .| O.O |. .| |. .| . . . .
- | O | | | | | O | | O | O
- | | | | | | | | | | | | |
- | O | | | | | | | | | | | |
- | | | | | | | | | O | | | | O |
10%+ .| |.| |. .| |.| |.| |.| |.| .| |.O . O.| .
- | | | | O O | | | | | | | | | | | | O | |
- | | | | | | | | | | | | | | | | | | O | | |
- | | | | | | | | | | | | | | | O | | | | | | |
- | | | | | | | | | | | | | | | | | | | | | | |
0%+ | | | | | | | | | | | | | | | | | | | | | | |
--------+---------+---------+---------+---------+-Year
1980 1985 1990 1995 2000

Although the data are pretty noisy, one can see a definite trend: the
percentage of 1st round picks who become busts has decreased slightly over
time, although the '99 draft class (the latest one in my sample) has gone
back to the high pre-90's levels. Overall, about 20% of all picks in the
70s and 80s became busts. That number dropped to 13% in the 90s.

Now imagine that we are only interested in "steals," in picks that vastly
exceed their expectations. Someone in this position may take the "lottery"
picture of the draft fairly literally, and see that most picks never amount
to much. This person would wonder how many times the winning ticket has
come up.


- O
+% - O |
- | | O O
- O | | | |
30%+ .| .| . . |. .| . . . . . .| .
- | | | | |
- | | O | O | |
- | | | | | | |
- | | | | | | |
20%+ .| .| .| . |.| .| . . . . . .| .
- | | | O | | | O O O |
- | | | | O | | | | | | |
- | O | | | | | | | O O | O O | | |
- | | | | | | | | | O O | | | | O | | | |
10%+ .| |.| .| |.| |.| .| |.| |.| |.| |.| |.| O.| .
- | | | O | | | | | | | | | | | | | | | | | |
- | | | | | | | | | | | | | | | | | | | | | |
- | | | | | | | | | | | | | | | | | | | | | |
- | | | | | | | | | | | | | | | | | | | | | |
0%+ | | | | | | | | | O | | | | | | | | | | | | |
--------+---------+---------+---------+---------+-Year
1980 1985 1990 1995 2000

These data are even noisier than the bust data, but a similar trend, I
think, is apparent: getting a steal in the draft was much likelier in the
past than it has become -- even if '99 harkened back to pre-1985 levels. In
the 70s, 27% of all picks became steals. That number dropped to 19% in the
80s, and dropped further to 17% in the 90s.

More to come. In a future post I will look at variations within the first
round picks, and also include some analysis of seconds round picks and
non-drafted players.

Jeremey Wilson

unread,
Oct 11, 2004, 1:17:01 PM10/11/04
to

"igor eduardo küpfer" <edku...@example.com> wrote in message
news:ldelm01svnuv7i09d...@4ax.com...

>
> Although the data are pretty noisy, one can see a definite trend: the
> percentage of 1st round picks who become busts has decreased slightly over
> time, although the '99 draft class (the latest one in my sample) has gone
> back to the high pre-90's levels. Overall, about 20% of all picks in the
> 70s and 80s became busts. That number dropped to 13% in the 90s.
>
> Now imagine that we are only interested in "steals," in picks that vastly
> exceed their expectations. Someone in this position may take the "lottery"
> picture of the draft fairly literally, and see that most picks never amount
> to much. This person would wonder how many times the winning ticket has
> come up.

[...]

> These data are even noisier than the bust data, but a similar trend, I
> think, is apparent: getting a steal in the draft was much likelier in the
> past than it has become -- even if '99 harkened back to pre-1985 levels. In
> the 70s, 27% of all picks became steals. That number dropped to 19% in the
> 80s, and dropped further to 17% in the 90s.

Sorry that I have to rely on you all my math for me -- I was learning about
postmodern literary theory when people with futures were taking stat courses --
but how does bustiness in a draft correlate to boominess? Eyeballing the
graphs, it looks like some, which would make sense (if the talent pool remains
relatively constant, a bad player getting drafted earlier means that a good
player will be available to be drafted later), but that's just eyeballing, and
it's eyeballing with a preconceived notion, to boot.

I can clarify, if that didn't make sense.

--
Jeremey


Jeremey Wilson

unread,
Oct 11, 2004, 1:25:16 PM10/11/04
to

"Jeremey Wilson" <noaddre...@yahoo.com> wrote in message
news:hezad.5645$5b1....@newssvr17.news.prodigy.com...

> Sorry that I have to rely on you all my math for me --

> I can clarify, if that didn't make sense.

Rely on you to do all my math for me. Goddammit.

--
Jeremey


igor eduardo küpfer

unread,
Oct 11, 2004, 1:38:40 PM10/11/04
to
On Mon, 11 Oct 2004 17:17:01 GMT, "Jeremey Wilson"
<noaddre...@yahoo.com> wrote in
<hezad.5645$5b1....@newssvr17.news.prodigy.com>:

The correlation is weak, and statistically insignificant (the latter due to
small sample sizes, likely). Below I plot the Bust% (on the left axis)
against the Steal% (on the bottom axis). A perfect correlation would be
displayed as a straight diagonal line from the bottom left to top right.
Zero correlation would look something like a ball of marks centered in the
plot. You can see that the actual data shows little relationship between
Bust% and Steal%.

- x
-% - x x x x
-
- 2
0.20+ x x
- x xxx
-
- x
- xx
0.10+ x x
- xx x
- x
- x
-
0.00+
+---------+---------+---------+---------+%
0.00 0.10 0.20 0.30

However, I think there is a relationship, but the data is much too noisy to
register it. I attempted to correct for this by lumping steals and busts
together, labeling them all "deviations from expectations," and showing the
results in the Standard Deviation graph upthread. What I was trying to
capture is the "chanciness" of the draft, and how it has slightly declined
over time.

Big Chris

unread,
Oct 11, 2004, 2:04:48 PM10/11/04
to
igor eduardo küpfer wrote lots of good math stuff that I clipped:

Have you examined this on a position by position basis at all? Now that I
think of it, that has a double (and doubly interesting) meaning. By
position in draft (does #4 more consistantly perform at or above standard
than #3 for instance) as well as do SG's regulary exceed expecation where PG
underperform? I assume this could be extrapolated anyhow, though perhaps it
is not of interest to anyone else. I was just thinking how "the best
available player" drafting theory might be affirmed or cut down through
this, if say SG's were shown to consistantly out perform all other
positions, and on the draft board you were picking between an SG and a PF
with all other things being equal (not that they ever are).

Well, if nothing else, I'm glad you're smart enough to pull this all
together and make a cohesive presentation of it all.

Big Chris

igor eduardo küpfer

unread,
Oct 11, 2004, 2:13:31 PM10/11/04
to
On Mon, 11 Oct 2004 13:04:48 -0500, "Big Chris" <mr...@yahoo.com> wrote in
<2t00a6F...@uni-berlin.de>:

Thanks Chris. I'll be looking at your question about draft position in the
next couple of days. My guess is that there is a draft position effect, ie
that some draft positions deviate more from expectations than others.

The other question WRT floor position is a little more difficult, but worth
studying. I attempted to include it in a half-ass way in my original
regression equation: I included a "height" variable, which turned out to be
statistically significant in predicting production. Height is, of course,
strongly correlated to position.

<talking to myself>
See if you can remove the height variable, and re-run the regression. Check
the error against the original error. If not substantially different, group
positions and check for trends. A bust/steal plot would probably be a good
place to start.

Big Chris

unread,
Oct 11, 2004, 11:28:24 PM10/11/04
to
----- Original Message -----
From: "igor eduardo küpfer" <edku...@example.com>
> <talking to myself>
> See if you can remove the height variable, and re-run the regression.
> Check
> the error against the original error. If not substantially different,
> group
> positions and check for trends. A bust/steal plot would probably be a good
> place to start.
>
> --


I'm looking forward to your findings. Certainly more interesting than
pre-season games.

Big Chris


Chris Hafner

unread,
Oct 12, 2004, 6:11:36 PM10/12/04
to
"igor eduardo küpfer" <edku...@example.com> wrote in message
news:iv9jm0trk4r15qpte...@4ax.com...

> On Sun, 10 Oct 2004 16:23:51 -0400, igor eduardo küpfer
> <edku...@example.com> wrote in
> <na3jm01bhj5ch504k...@4ax.com>:
>
> >Fifteen of Seattle's picks (56%) met or exceeded expectations, about the
> >same as the Clippers.
>
> That should be 71%, way better than the Clippers' 56%.

Damn right!

Cheers,
Chris Hafner


Chris Hafner

unread,
Oct 12, 2004, 6:13:10 PM10/12/04
to
"igor eduardo küpfer" <edku...@example.com> wrote in message
news:aungm0la7ufvscemn...@4ax.com...

> Okay, my long awaited post. To tell the truth, I've kinda forgot why I did
> this study in the first place, so if I wander off the path, give me a
> little nudge, won't you?

<snip>

Both this and your follow-ups are absolutely brilliant, Ed. This is the kind
of thing only you can add to a newsgroup.

Great work. Whatever you do for a living, I'm convinced you need to stop
that immediately and find a way to make your aptitude for statistics work
for you (if it doesn't already).

Cheers,
Chris Hafner


Chris Hafner

unread,
Oct 12, 2004, 6:10:20 PM10/12/04
to
"Michael" <mich...@twentyten.org> wrote in message
news:10mo7i4...@corp.supernews.com...

> "igor eduardo küpfer" <edku...@example.com> wrote in message
>
> [snip]

>
> > Although the data are pretty noisy, one can see a definite trend: the
> > percentage of 1st round picks who become busts has decreased slightly
over
> > time, although the '99 draft class (the latest one in my sample) has
gone
> > back to the high pre-90's levels. Overall, about 20% of all picks in the
> > 70s and 80s became busts. That number dropped to 13% in the 90s.
>
> Less busts -- check.

Keep in mind that the draft classes Ed has studied ends at 1999 - the major
influx of high school players that was at issue in our conversation happened
after that, which means that his findings don't necessarily completely
address our conversation, especially since my point was that more busts
*higher* in the first round push better players lower, not that there are
more busts in the whole first round (which is what Ed's data show).

> [snip]


>
> > These data are even noisier than the bust data, but a similar trend, I
> > think, is apparent: getting a steal in the draft was much likelier in
the
> > past than it has become -- even if '99 harkened back to pre-1985 levels.
> In
> > the 70s, 27% of all picks became steals. That number dropped to 19% in
the
> > 80s, and dropped further to 17% in the 90s.
>

> More steals -- check.

He's saying that there are fewer steals, right?

"... getting a steal in the draft was much likelier in the past than it has
become ..."

And since we both agreed there were more steals now (though we disagreed on
the reasons), I guess we're both wrong here.

:-(

We both have an out here again, though, because again the years most
fiercely under debate are the ones with the highest-percentage of
high-school players, which are the ones not included in the study (for legit
reasons).

If we assume that either one of us is right, perhaps the effect was weak
enough up to 1999 (because the draft hadn't changed as dramatically yet)
that increased scouting sophistication takes some of the uncertainty out?

> See Chris, Igor agrees with me! As does his calculator!

Ed's calculator is non-sentient. I'm hoping so, anyway.

Cheers,
Chris Hafner


Chris Hafner

unread,
Oct 13, 2004, 2:25:38 PM10/13/04
to
"igor eduardo küpfer" <edku...@example.com> wrote in message
news:5trqm0poffqfmdf9l...@4ax.com...
> On Wed, 13 Oct 2004 10:50:27 -0700, "Chris Hafner" <haf...@peoplepc.com>
> wrote in <416d...@news.usenetzone.com>:
>
> >It's amazing how orderly the downward progression of win shares
> >is as you descend draft order, especially after the big gaps in the first
> >five picks.
>
> Not so amazing: I smoothed the data out to produce the orderly
progression.
> The reality, like all facets of life, is messier:
>
> Smoothed Actual
> #1 20.1 20.3
> #2 16.1 13.6
> #3 13.8 17.7
> #4 12.1 10.1
> #5 10.8 12.0
> #6 9.8 6.9
> #7 8.9 7.7
> #8 8.1 7.8
> #9 7.5 10.1
> #10 6.9 7.9
> #11 6.3 7.4
> #12 5.8 6.7
> #13 5.3 7.5
> #14 4.9 5.6
> #15 4.5 2.7
> #16 4.2 3.8
> #17 3.8 1.4
> #18 3.5 7.0
> #19 3.2 2.0
> #20 2.9 3.1
> NonTop20 2.6 0.9
> Undrafted 2.3 1.2

Hey! That's more like what I would've expected to see.

What's the reasoning behind the smoothing process?

Cheers,
Chris Hafner


_________________________________________
Usenet Zone Free Binaries Usenet Server
More than 120,000 groups
Unlimited download
http://www.usenetzone.com to open account

igor eduardo küpfer

unread,
Oct 13, 2004, 5:04:36 PM10/13/04
to
On Wed, 13 Oct 2004 11:25:38 -0700, "Chris Hafner" <haf...@peoplepc.com>
wrote in <416d...@news.usenetzone.com>:

>"igor eduardo küpfer" <edku...@example.com> wrote in message
>news:5trqm0poffqfmdf9l...@4ax.com...
>> On Wed, 13 Oct 2004 10:50:27 -0700, "Chris Hafner" <haf...@peoplepc.com>
>> wrote in <416d...@news.usenetzone.com>:
>>
>> >It's amazing how orderly the downward progression of win shares
>> >is as you descend draft order, especially after the big gaps in the first
>> >five picks.
>>
>> Not so amazing: I smoothed the data out to produce the orderly
>progression.
>> The reality, like all facets of life, is messier:
>>
>> Smoothed Actual
>> #1 20.1 20.3

...


>> #19 3.2 2.0
>> #20 2.9 3.1
>> NonTop20 2.6 0.9
>> Undrafted 2.3 1.2
>
>Hey! That's more like what I would've expected to see.
>
>What's the reasoning behind the smoothing process?

The smoothing was done using Excel's Trendline chart function. Essentially,
it's a linear-log regression, used when the drop off starts quickly, and
then fades to almost zero, like we see above.

If you're asking about mathematical justification, well, I have none. You
aren't supposed to use this sort of regression on ordinal data. There are
linear-log regression models for non-interval/ratio data, but I don't know
exactly how to use them. I was hoping that my linear-log model was robust
enough to handle the non-standard data, and that since we weren't doing
open heart surgery, any mistakes wouldn't make all that much difference.

Jeremey Wilson

unread,
Oct 13, 2004, 12:54:19 PM10/13/04
to

"igor eduardo küpfer" <edku...@example.com> wrote in message
news:h6mqm0d2lcj5q7nrh...@4ax.com...

> | 1st5 2nd5 | season season season season season
> DraftPick|seasons seasons | 1 2 3 4 5
> -------------------------------------------------------------------
> #1 | 20.1 16.5 | 3.4 3.9 4.5 4.2 4.0
> #2 | 16.1 13.2 | 2.7 3.1 3.7 3.4 3.2
> #3 | 13.8 11.3 | 2.3 2.6 3.1 3.0 2.8
> #4 | 12.1 9.9 | 1.9 2.3 2.8 2.6 2.5
> #5 | 10.8 8.9 | 1.7 2.0 2.5 2.4 2.2
> | |
> #6 | 9.8 8.0 | 1.5 1.8 2.3 2.2 2.1
> #7 | 8.9 7.3 | 1.4 1.6 2.1 2.0 1.9
> #8 | 8.1 6.6 | 1.2 1.4 1.9 1.8 1.7
> #9 | 7.5 6.1 | 1.1 1.3 1.7 1.7 1.6
> #10 | 6.9 5.6 | 1.0 1.2 1.6 1.6 1.5
> | |
> #11 | 6.3 5.1 | 0.9 1.1 1.5 1.5 1.4
> #12 | 5.8 4.7 | 0.8 1.0 1.4 1.4 1.3
> #13 | 5.3 4.3 | 0.7 0.9 1.3 1.3 1.2
> #14 | 4.9 3.9 | 0.6 0.8 1.2 1.2 1.1
> #15 | 4.5 3.6 | 0.6 0.7 1.1 1.1 1.1
> | |
> #16 | 4.2 3.3 | 0.5 0.6 1.0 1.0 1.0
> #17 | 3.8 3.0 | 0.4 0.5 0.9 1.0 0.9
> #18 | 3.5 2.7 | 0.4 0.5 0.9 0.9 0.9
> #19 | 3.2 2.5 | 0.3 0.4 0.8 0.8 0.8
> #20 | 2.9 2.2 | 0.2 0.3 0.7 0.8 0.8
> | |
> NonTop20 | 2.6 2.0 | 0.2 0.3 0.7 0.7 0.7
> pick | |
>
> Undrafted| 2.3 1.8 | 0.1 0.2 0.6 0.7 0.7

Any idea on why the second 5 seasons are worse than the first 5, and production
seems to peak in year 3? Is that busts leaving the league? Just not what I was
expecting.

--
Jeremey


igor eduardo küpfer

unread,
Oct 13, 2004, 1:31:06 PM10/13/04
to
On Wed, 13 Oct 2004 16:54:19 GMT, "Jeremey Wilson"
<noaddre...@yahoo.com> wrote in
<%4dbd.14613$wT....@newssvr31.news.prodigy.com>:


>> Undrafted| 2.3 1.8 | 0.1 0.2 0.6 0.7 0.7
>
>Any idea on why the second 5 seasons are worse than the first 5, and production
>seems to peak in year 3? Is that busts leaving the league? Just not what I was
>expecting.

Elementary. Most players flame out after reaching their 5th season. The
teams that chose those players are getting zero value.

Average number of minutes played by years in the league, with a little
graph to show the steep drop off after year 5:

Years MIN
1 637 XXXXXXXXXXXXXXXXXXXXXXXXX
2 786 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
3 761 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
4 749 XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
5 707 XXXXXXXXXXXXXXXXXXXXXXXXXXXX
6 631 XXXXXXXXXXXXXXXXXXXXXXXXX
7 576 XXXXXXXXXXXXXXXXXXXXXXX
8 496 XXXXXXXXXXXXXXXXXXX
9 404 XXXXXXXXXXXXXXXX
10 319 XXXXXXXXXXXX
11 231 XXXXXXXXX
12 178 XXXXXXX
13 128 XXXXX
14 84 XXX
15 48 X

Chris Hafner

unread,
Oct 14, 2004, 11:03:32 PM10/14/04
to
"igor eduardo küpfer" <edku...@example.com> wrote in message
news:df5rm019ou7ighe88...@4ax.com...

Ah - gotcha.

> If you're asking about mathematical justification, well, I have none. You
> aren't supposed to use this sort of regression on ordinal data. There are
> linear-log regression models for non-interval/ratio data, but I don't know
> exactly how to use them. I was hoping that my linear-log model was robust
> enough to handle the non-standard data, and that since we weren't doing
> open heart surgery, any mistakes wouldn't make all that much difference.

Since I followed roughly seven percent of what you said above, I can offer
no objection. I'm not even sure that I'd want to - the cleaner data is
easier to absorb and tells us what we really want to know.

And, as you say, this isn't exactly life-or-death stuff.

Cheers,
Chris Hafner


Ron Coscorrosa

unread,
Oct 16, 2004, 10:51:20 PM10/16/04
to
On Tue, 12 Oct 2004 15:10:20 -0700, Chris Hafner wrote:

> Ed's calculator is non-sentient. I'm hoping so, anyway.

Hell, Ed was non-sentient for awhile there.

(Ed, your sister will laugh at that one).

Ron Coscorrosa

unread,
Oct 16, 2004, 10:51:21 PM10/16/04
to
On Sun, 10 Oct 2004 21:34:47 -0400, igor eduardo küpfer wrote:

> By the way, sorry about the Seahawks game. First Hawks game I've seen in
> about 3 years, and they blew it in overtime. Look forward to watching the
> Pats game.

Same here, it should be a good one.

Oh whoops, I forgot to put on my Seahawk's fan hat. The game will
probably suck, the Seahawks blew their chances at making the playoffs,
Holmgren should be fired, Koren Robinson is the antichrist, and it's the
end of the world.

> ....


>>> Okay, that done. The regression equation is
>>>
>>> 1st5 = - 421363 + 644*Year - 0.33*Year^2 +0.000056*Year^3
>>> + 0.070*AGE - 0.00455*AGE^2 +0.000040*AGE^3
>>> - 5.61*Ht + 0.0732*Ht^2 -0.000318*Ht^3
>>> - 0.290*OvAlPk + 0.00719*OvAlPk^2 -0.000060*OvAlPk^3
>>> + 8.6*Teams - 0.299*Teams^2 + 0.0034*Teams^3
>>> + 0.0708*AllPs -0.000485*AllPs^2 +0.000001*AllPs^3
>>> - 139*HHI5 + 570*HHI5^2 - 732*HHI5^3
>>> - 23.6*HHI5_1 + 64.3*HHI5_1^2 - 52*HHI5_1^3
>>> + 22.1*W_L5 - 66.6*W_L5^2 + 70.5*W_L5^3
>>> - 15.6*W_L5_1 + 43.2*W_L5_1^2 - 41.6*W_L5_1^3
>>
>>Just curious: how did you calculate this? I presume some software package,
>>it seems to me that with all the variables, even with software, that would
>>take forever to compute. That's really really cool though, I can see how
>>that would come in useful in the general "how much does X correlate with Y"
>>type questions.
>>
>

> The method is known as Ordinary Least Squares. It is among the oldest
> statistical techniques -- I'm sure the algorithm for minimizing errors (the
> "least squares" part) goes back a century. The method for using multiple
> variables like I've done above probably goes back almost as long, but it
> had to wait until computers to make it really accessible.

That figures.

> It doesn't take
> any time at all for any stats package to work out the optimal values for
> each variable.

Well, maybe I'm being pedantic, or ignorant, but are you sure about that?

Are you sure it's not just giving you a *good* value for each variable,
and not the optimal? There's a lot of algorithms that can generate a
"good" result but not guarantee that that result is the best. It just
seems like with 20 dependent variables it's asking a lot to get the
optimal solution (although it's not asking a lot to get a solution that's
"close enough."). Anyway, it's interesting either way.


>><snip>
>>
>>> The column labeled "P" shows the statistical significance. We are
>>> looking for variables which have low p-values, below 0.05. Once I
>>> remove the variables that aren't significant, we end up with this:
>>>
>>> 1st5 = 2.69
>>> + 0.00668*Ht
>>> - 0.189*OvAlPk + 0.00228*OvAlPk^2
>>> - 15.5*HHI5 + 32.7*HHI5^2
>>> + 8.12*W_L5
>>> - 2.45*W_L5_1
>>
>>Interesting how age isn't in there.
>>
>>

> Got weeded out as statistically insignificant. I *think* it's because so
> many players in the sample are the same 2 ages -- almost 75% of the
> players are drafted at 21 and 22 -- that my sample isn't large enough to
> pick up the subtle effect of age.

Or that the percentage of old flame-outs is the same for young flame-outs.

I would have intuited that old players would contribute sooner, but maybe
that's not the case, since many of them never pan out.

> ....


>>
>>That's really pretty damn good. But, you need to apply the model for a
>>longer period of time to see how good it really is.
>

> Well, I did it for every draft from 1977. I only showed a sample above
> to give a feel for the data.

Oh. For some reason I thought it was < than a 10 year period you ran.

>> It would be
>>interesting to see how relevant the increase in high school players
>>drafted really is (my guess: not that much) - just look at the average
>>NWAR over the first X years prior to, say, the draft with KG and the
>>draft with KG and after. It's also be interesting to see, in general,
>>if high school players end up being better than college players on
>>average (for whatever definition of "average").
>>
>>

> I can't imagine that HS players affect this very much. The sample simply
> isn't large enough to draw any firm conclusions -- remember, I'm looking
> at data from players' first five seasons following the draft, which
> means I can't use player data post-1999, and that's when most of the HS
> players have come into the league.

I remember you did a study correlating a players first 30 games with their
career - and that the correlation was actually pretty strong (or stronger
than one would think). Maybe you could cheat and just use first year
contributions.

>>> In my next post I will apply this model to drafts over the years to
>>> see if there are consistent deviations from the model.
>>
>>Well, there you go.
>>
>>Also, unrelated, but I remember you mentioning something (how's that for
>>vague?) which analyzed game logs to pull out interesting stats - what
>>was that? I was thinking of making something like that in my "free
>>time." I was also thinking of yanking the box score stats from every
>>game and making them free available in an RSS feed so other people could
>>parse them easily (or, more accurately, so I could mess with them later
>>in the year).
>>
>>

> I have all the raw data, box scores and game logs. I'd love to provide
> the former to anyone who wants to maintain a public DB or something. I
> can do limited work on the play-by-play logs -- limited by my lame
> programming abilities. What is needed is someone to program a parser for
> these logs. I'm asking around to see if I can get someone interested in
> that project.

I could write a parser for them, I don't know if I'll have enough time.
Send me an e-mail with a few game logs attached and what info you would
like extracted. I'm not making any promises.

>>Anyway, that's really interesting stuff. You should have one of those
>>"blog thingys" man.
>

> Yeah, if someone wants to donate the software and technical knowledge,
> I'll get right on that.

Blogs are like drivers licenses. They must obviously be easy to use
considering how many shitty blogs (or drivers) there are in existence.

The key part would be to find one that a) was free and b) didn't suck ass.
I have no experience on this unfortunately. If you run into technical
snafu's I could help, but as far as choosing one system or another - I
have no idea. If I were to have a blog I'd probably write the software
myself (spend 10 minutes learning blog software or a week writing it? The
latter obviously!).

Hmm.

Ron Coscorrosa

unread,
Oct 16, 2004, 10:51:27 PM10/16/04
to
On Mon, 11 Oct 2004 13:02:24 -0400, igor eduardo küpfer wrote:

<snip>

I love ascii graphs. *love*

> These data are even noisier than the bust data

I love taking quotes out of context. *love*

Ron Coscorrosa

unread,
Oct 16, 2004, 10:51:22 PM10/16/04
to

His stats and his writing both. He's a talented dude, that Ed.

Ron Coscorrosa

unread,
Oct 16, 2004, 10:51:32 PM10/16/04
to

Fire Wally Holmgren!!!!1!!11!

igor eduardo küpfer

unread,
Oct 17, 2004, 7:33:28 PM10/17/04
to
On Sat, 16 Oct 2004 19:51:21 -0700, Ron Coscorrosa
<cosco...@SPAMSUCKS.comcast.net> wrote in
<pan.2004.10.17....@SPAMSUCKS.comcast.net>:

>On Sun, 10 Oct 2004 21:34:47 -0400, igor eduardo küpfer wrote:
>
>> By the way, sorry about the Seahawks game. First Hawks game I've seen in
>> about 3 years, and they blew it in overtime. Look forward to watching the
>> Pats game.
>
>Same here, it should be a good one.
>
>Oh whoops, I forgot to put on my Seahawk's fan hat. The game will
>probably suck, the Seahawks blew their chances at making the playoffs,
>Holmgren should be fired, Koren Robinson is the antichrist, and it's the
>end of the world.
>

The game did suck (and by suck, I mean it was pretty good but the Hawks
lost).

...


>
>> It doesn't take
>> any time at all for any stats package to work out the optimal values for
>> each variable.
>
>Well, maybe I'm being pedantic, or ignorant, but are you sure about that?
>
>Are you sure it's not just giving you a *good* value for each variable,
>and not the optimal? There's a lot of algorithms that can generate a
>"good" result but not guarantee that that result is the best. It just
>seems like with 20 dependent variables it's asking a lot to get the
>optimal solution (although it's not asking a lot to get a solution that's
>"close enough."). Anyway, it's interesting either way.

Of course I'm not sure. My stats pack is giving me 4 significant digits --
I'm going to hazard the guess that it optimizes to the point when these
digits change no longer. The computing time for calculating the least
squares for ~30 variables was about 4 or 5 seconds.

...


>> I can't imagine that HS players affect this very much. The sample simply
>> isn't large enough to draw any firm conclusions -- remember, I'm looking
>> at data from players' first five seasons following the draft, which
>> means I can't use player data post-1999, and that's when most of the HS
>> players have come into the league.
>
>I remember you did a study correlating a players first 30 games with their
>career - and that the correlation was actually pretty strong (or stronger
>than one would think). Maybe you could cheat and just use first year
>contributions.

That was for counting stats -- eg points per game, assists per game, etc.
The stats I used for this study were more ability-based. These are highly
variable from year to year. Punching them in would screw the results, I
think.

>>>
>> I have all the raw data, box scores and game logs. I'd love to provide
>> the former to anyone who wants to maintain a public DB or something. I
>> can do limited work on the play-by-play logs -- limited by my lame
>> programming abilities. What is needed is someone to program a parser for
>> these logs. I'm asking around to see if I can get someone interested in
>> that project.
>
>I could write a parser for them, I don't know if I'll have enough time.
>Send me an e-mail with a few game logs attached and what info you would
>like extracted. I'm not making any promises.

I forbid you to spend any time whatsoever on this. Do not devote time which
would be better spent on improving your life even considering the idea.
This is not reverse psychology.

I'll email you this week.

...

Ron Coscorrosa

unread,
Oct 21, 2004, 10:21:37 PM10/21/04
to
On Sun, 17 Oct 2004 19:33:28 -0400, igor eduardo küpfer wrote:

> The game did suck (and by suck, I mean it was pretty good but the Hawks
> lost).

Yeah, another game where they outplayed the other team for 3 quarters and
still lost.

>>> It doesn't take
>>> any time at all for any stats package to work out the optimal values for
>>> each variable.
>>
>>Well, maybe I'm being pedantic, or ignorant, but are you sure about that?
>>
>>Are you sure it's not just giving you a *good* value for each variable,
>>and not the optimal? There's a lot of algorithms that can generate a
>>"good" result but not guarantee that that result is the best. It just
>>seems like with 20 dependent variables it's asking a lot to get the
>>optimal solution (although it's not asking a lot to get a solution that's
>>"close enough."). Anyway, it's interesting either way.
>
> Of course I'm not sure.

You seem pretty goddamn certain of your uncertainty there pal.

> My stats pack is giving me 4 significant digits --
> I'm going to hazard the guess that it optimizes to the point when these
> digits change no longer. The computing time for calculating the least
> squares for ~30 variables was about 4 or 5 seconds.

I guess it's not too complicated, it's a lot like solving systems of
equations, which can be done pretty fast using matrices. But still, 30
variables in 4-5 seconds is pretty amazing to me.

> ...
>>> I can't imagine that HS players affect this very much. The sample simply
>>> isn't large enough to draw any firm conclusions -- remember, I'm looking
>>> at data from players' first five seasons following the draft, which
>>> means I can't use player data post-1999, and that's when most of the HS
>>> players have come into the league.
>>
>>I remember you did a study correlating a players first 30 games with their
>>career - and that the correlation was actually pretty strong (or stronger
>>than one would think). Maybe you could cheat and just use first year
>>contributions.
>
> That was for counting stats -- eg points per game, assists per game, etc.
> The stats I used for this study were more ability-based. These are highly
> variable from year to year. Punching them in would screw the results, I
> think.

Well, screw it then (not the results, the idea).

>>>>
>>> I have all the raw data, box scores and game logs. I'd love to provide
>>> the former to anyone who wants to maintain a public DB or something. I
>>> can do limited work on the play-by-play logs -- limited by my lame
>>> programming abilities. What is needed is someone to program a parser for
>>> these logs. I'm asking around to see if I can get someone interested in
>>> that project.
>>
>>I could write a parser for them, I don't know if I'll have enough time.
>>Send me an e-mail with a few game logs attached and what info you would
>>like extracted. I'm not making any promises.
>
> I forbid you to spend any time whatsoever on this. Do not devote time which
> would be better spent on improving your life even considering the idea.
> This is not reverse psychology.
>
> I'll email you this week.

Liar. Unless you e-mail me in the second half of the week.

Reply all
Reply to author
Forward
0 new messages