Pepperdine PTEM (Player's Team Efficiency Margin)

23 views
Skip to first unread message

KL Onthank

unread,
Dec 23, 2011, 7:28:27 PM12/23/11
to cougcenter-stati...@googlegroups.com
For the pepperdine game I calculated what I am going to start calling the "Player's Team Efficiency Margin".  This is the same stat I calculated Western Oregon game, and is supposed to be a efficiency based equivalent of the +/- stat. 
In short this is the team's difference in offensive efficiency and defensive efficiency while a given player is on the floor. 
You can also view this as average net points for WSU per possession while a player was in the game.  So on average while Reggie was in the game WSU went ahead by 1 point every 4 possessions (PTEM=0.25), while Brock was in WSU went up by a point every other possession (PTEM=0.49), and went up by one point every 10 possessions while Lacy was in the game (PTEM=0.1), but went down by more than a point every possession Simon was in the game (PTEM=-1.08). 
Anyhow, here is the chart with team offensive efficiency while each player was in the game (OE), defensive efficiency while each player was in the game (DE), and the difference between the two: PTEM.
OH, I haven't seen this particular stat generated anywhere else, but if you have, please let me know.  I don't mean to be ripping off anyone else.

OE DE PTEM
Moore 0.98           
0.73          
0.25
Motum 1.12 0.63 0.49
Capers 0.89 0.75 0.14
Lacy 1.05 0.95 0.1
Aden 1.22 0.82 0.39
Enquist 0.89 1.02 −0.13
Shelton                  
1 0.33 0.67
Kernich-Drew 0.4 1.5 −1.1
Ladd 1.09 0.96 0.13
Dilorio 1 1.67 −0.67
Simon 0.67 1.75 −1.08
Lodwick 1.08 1 0.08
Also, here is the possession by possession dataset for the Pepperdine game:
https://docs.google.com/spreadsheet/ccc?key=0AkQiMBZz54ffdFZmaU9pdjNha0ZIbkxzWkphSmdTcGc




Jeff Nusser

unread,
Dec 23, 2011, 7:44:58 PM12/23/11
to cougcenter-stati...@googlegroups.com
Interesting. I'm not a huge fan of +/- because I think there's so much noise -- the constant shuffling of personnel changes the context so much -- but I think this definitely is better than traditional +/-, for sure.

Dan Shirley

unread,
Dec 23, 2011, 7:56:58 PM12/23/11
to cougcenter-stati...@googlegroups.com
Just a quick thought off the top of my head, this sort of looks like the Roland Rating for each player (Reggie Moore's for this season indicated by the RR column)

Jeff Nusser

unread,
Dec 23, 2011, 7:59:00 PM12/23/11
to cougcenter-stati...@googlegroups.com
Roland Rating is like +/- on steroids. It takes into account what happens when the player is both on the floor and off the floor. It's not a possession-based stat like Kirt is doing.

KL Onthank

unread,
Dec 23, 2011, 9:44:41 PM12/23/11
to cougcenter-stati...@googlegroups.com
I agree, there is a lot of noise in +/- type stats.  But as with a lot of stats, I think it is important to ask: What can this number tell me, and what doesn't it tell me.  With these +/- family of stats there is that noise, so I think any one result you can take with a grain of salt, but when there is a recurring trend in the stat, one can be fairly confident there is a real phenomenon behind it, not just random variation (incidentally if there is a recurring trend it is possible to go back see what the odds are that its noise of signal.)
However, one of the things that I like about the concept of the +/- family of stats is that it is an attempt quantify a player's overall influence on a team.  On offense if a player isn't shooting or getting assists, they are pretty invisible (unless of course they are turning it over).  On defense there really isn't anything to gauge a player's contribution. 
Also, the more personnel sets a player plays with the better for the usability of the stats.  Ideally a player would play with all possible combinations of teammate for a equal amount of time.  You can think about it this way: image if there was no variation in personnel sets: players 1-5 always played together and were always subbed out en mass with players 6-10.  In such a situation any +/- stat would be completely useless and there would be no way to differentiate contributions of players 1-5 from each other or 6-10.  But as you increase the mixing of players playing together the comparison becomes more reliable.  If the Offensive efficiency with Aden on the floor is 0.95, while offensive efficiency with him off the floor is 1.40, and all other players on the floor are well mixed when aden is both on and off, you can be pretty confident the drop in efficiency is somehow related to Aden being on the floor...

Anyhow, I don't mean to go on..  +/- type stats have limitations, and they should be interpreted with those limitations in mind, but I also think the general approach can provide valuable insight that other number's can't really provide.  I do wonder, however, if there might be a way to help control for the noise...

Jeff Nusser

unread,
Dec 23, 2011, 9:51:49 PM12/23/11
to cougcenter-stati...@googlegroups.com
These are excellent points. I hadn't thought about the combinations of players actually being beneficial.

Here's a piece Ken Pomeroy did on +/-. What do you guys think?

http://kenpom.com/blog/index.php/weblog/a_treatise_on_plus_minus/

KL Onthank

unread,
Dec 25, 2011, 1:53:33 AM12/25/11
to cougcenter-stati...@googlegroups.com
Pomeroy give a convincing condemnation of +/-, but really doesn't put it into context.  Yup, +/- has a ton of noise in it.  Let me give you an counter example: A team's offensive efficiency is generally assumed to be a reliable stat for comparing performances from game to game and between teams, right?
So I used Pomeroy's same methodology:  simulated 20 games, 70 possessions each.  In each possession there was a 3% chance of scoring 1 pt, 30% chance of scoring 2pts, and 15% chance of scoring 3pts.  Now, unlike Pomeroy's simulation, I calculated Offensive efficiency.  We would expect, offensive efficiencies to remain pretty stable from game to game because the probability of the team scoring on any given possession isn't change, as they would when playing varying prowesses of defense.  (Oh, and I should mention that with the percentages that Pomeroy selected, the expected average OE is 1.07). 
Here is what the OE did over the 20 games;

 1:  1.41
 2:  0.93
 3:  1.20
 4:  1.21
 5:  1.11
 6:  1.01
 7:  1.03
 8:  1.30
 9:  0.80
10: 0.92
11: 0.96
12: 1.14
13: 1.21
14: 1.03
15: 1.17
16: 1.07
17: 0.97
18: 1.01
19: 1.16
20: 0.89

There is a ton of variation, ranging from 0.80 to 1.41.   Tell me that everyone wouldn't going bonkers about games 9-11 and the slumping offense. 
Anyhow, I ran it over 50 seasons like Pomeroy did.  Season long OE ranged from 0.98 to 1.16, just from random variation.  Individual game OE ranged from 0.69 to 1.57.  So, is Offensive Efficiency a usable stat?


KL Onthank

unread,
Dec 28, 2011, 12:26:47 AM12/28/11
to cougcenter-stati...@googlegroups.com
I have been thinking about this whole +/-, efficiency based +/- thing for a while and finally came to this conclusion:
Why do basketball stats try to reinvent the wheel (or any sports stats for that matter)?   The statistical technical literature is well developed with how to determine if some system, with tons of noise and randomness, performs differently by some metric in the presence or absence of some factor.  It is done all the time to see if drugs work on crazy chaotic systems like human physiology, or if the presence/absence of certain animals changes ecosystem function and to what degree.  This is really pretty routine.  Why should it be very hard to show if the presence/absence of a given player changes a given metric, say points scored per possession.  It really shouldn't. 
This is what a propose:  A multifactorial anova in which in each player is a factor followed by a post-hoc test should give you which player significantly influence scoring beyond random variation.  Then calculating effect size for those significant players to give you a rough estimation of how much they influence points per possession. 
The problem with this is because there really are only five point results of a possession (0,1,2,3 and rarely 4) it is going to take more data to discern these effects of individual players. my guess off the cuff would likely about 5 games worth if the player is playing about 20 min a game, more if they are playing more or less. 

I could be way off with this, however.  My mind has been swimming in this for a while and I may have been swept out to sea at some point...

Jeff Nusser

unread,
Dec 28, 2011, 12:09:38 PM12/28/11
to cougcenter-stati...@googlegroups.com
Love the way you think. I've been pondering this whole thing for days now. Still thinking about it. :-)

KL Onthank

unread,
Dec 28, 2011, 1:24:29 PM12/28/11
to cougcenter-stati...@googlegroups.com
I am glad I am not just coming off like a ranting mad man...

Jeff Nusser

unread,
Dec 28, 2011, 1:36:41 PM12/28/11
to cougcenter-stati...@googlegroups.com
Not in the slightest. The team variations is really giving me something to ponder. Why would that be? Why do we find greater meaning in the team results? I'm formulating some thoughts in my brain, just haven't had time to sit down and flesh them out.

KL Onthank

unread,
Jan 8, 2012, 11:19:39 PM1/8/12
to cougcenter-stati...@googlegroups.com
o, I have gotten possession by possession data for Pepperdine, Oregon, OSU, Utah and Colorado.  I performed a multifactor ANOVA on points scored per possession, using the presence/absence or each player on the floor as the different factors.  This is the same type of approach a medical researcher would used to sort out what lifestyle factors influence blood pressure.  Blood pressure is very noise, with a lot of random/unmeasurable influences, but with enough data points we can begin to sort out factors that influence even very noisy systems.  For this analysis I kept points per possession on offense and defense separate.  On offense only Reggie Moore, Marcus Capers and Dexter Kernich-Drew were significant factors, on defense only Reggie was a significant facor, but there was a significant interaction between Marcus Capers and DJ Shelton (more on that in a bit..)
OK: Here is a table of the significant factors.  On floor is the efficiency of the team with X player on the floor and off floor is the efficiency of the team with the player on the bench.  The number in parentheses is the number of possessions these numbers are based on.  The P-value is the probability this difference is due to random chance (again my alpha, or the p-value at which I considered a factor to be significant, was 0.05, or 5%).


Team Offensive Efficiency

On floor Off Floor P-value
Reggie 1.12 (250)           
0.75 (80) 0.0129
Capers 0.95 (239) 1.23 (91) 0.0159
Kernich     
0.75 (44) 1.07 (286)         
0.0458






Team Defensive Efficiency
Reggie 0.92 (250) 1.31 (83) 0.0282
So The really interesting thing to me here is that Aden was no a significant factor influencing team offensive efficiency.  I would have imagined that with him taking so many shots while on the floor, and missing so many would equate to him being a significant factor in team offensive efficiency, but he isn't even close (p-value=0.49, for comparison Motum and Dilorio are close to being significant factors, both with p-values of about 0.08).  Seems strange to me and counter intuitive to that I see with my eyes. 

Now on to that interaction between Capers and Shelton on the defensive side.  An significant interaction means that the factors influence the variable (efficiency in this case) differently when occurring together.  Let me give you an (almost ridiculous) example:  Lets imaging we were looking at influences of several factors on the pregnancy rate in a group of people.  We might find there was a significant interaction between the use of birth control (which would tend to decrease pregnancy rate), and gender (males tend not to get pregnant).  So birth control would only lower pregnancy rate in one gender (females) and have no effect on pregnancy rate in another gender (males, since they wouldn't get pregnant either way).  So there is an interaction between these two factors.  So let me show you the interaction between Capers and Shelton in graphical form.  The points on the left are team defensive efficiency with Shelton off the floor, on the right with him on.  The red points (and line) is efficiencies when Capers is off the floor, the green lines when he is on the floor:

When Capers is on the floor, Shelton coming onto the floor improves team defensive efficiency.  When Capers isn't out there, Shelton comes into the game and the opponent starts scoring more efficiency by a fairly dramatic margin.  However, this is to be a bit tempered: Shelton was only on the floor for 19 possession without Capers.  However, this happened consistently enough to return a p-value of 0.0394.  Not insanely strong, but intriguing. 

Anyhow, a few observations based on the data I have gotten so far....

Jeff Nusser

unread,
Jan 8, 2012, 11:29:45 PM1/8/12
to cougcenter-stati...@googlegroups.com
Interesting. Help me understand what would make this sort of analysis determine whether a players contribution is significant or not.

KL Onthank

unread,
Jan 9, 2012, 3:07:36 AM1/9/12
to cougcenter-stati...@googlegroups.com
Let me see if I can construct a useful illustration:
   Imagine you have a bag with a bunch of 100 tiles in it: 10 tiles have a "3" printed on it, 30 have a "2" on it, 10 have a "1" on it, and 50 have a "0" on it.  So if you were to put your hand into this bag and pull out a single tile you would have:
50% chance of pulling a tile out with a 0 on it,
30% chance of pulling a tile out with a 2 on it,
10% chance of pulling a tile out with a 1 on it and
10% chance of pulling a tile out with a 3 on it. 
And even if you had no idea what the proportion of tiles were in the bag:  By pulling tiles out seeing what it has on it, putting it back, shaking up the bag and pulling a tile out again, you would eventually get a good idea what the composition of tiles in the bag was. 
   Now imagine there was a second bag, and you have no idea what the proportion of tiles is in this bag.  You draw out a tile and put it back 100 times (shaking the bag between each sampling to mix it up).  Lets say you get 97 "3"s, one "1",one "2" and one "0".  Is it possible to get this kind of draw pulling from a bag with the exact same contribution as the first?  Yes.  Is it likely?  No.  And we can assign a probability to that. 
   That is essentially what this test does.  It divides the data up and assigns an adjusted probability for those groups of data being drawn from the same "bag".  So for instance we have 250 "tiles" with reggie in the game, 80 "tiles" with reggie on the bench.  what this test does is gives a probability that we could have drawn the "tiles" that we did from the same "bag", or in other terms, if the samples were drawn from the same population.  Or what is the probability that the "tiles" drawn while Reggie was on the floor were drawn from a "bag" with the same number of 3's, 2's, 1's and 0's in it, as the "bag" the "tiles" were drawn from when Reggie was not on the floor. If those "bags" are different, the simplest explanation is that they are different because Reggie is on the floor, but no the only possible reason.  For instance I don't necessarily trust Kernich-Drew's p-value on that offensive efficiency table.  He played a bunch of possessions at Colorado, where the cougs played bad offensive ball overall. My bags & tiles illustration breaks down here, but suffice it to say that the better you can "mix" all other factors, the better this analysis performs.  The other factors are not "mixed" well for Kernich-Drew.
   Two things make it easier to detect if you are drawing from bags with different ratios of tiles: making more draws, and very different proportions of tiles. For example let say if you want to know if an unknown bag as tiles in the proportions of the first bag I gave earlier.  You draw out one tile and it is a 3, do you have much evidence that this bag is different?  nope.  But if you draw out 1000 times, you will likely be able to tell very well.
   Secondly, let say you draw 10 times out of the bag and get all 3's, is it very likely the bag contains only 10% threes?  Nope.  The bag likely has a much high % of 3s and only a few draws makes that clear...
 
Anyhow, I am not sure if this makes it any clearer, or even answers the question you were asking. Essentially what this analysis does is tell you when you divide the data into groups by a bunch of factors, what is the probability you get could the differences you see in those groups of data by the random chance of drawing tiles from a bag.  If the probability is very low, the numbers were likely drawn from bags with different ratios of tiles, and the reason the ratios were different is most likely the factor (or player) you used to divide the data. 

Jeff Nusser

unread,
Jan 9, 2012, 3:15:57 AM1/9/12
to cougcenter-stati...@googlegroups.com
That actually makes a ton of sense. Think about compiling this all into one post for CougCenter. It would be interesting to hear others' interpretations.

KL Onthank

unread,
Jan 11, 2012, 1:51:41 AM1/11/12
to cougcenter-stati...@googlegroups.com
You think I should include the rambling explanation as well?
Reply all
Reply to author
Forward
0 new messages