So I dug back through my old queries and decided I had done this really inefficiently. I rewrote the procedure to be fewer steps.
Step One - Run Expectancy:
create table run_expectancy_retro_pbp as
select
substring(game_id,4,4) AS yearID,
outs_ct AS outs,
concat( if(base1_run_id <> "","1","_"),
if(base2_run_id <> "","2","_"),
if(base3_run_id <> "","3","_")) AS base_state,
count(*) AS count,
sum(fate_runs_ct) + sum(event_runs_ct) AS inn_runs,
(sum(fate_runs_ct) + sum(event_runs_ct))/count(*) AS run_expectancy
from
events_regseason
where
substring(game_ID,4,4)>1973
AND ((inn_ct < 9) OR (inn_ct = 9 AND bat_team_id = away_team_id))
group by
yearID,outs,base_state
order by yearID asc, outs asc, find_in_set(base_state,'___,1__,_2_,__3,12_,1_3,_23,123')
Step Two - Linear Weights by year and retrosheet event:
select
substring(a.game_ID,4,4) AS year,
a.event_cd AS event,
count(a.event_cd) AS count,
sum(b.run_expectancy)/count(a.event_cd) AS re_pre,
sum(c.run_expectancy + a.event_runs_ct)/count(a.event_cd) AS re_post,
(sum(c.run_expectancy + a.event_runs_ct)/count(a.event_cd))-(sum(b.run_expectancy)/count(a.event_cd)) AS re_total
from
events_regseason a
inner join run_expectancy_retro_pbp b
on (substring(a.game_ID,4,4) = b.yearID
AND a.outs_ct = b.outs
AND concat(
if(a.base1_run_id <> "","1","_"),
if(a.base2_run_id <> "","2","_"),
if(a.base3_run_id <> "","3","_")) = b.base_state)
inner join run_expectancy_retro_pbp c
on (substring(a.game_ID,4,4) = c.yearID
AND (a.outs_ct+a.event_outs_ct) = c.outs
AND concat(
if((a.bat_dest_id = 1 OR a.run1_dest_id = 1),"1","_"),
if((a.bat_dest_id = 2 OR a.run1_dest_id = 2 OR a.run2_dest_id = 2),"2","_"),
if((a.bat_dest_id = 3 OR a.run1_dest_id = 3 OR a.run2_dest_id = 3 OR a.run3_dest_id = 3),"3","_")) = c.base_state)
group by year, event
As per usual, let me know if you see any mistakes. This relies on Retrosheet PBP as processed by the Chadwick tool including extra fields, and will not work when used with bevent-processed files. The second query takes ~1.5 hrs to run on my machine, fyi. Note that the LWTS query relies on a table created by the RE queries; there might be a way around that, but I wanted both to exist anyway.
-John