Weekly Challenge

85 views
Skip to first unread message

Chris St. John

unread,
Apr 19, 2013, 7:09:33 PM4/19/13
to baseball-sq...@googlegroups.com
I thought it might be fun to start something of a weekly challenge to get people's juices flowing. Anyone can post one, but I'll start: 

Using a play-by-play database, find the percentage of first hit and first baserunner at each inning/out state: For instance, 10% of the time, the first hit of the game happens in the 1st inning with 0 out (or whatever the actual number is). Post the code you used to find the answer.
Message has been deleted

Mark McC

unread,
Apr 22, 2013, 1:08:04 PM4/22/13
to baseball-sq...@googlegroups.com
Here is my first effort, focused on the first batter to reach base; the first hit could be determined by changing some of the criteria of the query.  It's in MySQL, so it may not work for people with other database management systems.  As you may note, it uses session variables to select the condition where all prior plate appearances had not resulted in a batter reaching base, but the current batter did reach base.  This doesn't give the percentages, just the aggregate counts; I usually pull the data into Excel Pivot tables for further calculation.

It could use another set of eyes, because some of the outputs don't make any sense to me, e.g. it says that there were 24 games in 1950 in which the fourth batter (or higher) was the first to reach base, which seems impossible.  I may have missed some conditions needed in the defintion of the session variables (I had to add one for lead-off Home Runs, and there certainly could be others).  I've only been working with Retrosheet for six weeks or so, and I admittedly have more to learn.

/* Program code */
set @prior =0;
set @current =0;
select `season`,`BAT_HOME_ID`,`inn_ct`,`inn_pa_ct`,count(*) as `dist_ct`
from
(
SELECT 
 `GAME_ID`,CAST(substr(`game_id`,4,4) AS UNSIGNED) as `season`, `BAT_HOME_ID`,`inn_ct`,`INN_PA_CT`,`GAME_PA_CT`,`EVENT_CD`,`EVENT_TX`,`START_BASES_CD`,`END_BASES_CD`
FROM (
SELECT 
 `GAME_ID` ,`BAT_HOME_ID`,`inn_ct`,`INN_PA_CT`,`GAME_PA_CT`,`EVENT_CD`,`EVENT_TX`,`START_BASES_CD`,`END_BASES_CD`
,@prior:=IF(`GAME_NEW_FL`='T',0,@current) as `prior`
,@current:=IF(`GAME_NEW_FL`='T',IF(`start_bases_cd`=0 and `h_cd`=4,1,0),@current+IF(`START_BASES_CD`=0 and `END_BASES_CD`>0,1,0)+IF(`start_bases_cd`=0 and `h_cd`=4,1,0)) as `current`
FROM   `events` 
where `BAT_EVENT_FL`='T'
) as `e` 
where `prior`=0 and `current`=1) as `sel`
group by
 `season`,`BAT_HOME_ID`,`inn_ct`,`inn_pa_ct`
;
/* 100.9459 sec */

jonath...@gmail.com

unread,
Nov 30, 2013, 2:41:53 PM11/30/13
to baseball-sq...@googlegroups.com
Here's another one.  Write a SQL query answers the following question:

Who has hit for a natural cycle?

jonath...@gmail.com

unread,
Dec 22, 2013, 11:18:23 PM12/22/13
to baseball-sq...@googlegroups.com
Whoah, I don't know why but I struggled with this one. Humbling experience.

I originally wrote it using table aliases for the EVENTS table and did all the dirty work in the WHERE clause. Something went wrong and I couldn't get it under an hour (and couldn't get the right results). Anyway, I ended up going simple with temp tables storing each hit type and then querying those temp tables to get the batters who have hit for a natural cycle.

Please feel free to share better solutions because I for one could definitely learn from others' experience.

(sorry I didn't bother formatting the results with better dates and full names, but the script took 18 mins already and I didn't want to make it worse.)
bat_id game_id
watsb001 BAL197909150
kiner101* BRO195006250
valej003 CHA200004270
folit001 CHN197604210
mabrj001 COL199605180
mattg002 DET200609130
boyek101 HOU196406160
wilkb002 MON200306240
hickj103 NYN196308070
willb104 SLN196607172

*
Kiner hit a HR in the 1st and then hit for a natural cycle in the remaining innings. I guess that doesn't count according to the Baseball Almanac. http://www.baseball-reference.com/boxes/BRO/BRO195006250.shtml


drop temporary table if exists tmpS;
create temporary table if not exists tmpS as 
-- creates temp table containing all singles
(
select 
  game_id,
  inn_ct,
  bat_id,
  case 
    when event_tx like 'S%'
    then 'S'
  end as 'event' -- extracts hit type from event_tx
from events
where
  event_tx like 'S%' -- single 
  and event_tx not like 'SB%' -- exclude stolen bases
);


drop temporary table if exists tmpD;
create temporary table if not exists tmpD as 
-- creates temp table containing all doubles
(
select 
  game_id,
  inn_ct,
  bat_id,
  case 
    when event_tx like 'D%'
    then 'D'
  end as 'event' -- extracts hit type from event_tx
from events
where
  event_tx like 'D%' -- double
);


drop temporary table if exists tmpT;
create temporary table if not exists tmpT as 
-- creates temp table containing all triples
(
select 
  game_id,
  inn_ct,
  bat_id,
  case 
    when event_tx like 'T%'
    then 'T'
  end as 'event' -- extracts hit type from event_tx
from events
where
  event_tx like 'T%' -- triple
);


drop temporary table if exists tmpHR;
create temporary table if not exists tmpHR as 
-- creates temp table containing all home runs
(
select 
  game_id,
  inn_ct,
  bat_id,
  case 
    when event_tx like 'HR%'
    then 'HR'
  end as 'event' -- extracts hit type from event_tx
from events
where
  event_tx like 'HR%' -- home run
);

-- put it all together
select
  hr.bat_id,
  hr.game_id
from tmpHR hr
inner join tmpT t 
  on hr.game_id = t.game_id 
  and hr.bat_id = t.bat_id
inner join tmpD d 
  on t.game_id = d.game_id 
  and t.bat_id = d.bat_id
inner join tmpS s
  on d.game_id = s.game_id
  and d.bat_id = s.bat_id
where 
-- finding HR after T and 
-- T after D and D after S
  (hr.inn_ct > t.inn_ct)
  and (t.inn_ct > d.inn_ct)
  and (d.inn_ct > s.inn_ct)
group by hr.game_id, hr.bat_id
-- all by the same batter
-- in one game
;
naturalCycle.sql
Reply all
Reply to author
Forward
0 new messages