BaseWinner,
I am also interested in the median runs per game. I agree with Joe that adding a median function as a tool to SDQL could be useful for handicapping.
Figuring out the median (the long way) is pretty easy using SDQL with a little help from Excel and some patience. I was able to figure out the median runs for just the Yankees since 2004 in less that 5 minutes. Here's what I did:
I used this query to get the runs scored by the Yankees since 2004
- team=Yankees and date and runs and o:runs and REG
The first thing I noticed was there are only 2104 game results. Multiplying 13 seasons by 162 games, I thought there should be 2106 game results, so it appeared the database is missing 2 Yankee games since 2004. To narrow down the season the games are missing in, I parsed the SDQL results in Excel, made a quick Pivot Table and found that 2007 had 163 regular season games (I can't remember, but there was probably a tie-breaking game played to get into the playoffs). Also, 2004, 2010, and 2011 have only 161 games. So it seems 3 games are missing in the database. See the table below.
| Row Labels |
Count of Date |
| 2004 |
161 |
| 2005 |
162 |
| 2006 |
162 |
| 2007 |
163 |
| 2008 |
162 |
| 2009 |
162 |
| 2010 |
161 |
| 2011 |
161 |
| 2012 |
162 |
| 2013 |
162 |
| 2014 |
162 |
| 2015 |
162 |
| 2016 |
162 |
| Grand Total |
2104 |
From there, with the SDQL data parsed into Excel columns, I used Excel's Median function to see that the median number of runs, or the number of runs where exactly half the runs score are above the median and half the runs scored are below the median. The median runs scored for the Yankees is 5. The median runs for the Yankees opponents is 4 runs. The median runs for game in which the Yankees participate is and 9 runs. Comparatively, the average runs the Yankees score is 5.05608365, the average runs per game the Yankees allow is 4.45152091, and the average runs of Yankee games is 9.507604563.