Data

139 views
Skip to first unread message

Danny

unread,
Mar 12, 2013, 5:27:11 AM3/12/13
to Machine March Madness
If you come across other sources of data, please post them here.

Here's one:
http://www.reddit.com/r/MachineLearning/comments/1a444o/machine_march_madness_2013/c8u25wg

Dr. Pain

unread,
Mar 12, 2013, 10:05:06 AM3/12/13
to machine-ma...@googlegroups.com
If someone needs specific data, post here.  I'll be glad to provide it if I have it.

Danny

unread,
Mar 12, 2013, 10:39:17 AM3/12/13
to machine-ma...@googlegroups.com
Also, here is the data from past years:

Danny

unread,
Mar 12, 2013, 2:43:53 PM3/12/13
to Machine March Madness

The Regressor

unread,
Mar 19, 2013, 6:28:20 PM3/19/13
to machine-ma...@googlegroups.com
I added data for this past season and the new 2013 bracket to the github repo. 

You will have to change the code to read in the new GameResults_201213.tsv
and
2013_bracket.py

Note that I put in random winners for the couple of games that are still pending. 

Enjoy!

Jasper

Zachary Mayer

unread,
Mar 19, 2013, 9:02:53 PM3/19/13
to machine-ma...@googlegroups.com
I'd really like to calculate a "distance from home" variable for each team in a game, but so far can't even find a comprehensive list of all the stadium locations.  Any ideas?

Scott Turner

unread,
Mar 19, 2013, 9:39:28 PM3/19/13
to machine-ma...@googlegroups.com
On Tue, Mar 19, 2013 at 9:02 PM, Zachary Mayer <zach....@gmail.com> wrote:
I'd really like to calculate a "distance from home" variable for each team in a game, but so far can't even find a comprehensive list of all the stadium locations.  Any ideas?

I've read an academic paper that did that for football, and I believe that Nate Silver does that for his predictions, but I don't know of a ready source of that data.  For a coarse estimation, you could probably use the state capital for each state as the single location in that state and then calculate distances between the capitals.

This page has the lat/lons for all the US state capitals:
http://www.xfront.com/us_states/

Here are the geographical centers if you want to use those:

Alabama 12 mi. SW of Clanton 32.9167 86.6333
Alaska 60 mi. NW of Mt. McKinley 64.8667 152.5000
Arizona 55 mi. ESE of Prescott 34.3833 111.8833
Arkansas 12 mi. NW of Little Rock 34.9500 92.3167
California 38 mi. E of Madera 37.1000 120.2167
Colorado 30 mi. NW of Pikes Peak 39.1333 105.7167
Connecticut at East Berlin 41.7000 72.7667
Delaware 11 mi. S of Dover 39.1000 75.6167
District of Columbia Near 4th & L Sts. N.W. 39.1667 76.8500
Florida 12 mi. NNW of Brooksville 28.1333 81.7667
Georgia 18 mi. SE of Macon 32.8333 83.6000
Hawaii near Maui Island 20.9667 157.3667
Idaho at Custer, SW of Challis 44.3167 115.0167
Illinois 28 mi. NE of Springfield 40.1333 89.3667
Indiana 14 mi. NNW of Indianapolis 40.0000 86.2667
Iowa 5 mi. NE of Ames 42.0667 93.4000
Kansas 15 mi. NE of Great Bend 38.6333 98.8333
Kentucky 3 mi. NNW of Lebanon 37.4333 85.5667
Louisiana 3 mi. SE of Marksville 30.9833 92.5667
Maine 18 mi. N of Dover 45.2833 69.2333
Maryland 4½ mi. NW of Davidsonville 39.5167 77.4167
Massachusetts in northern Worcester 42.4000 72.1667
Michigan 5 mi. NNW of Cadillac 45.1667 84.9833
Minnesota 10 mi. southwest of Brainerd 46.1000 95.4167
Mississippi 9 mi. WNW of Carthage 32.9500 89.7167
Missouri 20 mi. SW of Jefferson City 38.6000 92.7667
Montana 11 mi. W of Lewiston 47.1667 109.6833
Nebraska 10 mi. NW of Broken Bow 41.6000 99.9667
Nevada 26 mi. SE of Austin 39.5500 117.0667
New Hampshire 3 mi. E of Ashland 43.7167 71.6167
New Jersey 5 mi. SE of Trenton 40.1000 74.6333
New Mexico 12 mi. SSW of Willard 34.5167 106.2167
New York 12 mi. S of Oneida and 26 mi. SW of Utica 43.1000 76.0167
North Carolina 10 mi. NW of Sanford 35.6333 79.5000
North Dakota 5 mi. SW of McClusky 47.5167 100.5833
Ohio 25 mi. NNE of Columbus 40.4667 82.8167
Oklahoma 8 mi. N of Oklahoma City 35.5667 97.7500
Oregon 25 mi. SSE of Prineville 43.8833 121.0833
Pennsylvania 2½ mi. SW of Bellefonte 41.0167 77.8667
Rhode Island 1 mi. SSW of Crompton 41.7167 71.6667
South Carolina 13 mi. SE of Columbia 33.9500 80.9333
South Dakota 8 mi. NE of Pierre 44.4167 100.5833
Tennessee 5 mi. NE of Murfreesboro 35.9000 86.6667
Texas 15 mi. NE of Brady 31.3333 99.5333
Utah 3 mi. N of Manti 39.4167 111.7000
Vermont 3 mi. E of Roxbury 44.0167 72.7167
Virginia 5 mi. SW of Buckingham 37.5333 78.6833
Washington 10 mi. WSW of Wenatchee 47.3333 120.2833
West Virginia 4 mi. E of Sutton 38.7333 80.7333
Wisconsin 9 mi. SE of Marshfield 44.4333 89.8833
Wyoming 58 mi. ENE of Lander 43.0167 107.7167

Then you just need the code to calculate distance from lat/lon, which isn't hard to code but might already be available depending upon what language/tool you're using.

Wolfram Alpha does a pretty good job with queries like this:

http://www.wolframalpha.com/input/?i=distance+from+the+university+of+pennsylvania+to+san+jose+california

if you want to iterate through all the teams that way.

-- Scott

Zachary Mayer

unread,
Mar 22, 2013, 9:17:36 AM3/22/13
to machine-ma...@googlegroups.com
Thanks a lot for the suggestions.  I didn't have time to incorporate this into my model this year, but perhaps next year.  Do you know of a historic data source for game locations? I know most games are played at the home team's location, but some games are played at a neutral location, and I'd like to know where those locations are.  After that, the plan is basically:

1. Collect a list of all unique game locations (e.g. university of Pennsylvania or San Jose, California)
2. Geocode all the locations using google's geocoding API (maybe make some corrections by hand)
3. Calculate a distance matrix between all the locations, using a geographic distance formula (e.g. http://en.wikipedia.org/wiki/Great-circle_distance)
4.  Assign each team to a home location, and then lookup distance to each game they played
5. Assign each tournament game in the 2014 bracket to a location, so I can use my model to make predictions for a given pair of teams and their respective locations.

The most difficult part will probably be getting locations for neutral games.  I think just university names should be enough for google to geocode.

Scott Turner

unread,
Mar 22, 2013, 11:02:31 AM3/22/13
to machine-ma...@googlegroups.com
On Fri, Mar 22, 2013 at 9:17 AM, Zachary Mayer <zach....@gmail.com> wrote:
Do you know of a historic data source for game locations?

ESPN provides game locations for the neutral court locations, but they're typically something like "ARMED FORCES CLASSIC AT RAMSTEIN Germany", so using it would (I'd think) take a lot of manual intervention.

-- Scott

Reply all
Reply to author
Forward
0 new messages