Data rules

Danny

unread,

Feb 25, 2012, 4:20:58 PM2/25/12

to Machine March Madness

This thread is for discussing what data is and is not allowed.

Danny

unread,

Feb 25, 2012, 4:36:03 PM2/25/12

to Machine March Madness

The official statement of the rule is as follows (we reserve the right
to update this with clarifications as necessary):
Any data is allowed, so long as *no human judgement went into
producing it* (I know this is unclear, because human judgement goes
into any model, but we're going to try to enforce the spirit of the
rule as best as possible). So for example, all of the score and box
score data we provide, RPI, Sagarin, or Pomeroy-based scores are
allowed. Seeding, the results of AP polls, or other human-produced
rankings are not allowed. When in doubt, the rule is that the burden
of proof lies with you to show that no human entered into the loop,
but we can discuss specific cases.

Danny

unread,

Feb 25, 2012, 5:27:29 PM2/25/12

to Machine March Madness

Scott writes...
"I think you might want to make exceptions for at least seeding --
since you might well have models which are trained on past year's
seedings & results, etc., and seeding is really fundamental to the
tournament."

My response:
I tend to lean towards the hardline on seeding data and saying "no",
but I realize that I'm probably atypical on this issue, so I'm willing
to compromise if the majority wants to include it, because--as you
say--it is pretty fundamental to the tournament.

Thoughts from others?

Reply all

Reply to author

Forward