Revised Statistics Proposal

32 views
Skip to first unread message

Jason Johnson

unread,
Dec 30, 2013, 12:29:59 PM12/30/13
to sports-sch...@googlegroups.com
Hi all,

As with the Sports Event proposal, as I started to write up the collective proposal for our sports extension, I identified a potential need to revise our previous statistics proposal as well.  You can find the new proposal in the google drive folder I shared.  Please take the time to review and provide comments in that document or this thread as appropriate. 

My thoughts on the revision are:

- we should include 'startDate', 'endDate', and 'event' properties to support capturing the scope of the statistics being defined
- these new properties should be maintained at a top level 'Statistics' class which applies to all manner of statistics, including Sports
- renamed 'bonusPoints' to 'pointsBonus' to align with 'points' and 'pointsDeducted'
- added 'penaltiesAgainst' as a top level stat, since this applies to multiple sports
- added 'gamesPlayed' which will apply broadly to both teams and more importantly athletes
- removed 'eventOutcome' and 'score' to be captured as part of the proposed 'CompetitionResults' class (see new event proposal)
- introduced a proposed subclass naming scheme of '[Sport][Team/Player]Statistics' akin to '[domain][type]Statistics'
- identified and illustrated a likely need to leverage 'domainIncludes' to apply statistics like 'turnOvers' across multiple parallel statistic subclasses like 'american football' and 'basketball'

Hope the holidays are treating all of you well.

Cheers,

- Jason

Paul Kelly

unread,
Jan 2, 2014, 7:57:55 PM1/2/14
to sports-sch...@googlegroups.com
On 2013-12-30, at 12:29 PM, Jason Johnson <jasjo...@gmail.com> wrote:
>
> - introduced a proposed subclass naming scheme of '[Sport][Team/Player]Statistics' akin to '[domain][type]Statistics'

Player and team stats are generally the same. SportsML uses the same sets for each. We could simplify by just have, for example, AmericanFootballStatistics. The context would be implied in the markup, right?


> - identified and illustrated a likely need to leverage 'domainIncludes' to apply statistics like 'turnOvers' across multiple parallel statistic subclasses like 'american football' and 'basketball'

So turnovers will just appear under both football and basketball sets? We're not going to create a special category of stats that are used in more than one sport, are we?

Tom Grahame BBC

unread,
Jan 3, 2014, 10:16:17 AM1/3/14
to sports-sch...@googlegroups.com
I think we still need 'score'. I think this is an understood term, as in "the final score was five, nil." It's often rendered as a single string, please see the example here: https://gist.github.com/tfgrahame/7583212

I'm not sure of the need for 'domainIncludes'. Could not each subclass simply use the same vocabulary term? This will happen anyway when different sports use the same term with different semantics. For example, I understand 'saves' means something different in Baseball to Hockey but both statistics classes will need exactly the same syntactic property.

Is Division a suitable term? Is it interchangeable with League, for example? If not I would expect to see rankInLeague etc too.

Jason Johnson

unread,
Jan 6, 2014, 1:26:31 PM1/6/14
to sports-sch...@googlegroups.com, pa...@xmlteam.com


On Thursday, January 2, 2014 4:57:55 PM UTC-8, Paul Kelly wrote:
On 2013-12-30, at 12:29 PM, Jason Johnson <jasjo...@gmail.com> wrote:
>
> - introduced a proposed subclass naming scheme of '[Sport][Team/Player]Statistics' akin to '[domain][type]Statistics'

Player and team stats are generally the same. SportsML uses the same sets for each. We could simplify by just have, for example, AmericanFootballStatistics. The context would be implied in the markup, right?

I worked through this a bit through some examples and you are right, that in general, most stats apply to both players and teams, especially when you make the statement that player specific stats can and may be rolled up to the team in aggregate.


> - identified and illustrated a likely need to leverage 'domainIncludes' to apply statistics like 'turnOvers' across multiple parallel statistic subclasses like 'american football' and 'basketball'

So turnovers will just appear under both football and basketball sets? We're not going to create a special category of stats that are used in more than one sport, are we?

That is correct. 
Message has been deleted

Jason Johnson

unread,
Jan 6, 2014, 2:46:34 PM1/6/14
to sports-sch...@googlegroups.com


On Friday, January 3, 2014 7:16:17 AM UTC-8, Tom Grahame BBC wrote:
I think we still need 'score'. I think this is an understood term, as in "the final score was five, nil." It's often rendered as a single string, please see the example here: https://gist.github.com/tfgrahame/7583212


This is supported via the proposed 'CompetitionResult' in the sports event proposal, right?  I have added an example to the google document to illustrate.
 

I'm not sure of the need for 'domainIncludes'. Could not each subclass simply use the same vocabulary term? This will happen anyway when different sports use the same term with different semantics. For example, I understand 'saves' means something different in Baseball to Hockey but both statistics classes will need exactly the same syntactic property.

There are two options to support these scenarios.  Use 'domainIncludes' or maintain those properties in a superclass of all classes that will need them.  For example, to support the latter option for 'saves', we would need to define the 'saves' property within a superclass of hockey and baseball statistics.  Using 'domainIncludes' is a more elegant solution as it doesn't require creating arbitrary superclasses to support organizing properties - something Schema.Org in general has tried to avoid.
 

Is Division a suitable term? Is it interchangeable with League, for example? If not I would expect to see rankInLeague etc too.

Potentially there is a need for both.  If we can think of a more elegant solution than creating '_InLeague' and '_InDivision' versions of a bunch of stats, I'm happy to hear it.
 

Tom Grahame BBC

unread,
Jan 7, 2014, 5:16:32 AM1/7/14
to sports-sch...@googlegroups.com


On Monday, January 6, 2014 7:46:34 PM UTC, Jason Johnson wrote:


On Friday, January 3, 2014 7:16:17 AM UTC-8, Tom Grahame BBC wrote:
I think we still need 'score'. I think this is an understood term, as in "the final score was five, nil." It's often rendered as a single string, please see the example here: https://gist.github.com/tfgrahame/7583212


This is supported via the proposed 'CompetitionResult' in the sports event proposal, right?  I have added an example to the google document to illustrate.

Great I can see this now, thanks. 
 

I'm not sure of the need for 'domainIncludes'. Could not each subclass simply use the same vocabulary term? This will happen anyway when different sports use the same term with different semantics. For example, I understand 'saves' means something different in Baseball to Hockey but both statistics classes will need exactly the same syntactic property.

There are two options to support these scenarios.  Use 'domainIncludes' or maintain those properties in a superclass of all classes that will need them.  For example, to support the latter option for 'saves', we would need to define the 'saves' property within a superclass of hockey and baseball statistics.  Using 'domainIncludes' is a more elegant solution as it doesn't require creating arbitrary superclasses to support organizing properties - something Schema.Org in general has tried to avoid.

I've had a look at https://schema.org/domainIncludes and understand now, agree that it is the better choice. 
 

Is Division a suitable term? Is it interchangeable with League, for example? If not I would expect to see rankInLeague etc too.

Potentially there is a need for both.  If we can think of a more elegant solution than creating '_InLeague' and '_InDivision' versions of a bunch of stats, I'm happy to hear it.
 
There's some disagreement here about the meaning of 'division' and 'league', do we need to state _inDivision, could we not just have rank? How about rankInCompetition?

Jason Johnson

unread,
Jan 7, 2014, 11:30:24 AM1/7/14
to sports-sch...@googlegroups.com
In the US, the most common grouping for rankings (place) ...
... in the big 4 professional sports is Division (which are part of a conference - except in Baseball)
... in college sports is Conference (which are part a division)
... in soccer, it is Conference (which is part of a league)

You could also argue that there might be a desire to include more parent level ranking as well, though this is much less common
- Seahawks had best record in the NFC (conference) - same for basketball, hockey
- Seahawks tied for best record in the NFL (league) - same for basketball, hockey
- Yankees had the best record in AL (MLB league - equivalent to NFL conference)
- Yankees has the best record in MLB (also MLB league - equivalent to NFL league) - note ESPN uses the term 'Overall' here

Given the lack of consistency in usage of league vs conference vs division (especially in baseball where league means two levels of grouping) and the bloating that would occur if we allowed for 'InConference', 'InDivision', and 'InLeague' versions of team stats, I think our post option might be to simply including a 'withinGroup' property of 'SportsStatistic' with a range of 'String' - recommending that folks stick with the basics of 'Conference', 'Division', and 'League' with exceptions only when needed (e.g. for MLB).
Reply all
Reply to author
Forward
0 new messages