A general grouping mechanism: the "collection" class (or JSON-stat for all the stages in statistics dissemination)

Xavier Badosa

unread,

Jun 28, 2015, 1:13:06 AM6/28/15

to json...@googlegroups.com

(This is a follow-up of discussion https://groups.google.com/d/msg/json-stat/GjlGHJiVJoo/A_wHuKYoHIEJ. At that time, JSON-stat wasn't fully enriched with hypermedia properties.)

Extending JSON-stat to support, not only datasets' dissemination, but also dataset discovery, would mean that a single format (and a single library) could be used for all the stages in statistics dissemination. In the current state of the standard, this extension seems simple as it only requires to enable a general grouping mechanism.

JSON-stat supports different native responses (bundle, dataset, dimension). Bundles were just a convenient way to retrieve data from different datasets in a single query (like a waste table and a population table to compute waste per person): they weren't meant as a meaningful way of grouping datasets.

A general grouping mechanism would allow a data provider to group datasets (with data or only metadata [full tabulations of a survey]), but also to group dimensions [classifications], or even to group groups [for example, grouping surveys by topic].

Implementing this general mechanism only requires to add a new "collection" class value (http://json-stat.org/format/#class). Collections are sets of links that point to resources. According to the specs, the "link" property (http://json-stat.org/format/#link) supports any IANA link relation name (http://json-stat.org/format/#relationid). The "item" relation name (https://www.iana.org/assignments/link-relations/link-relations.xhtml) is used to point to a resource that is a member of the collection represented by the context.

If this addition is approved (v. 1.03), a JSON-stat response of class "collection" would look like this:

{

"class" : "collection",

"label" : "The label (optional)",

"href" : "The self link (optional)",

"link" : {

"item" : [...]

}

This is 100% backward compatible with the current specification: it contains a subset of the properties used when class is "dataset". When encountering a class of "collection", a client must look for the link.item array, as this is the core content of this class of native response.

Let's see some examples of a possible full JSON-stat API that benefits from this addition:

1) Get the root ontology of API version 1 from provider.org

Request: http://api.provider.org/v1

Response:

{

"class" : "collection",

"label" : "Provider.org API root ontology",

"href" : "http://api.provider.org/v1",

"link" : {

"item" : [

{

"class" : "collection",

"label" : "Topics",

"href" : "http://api.provider.org/v1/topics"

},

{

"class" : "collection",

"label" : "Statistics",

"href" : "http://api.provider.org/v1/stats"

}

]

}

Explanation:

This API declares (using a response of class "collection") two paths to retrieve data: browsing through a collection of topics and browsing through a collection of statistics. This response is a collection of collections.

Of course, any provider can organize its datasets as it pleases: "topics" and "statistics" are not elements of the JSON-stat standard: the API root request is where you specify these things.

2) Get an ordered list of available topics

Request: http://api.provider.org/v1/topics

Response:

{

"class" : "collection",

"label" : "Main topics",

"href" : "http://api.provider.org/v1/topics",

"link" : {

"item" : [

{

"class" : "collection",

"label" : "Labor",

"href" : "http://api.provider.org/v1/topics/labor"

},

{

"class" : "collection",

"label" : "Prices",

"href" : "http://api.provider.org/v1/topics/prices"

}

]

}

Explanation:

Again we have a collection of collections. The top-level topics for this provider are "Labor" and "Prices". We know that "Labor" and "Prices" are collections, but in this response it's not yet possible to know if they are collections of datasets or, again, collections of collections (subtopics). You need to continue browsing to know this. In version 1.03 "extension" (http://json-stat.org/format/#extension) will also be available inside Relation ID (http://json-stat.org/format/#relationid) and a provider would be able to use it to describe the nature of these collections.

3) Get the contents of the "Labor" topic

Request:

http://api.provider.org/v1/topics/labor

Response:

{

"class" : "collection",

"label" : "Statistics available for topic Labor",

"href" : "http://api.provider.org/v1/topics/labor",

"link" : {

"item" : [

{

"class" : "collection",

"label" : "Labor Force Survey",

"href" : "http://api.provider.org/v1/stats/lfs"

},

{

"class" : "collection",

"label" : "Structure of Earnings Survey",

"href" : "http://api.provider.org/v1/stats/ses"

}

]

}

Explanation:

Again a collection of collections, as in the previous request. For this provider, topic "Labor" has two children: the Labor Force Survey and the Structure of Earnings Survey. These could be subtopics of Labor. But for this provider, they aren't: for this provider they are something called "Statistics" (see 1)). This is noticeable in the "href" property (.../stats/...). It also means that you can end up finding these "statistics" following a different path: instead of browsing through topics like we have been doing, these stats are also available in the full list of stats:

http://api.provider.org/v1/stats

or as a result of search

http://api.provider.org/v1/stats?q=survey

Again, if you want to declare these collections as something "special" you can use the "extension" property: from a format point of view, they are just plain JSON-stat collections.

4) Get the content of the Labor Force Survey

Request:

http://api.provider.org/v1/stats/lfs

Response:

{

"class" : "collection",

"label" : "Labor Force Survey tables",

"href" : "http://api.provider.org/v1/stats/lfs",

"link" : {

"item" : [

{

"class" : "dataset",

"label" : "Metadata for employment by sex and age",

"href" : "http://api.provider.org/v1/stats/lfs/emp"

},

{

"class" : "dataset",

"label" : "Metadata for unemployment by sex and age",

"href" : "http://api.provider.org/v1/stats/lfs/unemp"

}

]

}

Explanation:

Finally, we have a collection of datasets: that's the meaning of "statistic" (survey) for this provider (a collection of tables). The Labor Force Survey contains two datasets (tables): "Employment by sex and age" and "Unemployment by sex and age".

According to the "label" property (for humans), these "datasets" only contain metadata. At this stage, though, machines cannot be aware of this (dataset metadata, dataset data and dataset metadata+data all use the same native response: "dataset").

Again, the provider can use the "extension" property to specify that these datasets are metadata-only.

Could the provider have skipped this step? Yes, it could have chosen to offer datasets with full info at this stage. But in some cases there's a good reason not to (see at the end).

5) Get the metadata of the "Employment by sex and age" table

Request:

http://api.provider.org/v1/stats/lfs/emp

Response:

{

"class" : "dataset",

"label" : "Metadata for employment by sex and age",

"href" : "http://api.provider.org/v1/stats/lfs/emp",

"link" : {

"describes" : [

{

"class" : "dataset",

"label" : "Data for employment by sex and age",

"href" : "http://api.provider.org/v1/stats/lfs/emp/data"

}

]

},

"source" : "...",

"updated" : "...",

"dimension" : {...}

//No "value" present

}

Explanation:

This is not a collection anymore. It's a response of class "dataset". Because "value" is missing, a machine can know that this response only contains the metadata for the table. It's also noticeable because in "link" the "describes" relation name is present (it points to the table data).

6) Get the "Employment by sex and age" table

Request:

http://api.provider.org/v1/stats/lfs/emp/data

Response:

{

"class" : "dataset",

"label" : "Employment by sex and age data",

"href" : "http://api.provider.org/v1/stats/lfs/emp/data",

"link" : {

"describedby" : [

{

"class" : "dataset",

"label" : "Metadata for employment by sex and age",

"href" : "http://api.provider.org/v1/stats/lfs/emp"

}

]

},

"source" : "...",

"updated" : "...",

"dimension" : {...},

"value" : [...],

"status" : {...}

}

Explanation:

Because this response contains, for example, the "dimension" property, a machine is aware that this is not a data-only response (it contains metadata too). The provider could have chosen not to include the metadata: in such case, thanks to the "describedby" relation, a machine where to go to retrieve the associated metadata.

In this example, we have assumed that the provider has pre-defined the tabulation of its surveys: these surveys are just a collection of small cubes (tables). Sometimes this is not the case: sometimes what we have is a queryable supercube. That's why step 4 makes sense: you'll get the metadata of the supercube and (new) a description of how it can be queried (what dimensions are required and what aren't, how to filter a particular category of a dimension).

I'll explain how I propose to do this in another message. Before, I need you feedback on the "collection" class addition. Does the API I have described in the post make sense to you?

Trygve Falch

unread,

Jun 30, 2015, 8:27:20 AM6/30/15

to json...@googlegroups.com

Hi Xavier, and all!

Thank you for this update!

I am personally very happy with this, I haven't looked at all the details, but I like what I see. The really important aspect with your update is that this creates a great framework for a general purpose statistical API which can be implemented on top of any kind of general purpose dissemination system!

Superb work!

I will look more into it, and maybe try to make a simple prototype that translate existing statistical APIs.

--

Trygve Falch

Enterprise Architect – Development
Statistics Norway, Pb 8131 Dep 0033 Oslo, Norway
Tel: +47 450 04 958

Message has been deleted

Xavier Badosa

unread,

Jul 1, 2015, 4:03:47 PM7/1/15

to json...@googlegroups.com, xba...@gmail.com, trygve...@gmail.com

Trygve,

I built this example using the full collection of datasets from Statistics Norway:

http://json-stat.org/samples/collection/

It's a "collection" response that points to thousands of "dataset" responses (they are converted on the fly from "bundle" responses).

X.

On Wednesday, July 1, 2015 at 8:18:23 PM UTC+2, Xavier Badosa wrote:

Thank you for your opinion, Trygve.

and maybe try to make a simple prototype that translates existing statistical APIs.

That would be awesome!

X.

Xavier Badosa

unread,

Jul 2, 2015, 10:31:23 AM7/2/15

to json...@googlegroups.com, xba...@gmail.com, trygve...@gmail.com

The samples at JSON-stat.org are of class "bundle". I have uploaded "dataset" versions of them. They are all discoverable thanks to the JSON-stat Dataset Sample Collection

http://json-stat.org/samples/collection.json

X.

Xavier Badosa

unread,

Jul 5, 2015, 2:58:45 PM7/5/15

to json...@googlegroups.com, xba...@gmail.com

Considering the proposed change is small and all feedback has been positive, I have updated the specs (1.03):

http://json-stat.org/format/

X.

Xavier Badosa

unread,

Jul 6, 2015, 1:36:09 PM7/6/15

to json...@googlegroups.com, xba...@gmail.com

I have updated the JSON-stat Javascript Toolkit (0.8.0 to support collections:

https://github.com/badosa/JSON-stat

Anne Abelseth

unread,

Aug 11, 2015, 7:31:47 AM8/11/15

to json-stat, xba...@gmail.com

I'm all new to json-stat, and still trying to figure out things. But it seems to me http://json-stat.org/format/ hasn't been updated with the new features.

I did a search of "1.03" on the page, and only got one hit: At the top. So I can't really figure out what's new..

Maybe you could post release notes or something? (I saw your tweet of 5. of July where you said version 1.03 is available, but I'd like some more details)

Anne

Xavier Badosa

unread,

Aug 12, 2015, 2:53:32 AM8/12/15

to json...@googlegroups.com

Anne,

I'm sorry it's not easy to figure out what's new in 1.03. It's explained here

https://groups.google.com/d/msg/json-stat/N4xyvgMC1CI/pu42mPeK3pEJ

You're absolutely right that some release notes or changelog will have to be published in the site sooner or later. So far, I have considered (probably incorrectly) that they weren't strictly needed, as changes have been few (1.01 added new properties, 1.02 mainly introduced recommendations and clarifications and moved some suggested unit properties to standard JSON-stat properties). The procedure of stating the version in each property seemed enough, particularly considering that all versions are back-compatible (no property has been removed).

But it's true this probably does not work well for version 1.03. The spec was indeed updated (even the examples in the spec were updated). The reason it does not seem so is because technically 1.03 didn't add any new property: that's why no property has a "1.03" attached.

1.03 added a new value for "class": "collection" and allowed "extension" inside relation ID and "link" at the root level. Nevertheless, if you were not familiar with JSON-stat, these changes are probably not meaningful to you and you can read the specs ignoring the version information.

X.

Xavier Badosa

unread,

Jan 2, 2016, 6:59:31 PM1/2/16

to json-stat

I have provided an example of collections of collections as the discovery mechanism describe at https://groups.google.com/d/msg/json-stat/N4xyvgMC1CI/pu42mPeK3pEJ in this collection single entry point:

http://json-stat.org/samples/index.json

You can use the JSON-stat Viewer to move from one collection to another till you get the dataset of your interest:

http://json-stat.org/format/viewer/?uri=http%3A%2F%2Fjson-stat.org%2Fsamples%2Findex.json

X.

Reply all

Reply to author

Forward