Towards a JSON-stat Query Language (JSTQL)

77 views
Skip to first unread message

Xavier Badosa

unread,
Oct 23, 2013, 11:00:39 AM10/23/13
to
The JSON-stat response format is documented and already in use. It is time to define a standard query language for JSON-stat.

These are some of the desirable features of the JSON-stat Query Language (JSTQL):

1) Creation of bundles

JSON-stat responses allow multiple datasets. To minimize the number of requests, JSTQL should provide a way of getting a responses with several datasets.

For example:

api/jsonstat?dataset=cpi,gdp
api/cpi.json+gdp.json
etc.

2) Data selections 

A dataset can include many dimensions with many categories. JSTQL should provide a way of getting the categories and dimensions of interests.

For example, to select from a "pop" dataset only dimensions "age" and "sex", and inside "sex" only category "F":

pop/sex/F,pop/age (dataset/dimension/category)

These would affect not only the cells retrieved but also the "dimension" and "category" properties.

3) Properties' selections

For certain use cases, some JSON-stat properties may be unneeded. JSTQL should provide a way of avoiding the retrieval of those unneeded elements without affecting the data.

For example, to retrieve only the information of the "year" dimension (the rest are never updated in this particular dataset) and the "value" array of the full "pop" dataset:

pop/value,pop/dimension/year (property/child)

There should probably exist a simplified way of specifying the same property in different places of the JSON-stat tree (for example, keep all the category indices).

4) Rules of inclusion and exclusion

Previous examples focused on inclusion rules. There should be a way to express exclusion rules too.

For example, include all but category "F" of dimension "sex" in the "pop" dataset:

~pop/sex/F


Do we need more features? 

What syntax must we use?

X.

Trygve Falch

unread,
Nov 8, 2013, 3:28:31 PM11/8/13
to json...@googlegroups.com
POST or GET, or BOTH?

Xavier Badosa

unread,
Nov 8, 2013, 4:23:08 PM11/8/13
to json...@googlegroups.com
All the examples are about GETting info, so I expect these calls to use the GET method.

Do you think 1), 2), 3) and 4) are difficult to implement?

X.
--
You received this message because you are subscribed to the Google Groups "json-stat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to json-stat+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Trygve Falch

unread,
Nov 9, 2013, 5:19:22 AM11/9/13
to json...@googlegroups.com
I agree. Although I am a little concerned about the size of the GET requests. We have potentially really large cubes, and the the query-string could end up being really large. But that might be a corner case, and there might be a completely different solution for that.

I think we should set up a use case which we could use as a starting point when discussing the syntax of the query. 

Anyway;

1) is relatively easy, but it requires sub-queries to each cube, so it requires that the rest of the query-syntax is done. This might not be the first thing I would try and solve.

2) This makes sense, but not all dimensions can be eliminated in a cube if the resulting dataset is going to make sense. This is metadata that should be shown to the user by a partial response when requesting the information about the metadata for a cube. If json-stat is going to be used for partial responses this is a property in json-stat we don't have at the moment. The nordic statistical data model will have this as a requirement.

3) Hmm.. I think you need to elaborate on this one. I'm not sure what the use case is here. (I might have had to little coffee this afternoon)

4) This is one of the real power-features of this query language if we nail it. The selection of values in a dimension is what typically bloat a query, but it should be precise and understandable. Maybe it should even include some type of range-selections or some type of limited regex?

When we get close to a concrete syntax, I would like to make a simple parser that parses the query and give you the dimension back, so that we have an idea of how effective the query language is. It could probably be written as a simple js. I would prioritize 2 & 4 (and maybe 3), and then 1). 

I'll go and have another coffee now.

--
Trygve


To unsubscribe from this group and stop receiving emails from it, send an email to json-stat+unsubscribe@googlegroups.com.

Xavier Badosa

unread,
Nov 11, 2013, 2:29:46 PM11/11/13
to json...@googlegroups.com
3) Hmm.. I think you need to elaborate on this one. I'm not sure what the use case is here. (I might have had to little coffee this afternoon)

A certain dataset contains production indices for hundreds of products, for many regions and many years. Every year a new year is added to the dataset (maybe some year is removed too) but regions and products remain the same.

After the first retrieval of this dataset, the consumer already has the product classification and the region classification (probably, the region classification is also present in other datasets previously retrieved).

The goal of 3 is to be able to request all data and dimensions BUT avoid retrieving again the categories under "region" and "product". This is particularly useful for any standardized dimension (you don't want to retrieve once and again the NACE!).

How would you retrieve a classification using JSON-stat? This is the other side of 3: you could request a dataset that uses such classification asking not to include the "value" array.

X.




To unsubscribe from this group and stop receiving emails from it, send an email to json-stat+...@googlegroups.com.

Trygve Falch

unread,
Nov 12, 2013, 5:02:01 AM11/12/13
to json...@googlegroups.com

The goal of 3 is to be able to request all data and dimensions BUT avoid retrieving again the categories under "region" and "product". This is particularly useful for any standardized dimension (you don't want to retrieve once and again the NACE!).
How would you retrieve a classification using JSON-stat? This is the other side of 3: you could request a dataset that uses such classification asking not to include the "value" array.

Hmm.. didn't JSON-stat initially have support for referenced classifications (using a URI-scheme)? I can't seem to be able to find it in the current documentation. But this is essentially what you are describing here right?

A) the ability to retrieve a dataset without the (specified) categories and B) a way of getting the categories on demand using a URI?


Xavier Badosa

unread,
Nov 13, 2013, 1:26:08 AM11/13/13
to json...@googlegroups.com
Yes, it is essential to A and B but as I said in a reply to Erlend it didn't make it to the final doc as I thought URIs might be useful for several goals and a unified approach could be needed. Now that Erlend has brought the issue it's probably the time to face it,

X

El 12/11/2013, a les 11:02, Trygve Falch <trygve...@gmail.com> va escriure:
Reply all
Reply to author
Forward
0 new messages