Standards compliance, rfc2396

127 views
Skip to first unread message

squashee

unread,
Nov 6, 2009, 7:33:15 AM11/6/09
to json-query
Based on my experience from the current JSONquery implementation in
dojotoolkit and the way it is used when querying resources i have a
few suggestions.

1. Be sure to make it rfc2396 compliant, specifically, make it respect
the reserved characters. Building backend for the current
implementation requires sidestepping the normal handling of URI's and
doing bad things™ in order to accommodate the filter (?), sort (/) and
map(=) operators.

2. Use specifications subsets and accept /negotiate partial compliance
or capabilities. The current perservere implementation is quite
elegant but also quite uncommon in it's nature being an object store.
Looking at the current state of backend data sources most turn out to
be relational databases, thus having additional limitations. Things
like the recursive descent operator become quite hard to implement in
a meaningful way and should thus be outside the spec or placed in a
subet of capabilities.

I'm currently in the process of finishing up development on a backend
based on the PHP Doctrine ORM that delivers RESTful JSON and has
limited JSONquery capability (based on the perservere/dojo work done
by Kris Zyp).

Jaanus Heeringson

Kris Zyp

unread,
Nov 19, 2009, 12:34:07 PM11/19/09
to json-...@googlegroups.com
I've been thinking about this as well, I think you are absolutely right.
Previously I had made some suggestions of JSON Query requirements [1]
and mentioned making it work with URIs, but I think that real adherence
to existing URI standards should be a primary goal. I would suggest
these as important goals:
* Properly adhere to the URI guidelines of rfc 3986 (rfc 2396)
* Make it simple to parse
* Make it properly superset application/x-www-form-urlencoded (rfc 1866
or 1867, I guess)
* Extensible, it should be easy for users to add custom functionality in
their query handlers.
As Jaanus points out, JSON Query operators forces important operators to
be encoded in URLs, essentially bypassing the whole intended structure
of query strings (and the delimiters described in 3986). Also, there are
plenty of form-urlencoded queries that would take on completely
different meanings in JSON query (or be invalid) due to the different
structure. These existing standards are far more important than
JSONPath, in reality. Consequently I would like to revamp, moving
towards a query format that would work with the standards compliance format.

Another format that is available for inspiration that I just found out
about, that very closely follows our goals, is the feed item query
language (FIQL) [2]. However, FIQL does not properly superset
application/x-www-form-urlencoded, so I don't think I would want to use
it as is.

So my basic proposal for future JSON query is this:
There are two types of terms, comparisons and calls. A comparison is of
the form name=value just like form urlencoding. Name and value are
encoded as components (instead quote delimited). Also the comparator is
not limited to =, but can include != and less than and greater than
comparators. A comparison should be interpreted as a filter on the
objects in a collection.

A call is of the form functionName(param1,param2). There will be several
predefined functions, including sort, group, distinct, and select, but
this is the main extension point, as users can have their own functions
with their own meanings.

Terms can be joined by & or | operators (meaning AND or OR, respectively).

I think this provides relatively easy to parse format, but is still easy
to extend.

One of the painful points of putting queries in URIs is that < and > are
not legal characters in URIs. Consequently here are some possibilities:
Current JSONQuery:
= - equal
%3C - less than
%3E - greater than
%3C= - less than or equal

FIQL:
== - equal
=lt= - less than
=gt= - greater than
=le= - less than or equal

My suggestion
= - equal
-+ - less than
+- - greater than
-=+ - less than or equal

Anyway here some examples and a start on ABNF for what I am thinking:
?foo=bar%20baz # property foo must be equal to "bar baz"
?price-+10 # property price must be less than 10
?price-+10&sort(-rating) # price under 10, sorted but rating in
descending order
?price-+10&select(brand)&distinct() # all the brands that have an object
with a price under 10
?foo=bar&tags.contains(fun)
?foo=bar&group(brand)&select(brand,average(price))


property = 1*pchar
property-path = property [ "." property-path ]
value = 1*pchar
comparator = ( "=" / "-+" / "+-" / "-=+" / "-=+" / "!=" )
comparison = property-path comparator value
logic = ( "&" / "|" )
function-parameter = ( value / expression )
function-parameters = function-parameter [ "," function-parameter ]
function-name = "sort" / "group" / "distinct" / "select" /
extension-function-name
function-path = property-path "." ( "contains" / extension-method-name )
extension-function-name = 1*pchar
extension-method-name = 1*pchar
named-function = ( function-name / function-path ) "("
function-parameters ")"
expression = ( comparison / named-function ) [ logic expression ]
query = expression


Let me know if you have any thoughts on this. Any preferences on
comparison operators (maybe FIQL style operators would be better, but
keeping equal as "=").

[1] http://groups.google.com/group/json-query/web/json-query-requirements
[2] http://tools.ietf.org/html/draft-nottingham-atompub-fiql-00

Thanks,
Kris

squashee wrote:
> Based on my experience from the current JSONquery implementation in
> dojotoolkit and the way it is used when querying resources i have a
> few suggestions.
>
> 1. Be sure to make it rfc2396 compliant, specifically, make it respect
> the reserved characters. Building backend for the current
> implementation requires sidestepping the normal handling of URI's and
> doing bad things� in order to accommodate the filter (?), sort (/) and
> map(=) operators.
>
> 2. Use specifications subsets and accept /negotiate partial compliance
> or capabilities. The current perservere implementation is quite
> elegant but also quite uncommon in it's nature being an object store.
> Looking at the current state of backend data sources most turn out to
> be relational databases, thus having additional limitations. Things
> like the recursive descent operator become quite hard to implement in
> a meaningful way and should thus be outside the spec or placed in a
> subet of capabilities.
>
> I'm currently in the process of finishing up development on a backend
> based on the PHP Doctrine ORM that delivers RESTful JSON and has
> limited JSONquery capability (based on the perservere/dojo work done
> by Kris Zyp).
>
> Jaanus Heeringson
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups "json-query" group.
> To post to this group, send email to json-...@googlegroups.com
> To unsubscribe from this group, send email to json-query+...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/json-query?hl=en
> -~----------~----~----~----~------~----~------~--~---
>
>

Jaanus Heeringson

unread,
Nov 19, 2009, 7:10:34 PM11/19/09
to json-...@googlegroups.com
Anyway here some examples and a start on ABNF for what I am thinking:
So... this means I have to get my head a round ABNF... guess it won't hurt to add another acronym to my CV... 

My initial reaction to this is that it seems to be much easier to parse at least with object trees as I do in my case... A welcome addition would be mandatory quoting of strings, otherwise it might be a bit tricky to differentiate between strings and properties in some cases

Otherwise you seem to cover most of it except some parts of the scope definition and the chaining (?) part where you used [] before.

Taking another look at rfc3986 suggests there is a bit of a grayzone as to which characters to consider available though. Since we are messing with the query part of the URI we might get away with only considering the gen-delims (2.2) and the ones defined in the query part (3.4) as reserved. This would mean we need a new character for the "current" scope operator (@) if it is deemed necessary.

As I understand it an initial '.' indicates root?

--
MVH

Jaanus Heeringson
Underwerks Design

www.underwerks.com
+46 735 016179

Kris Zyp

unread,
Nov 20, 2009, 10:05:30 AM11/20/09
to json-...@googlegroups.com


Jaanus Heeringson wrote:
>
> Anyway here some examples and a start on ABNF for what I am thinking:
>
> So... this means I have to get my head a round ABNF... guess it won't
> hurt to add another acronym to my CV...
I wouldn't worry about the ABNF too much, that is more for formal
completeness, and a reference for syntactic precision. Examples are much
easier to go off of for describing the main intent.
>
> My initial reaction to this is that it seems to be much easier to
> parse at least with object trees as I do in my case... A welcome
> addition would be mandatory quoting of strings, otherwise it might be
> a bit tricky to differentiate between strings and properties in some cases
In the proposed changes, the left side of the = is always the property
name, and the right side is always the value. I realize that it
eliminates the possibility of comparing properties with other
properties, but I don't think that is done very often. The goal was to
eliminate quoting to make it easier to parse. Quoting's purpose is to
allow token/delimiters inside the quotes, but that forces the parser to
differentiate symbols found between text in quotes and from symbols
outside the quotes (that otherwise have different meaning). In contrast
URL encoding defines a limited set of characters that are allowed in
string values, and so the parser can safely assume that =, (, ) can
always be interpreted as delimiters and not as simply characters in
strings.

And of course the other issue with quoting is that both " and ' are
illegal characters in URIs (they must be encoded).

You can take a look at the parser I am working on here:
http://github.com/kriszyp/pintura/blob/master/lib/resource-query.js
<http://github.com/kriszyp/pintura/blob/master/lib/resource-query.js#L53>
>
> Otherwise you seem to cover most of it except some parts of the scope
> definition and the chaining (?) part where you used [] before.
>
> Taking another look at rfc3986 suggests there is a bit of a grayzone
> as to which characters to consider available though. Since we are
> messing with the query part of the URI we might get away with only
> considering the gen-delims (2.2) and the ones defined in the query
> part (3.4) as reserved. This would mean we need a new character for
> the "current" scope operator (@) if it is deemed necessary.
>
> As I understand it an initial '.' indicates root?
The intent was the root would always be assumed (unless a function call
which to resolve logical parameters using a different scope, as
"contains" may do). So "[?(@.name='value')]" would simply be "name=value".

Thanks,
Kris

Kris Zyp

unread,
Nov 20, 2009, 10:50:38 AM11/20/09
to json-...@googlegroups.com


Kris Zyp wrote:
> One of the painful points of putting queries in URIs is that < and > are
> not legal characters in URIs. Consequently here are some possibilities:
> Current JSONQuery:
> = - equal
> %3C - less than
> %3E - greater than
> %3C= - less than or equal
>
> FIQL:
> == - equal
> =lt= - less than
> =gt= - greater than
> =le= - less than or equal
>
> My suggestion
> = - equal
> -+ - less than
> +- - greater than
> -=+ - less than or equal
>
After thinking about this more, perhaps it would be better to just
follow FIQL for greater/less than operators. Their operators are a
little easier on the grammar since - and + should be allowable
characters in values. And this would make it a lot easier to superset
both FIQL and www-form-urlencoded.

Kris

Dean Landolt

unread,
Nov 25, 2009, 12:59:11 PM11/25/09
to json-query
> > My initial reaction to this is that it seems to be much easier to
> > parse at least with object trees as I do in my case... A welcome
> > addition would be mandatory quoting of strings, otherwise it might be
> > a bit tricky to differentiate between strings and properties in some cases
>
> In the proposed changes, the left side of the = is always the property
> name, and the right side is always the value. I realize that it
> eliminates the possibility of comparing properties with other
> properties, but I don't think that is done very often.

What about differentiating between strings and integers? And true/
false/null? I know Date isn't part of JSON but how would one signify a
date in this way? At the very least the basic JSON scalars would need
to be usable -- and if they're just coerced from the string how would
I search for the string "true"?

Kris Zyp

unread,
Nov 25, 2009, 1:15:09 PM11/25/09
to json-query
Yeah, there is definitely loss in information of types, but I would
think that the type for the vast majority of properties are pretty
uniform. If you do ?foo=true, than matching for "true" or true seems
pretty acceptable as most collections will have all strings or all
booleans. Are there really very many use cases out there that need to do
a query for "true" the string and exclude true the boolean?

FIQL actually conveniently defines dates in the same way that de-facto
JSON serializers do. A query could be:
?uploaded=2009-09-28T08:00:00Z
In this case one could even disambiguate strings by requiring that ":"
be escaped in strings (and they normally are within URI components, at
least JS's encodeURIComponent escapes ":").

Another thing I had considered was providing a type disambiguation
prefix, for (hopefully rare) use case of needed to specify a certain type:

?foo=boolean:true
?foo=string:true

But I think the vast majority of the time, ?foo=true will be sufficient.
Kris



Dean Landolt

unread,
Nov 25, 2009, 1:16:38 PM11/25/09
to json-query
Of course, the supersetting of www-form-urlencoded is really nice, so
strings as default does make a lot of sense. Perhaps a prefix
character (something less common) for other scalars (or properties)
would solve this problem...

foo=true : foo == "true"
foo=`true : foo == true

And allow it to be escaped...

foo=``bar`: foo == "`bar`"

I don't really care what the character is as long as its legal --
aesthetically the ` doesn't look so bad. Other ideas?

Dean Landolt

unread,
Nov 25, 2009, 1:24:46 PM11/25/09
to json-...@googlegroups.com
I hadn't seen this reply before I sent my last one...

>
Yeah, there is definitely loss in information of types, but I would
think that the type for the vast majority of properties are pretty
uniform. If you do ?foo=true, than matching for "true" or true seems
pretty acceptable as most collections will have all strings or all
booleans. Are there really very many use cases out there that need to do
a query for "true" the string and exclude true the boolean?

Hardly any, I'll concede. But it would still be great to be able to disambiguate...
 

FIQL actually conveniently defines dates in the same way that de-facto
JSON serializers do. A query could be:
?uploaded=2009-09-28T08:00:00Z

Ah yes, very nice! But how do you specify just a date :)
 
In this case one could even disambiguate strings by requiring that ":"
be escaped in strings (and they normally are within URI components, at
least JS's encodeURIComponent escapes ":").

Another thing I had considered was providing a type disambiguation
prefix, for (hopefully rare) use case of needed to specify a certain type:

?foo=boolean:true
?foo=string:true

I like it -- plus reserving ":" would also allow for future extensibility (and could be used to solve the date I posed above).
 

But I think the vast majority of the time, ?foo=true will be sufficient.

Indeed. But these edge cases always seem to pop up eventually -- damn you, Hofstadter! Reserving ":" AFAICT is a complete and elegant solution.

Reply all
Reply to author
Forward
0 new messages