Re: [json] Format & interpretation of URL fragments for JSON resources

Skip to first unread message

Kris Zyp

Feb 26, 2010, 11:06:34 PM2/26/10
to, Jacob Davies,
You may already be aware of this, but a specification for the dot-delimited hash/fragment resolution mechanism is in the JSON Schema I-D (6.2.1) [1]. One thing to be noted that you can specify alternate hash/fragment resolution mechanisms in the schema, the draft just defines dot-delimited as the default. However, we do certainly want the default to be legitimate. I'd be glad to change the draft to slashes if there is consensus that using slashes is more appropriate. However, based on prior conversations [2], I had thought that there was agreement that the stipulations of RFC 3986 didn't need to be strictly applied to hashes, since they aren't transferred over the wire and don't identify resources (they identify internal parts of a resource, and the text you quoted from RFC 3986 refers to how resources are identified). I am certainly open to the idea that slashes might be better though, but since dots are currently in use, I would only want to alter the JSON schema draft if there is sufficient reason.



On 2/26/2010 5:34 PM, Jacob Davies wrote:

I have a question regarding the use of URL fragments (the part after
the # (hash) character in a standard URL) for navigating JSON
resources. So far as I can see from some searches & investigation,
there does not seem to be a firm consensus on the format and
interpretation of them, and there is a fairly major problem with the
most common suggestion I've seen, which is the interpretation of the
fragment as a series of dot-delimited, URL-encoded keys to be used to
navigate through a set of nested JSON objects and arrays.

So, an example. The fragment:

when used to navigate the JSON resource:

    "foo" : {
        "bar" : [

would refer to the value "xyz".

This has the attractive feature of looking like the Javascript or Java
dot-notation for navigating objects.

The problem is that dot/period is explicitly included in the list of
non-reserved characters in URL-encoding:

"For consistency, percent-encoded octets [...] period (%2E) [...]
should not be created by URI producers"

So the simple statement of the format ("dot-delimited, URL-encoded
keys") is either ambiguous or cannot accommodate keys containing

A simple example to illustrate:

"foo" : {
"bar" : "xyz"
"" : "abc"

Does the fragment refer to the value "xyz" or "abc".

Obviously it is straightforward to replace the periods in keys with %2E
and therefore distinguish between these fragments: - intended to refer to "xyz"
#foo%2Ebar - intended to refer to "abc"

But, there are some problems with this procedure, two minor, one major.

The first minor problem is that standard URL-encoding routines do not
replace dots with the %2E escape. The second minor problem is that it
makes it awkward to construct fragments by hand that refer to keys that
contain dots.

The major problem is that this method of interpretation of a URL is
explicitly disallowed. Quoting again from RFC 3986:

"URIs that differ in the replacement of an unreserved character with
its corresponding percent-encoded US-ASCII octet are equivalent: they
identify the same resource."

Clearly this is not true in the above example. Replacement of %2E with
a period changes the interpretation of the fragment. Note that the
word "unreserved" is significant in the above quote - the
replacement of a reserved character by its URL-encoded counterpart IS
allowed to make a difference in distinguishing between resources.

So, I have a suggestion for an alternative format and interpretation,
which is:

"URL fragments contain a slash-delimited, URL-encoded list of keys
used to navigate a JSON structure from the root".

So, given the JSON resource:

"foo" : {
"bar" : "xyz"
"" : "abc",
"foo/bar" : "123"

the contained values can be unambiguously referred to using the

#foo/bar - "xyz" - "abc"
#foo%2Fbar - "123"

Slash IS a reserved character for URL-encoding, which means,
firstly, that we can legitimately distinguish between the first and
last examples there as referring to different resources; secondly,
that standard URL-encoding routines will correctly escape it, and
the wording of the format is unambiguous; and thirdly, that keys
containing dots can be easily used in URLs - in my experience such
keys are far more common than keys containing slashes, and there
have been several recent suggestions for using reversed domain names
in dotted keys as an ad-hoc namespace mechanism in JSON similar to the
use for Java package names, for instance:

"org.itemscript.Name" : "Jacob"

One final note: the use of an initial slash to indicate that the value
is rooted at the top level of the JSON structure seems unnecessary,
since fragment identifiers by definition are global to a given resource
or document.

Anyway, just some thoughts. I know that the dot-delimited fragment
format already has some momentum, but I had to make a decision about
which format to use for something I was working on recently, and after
thinking about it (and using the dot-delimited format for a while) I
found that the problems with dot-delimited were significant enough that
I didn't use it. I do think a consistent interpretation of URL fragments
in JSON resources would be quite useful though.

Jacob Davies



Reply all
Reply to author
0 new messages