About the latest JSON Pointer draft: the role of the caret ('^')

498 views
Skip to first unread message

Francis Galiegue

unread,
Apr 7, 2012, 3:02:32 PM4/7/12
to json-...@googlegroups.com
Hello list,

I'd like to know if I understand the grammar correctly:

json-pointer = *( "/" reference-token )
reference-token = *( unescaped / escaped )
unescaped = %x00-2E / %x30-5B / %x5D-10FFFF
escaped = "^" ( "/" / "^" )

With "^" being %x5C, the caret, it means that a "^" can only be legal
if followed by "/" or "^".

Why isn't the caret a more generalized escape character? Ie, ^ must be
followed by at least one character, and escapes that character. It
would simplify parsing quite a lot imho.

Comments?
--
Francis Galiegue, fgal...@gmail.com
"It seems obvious [...] that at least some 'business intelligence'
tools invest so much intelligence on the business side that they have
nothing left for generating SQL queries" (Stéphane Faroult, in "The
Art of SQL", ISBN 0-596-00894-5)

Paul C. Bryan

unread,
Apr 9, 2012, 1:25:14 AM4/9/12
to json-...@googlegroups.com
On Sat, 2012-04-07 at 21:02 +0200, Francis Galiegue wrote:
I'd like to know if I understand the grammar correctly:

   json-pointer = *( "/" reference-token )
   reference-token = *( unescaped / escaped )
   unescaped = %x00-2E / %x30-5B / %x5D-10FFFF
   escaped = "^" ( "/" / "^" )

With "^" being %x5C, the caret, it means that a "^" can only be legal
if followed by "/" or "^".

Yes, this is currently the intent.


Why isn't the caret a more generalized escape character? Ie, ^ must be
followed by at least one character, and escapes that character. It
would simplify parsing quite a lot imho.

Sounds good to me. Anyone else have an opinion?

Paul

Xample

unread,
Apr 10, 2012, 2:58:10 AM4/10/12
to json-...@googlegroups.com
For my part there I still do not see the benefits of reinventing a caret escaping.
- an escaping exists already in json
- you only need to force the key escaping while writing your json pointer.

Examples:

{
"/key":"slash"
}

which is in json equivalent to (escaped)

{
"\/key":"slash"
}

and therefore should be accessible through:

#\/key

-> we are done…

I would be interested to know why you do not consider this simple approach.

Francis Galiegue

unread,
Apr 10, 2012, 3:57:05 AM4/10/12
to json-...@googlegroups.com
I think the reason is that you cannot know in advance whether the
string value you read is JSON escaped or not. It means you have to go
through a JSON decoding routine before parsing the pointer proper. And
such a routine will not alter non encoded strings.

The following JSON strings are equivalent: "#\/\/key", "#//key",
"#/\/key, "#\//key". Only the second is "fully decoded". All of them
will return the second string when munged by a JSON string decoder.
Ultimately, as the spec currently stands, all point to key "key" under
key "" of an object instance (and not key "/key" under the root).

The choice of a caret as an escape character therefore makes sense.
You'll write "#/^/key" for the fully unescaped pointer, but you can
also write "#\/^/key" etc. The thing is, you obtain an unambiguous
pointer to a location whether the string is "raw", fully JSON encoded,
or partially JSON encoded. A good JSON API will never make the
difference. Example with Jackson:

public static void main(final String... args)
throws IOException
{
final String s1 = "\"#/^/key\""; // "#/^/key"
final String s2 = "\"#\\/^/key\""; // "#\/^/key"
final ObjectMapper mapper = new ObjectMapper();

System.out.println(mapper.readTree(s1).getTextValue());
System.out.println(mapper.readTree(s2).getTextValue());

System.exit(0);
}


This prints #/^/key for both strings, a pointer which unambiguously
points to key "/key" under the root (whether it be written "/key" or
"\/key" in the document).

Xample

unread,
Apr 10, 2012, 4:35:02 AM4/10/12
to json-...@googlegroups.com
ok but still the only char which is problematic is the "slash" because it does have it's meaning in json pointer but can be used for a key too.
Therefore this is the only char we must escape in a json pointer.
Also we can always unescape non escaped content, it does not matter.
Therefore for those reasons I would strongly suggest adopting a convention (escaped slashes can only be part of a key) more than a new way of escaping.
This said I now understand the usage of a caret which is not in the json subset, I'm not convinced by adding one more layer of indirection.

Francis Galiegue

unread,
Apr 10, 2012, 2:41:06 PM4/10/12
to json-...@googlegroups.com
On Tue, Apr 10, 2012 at 10:35, Xample <flavien...@gmail.com> wrote:
> ok but still the only char which is problematic is the "slash" because it
> does have it's meaning in json pointer but can be used for a key too.
> Therefore this is the only char we must escape in a json pointer.

No, look: when you are given a string which is potentially a JSON
Pointer, _you cannot know in advance whether this string is JSON
encoded or not_.

Consider this sample:

\/

If the string is _not_ JSON encoded, then it would refer literally to
a / within a reference token. HOWEVER, if it was encoded, then it
would equal to:

/

Which is _not the same at all_.

Now, compare this with:

^/

Whether you decode it once or more, the end result will always be ^/,
therefore unambiguous, which stands for a literal / in a reference
token.

This is why the choice of ^ as an escape for JSON Pointers makes
perfect sense to me. I was dubious at first, just like you, until I
grabbed the full implications of paragraph 5 of the spec (although
this particular paragraph only seems to imply that encoding may only
have happened once, as far as I read it).

Paul C. Bryan

unread,
Apr 11, 2012, 11:11:46 AM4/11/12
to json-...@googlegroups.com
On Tue, 2012-04-10 at 01:35 -0700, Xample wrote:

ok but still the only char which is problematic is the "slash" because it does have it's meaning in json pointer but can be used for a key too.
Or, more precisely, a slash can be contained within a key.


Therefore this is the only char we must escape in a json pointer.
Correct.


Also we can always unescape non escaped content, it does not matter.
I'm not sure what this means.


Therefore for those reasons I would strongly suggest adopting a convention (escaped slashes can only be part of a key) more than a new way of escaping.
I cannot control what an arbitrary JSON document uses in object member keys. If you're going to use JSON Pointer (or a spec that depends on it), it's a good idea not to use empty keys, or keys with slashes, so as to keep pointers simple.


This said I now understand the usage of a caret which is not in the json subset, I'm not convinced by adding one more layer of indirection.
I'm not sure what you mean.

Paul

Gary Court

unread,
Apr 12, 2012, 5:00:21 PM4/12/12
to json-...@googlegroups.com
Using the carat "^" character code for escaping is a bad idea,
especially when the intent is to use JSON Pointers in URIs. This is
because this character code is not a valid URI character key, and will
be escaped by any RFC compliant URI parsing library. Example:

URI.normalize("#/^/foo/bar") == "#/%5E/foo/bar"

I propose JSON Pointer use URI component escaping rules for handling
property name delimiters. Example:

{"/foo":{"bar":true}}

becomes:

#/%2Ffoo/bar

This will make JSON Pointer URI compliant, and follows the same
reasoning why JSON Pointer was switched to use "/" instead of "." for
a delimiter.

--Gary

Francis Galiegue

unread,
Apr 12, 2012, 6:40:50 PM4/12/12
to json-...@googlegroups.com

Section 1: "This syntax is intended to be easily expressed in JSON
string values and Uniform Resource Identifier (URI) [RFC3986] fragment
identifiers."

How is this a problem at all? AFAICS, the spec does not mandate
anywhere that the input string should be URL-encoded, or JSON-encoded,
or whatever.

Francis Galiegue

unread,
Apr 12, 2012, 6:56:35 PM4/12/12
to json-...@googlegroups.com
And BTW this particular example is exactly why I welcome the use of the caret:

>>
>> #/%2Ffoo/bar
>>

The previous draft (I still have it in my git tree) indeed says that
the / should be encoded as such. But this basically _forces_ the input
string to be URI-escaped. URI-decode once too much and your JSON
Pointer is corrupted. This very subject was brought to light (albeit,
not exactly this way) on this list not so long ago.

The draft as it currently stands does not suffer this limitation. Nor
does it care whether the string is JSON encoded or not. What do we
want more, really?

Gary Court

unread,
Apr 12, 2012, 6:57:53 PM4/12/12
to json-...@googlegroups.com
On Thu, Apr 12, 2012 at 4:40 PM, Francis Galiegue <fgal...@gmail.com> wrote:
> On Thu, Apr 12, 2012 at 23:00, Gary Court <gary....@gmail.com> wrote:
>> Using the carat "^" character code for escaping is a bad idea,
>> especially when the intent is to use JSON Pointers in URIs. This is
>> because this character code is not a valid URI character key, and will
>> be escaped by any RFC compliant URI parsing library. Example:
>>
>> URI.normalize("#/^/foo/bar") == "#/%5E/foo/bar"
>>
>> I propose JSON Pointer use URI component escaping rules for handling
>> property name delimiters. Example:
>>
>> {"/foo":{"bar":true}}
>>
>> becomes:
>>
>> #/%2Ffoo/bar
>>
>> This will make JSON Pointer URI compliant, and follows the same
>> reasoning why JSON Pointer was switched to use "/" instead of "." for
>> a delimiter.
>>
>> --Gary
>>
>
> Section 1: "This syntax is intended to be easily expressed in JSON
> string values and Uniform Resource Identifier (URI) [RFC3986] fragment
> identifiers."
>
> How is this a problem at all? AFAICS, the spec does not mandate
> anywhere that the input string should be URL-encoded, or JSON-encoded,
> or whatever.
>

No, but the spec DOES mandates that it must be "carat encoded". I
don't believe a new escaping mechanism needs to be introduced when an
existing one can be used.

--Gary

Francis Galiegue

unread,
Apr 12, 2012, 7:06:52 PM4/12/12
to json-...@googlegroups.com
On Fri, Apr 13, 2012 at 00:57, Gary Court <gary....@gmail.com> wrote:
[...]

>>
>> How is this a problem at all? AFAICS, the spec does not mandate
>> anywhere that the input string should be URL-encoded, or JSON-encoded,
>> or whatever.
>>
>
> No, but the spec DOES mandates that it must be "carat encoded". I
> don't believe a new escaping mechanism needs to be introduced when an
> existing one can be used.
>

Sure, but then why specifically the URI-encoding mechanism? JSON is
not Web-only...

Considering the simple need for speed, with your proposal, if a % is
encountered, an implementation must lookahead _the two next
characters_ and replace with a / only in the appropriate case. What to
do if another sequence is encountered?

With the proposal as it currently stands, JSON Pointers can be passed
through JSON data or URIs all the same, _unabmibguously_. Not so with
the %2F notation. ^ is %5c in URI-encoded strings, so what?

Francis Galiegue

unread,
Apr 12, 2012, 7:12:53 PM4/12/12
to json-...@googlegroups.com
> [...] _unabmibguously_.

Hmm, sorry for that :/

Gary Court

unread,
Apr 12, 2012, 7:16:51 PM4/12/12
to json-...@googlegroups.com
On Thu, Apr 12, 2012 at 4:56 PM, Francis Galiegue <fgal...@gmail.com> wrote:
> And BTW this particular example is exactly why I welcome the use of the caret:
>
>>>
>>> #/%2Ffoo/bar
>>>
>
> The previous draft (I still have it in my git tree) indeed says that
> the / should be encoded as such. But this basically _forces_ the input
> string to be URI-escaped. URI-decode once too much and your JSON
> Pointer is corrupted.
> The draft as it currently stands does not suffer this limitation.

Wrong; all escaping mechanisms have this problem. Example:

{"^^":true} == "/^^^^"
escape(escape("/^^^^")) == "/^"

> Sure, but then why specifically the URI-encoding mechanism? JSON is
> not Web-only...

As I stated, I believe we can reuse an escaping mechanism.

> Considering the simple need for speed, with your proposal, if a % is
> encountered, an implementation must lookahead _the two next
> characters_ and replace with a / only in the appropriate case. What to
> do if another sequence is encountered?

Consider that URI escaping is extensible and future proof. Any JSON
Pointer parser can handle the current version, but when a new escape
is added (say "^." for example), these older parsers will break. While
URI escaping provides a method for escaping any character.

> With the proposal as it currently stands, JSON Pointers can be passed
> through JSON data or URIs all the same, _unabmibguously_. Not so with
> the %2F notation.

I don't understand the logic in this statement. Regardless of the
escaping method: It's just string data, it can be passed anywhere.

--Gary

Xample

unread,
Apr 13, 2012, 3:23:51 AM4/13/12
to json-...@googlegroups.com
I wrote already some posts about this in my opinion:
- We should distinguish URI to json pointer path. This is the job of the browser to escape it this is applicable. For instance "www.a test.com" will be escaped to "www.a%20test.com"
- The benefits are more readable schema and independence of the used protocol behind (I'm not always using schemas within an url context)
- As we can benefit from a json specification including it's own escaping AND describing already an escaping for the slash "/" we should make use of this property cleverly.
- Only slashes MUST be escaped within a json pointer, that's all. Given that "\/" is a valid equivalent for "/" in json, we are all done.

Example:
#a/tricky/\/StartingSlash/path

split this with a regex (?<!\\)\/

gives you

["#a","tricky","\/StartingSlash","path"]

Note that here you need a regex compatible "look behind", take a look here for js implementation http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript

As "\/StartingSlash" is a valid json key equivalent for { "/StartingSlash" : "something" } you are done

Francis Galiegue

unread,
Apr 13, 2012, 7:24:32 AM4/13/12
to json-...@googlegroups.com
On Fri, Apr 13, 2012 at 01:16, Gary Court <gary....@gmail.com> wrote:
[...]
>
> Wrong; all escaping mechanisms have this problem. Example:
>
> {"^^":true} == "/^^^^"
> escape(escape("/^^^^")) == "/^"
>

No, you didn't understand what I mean.

JSON Pointer data will be transmitted via JSON files or URIs. Which
means, they MAY be JSON encoded or URI encoded. MAY, not MUST.

Which means that if you choose either of these two methods, you get in
the situation where you don't know whether your string is encoded in
either way. Conflict. Conundrum. Call it what you want. But in all
cases, a source of bugs.

With the spec as it currently is, this problem just doesn't surface _at all_.

Armishev, Sergey

unread,
Apr 13, 2012, 5:51:05 PM4/13/12
to json-...@googlegroups.com
I absolutely agree with Gary - better to reuse existing widely known and accepted mechanism and have much less headache for developers/new comers and better and quicker acceptance by the companies. How can somebody start using product that requires special escaping, special software for that and googling for those special documents and discussions regarding such simple operation? What can I say looking at those curet escaped strings? My first reaction on ^ is the beginning of line in regular expression not "json escaping"
-Sergey

--Gary

--
You received this message because you are subscribed to the Google Groups "JSON Schema" group.
To post to this group, send email to json-...@googlegroups.com.
To unsubscribe from this group, send email to json-schema...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/json-schema?hl=en.


_____________________________________________________
This electronic message and any files transmitted with it contains
information from iDirect, which may be privileged, proprietary
and/or confidential. It is intended solely for the use of the individual
or entity to whom they are addressed. If you are not the original
recipient or the person responsible for delivering the email to the
intended recipient, be advised that you have received this email
in error, and that any use, dissemination, forwarding, printing, or
copying of this email is strictly prohibited. If you received this email
in error, please delete it and immediately notify the sender.
_____________________________________________________

Francis Galiegue

unread,
Apr 14, 2012, 1:27:45 AM4/14/12
to json-...@googlegroups.com
On Fri, Apr 13, 2012 at 00:57, Gary Court <gary....@gmail.com> wrote:
[...]
>
> No, but the spec DOES mandates that it must be "carat encoded". I
> don't believe a new escaping mechanism needs to be introduced when an
> existing one can be used.
>

It mandates that the REFERENCE TOKENS be carat encoded. You don't have
to decode or encode a JSON Pointer. escape(escape("/^^^^")) ==
"/^^^^".

Again: the spec as it currently is is the ONLY proposal that I can see
which makes JSON Pointers unambiguous.

Francis Galiegue

unread,
Apr 14, 2012, 1:58:42 AM4/14/12
to json-...@googlegroups.com
On Fri, Apr 13, 2012 at 23:51, Armishev, Sergey <sarm...@idirect.net> wrote:
> I absolutely agree with Gary - better to reuse existing widely known and accepted mechanism

... when you come from the Web world. I don't. I'm more of a systems
programmer, and don't see URI as a "existing widely known and accepted
mechanism". In fact, as a systems programmer, I positively hate it
because URI decoding/encodig is _slow_.

I only know that in some places, including fragments in URIs and JSON
string values, I may encounter a JSON Pointer. I want it to read
unambiguously and be _fast_ to process. In Java, URI's .getFragment()
and Jackson's ObjectMapper guarantee that I get a "pure" string, which
nobody will have tried to "outsmart" in any way since "oh, it's a JSON
Pointer, it's URI encoded/JSON encoded, let's do some work for the guy
below". THIS will lead to bugs, you can be sure about that.

Xample

unread,
Apr 14, 2012, 10:35:11 AM4/14/12
to json-...@googlegroups.com
Well… please tell me what is wrong with my proposal ?, theoretically speaking in json the following is not allowed:
{
"/":"slash"
}

according to the spec we must write it
{
"\/":"slash"
}

now if your java memory representation uses a "/" for the key it's up to you but think that while serializing your object you should then convert (escape) back your keys.

I don't really get it, because it looks like everything have been thought in json and we would like to add a new escaping.

Xample

unread,
Apr 14, 2012, 11:40:52 AM4/14/12
to json-...@googlegroups.com
Update: Having a doubt I reread the json RFC and as I understand we are not obliged to escape the solidus. (it is not a control-character).
Therefore
{
"/":"slash"
}
is valid…
Reply all
Reply to author
Forward
0 new messages