I'd like to know if I understand the grammar correctly:
json-pointer = *( "/" reference-token )
reference-token = *( unescaped / escaped )
unescaped = %x00-2E / %x30-5B / %x5D-10FFFF
escaped = "^" ( "/" / "^" )
With "^" being %x5C, the caret, it means that a "^" can only be legal
if followed by "/" or "^".
Why isn't the caret a more generalized escape character? Ie, ^ must be
followed by at least one character, and escapes that character. It
would simplify parsing quite a lot imho.
Comments?
--
Francis Galiegue, fgal...@gmail.com
"It seems obvious [...] that at least some 'business intelligence'
tools invest so much intelligence on the business side that they have
nothing left for generating SQL queries" (Stéphane Faroult, in "The
Art of SQL", ISBN 0-596-00894-5)
I'd like to know if I understand the grammar correctly: json-pointer = *( "/" reference-token ) reference-token = *( unescaped / escaped ) unescaped = %x00-2E / %x30-5B / %x5D-10FFFF escaped = "^" ( "/" / "^" ) With "^" being %x5C, the caret, it means that a "^" can only be legal if followed by "/" or "^".
Why isn't the caret a more generalized escape character? Ie, ^ must be followed by at least one character, and escapes that character. It would simplify parsing quite a lot imho.
The following JSON strings are equivalent: "#\/\/key", "#//key",
"#/\/key, "#\//key". Only the second is "fully decoded". All of them
will return the second string when munged by a JSON string decoder.
Ultimately, as the spec currently stands, all point to key "key" under
key "" of an object instance (and not key "/key" under the root).
The choice of a caret as an escape character therefore makes sense.
You'll write "#/^/key" for the fully unescaped pointer, but you can
also write "#\/^/key" etc. The thing is, you obtain an unambiguous
pointer to a location whether the string is "raw", fully JSON encoded,
or partially JSON encoded. A good JSON API will never make the
difference. Example with Jackson:
public static void main(final String... args)
throws IOException
{
final String s1 = "\"#/^/key\""; // "#/^/key"
final String s2 = "\"#\\/^/key\""; // "#\/^/key"
final ObjectMapper mapper = new ObjectMapper();
System.out.println(mapper.readTree(s1).getTextValue());
System.out.println(mapper.readTree(s2).getTextValue());
System.exit(0);
}
This prints #/^/key for both strings, a pointer which unambiguously
points to key "/key" under the root (whether it be written "/key" or
"\/key" in the document).
No, look: when you are given a string which is potentially a JSON
Pointer, _you cannot know in advance whether this string is JSON
encoded or not_.
Consider this sample:
\/
If the string is _not_ JSON encoded, then it would refer literally to
a / within a reference token. HOWEVER, if it was encoded, then it
would equal to:
/
Which is _not the same at all_.
Now, compare this with:
^/
Whether you decode it once or more, the end result will always be ^/,
therefore unambiguous, which stands for a literal / in a reference
token.
This is why the choice of ^ as an escape for JSON Pointers makes
perfect sense to me. I was dubious at first, just like you, until I
grabbed the full implications of paragraph 5 of the spec (although
this particular paragraph only seems to imply that encoding may only
have happened once, as far as I read it).
ok but still the only char which is problematic is the "slash" because it does have it's meaning in json pointer but can be used for a key too.
Therefore this is the only char we must escape in a json pointer.
Also we can always unescape non escaped content, it does not matter.
Therefore for those reasons I would strongly suggest adopting a convention (escaped slashes can only be part of a key) more than a new way of escaping.
This said I now understand the usage of a caret which is not in the json subset, I'm not convinced by adding one more layer of indirection.
URI.normalize("#/^/foo/bar") == "#/%5E/foo/bar"
I propose JSON Pointer use URI component escaping rules for handling
property name delimiters. Example:
{"/foo":{"bar":true}}
becomes:
#/%2Ffoo/bar
This will make JSON Pointer URI compliant, and follows the same
reasoning why JSON Pointer was switched to use "/" instead of "." for
a delimiter.
--Gary
Section 1: "This syntax is intended to be easily expressed in JSON
string values and Uniform Resource Identifier (URI) [RFC3986] fragment
identifiers."
How is this a problem at all? AFAICS, the spec does not mandate
anywhere that the input string should be URL-encoded, or JSON-encoded,
or whatever.
>>
>> #/%2Ffoo/bar
>>
The previous draft (I still have it in my git tree) indeed says that
the / should be encoded as such. But this basically _forces_ the input
string to be URI-escaped. URI-decode once too much and your JSON
Pointer is corrupted. This very subject was brought to light (albeit,
not exactly this way) on this list not so long ago.
The draft as it currently stands does not suffer this limitation. Nor
does it care whether the string is JSON encoded or not. What do we
want more, really?
No, but the spec DOES mandates that it must be "carat encoded". I
don't believe a new escaping mechanism needs to be introduced when an
existing one can be used.
--Gary
Sure, but then why specifically the URI-encoding mechanism? JSON is
not Web-only...
Considering the simple need for speed, with your proposal, if a % is
encountered, an implementation must lookahead _the two next
characters_ and replace with a / only in the appropriate case. What to
do if another sequence is encountered?
With the proposal as it currently stands, JSON Pointers can be passed
through JSON data or URIs all the same, _unabmibguously_. Not so with
the %2F notation. ^ is %5c in URI-encoded strings, so what?
Hmm, sorry for that :/
Wrong; all escaping mechanisms have this problem. Example:
{"^^":true} == "/^^^^"
escape(escape("/^^^^")) == "/^"
> Sure, but then why specifically the URI-encoding mechanism? JSON is
> not Web-only...
As I stated, I believe we can reuse an escaping mechanism.
> Considering the simple need for speed, with your proposal, if a % is
> encountered, an implementation must lookahead _the two next
> characters_ and replace with a / only in the appropriate case. What to
> do if another sequence is encountered?
Consider that URI escaping is extensible and future proof. Any JSON
Pointer parser can handle the current version, but when a new escape
is added (say "^." for example), these older parsers will break. While
URI escaping provides a method for escaping any character.
> With the proposal as it currently stands, JSON Pointers can be passed
> through JSON data or URIs all the same, _unabmibguously_. Not so with
> the %2F notation.
I don't understand the logic in this statement. Regardless of the
escaping method: It's just string data, it can be passed anywhere.
--Gary
No, you didn't understand what I mean.
JSON Pointer data will be transmitted via JSON files or URIs. Which
means, they MAY be JSON encoded or URI encoded. MAY, not MUST.
Which means that if you choose either of these two methods, you get in
the situation where you don't know whether your string is encoded in
either way. Conflict. Conundrum. Call it what you want. But in all
cases, a source of bugs.
With the spec as it currently is, this problem just doesn't surface _at all_.
--Gary
--
You received this message because you are subscribed to the Google Groups "JSON Schema" group.
To post to this group, send email to json-...@googlegroups.com.
To unsubscribe from this group, send email to json-schema...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/json-schema?hl=en.
_____________________________________________________
This electronic message and any files transmitted with it contains
information from iDirect, which may be privileged, proprietary
and/or confidential. It is intended solely for the use of the individual
or entity to whom they are addressed. If you are not the original
recipient or the person responsible for delivering the email to the
intended recipient, be advised that you have received this email
in error, and that any use, dissemination, forwarding, printing, or
copying of this email is strictly prohibited. If you received this email
in error, please delete it and immediately notify the sender.
_____________________________________________________
It mandates that the REFERENCE TOKENS be carat encoded. You don't have
to decode or encode a JSON Pointer. escape(escape("/^^^^")) ==
"/^^^^".
Again: the spec as it currently is is the ONLY proposal that I can see
which makes JSON Pointers unambiguous.
... when you come from the Web world. I don't. I'm more of a systems
programmer, and don't see URI as a "existing widely known and accepted
mechanism". In fact, as a systems programmer, I positively hate it
because URI decoding/encodig is _slow_.
I only know that in some places, including fragments in URIs and JSON
string values, I may encounter a JSON Pointer. I want it to read
unambiguously and be _fast_ to process. In Java, URI's .getFragment()
and Jackson's ObjectMapper guarantee that I get a "pure" string, which
nobody will have tried to "outsmart" in any way since "oh, it's a JSON
Pointer, it's URI encoded/JSON encoded, let's do some work for the guy
below". THIS will lead to bugs, you can be sure about that.