Re: [json-schema] id, $ref, and JSON Pointer implementation

1,353 views
Skip to first unread message

Francis Galiegue

unread,
Dec 20, 2012, 8:45:30 PM12/20/12
to json-...@googlegroups.com
On Fri, Dec 21, 2012 at 1:54 AM, <diamon...@users.sourceforge.net> wrote:
[...]
>
> Finally, there is the issue of how to interpret differences between v3 and
> future drafts. Some people have suggested using the latest draft of JSON
> Pointer for resolving JSON Schema draft v3 URI fragments. While I don't mind
> interpreting the specification to make it internally consistent, doing this
> is not necessary to accomplish a consistent implementation. JSON Schema
> draft v3 specifies URI encoding of property names only. I'd also like to see
> this in JSON Pointer, and that the current JSON Pointer syntax is
> unnecessarily complex by deviating from the URI encoding standard, which
> starting at v3, and currently, treats ~ as a special character, when it is
> normally a valid URI character needing no special encoding. %2f is
> sufficient as an escape for the / character. Since / is a reserved URI
> character, it is not the same as %2F in a URI (unlike <a> and <%61> which
> are equivalent URIs).
>

JSON string values are not URI encoded by default, which is why this
format is used. Also, keep in mind that JSON Pointer has two
representations, one as a "plain" JSON String, and another as a URI
encoded fragment:

JSON Pointer <-> URI
"" <-> "#",
"/~0" <-> "#/~0"
"/a b" <->#/a%20b",

etc

Also note that / is not a reserved URI character as far as the
fragment part is concerned!

> Finally finally, I'd note I make a subtle distinction between </schema.json>
> and </schema.json#> where schema.json is an application/json document: The
> former identifies a byte stream, an information resource, the latter
> identifies a conceptual resource, a parsed JSON object, like one in memory.
> They can both be used the same to refer to a JSON Schema, though.
>

JSON Schema defines (well, proposes) "application/schema+json" so yes,
this distinction is important.

--
Francis Galiegue, fgal...@gmail.com
"It seems obvious [...] that at least some 'business intelligence'
tools invest so much intelligence on the business side that they have
nothing left for generating SQL queries" (Stéphane Faroult, in "The
Art of SQL", ISBN 0-596-00894-5)

Francis Galiegue

unread,
Dec 21, 2012, 6:04:42 AM12/21/12
to json-...@googlegroups.com
On Fri, Dec 21, 2012 at 3:37 AM, <diamon...@users.sourceforge.net> wrote:
[...]
>
>
> My point is that this is an unnecessary extra step. If %2d were used as the
> escape for "/", and %25 to escape "%", instead of using the ~ form, then
> there would be no additional URI encoding necessary, it would be a single
> step, not two steps.
>

And you would then be confused as to whether you need the decoding at
all etc. The way JSON Pointer is defined, it is unanbiguous.

JSON Pointer has been defined as such for a reason ;)

Geraint (David)

unread,
Dec 21, 2012, 1:52:46 PM12/21/12
to json-...@googlegroups.com, diamon...@users.sourceforge.net
On Friday, December 21, 2012 12:54:19 AM UTC, diamon...@users.sourceforge.net wrote:
While there seems to be lots of discussion on the topic of $ref, id, and JSON Pointer, I haven't found any discussion on my issue in particular. My issue impacts both how to interpret the v3 draft and making proposals for future drafts.

Currently the v3 draft specifies:

If id is missing, the current URI of a schema is defined to be that of the parent schema. The current URI of the schema is also used to construct relative references such as for $ref.

Literally, this is impossible: two different resources, by definition, cannot have the same URI (a resource is anything uniquely identifiable by a URI). I read this to mean that the URI base of a schema is unchanged from the parent, not that the schema can somehow be identified by the same URI that another schema identifies by.

Yes, I think that's the only obvious interpretation.
 
This issue has also been raised as an issue to abandon the id attribute, but this does not follow. What if two schemas are defined with the same URI? In the Web this is impossible as long as the data being utilized is correct. It's not the library's job to determine what is correct, and what is false or erroneous data. There's plenty of schemas which, while "wrong" (maybe it's malicious, stale, outdated, or otherwise incorrect), will still be internally consistent and will validate instances without raising any error (the error is only uncovered in certain impossible situations, and if uncovered, the validator should raise a Schema Error, the same kind of error it would emit if it's passed {"divisibleBy": 0}). It's the developer's job to make sure that the schemas being passed to a validation library are correct, and if so, this error will never be raised.

This was not quite the issue Francis raised - it was not to do with schemas inheriting a (base) URI from a parent schema.

His criticism was around the idea that "id" allowed schemas to claim to represent URIs other than the one from which they were fetched, therefore leading to possible inconsistency if the two versions disagree.  I agree this is possible, but categorised it as "servers breaking their own promises" which naturally will cause inconsistent behaviour.  Given the quite stringent guidlines layed out in the standard about when clients should believe the claims made by "id", I don't have a problem with this at all - servers can only screw themselves up, not other people.

But anyway - the argument you mention was not the proposed reason for removing "id".
 
Some JSON Schema validators may have an option to automatically deference URIs. In this case, it's taking on responsibility beyond the scope of the JSON Schema, and should make sure that schemas with the same URI re-defined in multiple places are consistent with one another, that the HTTP cache is not stale, that the external URI is a trusted third party or that a malicious party's schema is validated and checked for security, and so on.

That sounds pretty reasonable, and is pretty much how I feel.  The only problem I have with "id" is the fact that some big players aren't using it properly (I'm looking at you, Google... >_>)

Let Parse be the JSON Schema validation function with three arguments: First, the instance being validated; second, the schema being validated against; and third, the base which is the document URI of the schema, or URI to which the URIs in the base of the schema are resolved against (e.g. if in the presence of an HTML <base/> tag);
Let superValid = Parse(instance, schema, base);
Let subBase = schema.id, if it exists, or <> otherwise (<> being the blank URI), resolved against base;
Let subValid = Parse(instance[x], schema.properties[x], subBase)
 
The schema resolution algorithm Parse I call consistent if subValid returns valid whenever superValid returns valid, and superValid returns invalid whenever subValid returns invalid, and where "properties" is any attribute containing a sub-schema used to process some corresponding instance sub-value, for every possible value of instance, schema, base, and x. (This means every possible value of schema for this test contains a sub-schema.)

Yes - that is how schema validation is defined for things like "properties", "items", and so on.  Either the sub-schemas validate, or the parent schema does not.

The only exception is "disallow" (replaced with "not" in v4).
 
I believe consistency is a desirable trait for JSON Schema because it allows developers to combine schemas or substitute them into {"$ref":"..."} expressions with little computational effort or consideration. If the URI <#> were to always resolve against base and not the base of the current schema (i.e. the parent schema id if it exists), this would be inconstistent: If I extracted a JSON Schema from a super schema,  the base would change, and thus the meaning of <#> within that schema. Merely extracting a schema from another schema would no longer be enough work, I would have to actually modify the values of $ref properties wherever they appear. If the algorithm were consistent  however, I merely need to note the base of the sub-schema, and provide it to the parser as the document URL, and no modification of the schema object itself is necessary.

That makes sense only if you consider the result of the "$ref" to be actually part of the parent schema document, which I think would be fairly radical.  If you instead consider that the target of $ref should be "spliced in", but retain information about its own URI that is separate from the original
 
To create a consistent validation function, I implement the JSON Schema draft v3 with the understanding I mentioned above (that only the schema URI base is inherited, not the id itself). This is also consistent with the behavior of the Web, where any resource may be given any number of URIs (by definition), and where arbitrary fragments may also identify a resource within a document (like anchors in HTML):
  1. Let target be the value of the $ref property resolved against the URI base, which is the parent schema URI (or their parent as necessary, ad infinitum, and failing that, the document URL, if available)
  2. If a schema exists with target, then
    • Use the schema at target and end
  3. Split target into deref and fragment, containing the part before the fragment and the fragment (if any), respectively
  4. If deref equals target, then
    • Raise a schema does-not-exist error and end
  5. If a schema exists at deref, then
    • Let schema be the schema found by starting at the schema identified by deref and walking it according to fragment
    • If schema does not exist, then
      • Raise a schema does-not-exist error and end
    • Else use schema and end
  6. Else raise a schema does-not-exist error and end
Interestingly - this sequence of parent schemas is exactly how I interpreted it.  Francis, however, understood "parent schema" to be equivalent to "root schema".  While I disagree, I do admit that his interpretation is consistent (as well as being quite a lot easier to implement).

I don't understand #4, though - if the fragment part is empty, won't that always fail?
 
This algorithm, complete with URI resolution relative to a base, is now implemented in <https://github.com/tdegrunt/jsonschema> and it passes all the tests in <https://github.com/json-schema/JSON-Schema-Test-Suite> (except some fragment-escaping-related tests, as described below -- this isn't terribly meaningful since there's a lack of relative URI resolution tests).

This allows for a schema to be identified by multiple URIs. This is not a novel concept, and again is seen throughout the Web, like Semantic Web applications (which, I may note, is an increasingly heavy user of JSON and hyper-JSON).

Finally, there is the issue of how to interpret differences between v3 and future drafts. Some people have suggested using the latest draft of JSON Pointer for resolving JSON Schema draft v3 URI fragments. While I don't mind interpreting the specification to make it internally consistent, doing this is not necessary to accomplish a consistent implementation. JSON Schema draft v3 specifies URI encoding of property names only. I'd also like to see this in JSON Pointer, and that the current JSON Pointer syntax is unnecessarily complex by deviating from the URI encoding standard, which starting at v3, and currently, treats ~ as a special character, when it is normally a valid URI character needing no special encoding. %2f is sufficient as an escape for the / character. Since / is a reserved URI character, it is not the same as %2F in a URI (unlike <a> and <%61> which are equivalent URIs).

No, it's not necessary.  Using "id" for fragment resolution, however, is not elegant, and not as flexible.  Throw in the fact that Google are mis-using "id", and you can see why we're more keen on JSON Pointer.

If you have complaints about JSON Pointer, then go take it up with the people who wrote that standard.  However, it's flexible, we don't have to specify our own fragment-resolution algorithms, and on top of that - I think JSON Pointer is fine, and I have no issue with ~-escaping.
 
Finally finally, I'd note I make a subtle distinction between </schema.json> and </schema.json#> where schema.json is an application/json document: The former identifies a byte stream, an information resource, the latter identifies a conceptual resource, a parsed JSON object, like one in memory. They can both be used the same to refer to a JSON Schema, though.

The web convention is that "<URI>" and "<URI>#" represent the same resource.  Given that they should be able to be used interchangeably, I'm afraid I can't quite see the benefit of trying to define any distinction between them.

However, if you're just talking about normalising the URIs of all your JSON data once it's been parsed (by adding an empty fragment if needed), then go for it.
 
Austin Wright.

Geraint 

Austin Wright

unread,
Dec 21, 2012, 6:47:43 PM12/21/12
to json-...@googlegroups.com, diamon...@users.sourceforge.net


On Friday, December 21, 2012 11:52:46 AM UTC-7, Geraint (David) wrote:
On Friday, December 21, 2012 12:54:19 AM UTC, diamon...@users.sourceforge.net wrote:
While there seems to be lots of discussion on the topic of $ref, id, and JSON Pointer, I haven't found any discussion on my issue in particular. My issue impacts both how to interpret the v3 draft and making proposals for future drafts.

Currently the v3 draft specifies:

If id is missing, the current URI of a schema is defined to be that of the parent schema. The current URI of the schema is also used to construct relative references such as for $ref.

Literally, this is impossible: two different resources, by definition, cannot have the same URI (a resource is anything uniquely identifiable by a URI). I read this to mean that the URI base of a schema is unchanged from the parent, not that the schema can somehow be identified by the same URI that another schema identifies by.

Yes, I think that's the only obvious interpretation.
 
Alright then, nice to know I'm not totally going out on a limb. Adopting vocabulary like "URI base" would be nice though.

 
This issue has also been raised as an issue to abandon the id attribute, but this does not follow. What if two schemas are defined with the same URI? In the Web this is impossible as long as the data being utilized is correct. It's not the library's job to determine what is correct, and what is false or erroneous data. There's plenty of schemas which, while "wrong" (maybe it's malicious, stale, outdated, or otherwise incorrect), will still be internally consistent and will validate instances without raising any error (the error is only uncovered in certain impossible situations, and if uncovered, the validator should raise a Schema Error, the same kind of error it would emit if it's passed {"divisibleBy": 0}). It's the developer's job to make sure that the schemas being passed to a validation library are correct, and if so, this error will never be raised.

This was not quite the issue Francis raised - it was not to do with schemas inheriting a (base) URI from a parent schema.

His criticism was around the idea that "id" allowed schemas to claim to represent URIs other than the one from which they were fetched, therefore leading to possible inconsistency if the two versions disagree.  I agree this is possible, but categorised it as "servers breaking their own promises" which naturally will cause inconsistent behaviour.  Given the quite stringent guidlines layed out in the standard about when clients should believe the claims made by "id", I don't have a problem with this at all - servers can only screw themselves up, not other people.

But anyway - the argument you mention was not the proposed reason for removing "id".

Hopefully I impress why the id attribute is important, though.
 
 
Some JSON Schema validators may have an option to automatically deference URIs. In this case, it's taking on responsibility beyond the scope of the JSON Schema, and should make sure that schemas with the same URI re-defined in multiple places are consistent with one another, that the HTTP cache is not stale, that the external URI is a trusted third party or that a malicious party's schema is validated and checked for security, and so on.

That sounds pretty reasonable, and is pretty much how I feel.  The only problem I have with "id" is the fact that some big players aren't using it properly (I'm looking at you, Google... >_>)

Let Parse be the JSON Schema validation function with three arguments: First, the instance being validated; second, the schema being validated against; and third, the base which is the document URI of the schema, or URI to which the URIs in the base of the schema are resolved against (e.g. if in the presence of an HTML <base/> tag);
Let superValid = Parse(instance, schema, base);
Let subBase = schema.id, if it exists, or <> otherwise (<> being the blank URI), resolved against base;
Let subValid = Parse(instance[x], schema.properties[x], subBase)
 
The schema resolution algorithm Parse I call consistent if subValid returns valid whenever superValid returns valid, and superValid returns invalid whenever subValid returns invalid, and where "properties" is any attribute containing a sub-schema used to process some corresponding instance sub-value, for every possible value of instance, schema, base, and x. (This means every possible value of schema for this test contains a sub-schema.)

Yes - that is how schema validation is defined for things like "properties", "items", and so on.  Either the sub-schemas validate, or the parent schema does not.

The only exception is "disallow" (replaced with "not" in v4).
 
I believe consistency is a desirable trait for JSON Schema because it allows developers to combine schemas or substitute them into {"$ref":"..."} expressions with little computational effort or consideration. If the URI <#> were to always resolve against base and not the base of the current schema (i.e. the parent schema id if it exists), this would be inconstistent: If I extracted a JSON Schema from a super schema,  the base would change, and thus the meaning of <#> within that schema. Merely extracting a schema from another schema would no longer be enough work, I would have to actually modify the values of $ref properties wherever they appear. If the algorithm were consistent  however, I merely need to note the base of the sub-schema, and provide it to the parser as the document URL, and no modification of the schema object itself is necessary.

That makes sense only if you consider the result of the "$ref" to be actually part of the parent schema document, which I think would be fairly radical.  If you instead consider that the target of $ref should be "spliced in", but retain information about its own URI that is separate from the original

What do you mean by "parent schema document" exactly? I'm saying the referenced object is not merely substituted in, but the referenced object keeps its own URI base, otherwise equivalent to substitution.

 
To create a consistent validation function, I implement the JSON Schema draft v3 with the understanding I mentioned above (that only the schema URI base is inherited, not the id itself). This is also consistent with the behavior of the Web, where any resource may be given any number of URIs (by definition), and where arbitrary fragments may also identify a resource within a document (like anchors in HTML):
  1. Let target be the value of the $ref property resolved against the URI base, which is the parent schema URI (or their parent as necessary, ad infinitum, and failing that, the document URL, if available)
  2. If a schema exists with target, then
    • Use the schema at target and end
  3. Split target into deref and fragment, containing the part before the fragment and the fragment (if any), respectively
  4. If deref equals target, then
    • Raise a schema does-not-exist error and end
  5. If a schema exists at deref, then
    • Let schema be the schema found by starting at the schema identified by deref and walking it according to fragment
    • If schema does not exist, then
      • Raise a schema does-not-exist error and end
    • Else use schema and end
  6. Else raise a schema does-not-exist error and end
Interestingly - this sequence of parent schemas is exactly how I interpreted it.  Francis, however, understood "parent schema" to be equivalent to "root schema".  While I disagree, I do admit that his interpretation is consistent (as well as being quite a lot easier to implement).

Francis' implementation doesn't lead to consistent results, as I defined consistency. If I have a {"$ref": "#"} expression somewhere in my JSON schema then the result of the lookup would be dependent on the root schema URI, when it instead should always yield the same results whenever the environment URI is the same, e.g., if I have {"id": "http://example.com/cars.json", additionalProperties:{"$ref":""}} then it should always refer to itself no matter where it appears, or the document URI. However, in Francis' implementation, it does not, therefore I would not call it consistent.

The algorithm that I laid out I believe is easier to implement: Keeping an internal table of schemas as a map (AbsURI -> Schema), that whenever you lookup a schema from that map, just change your URI base to the looked-up URI. But in any event, the point of a JSON Schema implementation is to do the heavy lifting so the developer doesn't need to.
 
I don't understand #4, though - if the fragment part is empty, won't that always fail?
 
That's the point. If no schema at target exists, and there's no fragment, then it's follows there's no schema at the URI without the fragment (it's the same URI of course), so that's a failure.
 
 
This algorithm, complete with URI resolution relative to a base, is now implemented in <https://github.com/tdegrunt/jsonschema> and it passes all the tests in <https://github.com/json-schema/JSON-Schema-Test-Suite> (except some fragment-escaping-related tests, as described below -- this isn't terribly meaningful since there's a lack of relative URI resolution tests).

This allows for a schema to be identified by multiple URIs. This is not a novel concept, and again is seen throughout the Web, like Semantic Web applications (which, I may note, is an increasingly heavy user of JSON and hyper-JSON).

Finally, there is the issue of how to interpret differences between v3 and future drafts. Some people have suggested using the latest draft of JSON Pointer for resolving JSON Schema draft v3 URI fragments. While I don't mind interpreting the specification to make it internally consistent, doing this is not necessary to accomplish a consistent implementation. JSON Schema draft v3 specifies URI encoding of property names only. I'd also like to see this in JSON Pointer, and that the current JSON Pointer syntax is unnecessarily complex by deviating from the URI encoding standard, which starting at v3, and currently, treats ~ as a special character, when it is normally a valid URI character needing no special encoding. %2f is sufficient as an escape for the / character. Since / is a reserved URI character, it is not the same as %2F in a URI (unlike <a> and <%61> which are equivalent URIs).

No, it's not necessary.  Using "id" for fragment resolution, however, is not elegant, and not as flexible.  Throw in the fact that Google are mis-using "id", and you can see why we're more keen on JSON Pointer.

How does one use the id attribute for fragment resolution? If you want to put fragments in "id" I'd think you can do so (e.g. a schema {"id": "http://example.com/spec#schema"} is legal and referencable), but it's largely unnecessary.
 
If you have complaints about JSON Pointer, then go take it up with the people who wrote that standard. However, it's flexible, we don't have to specify our own fragment-resolution algorithms, and on top of that - I think JSON Pointer is fine, and I have no issue with ~-escaping.

It appears to be directly derived from JSON Schema, including the authors. While media types are allowed to define their own fragment resolution schemes, JSON does not do so, nor is JSON Pointer an attempt to to so. Perhaps a new specification entirely is in order.
 
 
Finally finally, I'd note I make a subtle distinction between </schema.json> and </schema.json#> where schema.json is an application/json document: The former identifies a byte stream, an information resource, the latter identifies a conceptual resource, a parsed JSON object, like one in memory. They can both be used the same to refer to a JSON Schema, though.

The web convention is that "<URI>" and "<URI>#" represent the same resource.  Given that they should be able to be used interchangeably, I'm afraid I can't quite see the benefit of trying to define any distinction between them.

They'll both go to the same webpage, but they're not (necessarily) the same resource. Tim Berners-Lee's argument that "HTTP URIs (without "#") should be understood as referring to documents, not cars." (I have a slightly different take on the so-called httpRange-14 issue, that cars can have URIs without fragments so long as you use Content-Type negotiation, but I digress, but the point is, <URI> and <URI#> are not necessarily the same resource.)
 

Francis Galiegue

unread,
Dec 21, 2012, 7:08:04 PM12/21/12
to json-...@googlegroups.com, diamon...@users.sourceforge.net
On Sat, Dec 22, 2012 at 12:47 AM, Austin Wright
<diamon...@users.sourceforge.net> wrote:
[...]
>
> The algorithm that I laid out I believe is easier to implement: Keeping an
> internal table of schemas as a map (AbsURI -> Schema), that whenever you
> lookup a schema from that map, just change your URI base to the looked-up
> URI. But in any event, the point of a JSON Schema implementation is to do
> the heavy lifting so the developer doesn't need to.
>

You seem to be assuming the fact that any fully resolved ID will be
absolute. And that is not the case.

You can perfectly well load this schema locally:

{
"id": "foo",
"whatever": [ "you", "want" ],
"sub": {
"id": "other"
}
}

No URI is absolute here. The Internet does not resolve around HTTP!
But with my implementation, I can fully, and accurately, resolve any
$ref into that schema.

Austin Wright

unread,
Dec 21, 2012, 7:45:50 PM12/21/12
to json-...@googlegroups.com


On Friday, December 21, 2012 5:08:04 PM UTC-7, fge wrote:
On Sat, Dec 22, 2012 at 12:47 AM, Austin Wright
<diamon...@users.sourceforge.net> wrote:
[...]
>
> The algorithm that I laid out I believe is easier to implement: Keeping an
> internal table of schemas as a map (AbsURI -> Schema), that whenever you
> lookup a schema from that map, just change your URI base to the looked-up
> URI. But in any event, the point of a JSON Schema implementation is to do
> the heavy lifting so the developer doesn't need to.
>

You seem to be assuming the fact that any fully resolved ID will be
absolute. And that is not the case.

You can perfectly well load this schema locally:

{
    "id": "foo",
    "whatever": [ "you", "want" ],
    "sub": {
        "id": "other"
    }
}

No URI is absolute here. The Internet does not resolve around HTTP!
But with my implementation, I can fully, and accurately, resolve any
$ref into that schema.
 
Resolving URIs don't require HTTP! Section 5.1 of RFC 3986 specifies how to handle this case. I use an anonymous scheme+host, like local filesystem paths (actually, it is local filesystem path resolution). While it cannot be referred to in an absolute URI or anywhere outside of that URI base, it allows for relative-only URIs to successfully point to one another.

Austin Wright.

Geraint (David)

unread,
Dec 22, 2012, 5:17:34 AM12/22/12
to json-...@googlegroups.com, diamon...@users.sourceforge.net
On Friday, 21 December 2012 23:47:43 UTC, Austin Wright wrote:
 
Hopefully I impress why the id attribute is important, though.

I agree it's very useful. 

That makes sense only if you consider the result of the "$ref" to be actually part of the parent schema document, which I think would be fairly radical.  If you instead consider that the target of $ref should be "spliced in", but retain information about its own URI that is separate from the original

What do you mean by "parent schema document" exactly? I'm saying the referenced object is not merely substituted in, but the referenced object keeps its own URI base, otherwise equivalent to substitution.
 
Ah - I was misunderstanding what you meant by "extracted". 

Francis' implementation doesn't lead to consistent results, as I defined consistency. If I have a {"$ref": "#"} expression somewhere in my JSON schema then the result of the lookup would be dependent on the root schema URI, when it instead should always yield the same results whenever the environment URI is the same, e.g., if I have {"id": "http://example.com/cars.json", additionalProperties:{"$ref":""}} then it should always refer to itself no matter where it appears, or the document URI. However, in Francis' implementation, it does not, therefore I would not call it consistent.

A-ha!  I see what your line of argument is about now.  I actually agree with you, and that's a very good way to put it.

The algorithm that I laid out I believe is easier to implement: Keeping an internal table of schemas as a map (AbsURI -> Schema), that whenever you lookup a schema from that map, just change your URI base to the looked-up URI. But in any event, the point of a JSON Schema implementation is to do the heavy lifting so the developer doesn't need to.

I think if there is a difference in implementation difficulty, it's not very significant.  Francis's method simply requires that schemas keep a reference to the document URI from which they were fetched, and use that for the URI base instead.

I don't understand #4, though - if the fragment part is empty, won't that always fail?
 
That's the point. If no schema at target exists, and there's no fragment, then it's follows there's no schema at the URI without the fragment (it's the same URI of course), so that's a failure.

D'oh!  Sorry, I somehow skipped over point 2.
 
No, it's not necessary.  Using "id" for fragment resolution, however, is not elegant, and not as flexible.  Throw in the fact that Google are mis-using "id", and you can see why we're more keen on JSON Pointer.

How does one use the id attribute for fragment resolution? If you want to put fragments in "id" I'd think you can do so (e.g. a schema {"id": "http://example.com/spec#schema"} is legal and referencable), but it's largely unnecessary.

I believe that's the standard way to do fragment resolution in v3  ( {"id": "#someFragment"} ).  In fact, until JSON Pointer was introduced in v4, I think this is the only way to do fragment resolution.
 
If you have complaints about JSON Pointer, then go take it up with the people who wrote that standard. However, it's flexible, we don't have to specify our own fragment-resolution algorithms, and on top of that - I think JSON Pointer is fine, and I have no issue with ~-escaping.

It appears to be directly derived from JSON Schema, including the authors. While media types are allowed to define their own fragment resolution schemes, JSON does not do so, nor is JSON Pointer an attempt to to so. Perhaps a new specification entirely is in order.

I'm afraid I don't quite know what you want.  Do you just want the JSON Pointer standard to change, or are you saying you would prefer JSON Schema to define its own fragment resolution scheme (separate from JSON Pointer)?
 
Finally finally, I'd note I make a subtle distinction between </schema.json> and </schema.json#> where schema.json is an application/json document: The former identifies a byte stream, an information resource, the latter identifies a conceptual resource, a parsed JSON object, like one in memory. They can both be used the same to refer to a JSON Schema, though.

The web convention is that "<URI>" and "<URI>#" represent the same resource.  Given that they should be able to be used interchangeably, I'm afraid I can't quite see the benefit of trying to define any distinction between them.

They'll both go to the same webpage, but they're not (necessarily) the same resource. Tim Berners-Lee's argument that "HTTP URIs (without "#") should be understood as referring to documents, not cars." (I have a slightly different take on the so-called httpRange-14 issue, that cars can have URIs without fragments so long as you use Content-Type negotiation, but I digress, but the point is, <URI> and <URI#> are not necessarily the same resource.)

 OK.  What practical impact does that have, though?  In what situation would a URI with/without a "#" be actually treated differently?

Geraint (David)

unread,
Dec 22, 2012, 5:27:16 AM12/22/12
to json-...@googlegroups.com, diamon...@users.sourceforge.net
Just for clarity:

The biggest point I feel you're making is that if URIs  (such as "id" or "$ref") are resolved against the document root, instead of the URI of the immediate parent schema, then behaviour is inconsistent.  It also sounds like you believe this is the correct reading of v3 of the draft, and would like this behaviour to be continued in future versions?

If so, I most definitely agree with you.  However, Francis is the person to convince on this, as he has been the one writing the relevant sections of the specification, and he has a different view.

Francis:  I'm sorry!  I didn't bring it up again - but it's reassuring to know I'm not the only person who has my interpretation.

Francis Galiegue

unread,
Dec 22, 2012, 5:36:40 AM12/22/12
to json-...@googlegroups.com, diamon...@users.sourceforge.net
Well, that's 2 against 1, so I guess I lose, right? ;)

Now the way to word _that_ in the draft will be a real PITA, but hey,
I was enough in trouble writing this section in the first place.

Geraint (David)

unread,
Dec 22, 2012, 7:01:17 AM12/22/12
to json-...@googlegroups.com, diamon...@users.sourceforge.net
I wasn't saying that. ;)   But now I'm more confident I'm not the lone person with this view, it could be worth talking about again.

Geraint (David)

unread,
Dec 22, 2012, 8:42:53 AM12/22/12
to json-...@googlegroups.com, diamon...@users.sourceforge.net
Austin:

The argument you have made for consistency is true for your definition of consistency: if the sub-schema has an absolute URI as its value of "id", then it can be plucked out of its parent schema and used independently.

But there are a few issues I'd like to raise with it.

Firstly: unless I'm misunderstanding something, it still doesn't work if the URI in "id" is in any way relative.  If you plucked the sub-schema out and returned it from a different host, then the behaviour is still inconsistent unless "id" is an absolute URI.

Secondly: it complicates the issue of what should be considered the "document".  The JSON Reference (the canonical definition of the "$ref" behaviour) states that the URL in "$ref" should be resolved relative to the document.  So we have a few options:
  1. Don't use JSON Reference - in this case, we shouldn't use "$ref", but should think of some other syntax for it.
  2. We do some shenanigans to define each schema as its own "document"
  3. We just deal with it, and resolve everything relative to the root document URI.
Obviously (I hope), I think #1 is not a good idea.

I think #2 is very confusing, and raises even more problems.  If each schema is its own "document", then how can two schemas have the same URI?  How can a "document" have a URI with a fragment in it?

So actually, even though I find your interpretation more intuitive than the one currently settled on for draft v4, I think it has problems that the current interpretation does not.

Geraint (David)

unread,
Dec 22, 2012, 8:56:31 AM12/22/12
to json-...@googlegroups.com, diamon...@users.sourceforge.net
From my point of view, the main reason I wanted "$ref"s and so on to be interpreted relative to the immediate parent schema (and not the document root) is because I wanted to be able to have two schemas like so:

Schema 1:
    {
        "id": "http://example.com/schemas/",
        "title": "Some schema",
        "type": "array",
        "items": {"$ref": "subSchemas/schema2"}
    }
Schema 2:
    {
        "id": "http://example.com/schemas/subSchemas/schema2",
        "title": "Some other schema"
        "type": "object",
        "allOf": [{"$ref": "schema3"}]
    }

If we resolve relative to the parent schema, then we could serve up the following, presenting both schemas in one request:

Schema 1b:
    {
        "id": "http://example.com/schemas/",
        "title": "Some schema",
        "type": "array",
        "items": {
            "id": "http://example.com/schemas/subSchemas/schema2",
            "title": "Some other schema"
            "type": "object",
            "allOf": [{"$ref": "schema3"}]
        }
    }

You can just dump the contents of the other schema in without modification.  However, if you were resolving against the root schema, however, you would need to edit the {"$ref": "schema3"} so that it became {"$ref": "subSchemas/schema3"}.

The thing is, although that would be occasionally very convenient, that's a one-off modification, and it's not actually that bad.  So the question for me is: how important is it to be able to do that substitution without any modifications to the schema?  Is it worth the other issues it would raise (previous message)?

Francis Galiegue

unread,
Dec 22, 2012, 9:01:22 AM12/22/12
to json-...@googlegroups.com, diamon...@users.sourceforge.net
(did I say "id" was a viper's nest? No? :p)

I'd first like to mention that there is no obligation at all that "id"
be an absolute URI. In fact, draft v3 does not constrain what "id" can
be at all, and that is one of my problems. But nevermind that.

Let's take this schema:

{
"title": "rootSchema",
"id": "/some/node#",
"nested": {
"alsoNested": {
"title": "alsoNested",
"id": "otherNode#withFragment",
"type": "array",
"items": { "$ref": "#" }
}
}
}

In this situation:

* if all references are resolved against the root schema, "#" refers
to "rootSchema",
* but if they are resolved against the uppermost parent having an id,
here "alsoNested" which has URI "/some/otherNode#withFragment", the
resolved URI gives "/some/otherNode#" which is a dangling ref.

Yes, the fact that "id" can be whatever the author wants is a real
headache here.

Geraint (David)

unread,
Dec 22, 2012, 9:08:15 AM12/22/12
to json-...@googlegroups.com, diamon...@users.sourceforge.net
On Saturday, December 22, 2012 2:01:22 PM UTC, fge wrote:
(did I say "id" was a viper's nest? No? :p)

I'd first like to mention that there is no obligation at all that "id"
be an absolute URI. In fact, draft v3 does not constrain what "id" can
be at all, and that is one of my problems. But nevermind that.

Actually, I think this is a major point, given that non-absolute URIs in "id" break Austin's definition of consistency.
 
In this situation:

* if all references are resolved against the root schema, "#" refers
to "rootSchema",
* but if they are resolved against the uppermost parent having an id,
here "alsoNested" which has URI "/some/otherNode#withFragment", the
resolved URI gives "/some/otherNode#" which  is a dangling ref.

Well, personally I would count this as "schema authors being incompetent".

Yes, the fact that "id" can be whatever the author wants is a real
headache here.

I dunno - I mean, I can write <a href="/somewhere/that/does/not/exist">, but that's not a flaw in HTML.

However, I definitely agree the fact that "id" does not have to be absolute does limit any benefits that "resolving to immediate parent schema" might have.

Austin Wright

unread,
Dec 22, 2012, 10:29:11 AM12/22/12
to json-...@googlegroups.com
On Saturday, December 22, 2012 3:17:34 AM UTC-7, Geraint (David) wrote:
On Friday, 21 December 2012 23:47:43 UTC, Austin Wright wrote:


The algorithm that I laid out I believe is easier to implement: Keeping an internal table of schemas as a map (AbsURI -> Schema), that whenever you lookup a schema from that map, just change your URI base to the looked-up URI. But in any event, the point of a JSON Schema implementation is to do the heavy lifting so the developer doesn't need to.

I think if there is a difference in implementation difficulty, it's not very significant.  Francis's method simply requires that schemas keep a reference to the document URI from which they were fetched, and use that for the URI base instead.

This would be an even better way to put it: Whenever you reference a schema using a URI, you use that (resolved, absolute) URI as its URI base. This discards any need to keep track of where it ultimately came from, or which document it was embedded in. This works even if you use some sort of path lookup in the fragment, and even if the schema goes by multiple URIs (assuming the schemas within it are also resolvable under multiple URIs).
 
 
No, it's not necessary.  Using "id" for fragment resolution, however, is not elegant, and not as flexible.  Throw in the fact that Google are mis-using "id", and you can see why we're more keen on JSON Pointer.

How does one use the id attribute for fragment resolution? If you want to put fragments in "id" I'd think you can do so (e.g. a schema {"id": "http://example.com/spec#schema"} is legal and referencable), but it's largely unnecessary.

I believe that's the standard way to do fragment resolution in v3  ( {"id": "#someFragment"} ).  In fact, until JSON Pointer was introduced in v4, I think this is the only way to do fragment resolution.

v3 does define fragment resolution, it's one of two methods in some currently-inexplicable way that you can choose to switch between. Right now I'm implementing only slash-prefixed-walking-with-uri-encoding, so <#> is the root, <#/arr/1> is root.arr[0], <#/%2f> is root["/"], etc. I rather like this method. It works better than, at least, the "." separator which is an unreserved character (though otherwise might be preferable).

I'm afraid I don't quite know what you want.  Do you just want the JSON Pointer standard to change, or are you saying you would prefer JSON Schema to define its own fragment resolution scheme (separate from JSON Pointer)?

I'm thinking standardized fragment resolution scheme would be preferable. Then objects could formally be referred to outside the context of a JSON Schema document.
 

They'll both go to the same webpage, but they're not (necessarily) the same resource. Tim Berners-Lee's argument that "HTTP URIs (without "#") should be understood as referring to documents, not cars." (I have a slightly different take on the so-called httpRange-14 issue, that cars can have URIs without fragments so long as you use Content-Type negotiation, but I digress, but the point is, <URI> and <URI#> are not necessarily the same resource.)

 OK.  What practical impact does that have, though?  In what situation would a URI with/without a "#" be actually treated differently?
 
Let's say I want to give my car a URI. It could be an item in a car inventory, for instance. Typically, it will be given <car#>, and <car> will be the information resource, the formatted representation of the car. So if I want to talk about the car's make and model, I'd describe <car#>, and if I wanted to describe the JSON document (maybe it's JSON-LD), I want to describe how many bytes the document is, or enumerate the people who are allowed to access it, then I describe <car>.

Obviously, this poses a problem if <#> is also defined to refer to the root of the JSON schema document <>, because now I have a car defined to live at <#>, but a JSON fragment lookup which could resolve to a JSON object in addition.

I don't think this is a problem with JSON, certainly not JSON Schema so this may be drifting off topic a bit, but it's something to keep in mind. I do think the problem lies in the httpRange-14 resolution. It was made under the (correct) assumption that you can't dereference a URI with a fragment, because user agents strip it off before dereferencing a document. But instead, <car> should be its URI, and when it's dereferenced, and servers should utilize the Content-Location header (to tell user-agents you're addressing a resource different than was requested) with Content-Type negotiation to return a <car.json> resource in its place. Then you're never stuck trying to figure out how to describe an information resource and a non-information resource at the same time.

Again, I mention it (linked data, the semantic web) as a growing use-case for JSON that JSON Schema needs to be aware of.

Austin Wright.

Francis Galiegue

unread,
Dec 22, 2012, 10:42:37 AM12/22/12
to json-...@googlegroups.com
On Sat, Dec 22, 2012 at 4:29 PM, Austin Wright
<diamon...@users.sourceforge.net> wrote:
[...]
>
> This would be an even better way to put it: Whenever you reference a schema
> using a URI, you use that (resolved, absolute) URI as its URI base.
>

Again: a schema URI needs _not_ be absolute.

[...]
>> I'm afraid I don't quite know what you want. Do you just want the JSON
>> Pointer standard to change, or are you saying you would prefer JSON Schema
>> to define its own fragment resolution scheme (separate from JSON Pointer)?
>
>
> I'm thinking standardized fragment resolution scheme would be preferable.
> Then objects could formally be referred to outside the context of a JSON
> Schema document.
>

Well, "standardized fragment resolution scheme" is exactly what JSON
Pointer offers. So, that is not really a problem, is it?

I'll repeat again why JSON Pointer was designed the way it is (note, I
am _not_ the spec author): it allows unambiguous, uniform access to
any part of any JSON value, and context-independent encoding. You can
"decode" a JSON Pointer all you want, you'll always obtain the same
pointer. Not so if you use %-encoded escape sequences. JSON is not
linked to HTTP, nor to URI, nor to JavaScript. JSON Pointer has been
designed (successfully) with this in mind.

Austin Wright

unread,
Dec 22, 2012, 10:46:45 AM12/22/12
to json-...@googlegroups.com, diamon...@users.sourceforge.net
I'm arguing that each schema should have its own URI base which is inherited, and that the language of the id attribute should adopt this terminology ("URI base"). Like I mention, this has the desirable side-effect that the URI uniquely identifying the schema is also its URI base.

Also, the adoption of a standard fragment resolution scheme for the application/json media type.

Austin Wright.

Austin Wright

unread,
Dec 22, 2012, 10:47:10 AM12/22/12
to json-...@googlegroups.com

On Saturday, December 22, 2012 6:42:53 AM UTC-7, Geraint (David) wrote:
Austin:

The argument you have made for consistency is true for your definition of consistency: if the sub-schema has an absolute URI as its value of "id", then it can be plucked out of its parent schema and used independently.

But there are a few issues I'd like to raise with it.

Firstly: unless I'm misunderstanding something, it still doesn't work if the URI in "id" is in any way relative.  If you plucked the sub-schema out and returned it from a different host, then the behaviour is still inconsistent unless "id" is an absolute URI.

The URI base is preserved, because the URI you look it up by will serve as its URI base for resolving "id". Therefore it is "consistent" even with relative URIs in "id" and "$ref" alike.

Secondly: it complicates the issue of what should be considered the "document".  The JSON Reference (the canonical definition of the "$ref" behaviour) states that the URL in "$ref" should be resolved relative to the document.  So we have a few options:
  1. Don't use JSON Reference - in this case, we shouldn't use "$ref", but should think of some other syntax for it.
  2. We do some shenanigans to define each schema as its own "document"
  3. We just deal with it, and resolve everything relative to the root document URI.
Obviously (I hope), I think #1 is not a good idea.

I think #2 is very confusing, and raises even more problems.  If each schema is its own "document", then how can two schemas have the same URI?  How can a "document" have a URI with a fragment in it?

So actually, even though I find your interpretation more intuitive than the one currently settled on for draft v4, I think it has problems that the current interpretation does not.

v3 doesn't use JSON Reference so it's not a problem for that specification. I'd look into simply reverting the behavior for v4, or allowing a string to define a URI of a schema in place of an object schema (as v3 seems to inconsistently do, and v4 separates types like "object" from schemas so there's no potential for this to be ambiguous).

Austin Wright 

Francis Galiegue

unread,
Dec 22, 2012, 10:54:16 AM12/22/12
to json-...@googlegroups.com
On Sat, Dec 22, 2012 at 4:47 PM, Austin Wright
<diamon...@users.sourceforge.net> wrote:
[...]
>
> v3 doesn't use JSON Reference so it's not a problem for that specification.
>

Yes it does, albeit a little disguised: consider the json-ref
metaschema. Consider that the core metaschema itself uses it ({
"$ref": "#" }).

The only way in which JSON Reference is not suitable is when
references like { "$ref": "#foo" } are used, and this is why the
current draft v4 text allows this as an extension. It fits what v3 can
do and plain JSON Reference cannot. The only source of disagreement is
the interpretation of "parent" in section 5.23 of draft v3. Looks like
I've lost the battle of it actually meaning "root". I'll post a wrap
up about that.

> I'd look into simply reverting the behavior for v4, or allowing a string to
> define a URI of a schema in place of an object schema (as v3 seems to
> inconsistently do, and v4 separates types like "object" from schemas so
> there's no potential for this to be ambiguous).
>

Sorry, I don't understand this part at all. Can you elaborate? A URI
in JSON _is_ a JSON string. Also, I don't see where "v4 separates
types like "object" from schemas", it only makes the distinction of an
instance (a JSON value being validated) and a schema (a JSON value
which must be an object by definition).

Geraint (David)

unread,
Dec 22, 2012, 11:00:32 AM12/22/12
to json-...@googlegroups.com, diamon...@users.sourceforge.net

Sure - no problem.  However, "$ref"s would still resolve relative to the "referring document" which would be the "root schema" defined in v4.

Although I believe that, if you're using "id", a schema is not uniquely identified by a URI.  For example:
{
    "id": "http://example.com/schema/",
    "definitions": {
        "subSchema": {
            "id": "http://example.com/schema/subSchema"
        }
    }
}
The sub-schema there is equally accurately by both "http://example.com/schema/#/definitions/subSchema" and "http://example.com/schema/subSchema".
 
Also, the adoption of a standard fragment resolution scheme for the application/json media type.

Sure - my favourite candidate is JSON Pointer.  I'm weighing up the possibility of removing the "fragmentResolution" property, to encourage the use of JSON Pointer everywhere.

Austin Wright

unread,
Dec 22, 2012, 11:07:44 AM12/22/12
to json-...@googlegroups.com


On Saturday, December 22, 2012 8:42:37 AM UTC-7, fge wrote:
On Sat, Dec 22, 2012 at 4:29 PM, Austin Wright
<diamon...@users.sourceforge.net> wrote:
[...]
>
> This would be an even better way to put it: Whenever you reference a schema
> using a URI, you use that (resolved, absolute) URI as its URI base.
>

Again: a schema URI needs _not_ be absolute.

All URIs are resolvable to some absolute form, even if it's anonymous, and not encodable as an absolute URI. This is not a problem, RFC 3986 encompasses it. A document of all relative URIs in a purely programming environment with no base URI should still work 100%.
 

[...]
>> I'm afraid I don't quite know what you want.  Do you just want the JSON
>> Pointer standard to change, or are you saying you would prefer JSON Schema
>> to define its own fragment resolution scheme (separate from JSON Pointer)?
>
>
> I'm thinking standardized fragment resolution scheme would be preferable.
> Then objects could formally be referred to outside the context of a JSON
> Schema document.
>

Well, "standardized fragment resolution scheme" is exactly what JSON
Pointer offers. So, that is not really a problem, is it?

JSON Pointer doesn't adopt the language necessary to make it formal for all application/json documents.
 

I'll repeat again why JSON Pointer was designed the way it is (note, I
am _not_ the spec author): it allows unambiguous, uniform access to
any part of any JSON value, and context-independent encoding. You can
"decode" a JSON Pointer all you want, you'll always obtain the same
pointer. Not so if you use %-encoded escape sequences. JSON is not
linked to HTTP, nor to URI, nor to JavaScript. JSON Pointer has been
designed (successfully) with this in mind.

Not being a URI is a bad thing. URIs are a core Web technology. So I'm suggesting that instead of using "~0"  "~1" and "%", that JSON Pointer utilize "~" "%2f" and "%25" respectively. Or, adopt a dedicated fragment resolution syntax. I don't see such a URI-less syntax being necessary unless it concerns more complex logic, similar to XPath or XQuery for XML.

Austin Wright.

Geraint (David)

unread,
Dec 22, 2012, 11:08:18 AM12/22/12
to json-...@googlegroups.com


On Saturday, December 22, 2012 3:54:16 PM UTC, fge wrote:
On Sat, Dec 22, 2012 at 4:47 PM, Austin Wright
<diamon...@users.sourceforge.net> wrote:
[...]
>
> v3 doesn't use JSON Reference so it's not a problem for that specification.
>

Yes it does, albeit a little disguised: consider the json-ref
metaschema. Consider that the core metaschema itself uses it ({
"$ref": "#" }).
 
Actually, no - the json-ref metaschema is not used by the hyper-schema.  For example:
{
    "enum": [{"$ref": "/home"}, {"$ref": "/away"}]
}
According to the "json-ref" schema (which is completely recursive), the two enum values are the values of /home and /away.  However, according to the JSON Schema spec, the two enum value are in fact the two objects:
  1. {"$ref": "/home"}
  2. {"$ref": "/away"}

The only way in which JSON Reference is not suitable is when
references like { "$ref": "#foo" } are used, and this is why the
current draft v4 text allows this as an extension. It fits what v3 can
do and plain JSON Reference cannot. The only source of disagreement is
the interpretation of "parent" in section 5.23 of draft v3. Looks like
I've lost the battle of it actually meaning "root". I'll post a wrap
up about that.

What? No!  The other definition conflicts with the definition of JSON Reference.  I am not happy changing it unless those are somehow sorted out.

> I'd look into simply reverting the behavior for v4, or allowing a string to
> define a URI of a schema in place of an object schema (as v3 seems to
> inconsistently do, and v4 separates types like "object" from schemas so
> there's no potential for this to be ambiguous).
>

Sorry, I don't understand this part at all. Can you elaborate? A URI
in JSON _is_ a JSON string. Also, I don't see where "v4 separates
types like "object" from schemas", it only makes the distinction of an
instance (a JSON value being validated) and a schema (a JSON value
which must be an object by definition).
 
For the first part, I think he's talking about an old syntax where you didn't need "$ref", you just stuck the URL in as a string:
{
    "type": "array",
    "items": "/path/to/some/schema"
}
I don't like it, and prefer the $ref syntax.

For the second part... yeah, I'm confused about that as well.

Francis Galiegue

unread,
Dec 22, 2012, 11:11:55 AM12/22/12
to json-...@googlegroups.com
On Sat, Dec 22, 2012 at 5:07 PM, Austin Wright
<diamon...@users.sourceforge.net> wrote:
[...]
>>
>> Again: a schema URI needs _not_ be absolute.
>
>
> All URIs are resolvable to some absolute form, even if it's anonymous, and
> not encodable as an absolute URI. This is not a problem, RFC 3986
> encompasses it. A document of all relative URIs in a purely programming
> environment with no base URI should still work 100%.
>

OK, I guess we have different definitions of an "absolute URI". For
me, an absolute URI is a URI which starts with a scheme, ie
"foo://bar" is absolute but "/foo/bar" isn't.

[...]
>>
>> Well, "standardized fragment resolution scheme" is exactly what JSON
>> Pointer offers. So, that is not really a problem, is it?
>
>
> JSON Pointer doesn't adopt the language necessary to make it formal for all
> application/json documents.
>

Sorry but to my eyes it does. Care to give a counter example?

[...]
>
> Not being a URI is a bad thing.
>

As I mentioned, JSON Pointers can very easily be represented as URI
fragments. So again, there is no problem there. And JSON Pointer can
be used in _non web environments_ as well.

Also have a look at "format": "uri".

Austin Wright

unread,
Dec 22, 2012, 11:14:44 AM12/22/12
to json-...@googlegroups.com


On Saturday, December 22, 2012 9:00:32 AM UTC-7, Geraint (David) wrote:

Sure - no problem.  However, "$ref"s would still resolve relative to the "referring document" which would be the "root schema" defined in v4.

Although I believe that, if you're using "id", a schema is not uniquely identified by a URI.  For example:
{
    "id": "http://example.com/schema/",
    "definitions": {
        "subSchema": {
            "id": "http://example.com/schema/subSchema"
        }
    }
}
The sub-schema there is equally accurately by both "http://example.com/schema/#/definitions/subSchema" and "http://example.com/schema/subSchema".

"Uniquely identified" meaning, by the URI only, I can resolve to one and exactly one schema. This doesn't preclude that there may be multiple such URIs capable of uniquely identifying the resource.
 
 
Also, the adoption of a standard fragment resolution scheme for the application/json media type.

Sure - my favourite candidate is JSON Pointer.  I'm weighing up the possibility of removing the "fragmentResolution" property, to encourage the use of JSON Pointer everywhere.
 
I'd appreciate the removal of it, fragment resolution is supposed to be media-type wide, not application-defined. So that just leads to needless complexity.

Francis Galiegue

unread,
Dec 22, 2012, 11:16:45 AM12/22/12
to json-...@googlegroups.com
On Sat, Dec 22, 2012 at 5:08 PM, Geraint (David) <gerai...@gmail.com> wrote:
[...]
>
> What? No! The other definition conflicts with the definition of JSON
> Reference. I am not happy changing it unless those are somehow sorted out.
>

It can be worked around if the draft tells that JSON Reference
resolution must be made relative to the uppermost parent URI (that is,
if you trust "id" at all). This would be JSON Schema specific, but
then "id" is specific enough to have stirred this thread and others ;)

[...]
>
> For the first part, I think he's talking about an old syntax where you
> didn't need "$ref", you just stuck the URL in as a string:
> {
> "type": "array",
> "items": "/path/to/some/schema"
> }
> I don't like it, and prefer the $ref syntax.
>

This syntax is not legal as far as draft v3 is concerned, is it? Or I
have really misread the draft. I know that some schemas on the web
site were not up to date with regards to this, but they have been
updated since then.

Geraint (David)

unread,
Dec 22, 2012, 11:17:45 AM12/22/12
to json-...@googlegroups.com
 JSON Pointer doesn't adopt the language necessary to make it formal for all application/json documents.
Well, it presents itself as a nice default.  And the inclusion of "fragmentResolution" with only one pre-defined value ("json-pointer") is pretty strong support.

Are you looking for JSON Schema to have a separate (competing) fragment resolution standard?  Because I would much rather reference another standard, given that there is one.  Defining our own (especially one that started with "#/") would be confusing.
 
Not being a URI is a bad thing. URIs are a core Web technology. So I'm suggesting that instead of using "~0"  "~1" and "%", that JSON Pointer utilize "~" "%2f" and "%25" respectively. Or, adopt a dedicated fragment resolution syntax. I don't see such a URI-less syntax being necessary unless it concerns more complex logic, similar to XPath or XQuery for XML.

 In case it's relevant: have you seen the JSON Patch standard?  It uses JSON Pointers in a non-URI context.
 

Geraint (David)

unread,
Dec 22, 2012, 11:21:26 AM12/22/12
to json-...@googlegroups.com
"Uniquely identified" meaning, by the URI only, I can resolve to one and exactly one schema. This doesn't preclude that there may be multiple such URIs capable of uniquely identifying the resource.

That sounds more like "resolvable" to me.  "Uniquely identified" implies a one-to-one mapping.
 
I'd appreciate the removal of it, fragment resolution is supposed to be media-type wide, not application-defined. So that just leads to needless complexity.
 
I'll count that as another vote in favour of removal, then. :)

Geraint (David)

unread,
Dec 22, 2012, 11:23:50 AM12/22/12
to json-...@googlegroups.com


On Saturday, December 22, 2012 4:16:45 PM UTC, fge wrote:
On Sat, Dec 22, 2012 at 5:08 PM, Geraint (David) <gerai...@gmail.com> wrote:
[...]
>
> What? No!  The other definition conflicts with the definition of JSON
> Reference.  I am not happy changing it unless those are somehow sorted out.
>

It can be worked around if the draft tells that JSON Reference
resolution must be made relative to the uppermost parent URI (that is,
if you trust "id" at all). This would be JSON Schema specific, but
then "id" is specific enough to have stirred this thread and others ;)

True enough.  Let's start a separate thread about it - I don't expect many people will have the courage to follow this thread all the way down by now. 

[...]
>
> For the first part, I think he's talking about an old syntax where you
> didn't need "$ref", you just stuck the URL in as a string:
> {
>     "type": "array",
>     "items": "/path/to/some/schema"
> }
> I don't like it, and prefer the $ref syntax.
>

This syntax is not legal as far as draft v3 is concerned, is it? Or I
have really misread the draft. I know that some schemas on the web
site were not up to date with regards to this, but they have been
updated since then.

No, it's not legal.  It was in v2, or something like that, so it hung around in the meta-schemas for a while because they weren't updated.

Austin Wright

unread,
Dec 22, 2012, 11:26:15 AM12/22/12
to json-...@googlegroups.com


On Saturday, December 22, 2012 9:11:55 AM UTC-7, fge wrote:

OK, I guess we have different definitions of an "absolute URI". For
me, an absolute URI is a URI which starts with a scheme, ie
"foo://bar" is absolute but "/foo/bar" isn't.

Correct. However that doesn't mean you can't resolve </xyz> against an anonymous base. For an application, you might implement this by selecting a randomly-generated schema and authority, or use one defined in the application, e.g. define a default base to be <myapplication://authority/>. This is the correct way to resolve URIs when there is no other base to resolve against. In my application, I end up treating </> as an absolute URI, which works since the same library that parses paths in URIs also parses filesystem paths. Strictly speaking, however, a URI in one anonymous base can't refer to a URI in another anonymous base, but in most applications this is impossible to attempt anyways.
 
 

Sorry but to my eyes it does. Care to give a counter example?

Perhaps it does, actually. I'm seeking this out.

To be clear, I don't have a problem with URI-encoded JSON Pointer being standard, but that the ~ syntax strikes me as downright odd and unnecessary for using in URI fragments.

Austin Wright.

Austin Wright

unread,
Dec 22, 2012, 11:27:38 AM12/22/12
to json-...@googlegroups.com
On Saturday, December 22, 2012 9:17:45 AM UTC-7, Geraint (David) wrote:
 In case it's relevant: have you seen the JSON Patch standard?  It uses JSON Pointers in a non-URI context.
 
Yeah. That's the sort of example I was trying to come up with.

Francis Galiegue

unread,
Dec 22, 2012, 11:29:58 AM12/22/12
to json-...@googlegroups.com
On Sat, Dec 22, 2012 at 5:26 PM, Austin Wright
<diamon...@users.sourceforge.net> wrote:
>
>
> On Saturday, December 22, 2012 9:11:55 AM UTC-7, fge wrote:
>>
>>
>> OK, I guess we have different definitions of an "absolute URI". For
>> me, an absolute URI is a URI which starts with a scheme, ie
>> "foo://bar" is absolute but "/foo/bar" isn't.
>
>
> Correct. However that doesn't mean you can't resolve </xyz> against an
> anonymous base.

It is not needed at all. At least, I don't need it ;)

Also consider that if your URI is a URN (which _is_ an absolute URI),
then resolving any URI against that URN gives the target URI -- the
URN part is gone.

[...]
>
> To be clear, I don't have a problem with URI-encoded JSON Pointer being
> standard, but that the ~ syntax strikes me as downright odd and unnecessary
> for using in URI fragments.
>

Yes, that is still a concern and I don't like the looks of it
either... ^ was proposed as an escape character instead but ultimately
rejected because, when URI-encoded, it led to a %-encoded character ;)

But the IETF is pretty much adamant that this will be it, so...

Geraint (David)

unread,
Dec 22, 2012, 2:08:35 PM12/22/12
to json-...@googlegroups.com
OK - I mentioned it in the other thread, so I'll explain it here.

The behaviour of "id" is simply the behaviour of a link with relation "self".  Similarly, a "$ref" just defines a link with relation "full".  So the behaviour we choose for "id"/"$ref" will affect the behaviour of links more generally.

We could define two possible behaviours for resolving relative URIs in links.
  1. Relative URIs in links are resolved relative to the URI of the document containing them.
  2. When resolving relative URIs in links, we step up the data hierarchy looking for rel="self" links.  If we find one, then that's what we use.

For example, let's look at the following data, fetched from "http://example.com/just-some-document/":

{
    "id": "just-some-document",
    "title": "Just Some Document",
    "sections": [
        {
            "id": "just-some-document-subsection-1",
            "title": "Subsection #1"
        }
    }
}

with the following schema:

{
    "type": "object",
    "properties": {
        "sections": {
            "type": "array",
            "items": {
                "links": [
                    {
                        "rel": "author",
                        "href": "author"
                    }
                ]
            }
        }
    }
}

So with either method, the URI of the "author" link for the sub-section is "http://example.com/just-some-document/author".

However, let's then apply a second schema to the item as well:
{
    "links": [
        {
            "rel": "self",
            "href": "/{id}/"
        }
    ],
    "type": "object",
    "properties": {
        "sections": {
            "items": {
                "links": [
                    {
                        "rel": "self",
                        "href": "/{id}/"
                    }
                ]
            }
        }
    }
}

Now, once we apply this second schema, the sub-section suddenly has a link with relation "self".  This means it actually represents the data at "http://example.com/just-some-document-subsection-1".

At this point, the two methods for resolving URIs diverge.  If we always resolve relative to the document root, then the "author" link is unchanged from before, with a value of "http://example.com/just-some-document/author".

But if we are resolving the URI relative to that particular part of the data (which is affected by rel="self" links), the "author" link changes to "http://example.com/just-some-document-subsection-1/author".

The behaviour we want in this example must be consistent with the behaviour we want for "id"/"$ref".

Austin Wright

unread,
Dec 22, 2012, 2:15:06 PM12/22/12
to json-...@googlegroups.com

On Saturday, December 22, 2012 6:56:31 AM UTC-7, Geraint (David) wrote:
From my point of view, the main reason I wanted "$ref"s and so on to be interpreted relative to the immediate parent schema (and not the document root) is because I wanted to be able to have two schemas like so:

Schema 1:
    {
        "id": "http://example.com/schemas/",
        "title": "Some schema",
        "type": "array",
        "items": {"$ref": "subSchemas/schema2"}
    }
Schema 2:
    {
        "id": "http://example.com/schemas/subSchemas/schema2",
        "title": "Some other schema"
        "type": "object",
        "allOf": [{"$ref": "schema3"}]
    }

If we resolve relative to the parent schema, then we could serve up the following, presenting both schemas in one request:

Schema 1b:
    {
        "id": "http://example.com/schemas/",
        "title": "Some schema",
        "type": "array",
        "items": {
            "id": "http://example.com/schemas/subSchemas/schema2",
            "title": "Some other schema"
            "type": "object",
            "allOf": [{"$ref": "schema3"}]
        }
    }

You can just dump the contents of the other schema in without modification.  However, if you were resolving against the root schema, however, you would need to edit the {"$ref": "schema3"} so that it became {"$ref": "subSchemas/schema3"}.

The thing is, although that would be occasionally very convenient, that's a one-off modification, and it's not actually that bad.  So the question for me is: how important is it to be able to do that substitution without any modifications to the schema?  Is it worth the other issues it would raise (previous message)?
 
That's exactly the feature which I believe is desirable. Although we might dismiss it as uncommon, this doesn't mean no one will use it. It occurs all the time in any database/object store where you address each schema separately, a very common use-case in applications, especially the Web and Linked Data.

But I also make the point it makes using JSON Schema simpler. What's the use-case in which this is harder to use or implement?

If the specification were to require URI resolution against the document, you now have two separate notions of a resource, a JSON Schema document, and a JSON Schema object, and each JSON Schema object needs to point to the JSON Schema document that it's contained in, in order to resolve references to another JSON Schema object.

If you specify an "id", then it should be possible to resolve that URI, get a byte-for-byte identical schema. And if you don't specify an "id" in a sub-schema, then the URI base doesn't change, and this doesn't impact you 
(the fragment effectively changes, but the fragment is never considered in a URI base being resolved against).

Note on "byte-for-byte identical": If the "id" attribute is relative and changes directory, it will be wrong when resolved against the URL used to download it, no matter what method is used to resolve a URI when nested. The solution is don't use directory-changing relative paths in id attributes, or if you serve a JSON document, remove the "id" attribute, or adjust it to be consistent at serve-time.

Geraint (David)

unread,
Dec 22, 2012, 2:16:52 PM12/22/12
to json-...@googlegroups.com
Of course, the above confusion could be fixed by defining the "author" link in the example to have "href": "/{id}/author", which would be unambiguous - it would always refer to the author for that specific section.

I hope that my example illustrates the issues.
  1. If we resolve relative to the document root, then the addition of a schema does not affect the links defined by any other schemas.  However, it also mean that the link definitions have to be more complicated - the URI for the "author" link cannot just be "author", it has to specify "/{id}/author" if it wants to refer to the author for that specific section.
  2. On the other hand, if we take rel="self" links into account, simple links like the "author" one are more concise.  However, we have to perform a sweep up the data hierarchy looking for "self" links, and therefore adding another schema (that might define additional "self" links) can change the resolved URIs of existing links.
Geraint

Austin Wright

unread,
Dec 22, 2012, 2:32:14 PM12/22/12
to json-...@googlegroups.com

On Saturday, December 22, 2012 9:11:55 AM UTC-7, fge wrote:

Sorry but to my eyes it does. Care to give a counter example?

Alright: Defining how a fragment works is something assigned to a particular media type. While it does say it can be included as part of a URI, it doesn't do so it in any formal terms that would make it a fragment identifier for application/json.

Some specifications that do this are http://tools.ietf.org/html/rfc2854 and http://tools.ietf.org/html/rfc5147 .

I'd like to see the JSON Pointer draft include this language, or create a new specification to do so.

Francis Galiegue

unread,
Dec 22, 2012, 2:47:19 PM12/22/12
to json-...@googlegroups.com
On Sat, Dec 22, 2012 at 8:15 PM, Austin Wright
<diamon...@users.sourceforge.net> wrote:
[...]
>
>
> But I also make the point it makes using JSON Schema simpler. What's the
> use-case in which this is harder to use or implement?
>

This is in the eye of the beholder. While it may seem more simple to
_you_, it may not be so for others.

Francis Galiegue

unread,
Dec 22, 2012, 2:50:19 PM12/22/12
to json-...@googlegroups.com
On Sat, Dec 22, 2012 at 8:15 PM, Austin Wright
<diamon...@users.sourceforge.net> wrote:
[...]
>
> (the fragment effectively changes, but the fragment is never considered in a
> URI base being resolved against).
>

And that is wrong: fragments _do_ play a role in URI resolution
against other URIs, even if that means the fragment in the original is
effectively ignored.

Austin Wright

unread,
Dec 22, 2012, 2:51:37 PM12/22/12
to json-...@googlegroups.com

On Saturday, December 22, 2012 12:47:19 PM UTC-7, fge wrote:

This is in the eye of the beholder. While it may seem more simple to
_you_, it may not be so for others.

Describe your objection, then?

I'm talking about the complexity of an implementation. Your implementation, that you linked to, requires two functions, a function to validate a document, and a function that recurses the schemas, maintaining a pointer back to the document. This is more complex than simply treating a nested schema as a document in its own right.

Francis Galiegue

unread,
Dec 22, 2012, 2:54:08 PM12/22/12
to json-...@googlegroups.com
On Sat, Dec 22, 2012 at 8:51 PM, Austin Wright
<diamon...@users.sourceforge.net> wrote:
[...]
>
>
> Describe your objection, then?
>
> I'm talking about the complexity of an implementation. Your implementation,
> that you linked to, requires two functions, a function to validate a
> document, and a function that recurses the schemas, maintaining a pointer
> back to the document. This is more complex than simply treating a nested
> schema as a document in its own right.
>

Again, implementation does not matter, what we want is consistency.
The more I listen to David's argument about "rel", which is a
fundamental feature of hyper schema, the more it seems that even
though I didn't see these aspects, my initial "out-of-the-wild"
decision was the good one.

Austin Wright

unread,
Dec 22, 2012, 3:04:05 PM12/22/12
to json-...@googlegroups.com

On Saturday, December 22, 2012 12:54:08 PM UTC-7, fge wrote:

Again, implementation does not matter, what we want is consistency.
The more I listen to David's argument about "rel", which is a
fundamental feature of hyper schema, the more it seems that even
though I didn't see these aspects, my initial "out-of-the-wild"
decision was the good one.
Implementation complexity shouldn't be regarded, correct, developer complexity is a consern. Which is why I presented a definition of consistency as a desired feature of a JSON Schema validator. Your implementation is not consistent with it.
 
Changing the meaning of the tags to rel-links doesn't change the fact that URIs have to be resolved. How to go about it hasn't changed.

Francis Galiegue

unread,
Dec 22, 2012, 3:06:39 PM12/22/12
to json-...@googlegroups.com
On Sat, Dec 22, 2012 at 9:04 PM, Austin Wright
<diamon...@users.sourceforge.net> wrote:
[...]
>
> Implementation complexity shouldn't be regarded, correct, developer
> complexity is a consern. Which is why I presented a definition of
> consistency as a desired feature of a JSON Schema validator. Your
> implementation is not consistent with it.
>

And we fundamentally disagree on what "consistent" is. I find my
initial decision plenty consistent, and have so far not found any
argument telling otherwise.

But again, this all boils down on what section 5.23 of draft v3
defines as "parent", and we still do not have the original authors'
view on the matter.

Francis Galiegue

unread,
Dec 22, 2012, 3:12:28 PM12/22/12
to json-...@googlegroups.com
On Sat, Dec 22, 2012 at 9:06 PM, Francis Galiegue <fgal...@gmail.com> wrote:
[...]
>
> And we fundamentally disagree on what "consistent" is. I find my
> initial decision plenty consistent, and have so far not found any
> argument telling otherwise.
>

Just for the record: with my interpretation, in a schema, { "$ref":
"#" } always refers to the same content, whereas with yours, it
depends on where that $ref is. I find my interpretation consistent and
not yours. What is more, it _still_ refers to the same thing if you
don't trust id, which you may very well choose to do.

If that is not consistent, I don't know what is.

Austin Wright

unread,
Dec 22, 2012, 3:36:39 PM12/22/12
to json-...@googlegroups.com


On Saturday, December 22, 2012 12:08:35 PM UTC-7, Geraint (David) wrote:
OK - I mentioned it in the other thread, so I'll explain it here.

The behaviour of "id" is simply the behaviour of a link with relation "self".  Similarly, a "$ref" just defines a link with relation "full".  So the behaviour we choose for "id"/"$ref" will affect the behaviour of links more generally.

Let's assume that "id" should mean a "self" link, that means that you're defining equivalency between the schema at the URL you downloaded (the subject of the statement, e.g. <http://json-schema.org/schema>) and the target/object of the statement (e.g. <http://json-schema.org/draft-03/schema#>). This doesn't help us any in determining what sets the base URI of a schema. If the "id" attribute does not set the URI base in sub-schemas, then to be constistent it must not set the URI base in the root schema, either, and only the document URL should be used.
Problem: the "href" attribute is a template, not a URI to be resolved in the schema. It's applied to the instance, and then resolved relative to that instance's URL base.

Benjamin

unread,
Dec 22, 2012, 6:42:08 PM12/22/12
to json-...@googlegroups.com
I'm quite new to JSON schema in general (wanting to use it for a project, thus learning all I can about it), so I'm confident there are some nuances I'm missing and my perspective is obviously rather naive. However, in my observation, a schema is rather similar to a variable scope. Assuming this is an good (enough) metaphor, the "id" URI could be considered to be defining a new scope, and all relative links would be referenced against that scope.

Francis Galiegue

unread,
Dec 23, 2012, 8:31:07 AM12/23/12
to json-...@googlegroups.com
I like this notion of scope, especially since it is compatible with
the "don't trust id" way of doing things. If we define a scope as
being:

* a URI (whether this URI be absolute or not, it does not matter),
* a schema,
* a parent scope,

then we can tell that a JSON Reference is to be resolved relatively to
the current scope, and also define that a resolved reference (ie, a
URI) is within the current scope if the resolved reference without a
fragment part is equal to the current scope's URI without a fragment
part. It would then all be down to fragment resolution.

Geraint (David)

unread,
Dec 24, 2012, 4:56:55 AM12/24/12
to json-...@googlegroups.com
Hmm - variable scope is a very good way of thinking about this.

I'd like to mention how this would work for links more generally -  it would mean that the presence of a rel="self" link for an instance also provides a "URI resolution scope" for that instance and its children, with behaviour when two or more competing rel="self" links are present being undefined (and therefore discouraged).  I'm quite happy with this solution, and to put wording in the hyper-schema that cements it, if people are generally OK with that.

Geraint.

Francis Galiegue

unread,
Dec 24, 2012, 11:12:52 AM12/24/12
to json-...@googlegroups.com
On Mon, Dec 24, 2012 at 10:56 AM, Geraint (David) <gerai...@gmail.com> wrote:
> Hmm - variable scope is a very good way of thinking about this.
>
> I'd like to mention how this would work for links more generally - it would
> mean that the presence of a rel="self" link for an instance also provides a
> "URI resolution scope" for that instance and its children, with behaviour
> when two or more competing rel="self" links are present being undefined (and
> therefore discouraged). I'm quite happy with this solution, and to put
> wording in the hyper-schema that cements it, if people are generally OK with
> that.
>

So, define schema scope and instance scope? How does pathStart come
into play here?

Geraint (David)

unread,
Dec 31, 2012, 5:48:30 AM12/31/12
to json-...@googlegroups.com
pathStart doesn't have anything to do with URI resolution.

OK - my current understanding of our conclusion is that even if "id" is not trusted, then it still defines a new URI resolution scope for the schema.

This sounds great, and I'll put languages in the hyper-schema so that behaviour for links in general matches this.

Geraint
Reply all
Reply to author
Forward
0 new messages