Also note that id-based referencing can't be used to reference arrays
(obviously arrays can't declare their identity).
Kris
--
Thanks,
Kris
--
You received this message because you are subscribed to the Google Groups "JSON Schema" group.
To post to this group, send email to json-...@googlegroups.com.
To unsubscribe from this group, send email to json-schema...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/json-schema?hl=en.
-Matt
* Merging identity and referencing properties - In json-ref, there are
separate properties for referencing an object and self-identification of
an object. You are suggesting that an identity property be used for
both. The fundamental problem with this approach is that it reduces the
information available for the linker and creates ambiguities that are
more difficult to deal with. For example, if we had
[{id:"foo"},{id:"foo"}], it is not clear if these are both references to
a "foo" object defined elsewhere, or if one of these is the definition
of "foo" (wouldn't matter which one). By using a separate reference
property, referencing is explicit, and the interpreter knows if the
object is really empty object (except for the identity property) or if
it is referencing some other object (that the interpreter may or may not
be able to retrieve.
* Using non-URI values for identity - I am still not sure why relative
URIs are not suitable for the id-based referencing from the examples.
Relative URIs free you from needing to know anything about the parent
context, and you can use identities just as you describe. You can come
up with your own internal URI schema (that would never be exposed), if
you want to resolve relative identifiers to full URIs. As Ganesh pointed
out, URIs include the subset of identifiers that are non-locatable
(URNs) as well. If you want to only resolve objects within the current
document ({id:"foo",me:{$ref:"foo"}}), this is perfectly fine with URIs,
and if you don't want to handle references outside the current document
({id:"foo",me:{$ref:"bar"}}), it is still fine with URIs, you just throw
an error an error indicating that you can't locate the target object
(you have to throw either way, I would assume). Are there situations
where URIs are not adequate for the needs of id-based references?
* Resolving references prior to validation - This is certainly critical
for correct instance validation and is not addressed properly right now.
However, I would think this could be resolved by including language in
the specification that any links that use the "full" relation should be
resolved and substituted prior to validation. Would that be sufficient?
Thanks,
Kris
--
Thanks,
Kris
Yep - thanks for splitting it up - in retrospect separate emails would have been better
> * Merging identity and referencing properties - In json-ref, there are
> separate properties for referencing an object and self-identification of
> an object. You are suggesting that an identity property be used for
> both. The fundamental problem with this approach is that it reduces the
> information available for the linker and creates ambiguities that are
> more difficult to deal with. For example, if we had
> [{id:"foo"},{id:"foo"}], it is not clear if these are both references to
> a "foo" object defined elsewhere, or if one of these is the definition
> of "foo" (wouldn't matter which one). By using a separate reference
> property, referencing is explicit, and the interpreter knows if the
> object is really empty object (except for the identity property) or if
> it is referencing some other object (that the interpreter may or may not
> be able to retrieve.
The idea is that the schema, when talking at the level of pure JSON data, just dictates what it means for two instances of this schema to be considered equal. In order to describe how to serialize and deserialize bits of graph-based data in a way that doesn't require a root-relative addressing mechanism (see further on for the problem with this).
It would be up to the application to determine how much resolution it attempts to do when looking for instances to equate in the object graph, but it need not limit itself to those present in the document itself. In particular, many applications might want to use hyper-schema to find out exactly what resolution-by-id mechanisms are available for instances of that schema, and then use these in all cases to find the canonical full version of the object to use when joining up the graph. Or, for example, server-side apps might want to look up {id: 1234} in the database, and then switch the canonical full version into the graph before validating, to save clients from having to supply all the non-optional properties for some schema in order to communicate which instance they're talking about when the server already maintains the canonical database for objects of this schema.
That's in keeping with a desire to separate the description of data (in particular the description of data which comes in a graph not just a tree), which is quite a pure and beautiful domain - and metadata relating to hypertext mechanisms that might be used to resolve that data. Which it seemed you already keen on based on this json-schema / hyper-schema separation.
That said I realise that this makes the semantics for resolving links a little less obvious / more application-defined, but I think hyper-schema is the place to define the precise application-specific object resolution semantics (since hypertext is exactly what this is!).
So, I think it would be wonderful if there were two layers:
* Some general core concepts which allow json-schema to describe the serialization and deserialization of graph-based data based on identity
* Extra hypertext metadata which pins down exactly how resolution is performed and allows precise hypertext-reference-style semantics where desired
If you think my suggestion doesn't do enough to achieve the latter I'm happy to work on it
> * Using non-URI values for identity - I am still not sure why relative
> URIs are not suitable for the id-based referencing from the examples.
> Relative URIs free you from needing to know anything about the parent
> context, and you can use identities just as you describe. You can come
> up with your own internal URI schema (that would never be exposed)
I think there's some confusion here between relative URIs like in your example ("foo") and those with a fragment identifier ("#foo")
The former kind of relative URI, "foo", means approximately "a file called foo in the same directory as this file", is only really defined for URL protocols which have filesystem-like path components (HTTP, FTP, …), and very much depends on the parent context.
To explain the problem with them by an analogy, consider why using this kind of relative URL in HTML websites is such a pain in the bum - if you ever need to move some of the HTML files around in the directory structure, it can break the relative URLs (unless all the referenced files move alongside it); and perhaps more critically, if you want to cache some HTML snippet for use at multiple different URLs, you're out of luck because the relative URIs contained within it will mean different things in the different parent URI contexts.
So, I'm guessing you actually meant to suggest eg "#foo":
[{uri: "#foo"}, {uri: "#foo"}]
Which does works a bit better, because the meaning of these links when interpreted as relative URIs, no longer varies depending on the URL of the tree they're contained within.
> Are there situations
> where URIs are not adequate for the needs of id-based references?
BUT, and this is a big but, their meaning still depends on the context of what is considered the root of the tree they're to be used in. This makes it impossible to cache some fragment of JSON for use in different positions in different parent trees.
That's a big problem for me, and I imagine for others, because we use caching of JSON fragments heavily to help scale our web services.
Here's the simplest example I can think of to illustrate the problem (admittedly rather trivial, but I can construct a more real-world example too if you don't mind it being more verbose)
Say I want an object foo with a link to itself. I want a snippet of JSON for this, which I can re-use in whatever context I like, and I want it to mean the same thing (an object with a link to itself) wherever I use it. So I write:
{"foo": {"$ref": "#"}}
Using a relative URI to refer to myself. This works because I know that (in this instance at least!) I am the root of the tree.
But now, let's say I want to re-use this JSON snippet within another tree:
{"some": {"other": {"tree": {"foo": {"$ref": "#"}}}}}
My object no longer refers to itself, but to the new root object. It means something different; to fix it I would have to re-write the snippet to:
{"some": {"other": {"tree": {"foo": {"$ref": "#some.other.tree"}}}}}
But the precise reason I'm caching these snippets is so I don't have to fiddle around re-generating them for every context I want to use them in :) So this is a bit of a deal-breaker for me when it comes to using "path-from-root" linking rather than identity-based linking.
> * Resolving references prior to validation - This is certainly critical
> for correct instance validation and is not addressed properly right now.
> However, I would think this could be resolved by including language in
> the specification that any links that use the "full" relation should be
> resolved and substituted prior to validation. Would that be sufficient?
Something along those lines should work, yep, although then it no longer really makes sense to have your metaschema extend a schema for json-ref, because the json-refs are expanded before the meta-schema validates itself; the raw reference objects themselves are never validated, at least not during the main validation pass, since they need to be expanded beforehand for the validation process to be correctly defined.
That's kinda what I meant about having to link up the references as a first pass for the meta-schema and the json-schema spec to make full logical sense.
-Matt
> The former kind of relative URI, "foo", means approximately "a file called foo in the same directory as this file", is only really defined for URL protocols which have filesystem-like path components (HTTP, FTP, �), and very much depends on the parent context.
>
I don't see anything in http://www.ietf.org/rfc/rfc3986.txt that forbids
the use of relative URIs in the context of URIs that do not have a
location mechanism. It seems perfectly valid to use relative URIs even
if the identity and references are completely contained in the document.
> To explain the problem with them by an analogy, consider why using this kind of relative URL in HTML websites is such a pain in the bum - if you ever need to move some of the HTML files around in the directory structure, it can break the relative URLs (unless all the referenced files move alongside it); and perhaps more critically, if you want to cache some HTML snippet for use at multiple different URLs, you're out of luck because the relative URIs contained within it will mean different things in the different parent URI contexts.
>
> So, I'm guessing you actually meant to suggest eg "#foo":
>
> [{uri: "#foo"}, {uri: "#foo"}]
>
> Which does works a bit better, because the meaning of these links when interpreted as relative URIs, no longer varies depending on the URL of the tree they're contained within.
>
No, I meant "foo", not "#foo". "foo" is a identity based reference,
"#foo" is a root-relative JSON path reference.
>
>> Are there situations
>> where URIs are not adequate for the needs of id-based references?
>>
>
> BUT, and this is a big but, their meaning still depends on the context of what is considered the root of the tree they're to be used in. This makes it impossible to cache some fragment of JSON for use in different positions in different parent trees.
>
> That's a big problem for me, and I imagine for others, because we use caching of JSON fragments heavily to help scale our web services.
>
> Here's the simplest example I can think of to illustrate the problem (admittedly rather trivial, but I can construct a more real-world example too if you don't mind it being more verbose)
>
> Say I want an object foo with a link to itself. I want a snippet of JSON for this, which I can re-use in whatever context I like, and I want it to mean the same thing (an object with a link to itself) wherever I use it. So I write:
>
> {"foo": {"$ref": "#"}}
>
> Using a relative URI to refer to myself. This works because I know that (in this instance at least!) I am the root of the tree.
>
> But now, let's say I want to re-use this JSON snippet within another tree:
>
> {"some": {"other": {"tree": {"foo": {"$ref": "#"}}}}}
>
> My object no longer refers to itself, but to the new root object. It means something different; to fix it I would have to re-write the snippet to:
>
> {"some": {"other": {"tree": {"foo": {"$ref": "#some.other.tree"}}}}}
>
> But the precise reason I'm caching these snippets is so I don't have to fiddle around re-generating them for every context I want to use them in :) So this is a bit of a deal-breaker for me when it comes to using "path-from-root" linking rather than identity-based linking.
>
>
Yes, I agree, that is why I was not suggesting fragment identifiers.
>> * Resolving references prior to validation - This is certainly critical
>> for correct instance validation and is not addressed properly right now.
>> However, I would think this could be resolved by including language in
>> the specification that any links that use the "full" relation should be
>> resolved and substituted prior to validation. Would that be sufficient?
>>
> Something along those lines should work, yep, although then it no longer really makes sense to have your metaschema extend a schema for json-ref, because the json-refs are expanded before the meta-schema validates itself; the raw reference objects themselves are never validated, at least not during the main validation pass, since they need to be expanded beforehand for the validation process to be correctly defined.
>
> That's kinda what I meant about having to link up the references as a first pass for the meta-schema and the json-schema spec to make full logical sense.
>
Not sure I exactly follow, are you are saying that JSON Schema can't
properly be interpreted to rely json-ref based on metaschemas because
one would need to understand json-ref in order to interpret the json-ref
schema? I agree, although that simply seems like it would necessitate
some bootstrapping information about how to do the initial reference
resolution/substition (before reaching json-ref, which would simply
"agree" with the current bootstrap resolution mechanism). Or do you mean
something else?
--
Thanks,
Kris
> No, I meant "foo", not "#foo". "foo" is a identity based reference,
> "#foo" is a root-relative JSON path reference.
OK, well that's great news in that the intention is to support identity-based referencing after all. So I misunderstood somewhat and we've been talking past eachother on this. But there are some problems to be resolved with the syntax for it to work in terms of relative URIs in the URI spec, which contributed to this misunderstanding.
I understood 'foo' as a relative URL to mean "a file called foo in the same directory as this file" (which is how it's interpreted for example in HTML). Relative URI references do appear to be defined in the URI spec in this sense, and only for hierarchical URI schemes:
A relative reference (Section 4.2) refers to a resource by describing
the difference within a hierarchical name space between the reference
context and the target URI. The reference resolution algorithm,
presented in Section 5, defines how such a reference is transformed
to the target URI. As relative references can only be used within
the context of a hierarchical URI, designers of new URI schemes
should use a syntax consistent with the generic syntax's hierarchical
components unless there are compelling reasons to forbid relative
referencing within that scheme.
Whereas it appears you're using it to mean an identity-based fragment identifier within the same document, analagous to <a href="#section-1"> in a HTML page to jump to <h2 id="section-1">.
As I read the URI spec, you'd need to identify this as a fragment identifier via a preceding '#' for it to parse as a relative URI with the semantics you want (resolving a fragment within the current document, rather than resolving to a separate file which is referred to relative to the current file's namespace in a hierarchical URI scheme).
So yeah the confusion arises from the fact that (as I understand it, but I may again misunderstand) you effectively want to support two different forms of fragment-based identifier: an identity-based fragment identifier (like the way HTML defines fragment identifiers - the #section-1 example above) and a path-based identifier (like #path.from.root.to.object)
So you could fix it by, for example:
* Using the '#' in both cases, but add some syntax to the fragment identifier after the '#' to distinguish these two cases, eg #id:foo vs #path:the.path.to.foo
* Adding something explicitly to the spec for json-ref to explain when a ref is to be treated as an identity-based fragment identifier, and when it's to be treated as a path-based fragment identitier, rather than just calling it a URI and relying on the semantics of the URI spec to communicate the distinction
* ...
Anyway yeah I think precisely specifying these semantics for JSON-ref, and making sure they're compatible with the URI spec where appropriate, would avoid any future misunderstandings :)
> Not sure I exactly follow, are you are saying that JSON Schema can't
> properly be interpreted to rely json-ref based on metaschemas because
> one would need to understand json-ref in order to interpret the json-ref
> schema? I agree, although that simply seems like it would necessitate
> some bootstrapping information about how to do the initial reference
> resolution/substition (before reaching json-ref, which would simply
> "agree" with the current bootstrap resolution mechanism). Or do you mean
> something else?
Yep that's pretty much what I mean. And yeah it should be relatively easy to fix by defining a bootstrapping phase to get you from json-ref to an object graph, before going on to define the rest of the json-schema semantics in terms of those object graphs. Personally I think it wouldn't hurt to specify json-ref as a distinct underlying standard, cos some people might wanna use it without the schema stuff, but i'm not overly bothered either way.
Note that both the schema object itself, and the object which is being validating, need JSON-refs expanding prior to validation; you might also need to specify exactly how validation is defined in the case of cyclic references / a general object graph, in the schema and in the object being validated. Eg once a node of the graph has been visited once by the validator, it's marked as valid and doesn't need to be visited again. Maybe that goes without saying, but it wouldn't hurt to specify precisely, in case it reveals any awkward corner-cases with extending the spec for validation from trees to graphs.
About the role of the json-ref schema: you could still usefully have a schema file for json-ref, and refer to it in the spec to help describe the pre-conditions for the bootstrapping phase. But that description would be in a circular fashion, given that the json-ref bootstrapping needs to be defined before you can define how to interpret the json-ref schema. So I think a plain english description would be needed too.
> (before reaching json-ref, which would simply "agree" with the current bootstrap resolution mechanism)
I don't think schemas (whose validation semantics is defined on the object graph not on the tree-with-reference-object-stubs) could meaningfully extend that json-ref schema via the semantics of the normal extension mechanism; validation against the parent json-ref schema would happen on the object graph which would no longer contain the references, so it wouldn't be meaningful.
Does that make any sense?
-Matt
On 6/3/2010 9:33 AM, Matthew Willson wrote:
>>> The former kind of relative URI, "foo", means approximately "a file called foo in the same directory as this file", is only really defined for URL protocols which have filesystem-like path components (HTTP, FTP, �), and very much depends on the parent context.
>>>
>
>> No, I meant "foo", not "#foo". "foo" is a identity based reference,
>> "#foo" is a root-relative JSON path reference.
>>
> OK, well that's great news in that the intention is to support identity-based referencing after all. So I misunderstood somewhat and we've been talking past eachother on this. But there are some problems to be resolved with the syntax for it to work in terms of relative URIs in the URI spec, which contributed to this misunderstanding.
>
> I understood 'foo' as a relative URL to mean "a file called foo in the same directory as this file" (which is how it's interpreted for example in HTML). Relative URI references do appear to be defined in the URI spec in this sense, and only for hierarchical URI schemes:
>
> A relative reference (Section 4.2) refers to a resource by describing
> the difference within a hierarchical name space between the reference
> context and the target URI. The reference resolution algorithm,
> presented in Section 5, defines how such a reference is transformed
> to the target URI. As relative references can only be used within
> the context of a hierarchical URI, designers of new URI schemes
> should use a syntax consistent with the generic syntax's hierarchical
> components unless there are compelling reasons to forbid relative
> referencing within that scheme.
>
>
Just to be clear, this doesn't specify anywhere that the sibling with
the hierarchical URI needs to be file. URIs are for specifying
"resources" which is a more general concept (a file is just one type of
thing that can be a resource).
> Whereas it appears you're using it to mean an identity-based fragment identifier within the same document, analagous to <a href="#section-1"> in a HTML page to jump to <h2 id="section-1">.
>
> As I read the URI spec, you'd need to identify this as a fragment identifier via a preceding '#' for it to parse as a relative URI with the semantics you want (resolving a fragment within the current document, rather than resolving to a separate file which is referred to relative to the current file's namespace in a hierarchical URI scheme).
>
No, the identifier based referencing is not intended to be a fragment
style link. Rather the identifier (the property with the relation of
"self", which is the "id" property in json-ref) is defining the
resource, thus indicating it can be referenced as a resource on its own,
even though it is embedded in the representation of another resource.
Therefore when we have an instance of a json-ref schema:
{
"something": {
"id": "foo",
"me", {"$ref": "foo"}
}
}
We are saying that the inner object (the property value of something) is
a resource on its own, that is identified by the relative URI "foo".
This could have a resolution to a full URI of whatever://foo, but if it
can be resolved internally, then the full URI doesn't actually need to
be made visible or really computed. The reference is then linking to the
resource "foo", which has already been defined. The resource "foo" is
indeed embedded in the resource representation above (and thus could be
referenced by fragments), but fragments are not necessary since it is
also a separate resource itself. The "foo" resource is distinct from the
resource above (which hasn't been identified, and doesn't need to be)
even though it was embedded in the view.
Consequently the above block is a valid use of URIs and I believe
provides the type of identifier-based referencing that you are after.
> So yeah the confusion arises from the fact that (as I understand it, but I may again misunderstand) you effectively want to support two different forms of fragment-based identifier: an identity-based fragment identifier (like the way HTML defines fragment identifiers - the #section-1 example above) and a path-based identifier (like #path.from.root.to.object)
>
> So you could fix it by, for example:
> * Using the '#' in both cases, but add some syntax to the fragment identifier after the '#' to distinguish these two cases, eg #id:foo vs #path:the.path.to.foo
> * Adding something explicitly to the spec for json-ref to explain when a ref is to be treated as an identity-based fragment identifier, and when it's to be treated as a path-based fragment identitier, rather than just calling it a URI and relying on the semantics of the URI spec to communicate the distinction
> * ...
>
> Anyway yeah I think precisely specifying these semantics for JSON-ref, and making sure they're compatible with the URI spec where appropriate, would avoid any future misunderstandings :)
>
>
>> Not sure I exactly follow, are you are saying that JSON Schema can't
>> properly be interpreted to rely json-ref based on metaschemas because
>> one would need to understand json-ref in order to interpret the json-ref
>> schema? I agree, although that simply seems like it would necessitate
>> some bootstrapping information about how to do the initial reference
>> resolution/substition (before reaching json-ref, which would simply
>> "agree" with the current bootstrap resolution mechanism). Or do you mean
>> something else?
>>
> Yep that's pretty much what I mean. And yeah it should be relatively easy to fix by defining a bootstrapping phase to get you from json-ref to an object graph, before going on to define the rest of the json-schema semantics in terms of those object graphs. Personally I think it wouldn't hurt to specify json-ref as a distinct underlying standard, cos some people might wanna use it without the schema stuff, but i'm not overly bothered either way.
>
> Note that both the schema object itself, and the object which is being validating, need JSON-refs expanding prior to validation; you might also need to specify exactly how validation is defined in the case of cyclic references / a general object graph, in the schema and in the object being validated. Eg once a node of the graph has been visited once by the validator, it's marked as valid and doesn't need to be visited again. Maybe that goes without saying, but it wouldn't hurt to specify precisely, in case it reveals any awkward corner-cases with extending the spec for validation from trees to graphs.
>
> About the role of the json-ref schema: you could still usefully have a schema file for json-ref, and refer to it in the spec to help describe the pre-conditions for the bootstrapping phase. But that description would be in a circular fashion, given that the json-ref bootstrapping needs to be defined before you can define how to interpret the json-ref schema. So I think a plain english description would be needed too.
>
>
>> (before reaching json-ref, which would simply "agree" with the current bootstrap resolution mechanism)
>>
> I don't think schemas (whose validation semantics is defined on the object graph not on the tree-with-reference-object-stubs) could meaningfully extend that json-ref schema via the semantics of the normal extension mechanism; validation against the parent json-ref schema would happen on the object graph which would no longer contain the references, so it wouldn't be meaningful.
>
> Does that make any sense?
>
I agree that we can't just rely on meta-schemas extending json-ref to
give us json-ref. It has to be articulated with English such that the
boostrapping is sane. The extension of json-ref just creates a logical
consistency to agree with the pre-defined referencing strategy. But
having a referencing mechanism that can be described with English and
later described consistently with JSON Schema seems desirable.
--
Thanks,
Kris
Just to be clear, this doesn't specify anywhere that the sibling with
the hierarchical URI needs to be file. URIs are for specifying
"resources" which is a more general concept (a file is just one type of
thing that can be a resource).
Whereas it appears you're using it to mean an identity-based fragment identifier within the same document, analagous to <a href="#section-1"> in a HTML page to jump to <h2 id="section-1">.As I read the URI spec, you'd need to identify this as a fragment identifier via a preceding '#' for it to parse as a relative URI with the semantics you want (resolving a fragment within the current document, rather than resolving to a separate file which is referred to relative to the current file's namespace in a hierarchical URI scheme).
No, the identifier based referencing is not intended to be a fragment
style link. Rather the identifier (the property with the relation of
"self", which is the "id" property in json-ref)
is defining the
resource, thus indicating it can be referenced as a resource on its own,
even though it is embedded in the representation of another resource.
Therefore when we have an instance of a json-ref schema:
{
"something": {
"id": "foo",
"me", {"$ref": "foo"}
}
}
We are saying that the inner object (the property value of something) is
a resource on its own, that is identified by the relative URI "foo".
This could have a resolution to a full URI of whatever://foo, but if it
can be resolved internally, then the full URI doesn't actually need to
be made visible or really computed. The reference is then linking to the
resource "foo", which has already been defined. The resource "foo" is
indeed embedded in the resource representation above (and thus could be
referenced by fragments), but fragments are not necessary since it is
also a separate resource itself. The "foo" resource is distinct from the
resource above (which hasn't been identified, and doesn't need to be)
even though it was embedded in the view.
Consequently the above block is a valid use of URIs and I believe
provides the type of identifier-based referencing that you are after.
I agree that we can't just rely on meta-schemas extending json-ref to
give us json-ref. It has to be articulated with English such that the
boostrapping is sane. The extension of json-ref just creates a logical
consistency to agree with the pre-defined referencing strategy. But
having a referencing mechanism that can be described with English and
later described consistently with JSON Schema seems desirable.
Just to be clear, this doesn't specify anywhere that the sibling withthe hierarchical URI needs to be file. URIs are for specifying
"resources" which is a more general concept (a file is just one type of
thing that can be a resource).
Yep - I was using file as a quicker (but technically incorrect) way of saying 'resource exposing a JSON representation'
Whereas it appears you're using it to mean an identity-based fragment identifier within the same document, analagous to <a href="#section-1"> in a HTML page to jump to <h2 id="section-1">.
As I read the URI spec, you'd need to identify this as a fragment identifier via a preceding '#' for it to parse as a relative URI with the semantics you want (resolving a fragment within the current document, rather than resolving to a separate file which is referred to relative to the current file's namespace in a hierarchical URI scheme).
No, the identifier based referencing is not intended to be a fragmentstyle link. Rather the identifier (the property with the relation of
"self", which is the "id" property in json-ref)
Just to be clear: so the semantics of json-ref depend on the 'self' concept from hyper-schema?
Is this specified anywhere?
This sounds like it might be heading in the direction I originally wanted to go, where rather than treating the 'id' itself as a URI, hyper-schema is used to translate it into a URI� but yeah I'm a bit confused now.
is defining the
resource, thus indicating it can be referenced as a resource on its own,
even though it is embedded in the representation of another resource.
Therefore when we have an instance of a json-ref schema:
{
��"something": {
����"id": "foo",
����"me", {"$ref": "foo"}
��}
}
We are saying that the inner object (the property value of something) is
a resource on its own, that is identified by the relative URI "foo".
Relative to what though?
Relative URLs are typically interpreted as being relative to the URL of the resource whose representation they are contained within. Certainly unless explicitly specified otherwise that is what I would expect. Again I'd give HTML as the canonical example of a hypertext media type containing relative URIs, where <a href="foo"> links to a resource foo within the same part of the hierarchical namespace of the containing resource, and not to "some-arbitrary-urn-scheme://foo"
Sorry I know I seem like I'm nitpicking here, but this lead me to a lot of confusion about json references and I'm still not convinced it's compatible with the URI spec.
If you want "foo", when parsed as a URI, to make sense as a relative URI which is not relative to the URI of the parent resource within a hierarchical URI scheme, but instead relative to some implicit URN scheme (like whatever:// in your example) that would need to be explicitly specified somewhere (either in the spec itself or perhaps via hyper-schema)
In particular some people might actually want to use a genuine hierarchally-relative URI, eg {"$ref": "1234"} to refer to /articles/123 from /articles/index.
No, it can be relative to the parent document, just like HTML.As a client, how am I to decide what to consider these URIs relative to? seems you'd need something analagous to HTML's <base href="�"> tag to do this.
But all the commonly used URIs are hierarchical, and if you were using your own scheme for a set of non-locatable ids, there is no reason to shoot yourself in the foot and call it non-hierarchical. In practice, non-hierarchical is the exception, and really doesn't seem to even warrant much attention.
I'm also not convinced the URI spec allows for relative URIs relative to an arbitrary URN scheme unless that scheme is hierarchical ("relative references can only be used within�the context of a hierarchical URI")�and there is some base URI in that scheme which the relative URI can be considered as relative to.
This could have a resolution to a full URI of whatever://foo, but if it
can be resolved internally, then the full URI doesn't actually need to
be made visible or really computed. The reference is then linking to the
resource "foo", which has already been defined. The resource "foo" is
indeed embedded in the resource representation above (and thus could be
referenced by fragments), but fragments are not necessary since it is
also a separate resource itself. The "foo" resource is distinct from the
resource above (which hasn't been identified, and doesn't need to be)
even though it was embedded in the view.
Consequently the above block is a valid use of URIs and I believe
provides the type of identifier-based referencing that you are after.
I agree that we can't just rely on meta-schemas extending json-ref to
give us json-ref. It has to be articulated with English such that the
boostrapping is sane. The extension of json-ref just creates a logical
consistency to agree with the pre-defined referencing strategy. But
having a referencing mechanism that can be described with English and
later described consistently with JSON Schema seems desirable.
Hm� I still maintain that schemas in general extending a json-ref schema doesn't make logical sense for the reasons I tried to explain. The relationship isn't "foo extends json-ref", it's "foo acts on already-processed object graphs obtained from trees which validate against json-ref", and the former just doesn't work as a stand-in for the latter, other than as a rather confusing bit of inbuilt documentation.
But I guess I'll just choose not to extend json-ref from my schemas :)
-- Thanks, Kris
YesJust to be clear: so the semantics of json-ref depend on the 'self' concept from hyper-schema?
Is this specified anywhere?
This needs to be defined/specified/added to the spec.
The relative URIs definitely should be relative the URI of the parent resource. When I suggested whatever://, that meant both the parent resource and the "foo" resource are within that scheme. And that is the point of the use of relative URIs, it doesn't actually even matter what the parent resource URI is if all the URIs are relative. Doing {"id": "foo","me", {"$ref": "foo"}} means the same thing regardless of what the parent URI is (HTTP, FTP, or whatever). If the parent URI is http://site.com/parent than the identity and reference resolve to http://site.com/foo and a circular reference is created. If the parent URI is whatever://something/parent than the identity and reference resolve to whatever://something/parent and a circular reference is created. You can correctly evaluate the references without any knowledge of the parent URI (it doesn't even really need to exist), relative URIs free you from contextual dependence.
As a client, how am I to decide what to consider these URIs relative to? seems you'd need something analagous to HTML's <base href="…"> tag to do this.
No, it can be relative to the parent document, just like HTML.
No, the important relationship isn't foo extends json-ref, it is foo is an instance of json-ref, so the "already-processed object graphs" are using the referencing rules defined by json-ref.
YesJust to be clear: so the semantics of json-ref depend on the 'self' concept from hyper-schema?
Is this specified anywhere?
This needs to be defined/specified/added to the spec.
OK, will be interested to see this spec defined precisely.
Say we have an object {id: "foo"}, to which applies a hyper-schema defining a link relation: {"self": "uri://given-by-hyper-schema/{id}"}.
Should we attempt to treat "foo" itself as the URI for the object in question, relative to that of the parent resource (say http://example.com/given-by-namespace-of-parent-resource/foo) when�hyper-schema defines the 'self' relation for this object as having a different URI ("uri://given-by-hyper-schema/foo")?
Which is the real URI, "http://example.com/given-by-namespace-of-parent-resource/foo" or�"uri://given-by-hyper-schema/foo"?�Which should be used for resolution of references? why two different URIs?
The relative URIs definitely should be relative the URI of the parent resource. When I suggested whatever://, that meant both the parent resource and the "foo" resource are within that scheme. And that is the point of the use of relative URIs, it doesn't actually even matter what the parent resource URI is if all the URIs are relative. Doing {"id": "foo","me", {"$ref": "foo"}} means the same thing regardless of what the parent URI is (HTTP, FTP, or whatever). If the parent URI is http://site.com/parent than the identity and reference resolve to http://site.com/foo and a circular reference is created. If the parent URI is whatever://something/parent than the identity and reference resolve to whatever://something/parent and a circular reference is created. You can correctly evaluate the references without any knowledge of the parent URI (it doesn't even really need to exist), relative URIs free you from contextual dependence.
OK so given that I see a reference to a relative URI 'foo', I should:
* First attempt to find an object�in the current JSON tree defined with id: "foo", treating both 'foo's as being relative URIs relative to some undefined base URI of some undefined URI scheme and hence resolving both objects as the same instance having the same URI
* If this object exists then whatever representation might actually be served from http://foo.com/namespace/of/parent/resource/foo is irrelevant; even though the relative URI, taken relative to that of its parent, resolves to this URI I should make no attempt to fetch this URI, even though�some resource might exist at this URI with a different, conflicting representation
-- Thanks, Kris
OK, will be interested to see this spec defined precisely.
Say we have an object {id: "foo"}, to which applies a hyper-schema defining a link relation: {"self": "uri://given-by-hyper-schema/{id}"}.
Should we attempt to treat "foo" itself as the URI for the object in question, relative to that of the parent resource (say http://example.com/given-by-namespace-of-parent-resource/foo) when hyper-schema defines the 'self' relation for this object as having a different URI ("uri://given-by-hyper-schema/foo")?
No, it should be based on the hyper-schema's target URI, which can be relative or absolute (as in this example). If the target is relative, it resolves in context to the parent resource, and if it is absolute, no context is needed.
The resource should also exist at the URI, if it is a URL (locatable), it would obviously be undesirable for the target to be in conflict or non-existent, as the server would providing inconsistent data. If you are dealing with data where one resource can't be trusted to assign a definition for another resource, than the resource should be fetched (this is discussed in the security section of the json schema spec).
Would you like to see the addition of a definition indicating properties that can be treated like anchor ids that can be referenced through fragment identifiers? I have been making sure that it is clear that that is not how rel="self" properties work, and it is slightly different than your pure identity proposal, but maybe that would be a good approach for preserving a pure URI referencing scheme while allow for intra-document id-based referencing. If we did that, I think I would prefer to have an "anchor" attribute rather than an "identityProperties". For example, schema:
{
properties: {
id: {anchor: true},
$ref: {rel:"full", href:"{$ref}"}
}
}
instance:
{
id:"foo",
me:{$ref:"#foo"}
}
OK, will be interested to see this spec defined precisely.
Say we have an object {id: "foo"}, to which applies a hyper-schema defining a link relation: {"self": "uri://given-by-hyper-schema/{id}"}.
Should we attempt to treat "foo" itself as the URI for the object in question, relative to that of the parent resource (say http://example.com/given-by-namespace-of-parent-resource/foo) when hyper-schema defines the 'self' relation for this object as having a different URI ("uri://given-by-hyper-schema/foo")?
No, it should be based on the hyper-schema's target URI, which can be relative or absolute (as in this example). If the target is relative, it resolves in context to the parent resource, and if it is absolute, no context is needed.
Aha, great.
Still not sure I've understood it 100% though. So the requirement isn't for the value of that id property to be interpreted as a URI itself (relative or otherwise) - instead the expanded href of the self relation defined by hyper-schema for that object is what determines the URI which an object is defined as having, and this URI can be relative or not depending on your wishes?
(realised I used the wrong syntax for my example, I meant: {"rel": "self", "href": "uri://given-by-hyper-schema/{id}"} )
That would imply in particular that:
* There's nothing special about the 'id' property - it just happens to be one of the variables in the URI template href of the 'self' link relation
* Different objects might have different hyper-schemas applying to them with different 'self' link relations, meaning that two objects with the same 'id' value are not necessarily equal - eg:
if a hyperschema with {"rel": "self", "href": "some://uri//{id}"} applies to one object with an id: "foo" propertyand {"rel": "self", "href": "some://other/uri//{id}"} applies to another object, also with "id": "foo"
these are distinct objects with distinct URIs, neither of which is 'foo' taken as a relative URI relative to the URI of the parent tree.
If there's then a reference to {"$ref": "foo"}, this might not necessarily resolve to either of them; it would expand as a relative URI to http://the.parent/resource/namespace/foo and it would match any object which is an instance of a hyper-schema specifying a 'self' link relation whose target href expands (using the properties of that object) to that URI. It would look for that object first within the current JSON tree, and if that fails it would then attempt to fetch the URI in question in order to obtain the object.
only problem that occurs to me with this approach is that while objects themselves can allow hyper-schema to expand their full URIs, "$ref" references to those objects can't and have to be based on the full expanded URIs.
That introduces a bit of a coupling / dependency, requiring knowledge of the (possibly arbitrary) URI template for the self relation in order to create a correct $ref object. And means there isn't symmetry between the way identity is expressed in referenced and referencing objects.
So, it would be nice if there was also do something like
{"$ref": true, "id": "foo"}
which would then expand the 'self' relation of the hyper-schema applying it to determine the actual URI for the $ref.
That then gives a nice symmetry (where desired) between the way identity is expressed for both referenced and referencing objects, and is quite close to what I was aiming for with identityProperties
The resource should also exist at the URI, if it is a URL (locatable), it would obviously be undesirable for the target to be in conflict or non-existent, as the server would providing inconsistent data. If you are dealing with data where one resource can't be trusted to assign a definition for another resource, than the resource should be fetched (this is discussed in the security section of the json schema spec).
Hm ok. If (as I think you're saying above) the 'self' relation of the hyper-schema is what actually determines the URI, and hence I can choose to use an absolute URI (a URN even) for that, then this isn't a problem for me.
If not though, if having "foo" in {"id": "foo"} interpreted as a relative URI was my only choice, then there would be a problem with "The resource should also exist at the URI, if it is a URL".
In that case:* What would I do if I wanted to use 'foo' just to express identity local to that JSON tree (which happens to be served from a HTTP URL say) and I didn't want to make a separate subresource available for 'foo' over http at the relative URI
* Even if I am happy to make these subresources available at the relative URLs in this one case, what if I want to use that JSON snippet within a JSON tree at another URL? do I have to then make the subresources available at URLs relative to that URL too?
Would you like to see the addition of a definition indicating properties that can be treated like anchor ids that can be referenced through fragment identifiers? I have been making sure that it is clear that that is not how rel="self" properties work, and it is slightly different than your pure identity proposal, but maybe that would be a good approach for preserving a pure URI referencing scheme while allow for intra-document id-based referencing. If we did that, I think I would prefer to have an "anchor" attribute rather than an "identityProperties". For example, schema:
{
properties: {
id: {anchor: true},
$ref: {rel:"full", href:"{$ref}"}
}
}
bit confused by the $ref property in this schema - it looks like a hyperschema link object (which I thought had to go in a separate 'links' property) not a schema object?
instance:
{
id:"foo",
me:{$ref:"#foo"}
}
That sort of helps yeah, although one problem is that it assumes that values of anchor properties on instances of all kinds of different object schemas will happily coexist within the same anchor namespace for a given JSON tree.
In my case I have a bunch of different ID namespaces, so it would be nice to know that, eg {id: 1234} in the case of a 'release' object actually expands (based on hyperschema self) to "/releases/1234" whereas {id: 1234} in the case of a 'recording' object expands to "/recordings/1234" hence they are not the same identity. Hence my suggestion above, which maybe I misunderstood again but seemed to be compatible with your statement about using the hyper-schema's target URI.
Another problem might be that this introduces some ambiguity between the two kinds of fragment identifier (path-based and anchor-based) -- or would you ditch the path-based ones?
-- Thanks, Kris
Hmm ok.
So if I have a schema like so:
[
{
id: "first-schema",
properties: {
"id": {"type": "string"},
"myself": {"$ref": "first-schema"}
},
links: [
{"rel": "full", "href": "/first-namespace/{$ref}"},
{"rel": "self", "href": "/first-namespace/{id}"}
]
},
{
id: "second-schema",
properties: {
"id": {"type": "string"},
"myself": {"$ref": "second-schema"}
},
links: [
{"rel": "full", "href": "/second-namespace/{$ref}"},
{"rel": "self", "href": "/second-namespace/{id}"}
]
}
]
With the following instance:
[
{
"id": "foo",
"myself": {"$ref": "foo"}
},
{
"id": "foo",
"myself": {"$ref": "foo"}
}
]
And I'll get two separate objects each with a link to itself? but with those 'foo' ids and $refs being expanded to different URIs (/first-namespace/foo and /second-namespace/foo) based on the hyper-schema full and self relations in force at the points those ids and $refs are found? and the bootstrapping phase which joins up of the object graph will take this into account and join things up based on this notion of identity?
I guess that means the object graph bootstrapping phase needs to be defined precisely, in terms of those 'self' and 'full' hyper-schema relations. It would need to be defined in terms that work not just for the json-ref syntax ($ref and id) but with whatever full and self relations are defined.
So yeah provisionally that sounds OK if you can find a watertight way to define that in a non-circular fashion.
I think there may still be some demons lurking though, relating to the problem I raised with the approach of trying to treat validation of raw reference objects, and validation of the bootstrapped object graph created from those objects, at the same semantic level. I think if an attempt was made to define the semantics formally that would reveal trouble.
IMO it really, really does need specifying precisely though, if not completely formally then with as much rigour as can be mustered.
Because there appear to be a lot of assumptions which you're making about the semantics of json-ref which aren't clear to others; given that the semantics of json-schema, hyper-schema and json-ref are at present so closely interconnected, it really needs pinning down. This thread has clarified some of the assumptions, I think, but I won't really feel like I understand fully (and won't have full confidence that there aren't logical flaws in the combined definition of json-ref and json-schema) until I see a more rigourous spec especially of the json-ref / object-graph-bootstrapping stuff.
I guess this is part of the pain of defining a standard without relying on assumptions implicit in the way some reference implementation does things.
(by the way: if the above works the way I described, I don't think the anchor property proposal is required, since you could instead use something like:
[{"rel": "full", "href": "my-app-specific-urn-scheme:{$ref}"}, {"rel": "self", "href": "my-app-specific-urn-scheme:{id}"}]
-Matt
{
"id": "my-card",
"name":"A schema that extends card",
"extends":"http://json-schema.org/card",
"properties":{
"another-card": "my-card"
}
}
Rather than:
{
"id": "my-card",
"name":"A schema that extends card",
"extends":{"$ref": "http://json-schema.org/card"}
"properties":{
"another-card": {"$ref": "my-card"}
}
}
Essentially one could describe the grammar of a schema as being either
an object (with all the standard attibutes), or a string value, in
which case it should be interpreted as URL and the target resource
should be used as the schema definition. We might also say that if the
string value is one of the primitives types, than that a special
indicator that it is a primitively typed instance (since that is how
"type" works).
The advantage of this approach is that it is provides more concise
schemas, and it might be slightly easier to describe/define and parse.
I think it would be backwards compatible in the sense that updated
schema validators could handle json-ref-based schemas or this design,
since we weren't using strings in place of schemas before.
What do you think, would this be a good change, or problematic?
If you view this just as some syntactic sugar on top of the existing referencing and identity mechanisms which allows you to avoid the {"$ref": …} boilerplate, I'd be inclined to say that, given these things are primarily (?) for machine-readable use, perhaps syntactic sugar should take a back seat to sorting out the semantics in a rigorous way. (If you wanted to add syntactic sugar, it might be easier to define as a separate step how to transform the sugar into a canonical form, and have the metaschemas only operate in terms of this canonical form, in order to stop the metaschemas getting too messy).
Another issue would be: what if you had a schema which is actually a union of 'string' together with some object type which might supplied via a reference. When looking at an instance, how would you tell whether the string was syntactic sugar for the reference, or an actual string from the union type. (Perhaps your 'special indicator' suggestion was intended as a solution to this - not sure I understood it).
But, I think maybe what you're getting at is less about the special syntax, more that it would be nice if you could define two levels of identity semantics, a simpler one and then a full one based on hyperschema. Which might make bootstrapping easier if the metaschemas themselves stick to using only the simpler level 1 approach.
The level 1 approach would not require hyperschema relations or the use of json-ref, meaning it could be bootstrapped without requiring you to define the semantics of hyperschema stuff. It would just use a standard identity property, "id", with the rule that the object graphs are always to be interpreted as having been joined up by identifying objects in the graph based on the value of this identity property. It would not place any assumptions or constraints on how these identities/references are to be resolved outside the current document.
Level 2 could then add extra hypermedia semantics on top of this, allowing hyperschema to be used to specify exactly how any identity/reference properties should be mapped to URIs, or URLs in order to resolve references outside the current document. That would then also allow you to use something other than the default of 'id' as the identity property, and to specify other hypermedia relations as desired.
Ideally the approach would allow for level 2 to be backwards compatible, at least with some interpretation of the current not-quite-pinned-down spec.
I'd like this since it decouples things quite a bit and could reduce the complexity of the json-schema semantics quite a bit. Think it would really require someone to sit down and rigorously hash this stuff out in a formalised way in order to pin it down, ideally writing a reference implementation as part of the process with a bunch of test cases for any awkward corner cases. I imagine a bunch of issues might come up during that process, but without actually attempting it it's a bit hard to say. I'd be willing to give it a go if I had a clear mandate to sort this stuff out, but I feel like I might be stepping on people's toes a bit, or that the standard has already advanced to a point where full backwards compatibility might turn out to be too much of a constraint to really get it right. I'd also probably need to spend a bunch more time making sure I fully understand the motivation and the mental model behind the draft hyperschema / json-ref specs.
Let me know what you think I guess...
Regards,
-Matt
On 9/14/2010 2:29 PM, Gary Court wrote:
>> How painful is it to have a reserved set of identifiers/URLs for the
>> primitive types? What issues would result from having "string",
>> "number", etc. as a fixed set of reserved identifiers. This seems like
>> a pretty small set of reserved identifiers. From the resolvers point
>> of view, handling these would be pretty trivial. From the schema
>> author's perspective this is only a problem if you had a preexisting
>> sibling resource (although if the resource is another schema, it
>> should be easy avoid giving one of the reserved names) that you had to
>> reference, you could always use "./string" to achieve the same
>> relative resolution, I believe. Are there aspects that I haven't
>> considered?
> It's certainly accomplishable to reserve a set of keywords, it just
> seems like such a... dirty way of doing it. You're then mixing
> keywords with URIs, meaning the semantics of the property value depend
> on it's value. Every property in JSON Schema to this point has had a
> different instance type to indicate a different type of data. For the
> type property: if it's a string it's a type, if it's an object it's a
> schema, if its an array is a union of types. With this change, a
> string is either a type or a reference (URI) and the contents must be
> examined to determine it's meaning. As I said in another post, despite
> it's verbose nature, {"$ref":"uri"} is unmistakable as a reference;
> it's even easier to read as you don't have to know what property it
> belongs to to understand that is is a reference.
I don't think that the meaning of "string" or "number" is semantically
all that different, one can think of it as being defined as reference to
a built-in schema that accepts string values or number values. The only
thing that is special is that it doesn't follow normal relative URL
resolution rules (because it would normally be treated as a relative
URL). If you can swallow the URL interpretation, having built-in schemas
seems reasonably consistent.
> But I'll be honest, I'm indifferent about this decision. It does have
> the advantage of reducing the amount of schema you have to write, and
> makes schema referencing easier for simple validators. I think I'd
> feel better about this approach if string values for types were always
> URIs, and you would have special URIs to denote the primitive types
> like "json:string", "json:object", etc.
>
A custom protocol would indeed be more correct :). What is more
important, convenience (and backwards compatibility) or correctness?
--
Thanks,
Kris
--
Thanks,
Kris
On 9/16/2010 2:59 PM, Gary Court wrote:
>> Good point, we would need to modify "requires".
> Do you mean you agree we should abandon "schemas can be URIs", or that
> that the "requires" attribute needs to be modified to support this? I
> don't see how we can do the latter without breaking backwards
> compatibility.
I think we solve this by simply not allowing/supporting URIs as the
value for the requires attribute. This is one particular attribute that
takes schema, but probably rarely actually needs to reference other
schemas. Most of the time this is used to just add an additional
constraint that is more complex than simple property existence.
Therefore "requires" simply behaves like it did in draft 2 (so it
back-compat). Of course you can still reference another schema with
"requires":
"requires":{
"extends":"http://some.other/schema"
}
--
Thanks,
Kris
On 9/18/2010 11:19 AM, Gary Court wrote:
> Ok, I was under the impression that a schema was a union type of
> ["string", "object"]. But as you're stating here, we can explicitly
> choose which attributes accept reference URIs or not. Don't you think
> that will be confusing for people? Having to know that requires is
> different then all other attributes that accept schemas?
We could alternately separate requiring a property and and requiring a
schema into separates properties. We could have a "requires" or
"requiresProperty" and a "requiresSchema" so there is no ambiguity (or
decide one of them is not needed).
--
Thanks,
Kris